The insurance data lifecycle part I: Data extraction

Insurance is a business of information. Policy submissions, claims applications, contracts, reports and assessments are all needed every day to carry out the basic work of insurance, and the paperwork adds up.

Until recently, insurance documentation has had to be mined for usable data manually; in other words, underwriters and insurance brokers traditionally had to scan documents for data themselves, a task which can take up the majority of the day and is often repeated across various teams.

But modern technology means that manual data entry can be a thing of the past. Artificial intelligence and machine learning enable data extraction from insurance documents by computers, leaving insurance professionals free to deal with more important matters.

Which documents can AI & machine learning extract data from?

Machine learning can be used to extract data from a wide range of relevant insurance documents, including submissions, bordereaux files and loss runs. If it's got usable data, then that data can be extracted.

Bordereaux files are a fantastic illustration of how automated data extraction benefits insurers both in terms of time saved and as part of the end results. We've written before about how bordereaux reports can be analysed for data, citing an analysis in which we used data extraction tools to structure and standardise thousands of data sets.

As part of our analysis, we extracted a number of data points relating to written, premium and claim bordereaux including loss description, date of loss, and total incurred losses. In total, we were able to extract over 90% of the reports' key data points, which resulted in a master table comprising hundreds of thousands of rows of data.

The final result of our analysis was a structured database that stores key data in a standard format, making it easy to scan and analyse the data automatically using tools like Tableau. It also makes the data available for re-use to reduce rekeying and improve and automate decision making.

Why is automated data extraction in insurance a game-changer?

If the significance of automated data extraction isn't clear yet, it's time to look at the bigger picture. The insurance industry is evolving; modern technology is changing not just how insurers work but also what policyholders expect from their insurers.

Many of the biggest trends in insurance are driven by big data; that is, huge volumes of data that can't be analysed and understood without the help of machine learning. And the more time that passes, the bigger this data set gets; in fact, 90% of all the world's data was created in the past two years.

Big data is a crucial aspect of ‘Internet of Things’ (IoT) insurance devices, such as car telematics, and it's also a necessary step for insurers hoping to offer more personalised insurance policies which truly meet the needs (and reflect the risk level) of the individual. Experts are predicting that the world will be home to over a trillion connected devices - all sources of possible data for insurers - by 2025.

Insurers have a huge amount of customer data at their fingertips, but until recently it wasn't accessible; today, with the help of ML, big data can finally be used to offer superior service and gain an edge over the competition.

How is data extracted from insurance documents?

Artificial intelligence and, more specifically, machine learning can be used to extract all kinds of data points from insurance documents. Data capture isn't new technology, but the level of accuracy and reliability that's needed in insurance data extraction means that only contemporary methods will do - which brings us to Optical Character Recognition, the benchmark of accurate data extraction for insurers.

Optical Character Recognition

Optical Character Recognition, or OCR, is an important aspect of digitisation in the insurance industry and across other sectors. With OCR, computers can scan images of handwritten or typed documents and turn this text into computer-readable text. When paired up with Natural Language Processing, or NLP technology, OCR means that machine learning software can learn to 'read' documents, understanding where various data points belong in a structured format and logging them for later analysis.

OCR can help insurers to extract data from documents in a myriad of ways:

It is, clearly, a time-saver. Large insurance companies don't have the staff to manually pour over the millions of documents they receive a day; OCR makes this doable and frees staff up for other tasks too.
OCR means better service. Automating data extraction methods enables insurers to offer convenient, fast access to policy quotes and comparisons that may take other insurers days to put together. In today's competitive market, the importance of convenience can't be overestimated.
OCR can help insurers to add value and improve the quality of their service. Compared to manual data extraction, using OCR actually means fewer errors and omissions.
Because of the sheer quantity of data that can be captured, OCR enables insurers to make more accurate predictions and risk assessments, in turn meaning that premiums can be priced fairly for all customers.

What does automated data extraction mean for the industry?

The implications of automated data extraction for insurers are huge. The changes to data extraction in insurance mean that big data can finally be utilised to its full potential, but automated data extraction isn't the end of this story. The next stage in the data lifecycle is data augmentation, where extracted data is enriched with data from other sources to better inform decision making in insurance.

The push towards big data and automation has been made possible thanks to AI, a technology that is still developing and improving every year. At first glance, the number of AI options available to insurers can look overwhelming, which is why at Artificial we put together an AI buyer's guide to help insurance professionals make smart, informed decisions before settling on an AI product.

This article is the first part of our three part series on ‘Data Lifecycle in Insurance’. Be sure to sign up to our newsletter below to receive part two and three in your inbox.