Our Customers

 

Deep Learning Based Document Data Extraction

Text Extraction from Bank Statement for Fintech Client

A Growing Fintech Company

Business Problem :

The fintech company has a loan origination platform. The loan applicants upload the bank statements to apply for the loan. The Client already had an OCR solution to extract the data from Bank statements. However, the OCR solution was unable to give desired accuracy if the Bank statement was complex. Also, with the change in the Bank statement format, it required coding efforts. The Client was looking for a solution where new Bank statements format can be accommodated with minimum or no coding efforts

Technology Used :

Python, CTPN, OpenCV, Deep Learning, Tesseract, Node.JS, React.JS, MongoDB

Solution :

The traditional OCR based data extraction works on the co-ordinates. If there is a change in the structure on the input, the OCR solutions fails. Also if the image is very noise, the OCR based solution gives a very poor extraction accuracy.

To address this complex business problem we used a combination of Deep Learning and OCR to get the desired results. The deep learning based OCR solution involved image pre-processing to improve image resolution, automatically marking region of interest, text extraction and recognition. OpenCV was used for image processing, CTPN was used for automatically marking region of interest and text detection. For text extraction, Tesseract was used. Application was developed using NodeJS & ReactJS

The business problem was challenging. Especially, handling the changing format of the input document. We successfully delivered the solution with a great extraction accuracy and saved massive human efforts involved in the data extraction process.

Text Extraction from Handwritten Insurance Claims Form

A General Insurance Client of an IT Service Provider

Business Problem :

The General insurance company in Europe, wanted to extract certain key fields like vehicle number, insurance number and few other key fields from a handwritten accidental insurance claims form. The desired text extraction accuracy level was 70% ,but the insurance company was getting it around 40%.

Technology Used :

Python, CTPN, Google OCR, Flask, OpenCV, Deep Learning

Solution :

The Handwritten Claims form were in the form of images. The complexity was to recognize the region of interest in the form, extract the text and recognize the digits and characters which were in Spanish Language.

The technical solution was divided into reading image, automatically marking region of interest, text extraction and recognition. OpenCV was used for image processing, CTPN was used for automatically marking region of interest and text detection. CTPN uses deep learning for text detection. For text extraction, we tried Tesseract, however, we did not get the desired accuracy. It was hence decided to use the Google OCR. With Google OCR, the we could achieve an acceptable text extraction accuracy. The model was invoked in the using Flask API.

The overall solution was complex, however, by carefully weaving up solutions components, we could deliver the solution with an accuracy of over 80%, thus exceeding the Clients expectation.

Document Data Extraction for a Growing Automation Company

Growing Automation Company

Business Problem :

Piping & Instrumentation diagram (P&ID) is used extensively in process industry. The data extraction from this diagram is used for preparing bill of material. This is a manual activity, which is time consuming and error prone. The customer wanted to automate this process.

Technology Used :

Python, OpenCV, Tensorflow, Tesseract OCR

Solution :

The solution is developed using deep learning technique for object detection. The symbol recognition process is implemented using Faster-RCNN algorithm. The document image is pre-processed using openCV based image augmentation techniques. The encoding in the symbol is extracted using OCR approach. The tabular text outside the main image is extracted using Python based Tesseract OCR component.

Automated Number Plate Recognition for CCTV Implementation Company

CCTV Implementation Company

Business Problem :

The CCTV implementation company partnered with us to implement automatic number plate recognition system for a premium club in Pune, India.
The club members need to manually verify the vehicle at the entry gate. This is time consuming and lead to inconvenience to the club members. The management wanted to automate the vehicle verification & entry at the entrance.

Technology Used :

Python, OpenCV, Tensorflow, Deep Learning

Solution :

The solution involved detection of the number plate and character recognition in the number plate. The detection of the number plate on the car is implemented using deep learning based Faster-RCNN object detection model. The detected number plate is extracted as an image. The character recognition is implemented for english language in the current phase. The character recognition is implemented using deep neural network which identifies individual character and number from the number plate.

Financial Information Data Lake

Financial Information Provider in India

Business Problem :

Our client sells financial information about the company to other stakeholders. This information is gathered from information submitted to MCA, GST Return and other sources.  The data is available in PDF, XBRL & API’s. The current process takes upto 8 hours to provide required details, which client wanted to cut down to 2 hours

Technology Used :

Python Django, React, Deep Learning

Solution :

The information is extracted using API’s, a parser is written to extract details from XBRL and STY’s solution of Docparser is used to extract data present in the PDF document. All this information from 3 sources is integrated and stored in the data lake. The web application is developed to access this information at any given point of time. The interested parties need to buy subscription to avail this information. This solution resulted in significant reduction in manual efforts and meet customer SLAs

Claim Form Data Extraction

Claim Processing BPO based in India

Business Problem :

Our client was looking at improving data extraction accuracy of UB and HFCA claim forms. They were getting accuracy of 30% with the existing application. The scanned claim forms were not having fixed layout and document quality was also poor.

Technology Used :

Python, Deep Learning, OCR

Solution :

Out document data extraction solution was trained with the UB and HFCA claim forms.  The solution has in built capabilities for improving document quality, denoising images and tilt correction. The classification model was trained to identify relevant documents from which data need to be extracted. Table extraction models were trained to extract tabular data from these documents more accurately. The solution was hosted on premise and integrated with their downstream application