Adding handwritten data from PDF files to a dataset

The following is a flow that show the sequence and tools used to extract information from a public PDF file with hand written information to text. Two critical steps are OCR ( Optical character recognition ) text extraction using Azure Computer Vision API service and KUTools which is an Excel add-in that allow to visualize images from the cloud in an row excel style to validate the accuracy of the OCR.

The process took few hours after running multiple processes to get different ranges of data.

The process successfully detect the correct information in more than 99% of the cases.