Optical character recognition (OCR) has been part of the digital transformation toolkit since its inception. And over the years, OCR accuracy rates have continued to climb—but what does a claim like “99% OCR accuracy” mean, in practical terms, for companies who rely on it for business-critical tasks such as invoice processing?
The answer is more complex, and less reassuring, than you might imagine. Despite its relative maturity in the field of artificial intelligence, OCR is a technology that remains better suited to a support role than as a primary driver of digital transformation or even as a comprehensive automation solution. Fortunately, by investing in the proper tools and preparing the information being processed as thoroughly as possible, you can improve the accuracy rate of your OCR software and minimize errors while maximizing performance.
OCR Accuracy: More Than Just a Number
When discussing optical character recognition, it’s important to have a grasp of the basic concepts involved, and the actual meaning of the terminology used.
OCR is text recognition and data extraction technology; it converts printed text to digital images, then parses those images using a combination of data analysis technologies to render digitally editable text. Theoretically, OCR eliminates the need for manual data entry and mitigates the need for human intervention in common business processes such as purchase order creation and invoice processing.
In practice, OCR can be, well, a bit fiddly. Which is why a term like “99% accuracy” must be carefully examined, since the context is so crucial for business purposes. Generally speaking, these claims refer to:
- Character-level accuracy, meaning the percentage of errors as compared to the total characters scanned;
- Word-level accuracy, meaning the percentage of errors as compared to the total words scanned;
- Page-level accuracy, meaning the percentage of the content for any given page that will be accurately read and transferred.
Some OCR systems claim accuracy rates as high as 99.9%. Yet, even if only one character in 1000 is misread or skipped (a .1% error rate), the overall accuracy of the OCR takes a serious hit if those errors fall in the wrong place, as that damages the field-level confidence score, or the threshold for errors for any given field in a document being processed using OCR.
For example, if an invoice has 10,000 characters, and the OCR application misreads or misses .1% of them, it might not seem like a major issue—unless all ten of those errors are in data fields containing transaction-specific information, like the purchase order number, price, item name, quantities, etc.
Suddenly, the practical OCR accuracy rate has dropped precipitously.
In light of these considerations, it’s critical to understand what technologies empower and improve OCR, as well as the techniques required to ensure the best possible input for the best possible OCR results.
All OCR solutions rely on a similar set of algorithms to do their work. But as with other technologies, it pays to find the OCR solution that fits your specific needs, rather than look for some kind of jack-of-all-trades that might seem like a bargain until you’re putting it’s through its paces.
OCR Accuracy Depends on Artificial Intelligence Technologies
It’s certainly a core component of modern automation solutions, but OCR is not a standalone solution. OCR converts unstructured data into digital text, but its output requires further analysis (and often correction) to achieve maximum accessibility and utility.
Consequently, OCR requires more advanced technologies to be truly useful in modern business processes. Three such technologies work together to support modern OCR functionality:
- Computer vision, or the automated extraction of useful data from images. The goal of computer vision is to help the computer “see” in the same way a human does, recognizing text as separate from the background and other visual content within an image. It is related to, but distinct from, image processing, which is the analysis of what the computer sees to extract meaning.
- Natural Language Processing (NLP), or a series of image processing algorithms used to parse the text extracted by computer vision for linguistic content (i.e., words and sentences). NLP is powered by artificial intelligence, and combines recognition of established characters and word forms with probability analysis to fill in any “holes” it finds in a scanned text with the most likely word(s) that fit the surrounding context.
- Supervised deep learning, another algorithm-based technology powered by a type of iterative artificial intelligence called machine learning. With time and human supervision, OCR software using deep learning can be taught to recognize the same characters, words, and sentences written in a wide variety of fonts, as well as correct common errors that generic OCR tools would simply skip. Deep learning is also used in training OCR software to recognize more advanced character sets, such as cursive script (handwritten and otherwise).
Common Roadblocks to Acceptable OCR Accuracy
Even with help from state-of-the-art artificial intelligence technologies, OCR simply can’t operate consistently at the human level. Accuracy levels vary for a number of reasons (not the least of which is the ongoing quest to achieve true artificial intelligence that thinks, reasons, and interacts with the world as we do—including text recognition), but some of the most common challenges are:
The Chosen OCR Engine
Like many other technologies, OCR comes in a wide variety of “flavors,” from open source to “freemium” to proprietary, purpose-built applications. All OCR solutions rely on a similar set of algorithms to do their work. But also as with other technologies, it pays to find the OCR solution that fits your specific needs, rather than look for some kind of jack-of-all-trades that might seem like a bargain until you’re putting it’s through its paces.
Open-source solutions such as Tesseract boast high accuracy from the jump, but require additional training and adjustments (e.g., image pre-processing) to reach enterprise-acceptable levels of performance.
For the best possible results, investing in a purpose-built, cloud-based procure-to-pay (P2P) and data management solution like Planergy can help. Pairing best-in-class OCR with advanced robotic process automation (RPA), deep data analytics, and powerful data centralization and organization capabilities eliminates the need for manual data entry, supports electronic invoicing, and makes it much easier to collect, analyze, and use data from a wide variety of sources (including OCR scans of electronic as well as physical documents).
Text Formatting
Quality OCR output depends heavily on high-quality input, and issues with the text itself can make the algorithms work harder than they have to when performing data extraction.
- Font overload. Documents with more than two fonts or a wide range of different sized text can be difficult for algorithms to process effectively.
- Whether it’s perfect Palmer script or a doctor’s chicken scratch, hand-written text can tax recognition at the character level.
- Poorly distinguished character sets. Some fonts don’t offer much visual distinction between easily confused characters such as “0” and “O” or “6” and “G”.
- Different alphabets. If the OCR system encounters a document written in an alphabet it hasn’t been trained to recognize (e.g., Cyrillic, Arabic, etc.), it will likely fail to recognize much of the content until such training has been performed.
Quality Issues with Original Images and Scanned Documents
Sometimes it’s not the text itself, but how the document’s been formatted that can compromise OCR results.
- Colored paper and background images. Colorful or graphically “busy” backgrounds can confuse the OCR system as it tries to separate text from the rest of the scanned image.
- Poor image quality. Even humans struggle with blurriness and glare. A printed document with a low dots per inch (DPI) resolution will be harder to analyze than one with a high DPI and clear, crisp characters. Text recognition will falter if the OCR engine can’t make out clear shapes to analyze and identify.
- Poor image alignment. If documents are scanned at a skewed angle or scanned in a non-standard way (e.g., scanning a standard invoice printed on letter or A4 paper in the “landscape” orientation), text alignment and OCR performance will both suffer.
Improving OCR Accuracy
Beyond investing in a purpose-built P2P solution like Planergy, you can obtain optimal OCR results and minimize the attendant headaches by prioritizing image quality.
- Standardize form sizes, fonts, and formatting for all documents (including invoices) for clarity. Communicate these requirements to any vendor who is still using paper invoices.
- Use a single approved file format, such as .tiff.
- Make sure all documents being scanned by your OCR engine have:
- Clear, high-contrast text (ideally, black text on a white background).
- Minimal noise.
- The highest possible DPI during scanning for both documents scanned into your system and documents scanned elsewhere and submitted electronically (300 DPI is considered the minimum).
- Well-aligned, properly oriented text. Skewed documents should be de-skewed during scan or resubmitted as appropriate.
Optimize Your OCR Processes for Peak Performance
It may be getting a little long in the tooth, but OCR technology can still play a vital part in optimizing your business processes. Ensure you have the best possible source material. Develop and implement the controls and processes that eliminate recognition roadblocks, and make sure to invest in best-in-class P2P software to ensure you’ve got top-notch OCR support. Your reward will be more efficient processes, better data management, and OCR output you can rely on when you need it most.