Please bookmark this page.
Optical character recognition (OCR)
is the electronic or mechanical conversion of images of typed, handwritten, or printed text to machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example, the text on signs and billboards in a landscape photograph), or from subtitle text superimposed on an image (for example: from a television broadcast).
Widely used to enter data from printed paper records – whether passports, invoices, bank statements, computerized receipts, business cards, mail, printouts of static data, or any other suitable documentation – it is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as cognitive computing, machine translation, and (extracted) text-to-sp OCR is a subfield of pattern recognition study that encompasses artificial intelligence and computer vision.
Earlier versions required training with photographs of individual characters and worked on a single typeface at a time. Advanced systems capable of delivering a high degree of identification accuracy for the majority of typefaces are now widely available, as are systems that accept a number of digital picture file formats as inputs. Certain systems are capable of replicating formatted output that is as similar to the original page as possible, including graphics, columns, and other non-textual components.