Optical Character Recognition (OCR) services for digitizing texts
At SIGNEWORDS we convert scanned documents, images and PDF files into Microsoft Word format documents or other similar text editing formats. This service makes it possible to convert the content of a paper document or digital image into an editable format.
Automating the input of characters, thus avoiding input via a keyboard, results in significant time saving and increased productivity. We will always try to maintain (or even improve) the quality of the original.
Optical Character Recognition (OCR) is an artificial intelligence application that consists of automatically identifying characters or symbols using an image. A scanner sends the image of the text to the computer´s OCR program. Thus, the program then tries to identify each letter in order to turn the content into editable text.
Using a perfect image (an image with two grey levels), the character or symbol recognition is carried out via a comparison with patterns that contain all the possible characters.
Not all real images are perfect, therefore some problems may arise with OCR:
– There may be noise, that is to say, dark areas that the program mistakenly identifies as text.
– There may be grey levels that do not belong to the original image that confuse the program when it comes to converting the image into text.
– The connection of two or more characters via shared pixels can also cause errors.
– The mistaken separation of characters may occur, when there is no set space between them.
List of programs
You can find different commercial OCR programs, for example ABBYY FineReader, AnyDoc Software, Brainware, ExperVision TypeReader & RTK, Image to OCR Converter, Microsoft Office Document Imaging, Microsoft Office OneNote 2007, Nicomsoft OCR, OmniPage, Readiris, ReadSoft, RelayFax, Scantron, SmartScore, Transym OCR and Zonal OCR. You can also find open source programs, such as CuneiForm/OpenOCR, GOCR, hOCR, Ocrad, Ocre, OCRopus, Puma.NET and Tesseract.
Many commercial and open source OCR systems are available for the most common writing systems such as the Latin alphabet, Cyrillic, Arabic, Hebrew, Hindi, Bengali, Devanagari, Tamil, Chinese, Japanese and Korean.
1870-1931: The first OCR ideas were conceived. Devices were invented to aid blind people with reading, such as the optophone by Fournier d´Albe, a machine that read characters and turned them into standard telegraph code, and the Tauschek reading machine.
1931-1954: The first OCR tools were invented and applied to industry. These tools could interpret Morse code and read a text with voice.
1954-1974: Development of Optacon, the first portable OCR device. Similar devices were used in order to digitize coupons and postal addresses for Reader´s Digest.
1974-2000: Scanners used in order to read price labels and passports. Companies such as Caere Corporation, ABBYY and Kurzweil Computer Products Inc were created.
In the decade of the 2000´s, OCR was made available online as a computer service (WebOCR) on the cloud and in mobile applications such as real-time translations of signs from foreign languages via smartphones. With the arrival of smartphones and smartglasses, OCR services can be used via applications on mobile devices connected to the Internet that extract the captured text using the camera.