What is OCR?

According to Wikipedia, OCR (in Eng. optical character recognition) is a “set of techniques or software used to recognise characters and entire texts in a raster-based image file. The primary task of OCR is to recognise text in a scanned document (for instance, a paper form or book page)”.

So much for definition. As regards services, OCR includes, among others:

  • conversion of PDF files to Word (while retaining the layout and graphics)
  • conversion of image files (scans/photos) containing text to Word format
  • advanced image editing of PDF files (substitution of text/translation)
  • preparation of materials for Translation Agencies

Preparation of documents for translation

Our OCR services are primarily addressed to translation agencies for which we compile source materials and prepare them for translation in. doc/. docx or formats supported by CAT software (programmes supporting translation).

Non-editable files, such as advertising folders or instruction manuals where their visual presentation plays a key role - most often require preparation before translation.

The OCR service often requires both the use of OCR software and the work of a DTP specialist, such as for folders, instructions or advertising materials sent in the form of scans, photos or poor quality pdf files. With uncomplicated pdf documents, the simple use of OCR software sometimes produces satisfactory results. However, the quality of editable documents obtained depends to a large extent on the source text.