# OCR Service

The OCR service in ProcessMaker IDP extracts text from images and PDF files (with or without a text layer) after the file is created in the system. Key features include:

* **Text Extraction**: Recognizes and digitizes text from images or scanned documents.
* **Language Detection**: Automatically detects the language of the extracted text.
* **Full Integration**: Users only need to upload a file; metadata is updated automatically.

The OCR service returns the results to the integration module, which updates the `FULL_TEXT` and `LANGUAGE` attributes of the `FILE` entity with the extracted data. The message is then forwarded to the Classification service, where attributes are classified. If a user specifies the language during upload, it will not be overwritten by the OCR service.

{% hint style="info" %}
ProcessMaker IDP supports 126 languages and language variants. If a different language is detected, it will be marked as "`other`."&#x20;
{% endhint %}

If no text can be extracted initially, OCR is performed on the following image file types: BMP, PNG, JFIF, JPEG, and TIFF (with MIME types “image/bmp”, “image/png”, “image/jpeg”, “image/tiff”). PDF files are automatically converted to TIFF before processing. Other file types are ignored.

***

## Conditions in OCR Configuration

A JSON condition can be added per OCR configuration so all files meeting the condition are processed through OCR. ProcessMaker IDP merges overlapping configurations when more than one condition is met.

***

## Reprocess OCR

With JSON conditions, administrators can target specific sets of documents for reprocessing using the new actions 'Reprocess dossiers', 'Reprocess folder', and 'Reprocess document' through OCR. Only ProcessMaker IDP Administrators can perform these actions from the action menu. All subjacent files in a dossier or folder are processed using the OCR configurations that match the files.

<figure><img src="/files/WGyx4yBYyhp0bz7Z6QiN" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://processmaker.gitbook.io/idp/idp-administrator/ocr-service.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
