Intelligent Document Processing
processmaker.comKnowledge CenterDevelopers Corner
  • ProcessMaker Intelligent Document Processing
  • What's New
  • Release Notes
  • Glossary
  • 🟦IDP User
    • Document Management
      • Files and Folders
      • Preview a Document
      • Version Control
      • Access Control
      • Elastic Search
    • Intelligent Document Processing
  • 🟪IDP Administrator
    • Entity Management
      • Create a New Entity
    • Excel Import and Export
    • OCR Service
    • Classification Service
    • Named Entity Recognition
    • Annotations
    • Authorization
      • Example Authorization Configuration
    • Importer
    • Elastic Search Configuration
    • Email Integration
    • Email Notifications
    • Audit Log
    • Retention Management
    • Power BI
    • Exports
    • Translations
  • 🟦ProcessMaker Administrator and Designer
    • IDP Admin Settings
    • IDP Connector in Processes
  • đźź©IDP Developer
    • REST API Home
    • Key Concepts
    • Authentication
    • Request Syntax
    • Endpoints for Entity Objects
    • Endpoints for Documents
    • WebSockets
Powered by GitBook
On this page
  • Conditions in OCR Configuration
  • Reprocess OCR
  1. IDP Administrator

OCR Service

PreviousExcel Import and ExportNextClassification Service

Last updated 11 months ago

The OCR service in ProcessMaker IDP extracts text from images and PDF files (with or without a text layer) after the file is created in the system. Key features include:

  • Text Extraction: Recognizes and digitizes text from images or scanned documents.

  • Language Detection: Automatically detects the language of the extracted text.

  • Full Integration: Users only need to upload a file; metadata is updated automatically.

The OCR service returns the results to the integration module, which updates the FULL_TEXT and LANGUAGE attributes of the FILE entity with the extracted data. The message is then forwarded to the Classification service, where attributes are classified. If a user specifies the language during upload, it will not be overwritten by the OCR service.

ProcessMaker IDP supports 126 languages and language variants. If a different language is detected, it will be marked as "other."

If no text can be extracted initially, OCR is performed on the following image file types: BMP, PNG, JFIF, JPEG, and TIFF (with MIME types “image/bmp”, “image/png”, “image/jpeg”, “image/tiff”). PDF files are automatically converted to TIFF before processing. Other file types are ignored.


Conditions in OCR Configuration

A JSON condition can be added per OCR configuration so all files meeting the condition are processed through OCR. ProcessMaker IDP merges overlapping configurations when more than one condition is met.


Reprocess OCR

With JSON conditions, administrators can target specific sets of documents for reprocessing using the new actions 'Reprocess dossiers', 'Reprocess folder', and 'Reprocess document' through OCR. Only ProcessMaker IDP Administrators can perform these actions from the action menu. All subjacent files in a dossier or folder are processed using the OCR configurations that match the files.

🟪