Glossary | Intelligent Document Processing

AI Model Gateway

A feature that allows seamless data extraction using custom AI models, providing labels in JSON format and visual representation in images.

Analyzer

A component in Elastic Search that defines how a string is processed into searchable parts.

Anonymization

Hide personal identification information from a document.

Annotation

A feature that allows users to add specific, detailed notes or comments to documents. This helps in marking and highlighting key areas of interest or concern within the document for future reference.

Archive Extraction

The automatic unpacking of .zip and .rar archives, creating subfolders with the archive's name and placing its contents inside.

Attribute

A property of an entity, such as a name for a customer or a price for a car.

Authorization

The process of determining user permissions and access to entities and operations within ProcessMaker IDP.

Binary Metadata

Metadata attributes for binary files, such as media type, hash, file size, and file name.

Bool Query

A type of query in Elasticsearch that combines multiple queries using Boolean operators like MUST, MUST_NOT, and SHOULD.

Bucket

A collection of documents in Elasticsearch that meet a specified criterion, used in aggregations.

Classification

The process of categorizing documents using predefined models and scoring systems.

Core Schema

The foundational schema in ProcessMaker IDP that describes all entities in the datastore and is adjustable to fit organizational needs.

Dossier

A top-level folder in the Files and Folders module that can contain multiple subfolders and files.

Elastic Search

A search engine based on the Lucene library, used in ProcessMaker IDP for indexing and searching data.

Entity

A type of data, such as a person, location, or organization, that can have various properties.

Field Mapping

The process of defining how an entity's attributes are mapped to Elasticsearch fields.

Index

A collection of documents in Elasticsearch that is made searchable and can be configured to suit specific business needs.

Named Entity Recognition (NER)

A process that identifies and categorizes entities within a text, such as names of people, organizations, and locations.

Normalizer

A component similar to an analyzer in Elasticsearch but ensures the analysis chain produces a single token.

OCR (Optical Character Recognition)

The technology used to convert different types of documents, such as scanned paper documents or PDFs, into editable and searchable data.

Query

A request to retrieve data from Elasticsearch, which can be composed of leaf queries and compound queries.

Retention Management

The process of managing the lifecycle of data, including defining retention rules and detection jobs.

Rendition

Renditions are alternative representations of a document, or its content after ProcessMaker IDP has processed and analyzed it.

Scripts and Regexes

Custom scripts and regular expressions are used for various purposes in ProcessMaker IDP, such as extracting data to value scripts, setting default values, validating data, and controlling visibility.

Tokenizer

A component in Elasticsearch that breaks a string into tokens or terms, which are then indexed for search.

Validation Rules

Rules configured to ensure data integrity during Excel import and export processes.

WebSocket

A protocol providing full-duplex communication channels over a single TCP connection, used in ProcessMaker IDP for real-time updates and notifications.