Intelligent Document Processing
processmaker.comKnowledge CenterDevelopers Corner
  • ProcessMaker Intelligent Document Processing
  • What's New
  • Release Notes
  • Glossary
  • 🟦IDP User
    • Document Management
      • Files and Folders
      • Preview a Document
      • Version Control
      • Access Control
      • Elastic Search
    • Intelligent Document Processing
  • 🟪IDP Administrator
    • Entity Management
      • Create a New Entity
    • Excel Import and Export
    • OCR Service
    • Classification Service
    • Named Entity Recognition
    • Annotations
    • Authorization
      • Example Authorization Configuration
    • Importer
    • Elastic Search Configuration
    • Email Integration
    • Email Notifications
    • Audit Log
    • Retention Management
    • Power BI
    • Exports
    • Translations
  • 🟦ProcessMaker Administrator and Designer
    • IDP Admin Settings
    • IDP Connector in Processes
  • 🟩IDP Developer
    • REST API Home
    • Key Concepts
    • Authentication
    • Request Syntax
    • Endpoints for Entity Objects
    • Endpoints for Documents
    • WebSockets
Powered by GitBook
On this page
  • Data Model
  • Configure Importer Location
  • Configuration for ProcessMaker IDP Communication
  • Configuration for Authorization
  • Configuration for Internal Database
  • Scraping Logic
  • Process Files and Folders
  1. IDP Administrator

Importer

The ProcessMaker IDP Importer is a standalone application that imports folder structures into ProcessMaker IDP from one or more locations. The primary use case is to import all files from a specific location. Depending on its configuration, the application can also function as a watcher, so the source data and application data stay synced.

By default, the Importer adds data to ProcessMaker IDP based on the creation date of items on the file share. The Importer does not have any dependency on ProcessMaker IDP and can, therefore, be reused for other platforms if needed.

Requirements and Guidelines

  • Any filesystem should be mounted on the machine where the Importer is running.

  • Any network shares should be mounted locally.

  • This setup means that the Importer does not need to have any knowledge about how the source is mounted.

  • Another advantage of this solution is that authentication credentials for the source are not necessary.

Data Model

To keep track of what is imported, the Importer will maintain records of all imported instances in an internal (embedded) database. The Importer will store the external ID (ProcessMaker IDP ID), absolute path, file size, and last modified date. The File and Folder entities are also extended with importer-specific attributes.


Configure Importer Location

The Importer can be configured via the application.yml file or via ProcessMaker IDP.

  1. In the admin panel, click the menu item "Importer Configuration"

    • It has an interval for re-fetching configuration (fixed: 15 minutes)

  2. Create an importer instance and add at least one importer_location instance.

    • Importer location:

      • One or more source locations (all source locations need to be on the same machine).

      • A path to the source location.

      • If you want to archive your files on the source location, select Archive enabled true.

      • A path to the archive.

    • Target location:

      • Either an existing folder/dossier ID or, if it is null, then the folders in the main source will be created as dossiers.

      • Default mode or when unchecked, it imports the file share hierarchy.

      • A flag for enabling/disabling the importer.

      • A description for each source location (optional).

  3. In the importer application.yml file:

    • Set use-remote-settings=true, importer-id=ID from ProcessMaker IDP.

  1. In the importer application.yml file:

    • Set use-remote-settings=false, importer-location-id=

  2. In the importer application.yml file, set the configuration for the source location and target location:

    • SOURCE_PATH: Path string for the main source directory

    • SOURCE_DELAY: Message process delay in milliseconds

    • DESTINATION_FOLDER_ID: ID of the destination folder in ProcessMaker IDP. For the dossier type, it can be empty.

Configuration for ProcessMaker IDP Communication

  • DOCULAYER_URL: ProcessMaker IDP base URL.

Configuration for Authorization

The Importer uses client-based authentication via Keycloak.

  1. A new client is created in Keycloak.

    • Access-type is set to 'Confidential' and 'Service accounts enabled' is on.

  2. These parameters need to be defined in the environment:

    • KEYCLOAK_TOKEN_URI: Keycloak token URL

    • KEYCLOAK_CLIENT_ID: Keycloak client name

    • KEYCLOAK_CLIENT_SECRET: Keycloak client secret text

Configuration for Internal Database

The Importer uses an internal database to check if a File or Folder has already been imported. The File/Folder data is stored in an H2 database (a local ContentItem DB for ProcessMaker IDP Importer). There are multiple ways to view the data:

  • Using the H2 web console

  • A tunnel may need to be set up from your local machine to the machine where the Importer is running.

  • Once the Importer is running, direct your browser to the following address: http://SERVER:9595.


Scraping Logic

The ProcessMaker IDP Importer uses a recursive directory scanner that visits all levels of the file tree. To prevent files from being placed in the wrong folder, files in the source root directory will not be imported. To solve this problem, all files should be stored in subfolders in the source directory. For the files in deeper levels of the hierarchy, nothing needs to be done since the Importer also creates the folders.


Process Files and Folders

  • When the file size and last modified date for the current path are the same, it is assumed that the current file or folder has not changed and will not be processed.

  • When the file size or the last modified date is different, the Importer will assume that the current file or folder has been updated and will send an update request to ProcessMaker IDP (it compares the hash first before sending the actual file).

  • When a file is updated in the source location, the changed file will be overwritten in ProcessMaker IDP.

PreviousExample Authorization ConfigurationNextElastic Search Configuration

Last updated 11 months ago

Using an external tool (e.g., dBeaver - )

🟪
https://dbeaver.io/