Data Collection Rules

The Usage Basic flow

There are two main methods to add new applications, windows, and processes for tracking in the Process Intelligence platform’s data collection:

Define Applications and Data Collection Rules Yourself:
- Navigate to Admin Panel -> Configure Data Collection -> Configuration Portal -> Add Custom App.
Use the Application Catalog:
- Navigate to Admin Panel -> Configure Data Collection -> Configuration Portal -> Catalog Apps.

Defining data collection in UI (No-code)

Using no-code tools, you can define basic data collection rules based on the title, URL, and process name attributes. These rules use keyword matching, allowing you to specify keywords for each attribute. When all keywords match the active window's attributes, the window is labeled according to the defined Application/Window. The matching rules are and-based, meaning all the defined keywords need to match, not just one of them.

Example 1

If a user sets the process name keyword to excel.exe and the title keyword to invoice for Excel's Invoice Window, both conditions must be met for the active window to be identified as the Invoice Window in Excel.

Example 2

A user can also define multiple keywords to match from the URL. For example, if the keywords http://salesforce.com and 'case' are set, the URL must contain both keywords to be a match. For instance, salesforce.com/customers/case would match because both http://salesforce.com and case are found somewhere in the URL.

Defining data collection in Code (Code)

The data collection rules defined in the UI are automatically converted to JSON code. You can also manually edit this code for greater flexibility and advanced options. The JSON code includes four key fields for data collection: tags, extract_identifiers, and matching_criteria, dashboard_context. This chapter explains how to configure these key fields.

Extracted identifiers

Extracted identifiers specify the process-related identifiers and attributes collected from the active window. These can be gathered from sources like the URL or title using regex matching or URL query parameters. In advanced configurations, identifiers can also be collected from the UI. For this, you must first define the UI-capturing rule in the reactions field (not covered in this tutorial) of the configuration JSON file. The extract_identifiers is a list of extraction rules. Here’s an example of the extracted identifiers field:

"extract_identifiers": [
   {
      "id": "04c24a3a-86bb-41c7-a8e9-c546c94c4b56",
      "identifier_name": "SAP - Purchase Order",
      "key": null,
      "regex_pattern": "(?<=po/)(.*?)(?=[?/]|$)",
      "hash_identifier": false,
      "from_fields": [
         "url"
      ],
      "compiled_regex": "(?<=po/)(.*?)(?=[?/]|$)"
   }
]

Matching Criteria

While tags and extracted identifiers define what information is collected from the window, the matching criteria determines when this information is collected. Data (e.g. tags and extracted identifiers) is collected from the active window only if the matching criteria is met.

The matching criteria consist of two fields: rule_engine_rule and context.

rule_engine_rule: Allows users to define simple Python code to evaluate the matching criteria. You can automatically access some attributes of the active window (e.g., url, title_lower, and process_name_lower) and compare them to the matching criteria. For more information on the Python rule engine, visit: zeroSteiner/rule-engine.
context: Defines variables passed to the rule_engine_rule. These variables can be accessed within the rule using a Python dictionary. For example, if a variable variable1 is defined in context, its value can be accessed in the rule_engine_rule by calling context['variable1'].

Here’s an example of the matching criteria field:

"matching_criteria": {
   "rule_engine_rule": "active_process_name_lower and context['process_name'] in active_process_name_lower",
   "context": {
      "process_name": "code.exe"
   }
}

Dashboard Context

The dashboard_context has three fields that help categorize rules correctly in the admin panel:

app_name: Should match the appname tag defined in the tags field.
window_name: Should match the content-category tag. This field is optional if no content-category is defined.
process_name: Should match the extracted_identifiers.identifier_name tag. This field is optional if no identifiers are defined.

Here’s some example of the dashboard context field:

"dashboard_context": {
   "app_name": "Excel",
   "window_name": null,
   "process_name": null
}

"dashboard_context": {
   "app_name": "Salesforce",
   "window_name": "Case",
   "process_name": "Salesforce Case Tracking"
}

Defining Tests in UI (No-code)

To verify that the matching criteria function correctly, you can define test cases. Each test case includes a full URL, process name, and title. By running these tests, you can ensure that keyword-based matching identifies the window as expected.

For example:

The user has created keywords sap.com for the URL and Purchase Order for the title.
In the test case, the user can define an example event with a realistic URL, title, and process name:
- URL: https://mycompany.sap.com/view/po
- Process Name: chrome.exe
- Title: SAP - Purchase Order | Google Chrome

Running this test will verify if the defined keyword rules correctly match the window based on the provided test case.

Running the tests: Admin Panel -> Configure Data Collection -> Run Tests

Defining Tests in code (Code)

The data collection rules defined in the UI are automatically converted to JSON code. You can manually edit this code by downloading the file from Configure Data Collection → Advanced Setup → Test File Management. After making changes, upload the edited tests back to the dashboard through the Test File Management interface. The JSON code includes two key fields for data collection: row_event and expected_processed_event.

Row Event

This field includes information similar to the UI, allowing you to input an example process name, URL, and title. The row_event field is an object with three optional fields:

process_name
url
title

You can define all three fields or just one of them.

Expected Processed Event

The expected_processed_event field specifies what information is expected to be collected from a window given the fields in row_event. This field is an object, and the most common fields in the expected_processed_event object are:

tags
extracted_identifiers

For additional examples of Data Collection, see Advanced Examples for Data Collection.

PreviousData Collection Studio NextAdvanced Examples for Data Collection

Last updated 5 months ago