# Introduction to Data Collection

This chapter describes the data collection options and source-computer identification method. Process Intelligence Platform’s data collection combines accurate business process data capture and advanced anonymization techniques. Data collection is justified, minimized and defined. <br>

<figure><img src="/files/YNOT0n0icBVnmrqGAEXB" alt=""><figcaption><p>Process Intelligence data collection principles</p></figcaption></figure>

Anonymity of the source-computer is achieved with: (1) team-level source-computer identification, (2) opting-in business process applications, and (3) versatile configuration of business process data collection for each opted-in business application. This approach does not collect Personally Identifiable Information (PII) at all.&#x20;

Process Intelligence Agent utilizes various technologies to collect data on Windows platforms that are grouped under the term Work API. Examples of the used technologies are Windows COM (Component Object Model), DOM (Document Object Model), and Windows DLLs (Dynamic Link Libraries).&#x20;

## Defining data collection settings&#x20;

Process Intelligence platform does not collect data without data collection settings done by the Customer using the Process Intelligence Dashboard’s Work API Configuration functionality:&#x20;

* Define opt-in of applications: which applications are being part of analysis.&#x20;
* Configure collected data for each opt-in applications: define what data is being collected from those applications&#x20;
* Apply configurations to Agents: Data collection settings are automatically updated to all computers. Different teams can have different settings.&#x20;

Collected business process data is clearly defined and done by the customer. Process Intelligence Agents then observes the use of opt-in business applications and collects the business data according to the settings. All other applications are being ignored and out-of-scope of the analysis.&#x20;

## Identification of Agents&#x20;

Each Process Intelligence Agent is linked to a specific customer organization and a team. Unique team tokens are used for that purpose. This way the Platform does not need personal information to identify and separate different source-computer users.&#x20;

Agents create random session IDs that they use to separate repetitive workflows on source computers. These session IDs are changing so that the data sent to Process Intelligence Platform will not create data sets of individual source computer users.&#x20;

For the sake of clarity: As there are no unique identifiers of a particular computer name or username collected by the Process Intelligence platform, there is no possibility of identifying individual computer users from the data stored in Process Intelligence's databases.&#x20;

## Opting-in business applications&#x20;

The first step of allow-listing business process applications is to define the applications related to performing business transactions. The options are explained below.

<table><thead><tr><th width="219">Application Type </th><th width="240">Example</th><th>Notes</th></tr></thead><tbody><tr><td>Desktop Application</td><td>sapgui.exe</td><td>Native Windows applications</td></tr><tr><td>Web Application</td><td>organization.salesforce.com</td><td>The minimum part of the included domain. E.g., different.salesforce.com would be excluded.</td></tr><tr><td>Web Portal</td><td>invoices.organization.de</td><td></td></tr><tr><td>Application on a virtual or remote desktop</td><td>Wfica32.exe (Citrix)</td><td>Depending on the target application design, some part of data collection might be short. Process Intelligence Agent can also be installed on the virtual machine for more granular data collection.</td></tr></tbody></table>

## Configurable data collection types&#x20;

More detailed business process data collection for opt-in applications can be defined using data tagging, identifiers, and salvage fields. Those options are explained below.

<table><thead><tr><th width="207">Data Type</th><th>Explanation</th><th>Examples and Notes</th></tr></thead><tbody><tr><td>Tag</td><td>Fixed keyword identified in titles, URLs, or UI.</td><td>E.g., “Invoice” or “Report”.</td></tr><tr><td>Identifier</td><td>Variable/process identifier in title, URL, or UI. Option for value hashing.</td><td>E.g., “Invoice number” = 12345, “Customer name” = Workfellow.</td></tr><tr><td>Salvaged Data</td><td>Collecting data in original format as a training data to help define tags and identifiers.</td><td>E.g., enabling URL salvage for domain app.sap.com, would mean that visits to subpages are being collected: app.sap.com/reports, app.sap.com/invoices…</td></tr></tbody></table>

{% hint style="info" %}
For the sake of clarity: the collected data through opt-in and allow-listing is defined by the customer.
{% endhint %}

## Configurable data collection capabilities

<table><thead><tr><th width="202">Data</th><th width="198">Collection Status</th><th>Purpose</th></tr></thead><tbody><tr><td>Team-token </td><td>Yes</td><td>To identify to which team and organization the computer belongs.</td></tr><tr><td>Session-ID </td><td>Yes for opt-in </td><td>To separate workflows.</td></tr><tr><td>Time stamps</td><td>Yes for opt-in </td><td>To identify time ranges and durations</td></tr><tr><td>Application name</td><td>Yes for opt-in </td><td>To separate different applications.</td></tr><tr><td>Mouse click elements</td><td>Click events for opt-in, tagging element names is an option</td><td>To identify field changes in a process. E.g., identify address field edits in invoicing application to find master data problems.</td></tr><tr><td>Typed keyboard length</td><td>Yes for opt-in, salvage option for specific application windows</td><td>Yes for opt-in, salvage option for specific application windows</td></tr><tr><td>Keyboard shortcuts</td><td>Typical ones for opt-in</td><td>To identify manual data flows and activities. E.g. CTRL+C, CTRL+V</td></tr><tr><td>Clipboard activity type: text, image or file</td><td>Yes for opt-in, salvage option for specific application windows</td><td>To identify manual data flows between applications.</td></tr><tr><td>Window titles</td><td>Yes for allow-listed</td><td>To separate different windows within applications.</td></tr><tr><td>Case identifiers</td><td>Yes for allow-listed</td><td>To identify process transactions.</td></tr><tr><td>File type</td><td>Yes for allow-listed</td><td>To identify used file formats</td></tr><tr><td>Business process related web URLs</td><td>Yes for allow-listed</td><td>Only specified business process-related web applications are included in data collection.</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://processmaker.gitbook.io/worfellow/data-collection/introduction-to-data-collection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
