# Introduction to Data Collection This chapter describes the data collection options and source-computer identification method. Process Intelligence Platform’s data collection combines accurate business process data capture and advanced anonymization techniques. Data collection is justified, minimized and defined.

Process Intelligence data collection principles

Anonymity of the source-computer is achieved with: (1) team-level source-computer identification, (2) opting-in business process applications, and (3) versatile configuration of business process data collection for each opted-in business application. This approach does not collect Personally Identifiable Information (PII) at all. Process Intelligence Agent utilizes various technologies to collect data on Windows platforms that are grouped under the term Work API. Examples of the used technologies are Windows COM (Component Object Model), DOM (Document Object Model), and Windows DLLs (Dynamic Link Libraries). ## Defining data collection settings Process Intelligence platform does not collect data without data collection settings done by the Customer using the Process Intelligence Dashboard’s Work API Configuration functionality: * Define opt-in of applications: which applications are being part of analysis. * Configure collected data for each opt-in applications: define what data is being collected from those applications * Apply configurations to Agents: Data collection settings are automatically updated to all computers. Different teams can have different settings. Collected business process data is clearly defined and done by the customer. Process Intelligence Agents then observes the use of opt-in business applications and collects the business data according to the settings. All other applications are being ignored and out-of-scope of the analysis. ## Identification of Agents Each Process Intelligence Agent is linked to a specific customer organization and a team. Unique team tokens are used for that purpose. This way the Platform does not need personal information to identify and separate different source-computer users. Agents create random session IDs that they use to separate repetitive workflows on source computers. These session IDs are changing so that the data sent to Process Intelligence Platform will not create data sets of individual source computer users. For the sake of clarity: As there are no unique identifiers of a particular computer name or username collected by the Process Intelligence platform, there is no possibility of identifying individual computer users from the data stored in Process Intelligence's databases. ## Opting-in business applications The first step of allow-listing business process applications is to define the applications related to performing business transactions. The options are explained below.

Application Type	Example	Notes
Desktop Application	sapgui.exe	Native Windows applications
Web Application	organization.salesforce.com	The minimum part of the included domain. E.g., different.salesforce.com would be excluded.
Web Portal	invoices.organization.de
Application on a virtual or remote desktop	Wfica32.exe (Citrix)	Depending on the target application design, some part of data collection might be short. Process Intelligence Agent can also be installed on the virtual machine for more granular data collection.

## Configurable data collection types More detailed business process data collection for opt-in applications can be defined using data tagging, identifiers, and salvage fields. Those options are explained below.

Data Type	Explanation	Examples and Notes
Tag	Fixed keyword identified in titles, URLs, or UI.	E.g., “Invoice” or “Report”.
Identifier	Variable/process identifier in title, URL, or UI. Option for value hashing.	E.g., “Invoice number” = 12345, “Customer name” = Workfellow.
Salvaged Data	Collecting data in original format as a training data to help define tags and identifiers.	E.g., enabling URL salvage for domain app.sap.com, would mean that visits to subpages are being collected: app.sap.com/reports, app.sap.com/invoices…

{% hint style="info" %} For the sake of clarity: the collected data through opt-in and allow-listing is defined by the customer. {% endhint %} ## Configurable data collection capabilities

Data	Collection Status	Purpose
Team-token	Yes	To identify to which team and organization the computer belongs.
Session-ID	Yes for opt-in	To separate workflows.
Time stamps	Yes for opt-in	To identify time ranges and durations
Application name	Yes for opt-in	To separate different applications.
Mouse click elements	Click events for opt-in, tagging element names is an option	To identify field changes in a process. E.g., identify address field edits in invoicing application to find master data problems.
Typed keyboard length	Yes for opt-in, salvage option for specific application windows	Yes for opt-in, salvage option for specific application windows
Keyboard shortcuts	Typical ones for opt-in	To identify manual data flows and activities. E.g. CTRL+C, CTRL+V
Clipboard activity type: text, image or file	Yes for opt-in, salvage option for specific application windows	To identify manual data flows between applications.
Window titles	Yes for allow-listed	To separate different windows within applications.
Case identifiers	Yes for allow-listed	To identify process transactions.
File type	Yes for allow-listed	To identify used file formats
Business process related web URLs	Yes for allow-listed	Only specified business process-related web applications are included in data collection.

--- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://processmaker.gitbook.io/worfellow/data-collection/introduction-to-data-collection.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.