Create Data Collection Configurations
Creating data collection configurations
The purpose of this document is to provide guidance on the data collection configuration setup process for Workfellow Task Miners. The process encompasses the following steps:
Identifying initial data collection requirements
Developing and testing data collection settings
Reviewing data collection requirements based on initial data collection results
Modifying data collection settings in accordance with updated requirements
Configuring process mining features on the Dashboard
Please refer to the onboarding process documentation for more detailed information on the onboarding process. You may also find the support material for the onboarding process helpful as you complete these tasks.
Create initial data collection requirements
Start by gathering the initial data collection requirements and filling in the requirements template with the relevant information. It is essential that the data you collect is accurate and up-to-date, as it will be used to set up the data collection from task miners.
To complete this task, you will need to do the following:
Collect a list of what applications should be tracked with Workfellow. For each tracked application define whether tracking should include only application usage, or if more details (title and/or URL) should be collected. Collected details can be used to define more detailed, application-specific tagging and identifier extraction. Identifier extraction is required to use Workfellow’s process mining features.
Use the template provided to document the information you gather. Make sure to double-check that all data is accurate and complete.
Accurate data collection requirements are crucial to success. Data collection settings to be uploaded to task miner users’ computers are automatically verified against the requirements template. Any errors in requirements can lead to a variety of risks, including:
Delays in the onboarding process: If the information you gather is incorrect or incomplete, it may take longer to set up the data collection.
Security risks: Incorrect requirements can also pose security risks, as they may result in Workfellow accidentally collecting sensitive data from users’ computers.
Please take care to ensure that all information you gather is correct and up-to-date. If you have any questions or concerns, don't hesitate to reach out to the [onboarding team/relevant point of contact].
Create and test settings
Create the data collection settings and run automated tests to validate that they fulfill all the data collection requirements. This includes the following steps:
Create data collection settings.
Test the created settings against the data collection requirements. Refine settings until all tests pass.
Generate unit tests for the settings and check that all settings are tested.
If any issues are identified during testing, troubleshoot and resolve them with the person responsible for the data collection requirements before completing this task.
Refer to Sluice documentation for details on data collection settings creation and testing.
Review initial settings
Review the settings before uploading them to use. This includes the following steps:
Check that the requirements used in the settings generation correspond to the approved data collection requirements linked in the main task.
Run the tests and check that all of them pass.
Check that unit tests are included.
Browse through the data collection settings and try to spot any errors.
If any issues are identified during reviewing, troubleshoot and resolve them before completing this task.
Refer to Sluice documentation for details on data collection settings creation and testing.
Upload initial settings
Upload the data collection settings to production. Refer to Dashboard Server documentation for instructions.
Generate requirements template
After data has been collected for a while, the requirements can be refined to include more details. To support this, generate requirement template(s) based on the initial data collection.
Use Sluice's requirement generator to convert data sample(s) from MongoDB to requirements template
Generate one template for each application that was tracked in “training mode” during the initial data collection, that is, applications where title and/or url field(s) were collected.
Upload requirement templates to the customer’s folder in Google Drive
Add a link to the template folder in the main task.
Refer to Sluice documentation for details on data collection requirement template generation.
Data analysis: remove unnecessary rows from requirements template(s)
Often the requirements template needs to be cleaned up by removing near duplicates. Near duplicates are records that are very similar to each other, but may not be exact duplicates.
Identify near duplicates: Use a deduplication tool or manual review to identify near duplicates in each requirement template.
Determine which records to keep: Determine which of the near duplicate records should be kept and which should be removed, based on factors such as completeness, accuracy, and relevance.
Remove near duplicates: Use a deduplication tool or manual review to remove the near duplicate records that were identified in Step 1.
Link to data collection requirement templates is in the main task.
Refer to Sluice documentation for possible deduplication tools (under construction).
Fill in detailed data collection requirements
Fill in the detailed data collection requirements based on the results of initial data collection. It is important that the data you collect is accurate and up-to-date, as it will be used to set up the data collection from task miners.
To complete this sub-task, you will need to do the following:
Go through each cleaned requirement template linked in the main task. There should be one template for each application that was in “training mode” during the initial data collection.
Add appropriate content-category tags
Add definitions for extracted identifiers as necessary.
Review the tagging and identifier extraction for correctness.
Accurate data collection requirements are crucial to success. Data collection settings to be uploaded to task miner users’ computers are automatically verified against the requirements template. Any errors in requirements can lead to a variety of risks, including:
Delays in the onboarding process: If the information you gather is incorrect or incomplete, it may take longer to set up the data collection.
Security risks: Incorrect requirements can also pose security risks, as they may result in Workfellow accidentally collecting sensitive data from users’ computers.
Please take care to ensure that all information you gather is correct and up-to-date. If you have any questions or concerns, don't hesitate to reach out to the [onboarding team/relevant point of contact].
Update and test settings
Create the data collection settings and run automated tests to validate that they fulfill all the data collection requirements. This includes the following steps:
Create data collection settings.
Test the created settings against the data collection requirements. Refine settings until all tests pass.
Generate unit tests for the settings and check that all settings are tested.
If any issues are identified during testing, troubleshoot and resolve them with the person responsible for the data collection requirements before completing this task.
Refer to Sluice documentation for details on data collection settings creation and testing.
Review updated settings
Review the settings before uploading them to use. This includes the following steps:
Check that the requirements used in the settings generation correspond to the approved data collection requirements linked in the main task.
Run the tests and check that all of them pass.
Check that unit tests are included.
Browse through the data collection settings and try to spot any errors.
If any issues are identified during reviewing, troubleshoot and resolve them before completing this task.
Refer to Sluice documentation for details on data collection settings creation and testing.
Upload updated settings
Upload the data collection settings to production. Refer to Dashboard Server documentation for instructions.
Last updated