...your ever-present help in time of need

Data Quality Pipeline Solution

Clients

Multinational pharmaceutical and healthcare company.

Problem

Scientist has a biological sample inventory report with several data collection file (DCT) templates in MS Excel format and registered in different inventory systems.

Data are separated in different DCT according to type of samples. VBA macro perform few checks i.e. dictionary-controlled values, missing mandatory data/ However due to the diverse data and complex level of detail,  there is a need to find a more reliable, robust and accurate solution to perform data quality checks.

Solution

  • Developed a Script to connect to Bi-ample data store and write the data into Cloud Storage
  • Configure a scheduler rule to trigger Step Functions State machine.
  • Defined and Developed Workflows to perform data quality checks & verify sample registrations.

Technology

*Python  *AWS  *REST API   *EventBridge

Success

  • Provision of a visual data preparation framework tool which makes it easier for the Scientist to clean and normalize data for analytics and machine learning.
  • Consistent data patterns and data quality.
  • Easy data profiling and anomaly detection capabilities.