About Our Customer

A global investment bank and financial services firm founded and based in Switzerland. Headquartered in Zürich, it maintains offices in all major financial centres around the world and provides services in investment banking, private banking, asset management, and shared services.

Their Use Case

The profit and loss of strategic trading decisions depend on the data available and its quality. If the data available is of low quality or faulty then trading decisions based on this data can result in losses instead of profit. Failed trading decisions result in reputational and financial damage for the bank. This may also result in the loss of customers for the bank.

Moreover, machine learning (ML) algorithms are becoming more and more dependent upon business operations and decision-making. Poor data quality significantly impacts the data insights produced by these ML algorithms.

It is vital to identify as many potential risks and issues with data quality before it can be used in any ML algorithms and trading decisions.

Our Solution

Our approach started with working with our business stakeholders in the creation of a Business Requirement Document (BRD) that captures the data quality metrics.

In collaboration with our client, we developed a Python-based Data Quality Management (DQM) solution that used Data Science and Machine Learning techniques to identify issues with data quality.

The quality of the data was assessed using different metrics such as:

Missing Data	Incorrect Data Format
Incorrect Timestamps	Duplicate Data
Outliers & Anomalous Data	Sudden Spikes
Dips in Data	Late Data
Not Enough Data Available

Our implementation featured the following:

We created the data quality checks as a generic Python library so it can be adapted and used in other domains as well.
The data quality library was deployed on the cloud (Azure Databricks).
The data quality library was optimized for performance and scalability where possible.
A visualization dashboard was also created which shows the results of data quality checks in an easy-to-follow visual format.

The Results

Our solution was delivered on time, and within budget. Both business and IT customers were very pleased with the results and they wanted to adopt our DQM solution to identify more data problems within other areas of the bank. The solution was generic and flexible to be used with a wide range of data types and sources. Moreover, the solution was scalable so that It could be scaled to enable large data processing using more processor cores.

In one of the data sources, using our solution, we found:

Around 2% of missing data
Around 1% of duplicate data
Around 4% of late data
Around 1.5% of outlier data
Around 5% of invalid format data