By Margarita Młodziejewska and Henning Soller
Enabling personalization in customer experience. Detecting fraudulent activity in financial services. Optimizing supply chains in manufacturing. Accelerating drug discovery and testing. The common thread in all of these efforts is advanced analytics.
Over the past decade, advanced analytics has become a top priority across industries: 90 percent of companies recognize its value, and many have started to put internal analytics organizations in place, with an eye toward scaling use cases.
To date, however, the majority of companies have not been able to unlock the full potential of advanced analytics—with the main reason being the lack of capabilities and repeatable processes needed to roll out new algorithms and analytics models. In a recent survey of major advanced analytics programs, we found that 80 percent of companies’ time in analytics projects is spent on repetitive tasks such as preparing data, whereas the actual value-added work is limited. Moreover, just 10 percent of companies believe they have this issue under control.1
Many organizations have pointed to DataOps as the remedy and have put forward various definitions—DataOps is an organizational concept or a toolchain, for instance. In fact, DataOps is an automated, process-oriented methodology used by analytics and data teams to improve quality and reduce the cycle time of advanced analytics. This emerging approach can enable companies to gain more value from their data by expediting the process of building models.
Overcoming barriers to advanced analytics
Despite the nearly limitless applications of advanced analytics, many companies have found the impact of their investments to be quite limited given the challenges of scaling the use cases and of integrating advanced analytics into their environment and processes.
In an effort to harness data, companies typically implement modern data architectures, such as data lakes, lab environments, and next-generation tooling for advanced analytics. While these investments ease the preparation of new algorithms, many tasks still present obstacles. For example, models are not documented and therefore not scalable, and the testing of models is handled manually, significantly lengthening the process. In addition, environment and data preparation takes time, and models must be adjusted during testing and production to account for different configurations and technologies. The teams involved in advanced analytics are then often kept busy with maintenance of their use cases—constraining their time and motivation to venture into further development of use cases.
Companies seeking to address these challenges won’t make progress by taking a piecemeal approach. Instead, they must account for not only technology but also processes and people.
DataOps to the rescue
Leading companies have begun to embrace DataOps. As the name suggests, this approach applies agile development, DevOps, and lean manufacturing to data-analytics development and operations. As such, it represents a comprehensive change along the key dimensions of people, process, and technology:
- People. A change of the skill set and culture toward the continuous usage of data and the automation of enhanced processes through data.
- Process. An end-to-end revision of processes to aim for streamlined and fully automated deployment of all types of new analytics models to production.
- Technology. The setup of an end-to-end toolchain for full automation of the integration and deployment pipeline for models (from model definition to taking them to production).
Accordingly, its scope is much broader and its aims more transformative. It is not just a mere extension of DevOps; for example, the automated testing of models requires different and more elaborate tooling than the typical test scripts leveraged for typical programming. In this regard, it encompasses MLOps, which is the application of DataOps to machine learning models.
DataOps strives to foster collaboration among data scientists, engineers, and technologists so that every team is working in sync to use data more appropriately and in less time. In our experience, this visibility and coordination by multidisciplinary teams leads to more accurate analysis, better insights, improved business strategies, and higher profitability (exhibit).
Companies can apply DataOps as an enablement tool across the value chain, from data ingestion through processing, modeling, and insights for the end user. It empowers the provisioning of production data through automated data ingestion and loading from multiple sources. The use of automation for data transformation reduces time-consuming and error-prone steps in the pipeline, continuously improves analytics operations and performance, and allows for faster deployment and releases. Last, it accelerates the time to value from data by enabling teams to access real-time data and adjust their business decisions based on the results.
Companies that embed DataOps in their organization are able to achieve a range of performance improvements. In our experience, the volume of new features can be increased by 50 percent because data automation enables quicker development iterations. By automating repeated operational tasks, companies can also reduce errors. And continuous code quality checks and early detection of data inconsistencies and risks can improve processes and reduce an organization’s tech debt. In the same way, adjustments across people, processes, and technology can, in our experience, reduce time to market by 30 percent, enhance productivity by up to 10 percent, and cut IT costs by as much as 10 percent.
DataOps in action
The experience of a global pharmaceutical company illustrates the impact of DataOps. The company was struggling with a range of challenges that included the failure to deploy data-science and data-engineering resources and a culture that didn’t rely on data and analysis to inform decision making. Too often, data engineers and data scientists were focused on finding and modeling the data needed to run their models, meaning that it took three to six months for new algorithms to be incorporated into actual processes on the ground.
The company saw an opportunity to significantly improve the performance of its drug-discovery process by accelerating the integration of advanced analytics into its operations. Specific use cases included the lead-finding process for new drugs as well as the actual operations of the plants.
After adopting DataOps, the company was able to improve its development and deployment of analytics processes as well as the quality of the insights. It has now automated the generation of test data and developed improved methodologies to engineer the data for models. It can now catalog its inventory of models and algorithms, automate testing of a large part of the models aligned to the tool landscape, and increase access to various stakeholders. As a next step, the company is now looking to harness DataOps to rapidly accelerate its drug-development cycles.
Getting started
Implementing DataOps is not a quick and easy fix. It requires companies to adopt a combination of strict governance, the right technologies, and upskilling programs. In this way, it is not a one-step process but a longer journey.
Specifically, the challenge, in contrast to DevOps, is that standardized toolchains and out-of-the-box solutions are often not available and need to be put together. Completing such tasks requires time, dedication, and the right data-engineering talent.
To extract more value from data and analytics, companies must set a clear ambition from the top of the organization, build highly skilled data-engineering teams, and implement and develop the necessary tools. This will require ring-fencing the required resources for a certain period of time. Organizations can begin by taking the following three steps:
- Set up new governance for the different steps in DataOps, mandate full automation, and integrate more closely with the business through translators.
- Upskill the existing data teams to enable the automation.
- Build the required tools to support operations and boost performance.
Companies should seek to implement these three steps at scale. By developing a common language and standardizing deployment models for advanced analytics, companies can pursue the industrialization of DataOps.
Margarita Młodziejewska is a consultant in McKinsey’s Warsaw office, and Henning Soller is a partner in the Frankfurt office.
1 McKinsey survey of 32 companies globally, conducted in 2020 for this post.