QuantumBlack, the advanced analytics firm we acquired in 2015, has now launched Kedro, an open source tool created specifically for data scientists and engineers. It is a library of code that can be used to create data and machine-learning pipelines. For our non-developer readers, these are the building blocks of an analytics or machine-learning project.
“Kedro can change the way data scientists and engineers work,” explains product manager Yetunde Dada, “making it easier to manage large workflows and ensuring a consistent quality of code throughout a project.”
Introducing Kedro
McKinsey has never before created a publicly available, open source tool. “It represents a significant shift for the firm,” notes Jeremy Palmer, CEO of QuantumBlack, “as we continue to balance the value of our proprietary assets with opportunities to engage as part of the developer community, and accelerate as well as share our learning.”
The name Kedro, which derives from the Greek word meaning center or core, signifies that this open-source software provides crucial code for ‘productionizing’ advanced analytics projects.
Kedro has two major benefits: it allows teams to collaborate more easily by structuring analytics code in a uniform way so that it flows seamlessly through all stages of a project. This can include consolidating data sources, cleaning data, creating features and feeding the data into machine-learning models for explanatory or predictive analytics.
Kedro also helps deliver code that is ‘production-ready,’ making it easier to integrate into a business process. “Data scientists are trained in mathematics, statistics and modeling—not necessarily in the software engineering principles required to write production code,” explains Yetunde. “Often, converting a pilot project into production code can add weeks to a timeline, a pain point with clients. Now, they can spend less time on the code, and more time focused on applying analytics to solving their clients’ problems.”
At a feature level, Kedro helps teams build data pipelines that are modular, tested, reproducible in any environment and versioned, allowing users to access previous data states.
“More importantly, the same code can make the transition from a single developer’s laptop to an enterprise-level project using cloud computing,” explains Ivan Danov, Kedro’s technical lead. “And it is agnostic, working across industries, models and data sources.”
Two years in the making, Kedro was the brainchild of two QuantumBlack engineers – Nikolaos Tsaousis and Aris Valtazanos, and QuantumBlack alumnus, Peteris Erins, who created it to manage their numerous workstreams. Kedro had started as a prototype library and was being quickly adapted by different teams when they brought it to Quantum Black Labs, a technical innovation group led by Michele Battelli.
“Client teams can rotate into our lab and have the resources to convert a one-off piece of software or database [such as Kedro] into a viable product that can be used across industries, and that will be continually improved,” explains Michele. “It is a powerful way of innovating; our tech teams can move faster, more efficiently, and make a lasting contribution.”
McKinsey has used Kedro on more than 50 projects, to date. According to Nikolaos, clients especially like its pipeline visualization. He explains that Kedro makes conversations much easier, as clients immediately see the different transformation stages, types of models involved, and can backtrack outputs all the way to the raw data source.
“Kedro began as a proprietary program, but when a project was over, clients couldn’t access the tool any more. We had created a technical debt,” Nikolaos said. “By converting Kedro into an open source tool, clients can use it after we leave a project—it is one way we are giving back."
“There is a lot of work ahead, but our hope and vision is that Kedro should help advance the standard for how data and modelling pipelines are built around the world, while enabling continuous and accelerated learning. There are huge opportunities for organizations to improve their performance and decision-making based on data, but capturing these opportunities at scale, and safely, is extremely complex and requires intense collaboration,” says Jeremy. “We’re keenly interested to see what the community does with this and how we can work and learn faster together.”
Learn more about Kedro at GitHub, where you can engage with our team and watch for new features in coming months.