By Henning Soller and Asin Tavakoli
Business leaders around the world recognize the value in becoming a data-driven organization, and many companies have started to implement targeted advanced analytics and artificial intelligence use cases. However, technology leaders such as the CIO or CTO are facing challenges in enabling their current legacy systems to support new requirements such as analytics at scale.
Executives that seek to expand their advanced analytics efforts have been engaged in long-standing discussions on the right IT architecture. However, these conversations actually miss the point and can be easily circumvented. The solution: a reference architecture that modernizes the best of legacy systems while radically replacing the rest—specifically focusing on expensive legacy databases and cumbersome access.
Data architectures have taken shape over time
At most companies, the data architecture has grown over a number of years. Legacy systems can be broadly categorized in three horizons:
- Horizon 1: transactional systems built when the business first started using computers and focused on core processes such as enterprise resource planning (ERP) and accounting
- Horizon 2: data warehouses implemented to provide a far better interface for analytics and regulatory requests that spanned the full business
- Horizon 3: recently introduced platforms, such as master-data-management systems, customer-relationship-management (CRM) systems, and data lakes, developed to take some burden off the traditional data warehouse and serve specific use cases
Since these systems have been added gradually to manage expanding business demand, their scalability and flexibility in many cases fall short. For example, many of the traditional data warehouses couldn’t cope with the new requirements of big data.
This dynamic doesn’t necessarily mean that systems introduced over time need to be completely decommissioned. Rather, the trick is to compose and structure them to regain the ability to scale and flexibly integrate additional platforms—for example, by adding a data lake to the old data warehouse to manage new demands.
Reference data architecture
We have worked with a multitude of clients on their IT transformations. Based on this firsthand experience, we have distilled key elements of the best data architectures to enable traditional use cases to be more cost efficient and to support new use cases in a flexible manner. Interestingly, this reference data architecture—both its elements and structure—is relevant across industries despite a multitude of different use cases.
The reference data architecture is based on three pillars that sit on a foundational data-ingestion layer:
- Classical data warehouse. This pillar supports predictable, highly critical reporting, such as regulatory compliance and financial reporting.
- Data lake. Data lakes are ideal for less stringent reporting needs as well as advanced analytics use cases that require large-scale data processing.
- Real-time streaming. This pillar enables real-time use cases as well as rule-based analytics.
The transactional databases that serve the pillars are connected either directly (streaming) or through the data lake (exhibit).
The advantage of a reference data architecture is that it allows companies to concentrate on connecting the three pillars instead of undertaking massive new implementations. Similarly, the architecture pillars and usage are simple and typically do not require a massive up-front investment.
Further scalability can be gained by the abstraction from the infrastructure layer. This action can be accomplished by using containerization specifically for the underlying source systems or for certain parts of the data lake. In contrast, the data warehouse and full data lake will typically be too large to permit simple auto-scaling. In such cases, cloud-based solutions and the corresponding proprietary database solutions may provide good alternatives, even if they are based on traditional SQL.
Benefits through IT modernization
Organizations that transition to a reference architecture can capture multiple benefits. The data architecture can be more scalable and resilient, accommodate additional use cases, and prove more cost effective.
Specifically, key features include the possibility to offload the data warehouse by putting new use cases on the data lake and performing initial data loading within the data lake. Similarly, using open-source solutions for streaming, for example, increases resilience while lowering cost. The same holds true for possible replacements of traditional vendor-based SQL databases and the advent of cloud-based databases and Hadoop variants to replace legacy archival and large-scale storage systems.
A chemical company, using a reference data architecture, was able to overcome its traditional master-data-management challenges on a much lower budget and in less time by adopting a new data-lake-based solution. Similarly, a bank employed a new data architecture that enabled it to reduce the time needed to implement new use cases from six months to six weeks by introducing a full suite of new technologies and relieving the warehouse of use cases for which it was not designed.
Key takeaways and trade-offs
Organizations interested in making the move to a reference data architecture must overcome three core misbeliefs that have often shaped the company culture:
- Deploying vendor-based technologies is always better. In reality, open-source solutions can be cheaper and more resilient.
- Using the data warehouse for advanced analytics is the safe option. In reality, a data reference architecture helps to maintain the stability and resilience of systems that may not have been built for this use.
- Achieving analytics at scale requires investments in underlying technology components. In reality, it is far more important to put the components to the right use and enable easy integration.
In our experience, companies need to overcome all three key cultural challenges at once. Once tech leaders adopt this mindset, they tend to embrace a far more pragmatic and successful way to manage data architecture, which allows them to harness the aforementioned technological benefits.
Beyond the technical perspective on data-architecture modernization, the governance and operations of such environments have a direct impact on data quality and long-term viability of the solution. Successful companies tackle these elements in tandem with the implementation of a reference data architecture.
Henning Soller is a partner in McKinsey’s Frankfurt office, and Asin Tavakoli is a partner in the Düsseldorf office.