Data warehouses play a crucial role in today’s data-intensive operations. However, the physical infrastructures required for Big Data are increasingly being jettisoned in favour of more agile solutions. Logical data warehouses add a virtual architecture layer that sits apart from traditional data services and sources. They abstract physical storage, expand data aggregation and simplify access. In this article, we’ll explain more about logical data warehouses and their uses. We’ll draw on insights from S2E, a digital transformation company and look at the architecture behind QuantiaS Data Director.
Definition and architecture
Data warehouses (DWH) provide resources for data-intensive computing tasks such as AI, data mining, data analytics and business intelligence. Though they typically include some kind of relational database, they operate on a much larger scale than regular databases and also provide much wider functionality such as analysis and reporting.
Limitations of traditional data warehouses
Traditional data warehouses have been tied to extensive physical server infrastructures and fixed implementations. This has meant limits on the range of data sources, data types and processing that are possible. One of the main functions of traditional DWHs was to extract, clean and prepare data to match the required storage architecture, which in turn, restricts flexibility.
It’s worth listing the key problems for business with traditional DWH solutions:
- Expense. The very high TCO for proprietary solutions in software (and hardware) is disincentivising. Dedicated staff and additional infrastructure are also often required for complex operations.
- Inflexibility. Traditional DWH tends to have low adaptability and responsiveness to changing demands. In today’s fast-paced business and technology environment, this is a major drag.
- Poor integration. From data lakes (raw data repositories) to analytics, a fragmented system is difficult to navigate.
Business needs
Businesses have needs that mirror these shortfalls, such as reduced TCO, improved time to market and support for intelligence and analytics operations. They need access to best-of-breed technologies and the means to monetise data through sales or cost-cutting. Data types have evolved and the expectation has arisen that data can be processed and transferred across the system with ease.
The solution? Logical data warehouses. As technology researchers Gartner state, ‘the logical data warehouse is now well established as a best practice’. LDWs like QuantiaS Data Director provide greater flexibility for accessing data sources, allowing real-time access through RDMS, NoSQL databases like CouchDB or distributed solutions like Hadoop. LDWs also provide new analytical styles and roles in an organisation, offering a unified source of truth and streamlined BI.
Industry independent solution, by design
The best way to achieve the kind of flexibility required, not just for today’s data needs but also to insure against future changes, is to leverage standardised technologies. Using standardised architectures means that industry-specific logic and design can be confined to the software layer and data definitions. Infrastructure developers need not repeat well-understood developments or rewrite designs from scratch.
Data Processing
The logical data warehouse solution has three core aspects: data processing, data modelling and analysis. In many ways, data processing is the core function, allowing LDWs to incorporate data from diverse sources with ease. QuantiaS Data Director uses a Big Data approach from the beginning for data integration. This covers core areas such as ETL/ELT processes.
ELT (extract, load, transform) and ETL (extract, transform, load) are core processes that transfer data from a source system to a target. They differ in the level of processing prior to making the data available for consumption. With ELT, unstructured data from the source is maintained by the system without prior processing. With ETL, however, the raw data is translated first into a natively-accessible format for the data warehouse operations.
The benefit of the logical data warehouse here is that it does not require a consolidation step before data is accessible. Because the operation is logical rather than physical, LDW can combine multiple data sources or engines, allowing real-time incorporation and a seamless access experience for client analysis. Data sources are not restricted, and the Big Data ethos is fostered from the very beginning.
With QuantiaS this data-processing pipeline is facilitated by a range of software and utilities. Open-source tools often provide standards-compliant drop-in solutions to handle core operations. In some cases, proprietary tools are leveraged to handle more particular cases, plus custom parallelised serverless computation. The choice is based on customer preferences and requirements.
Big Data and high-performance computing from the start
The shift towards data-intensive operations has presented businesses with some challenges in recent years. For a time, the decision to embrace Big Data and concomitant high-performance computing (HPC) solutions was a weighty one. Typically it involved an enormous financial investment in terms of both hardware and software. And the learning curve for managing such solutions was steep and difficult.
The availability of cloud-native Big Data and HPC solutions have changed all that. Cloud-based cheap data lake technologies and pay-per-use computation allow IT managers to adopt Big Data solutions early, without commitments of scale. Even for those managing their own hardware with on-premises solutions, the availability of open-source platforms has lowered the hurdle considerably. Not least of the values of such software is that it is often capable of running on commodity-grade hardware.
Such solutions allow skilled software engineers and data architects to work from the off with cutting-edge tech without having to spend huge sums of cash. And the easy scalability of these solutions means systems can expand rapidly in response to data volume without re-engineering. All of this allows businesses to gain quick wins in data monetisation, foregrounding the old motto: data is the new oil.
Data Modelling
LDW architectures entail a range of specific data modelling techniques, such as dimensional modelling and logical data model mapping. A well-designed data model is strategically essential to handle ongoing customer requirements, including support for future business and analytics needs.
The data model must be both precise and flexible. Along with the chosen software solutions, which are often open-source, the design of the data model must be ready to support Big Data and HPC from the outset. And as the customer’s data volume increases, data models must also support its growth.
With these provisos in mind, QuantiaS uses a design that combines tableless SQL-enabled data lake repositories and specific RDBMS schemas. These allow users to analyse both structured and unstructured data depending on requirements. The dual strategy also supports the primary aim of supporting Big Data. The data lake approach is suitable for cloud architectures – easily scalable and distributed by design. Meanwhile, the use of RDBMS schemas is targeted at specific data marts and BI-oriented information.
Querying and Analysis
The final stage in the LDW journey is querying and analysis. This is where data becomes useful for end purposes like AI and machine learning, analytics, BI and other algorithmic outputs. There are three main methods of querying and analysing data in LDWs:
- SQL-based analytics
- Data filtering, aggregation and enrichment
- Advanced analytics
The approach used in QuantiaS Data Driver centres on the importance of Big Data. It uses parallel high-performance computing from the beginning to maintain readiness in the face of increasing volumes of data. This avoids having to re-engineer to meet scaling requirements as well as associated spiralling costs. QuantiaS leverages SQL-oriented data processing with scalable technology, fostering definite time computations and reliable results.
With such a versatile and proven technical approach, customer-driven operations such as data filtering, aggregation and enrichment are possible without excessive expenditure. The solution uses enterprise-grade analytics engines such as Apache Spark for large-scale data processing. Spark’s implicit parallelism and implicit fault tolerance make it an excellent fit for data-rich HPC applications. QuantiaS also uses high-performance cloud query engines like Amazon Redshift, which again are specially targeted at Big Data workloads with massive parallel processing.
Advanced analytics using modern technologies: ML & AI
To serve the complexities and scale of Advanced Analytics, QuantiaS uses AI and ML-based Smart Engines to extract all latent knowledge from customer data. Beginning with a Lakehouse (a hybrid of data lake and data warehouse architectures) or even raw data, ML algorithms are able to extract information buried deep inside complex data stores, enriching data models to feed:
- BI tools and reports
- Recommendation engines
- Data-driven process optimations
- Data-driven decision-making
Open-source solutions and cloud services provide a welcome range of solutions based on statistical learning or deep-learning techniques like neural networks. Using standardised and well-known frameworks, this approach is cost-effective and efficient. In particular, pay-per-use computation resources allow easy scalability without big financial commitments. The results are easily accessed using serverless SQL queries and can also be exported to standard RDBMS engines.
Data warehouses: conclusions, challenges and future trends
LGWs and their associated sub-components across analytics engines, cloud architectures and HPC services are now well-established in the Big Data ecosystem. They empower data consumers with richer analyses for business intelligence, risk management, predictive analytics and, as it gains further traction, the Internet of Things.
Of course there are challenges: poorly managed unstructured data can still be problematic. Despite the capabilities, users must not be unthinking about their data sources. And, despite the ubiquity of scalable, open-source platforms, for more specific use cases, organisations will still need to draw on proprietary solutions or in-house specialisations. However, logical data warehouses like QuantiaS Data Director offer our best approach yet to the challenges of Big Data.