20 + petabyte Analytical Big Data Lake on izac

Streaming 300+ million subscribers, handling 15+ billion records a day, 

Case ​: The customer had a data warehouse with close to a 10+ petabyte data store. In recent times, it has been noticed that the warehouse is getting fuller by the day and has already reached a capacity of 80% utilization. It is becoming increasingly important to manage the situation by either refreshing the entire architecture or augmenting the architecture by adding offloading mechanisms to manage the workload in parallel in a cost-effective manner.

The main objectives of the existing warehouse and ETL landscape were:

  • Application and business reporting management
  • Managing adhoc data exploration
  • Performing ETL operations on production data sets being extracted from application databases
  • Providing insights on archival data by bringing it back into the warehouse as needed

Challenge ​: Enterprise data warehouses like theirs are under considerable strain from increasing data volumes as well as bottlenecks in real-time data ingestion and change data capture that they were not built to easily accommodate. These challenges were forcing us and organisations like VI to rethink their data infrastructure in order to architect an enterprise data hub that could ingest, store and process large volumes of structured, unstructured and semi-structured data to deliver richer business insights and at the same time, build parallel data exchange architecture. The result was that we moved onto an enterprise-ready and open architecture.

This enables our business layer to do what it does best: business-critical reporting that supports high concurrency with low latency, rather than spending CPU cycles on transformations. Big data-based open source systems are increasingly becoming the central data repository, supporting different use cases including batch, interactive and real-time. The IZACframework, along with the Hadoop platform, fulfilled the need for open source IZAC framework along with Hadoop platform fulfilled the need for open sourceenterprise ready software/databases, thereby revealing that VI could handle all the above objectives at a substantial lower cost.

Action : DWH Optimization—Stage 1

This initiative of warehouse optimization provides the following critical capabilities that VIL required:

  • A data management platform that helps store large volumes of data at a lower cost than alternatives such as cheaper servers and storage.
  • Improved responsiveness of the data warehouse by performing ETL transformations on a different real-time platform.
  • The ability to store, process and analyse new types of data, which could be our weblogs, security logs, or any of our upcoming web/mobile applications,
  • The ability to restore data warehouse CPU and storage capacity and finally sunset the same.

VIL hoped to be able to use this new architecture to reduce overall system costs by performing transformations on the new open platform and releasing previously used storage and capacity.

In addition, they could add more types and sources of data into this real-time architecture of “Single Source of Truth” for more granular and richer analytics across the combined solution.

Our Solution :

Our approach towards Data Exchange techniques and what’s coming next as part of new architectures that leading telecoms are pursuing or using, results in choosing open-source technologies to have full data architecture for DWH and real-time analytics use-cases based on modern frameworks. Three of the major components of this solution were:

  • Apache Hadoop and Hortonworks Data Platform: Developing an enterprise data lake architecture that would include database technologies such as Hive and Hbase to allow analysts to perform ad hoc data exploration.
  • Apache Kafka and Spark Streaming custom IZAC framework for real-time ETL and data exchange platform management.
  • Apache Druid for real time data warehousing:Offloading workload from an existing DB2 or IIAS solution in order to keep it optimised and lower TCO.