Data Lake for demand planning

Challenge

The aim of the project was to implement an efficient environment for analyzing energy consumption data. Within the scope of the project, the architecture of the Data Lake environment and data flows from several source systems were developed. The business goal was to provide efficient forecasting of energy consumption. The technical goal was to optimize the current data transformation and reporting processes

Solution

Apache Hadoop (HDFS, Hive, Spark)
Airflow
Zeppelin Notebook
Oracle (source system)
SAS

Result

Cluster installation and configuration (OS, cluster components, security)
Integration of the cluster with external systems
Building reports that allow you to analyze data on energy consumption
Development of a data repository in the distributed processing technology
Reducing the time of preparing billing reports from several hours to several several seconds

More effective sales analyses from a cross-channel perspective