The aim of the project was to implement an efficient environment for analyzing energy consumption data. Within the scope of the project, the architecture of the Data Lake environment and data flows from several source systems were developed. The business goal was to provide efficient forecasting of energy consumption. The technical goal was to optimize the current data transformation and reporting processes
Solution
Apache Hadoop (HDFS, Hive, Spark)
Airflow
Zeppelin Notebook
Oracle (source system)
SAS
Result
Cluster installation and configuration (OS, cluster components, security)
Integration of the cluster with external systems
Building reports that allow you to analyze data on energy consumption
Development of a data repository in the distributed processing technology
Reducing the time of preparing billing reports from several hours to several several seconds
More effective sales analyses from a cross-channel perspective