Data Lake for demand planning

Challenge

The aim of the project was to implement an efficient environment for analyzing energy consumption data. Within the scope of the project, the architecture of the Data Lake environment and data flows from several source systems were developed. The business goal was to provide efficient forecasting of energy consumption. The technical goal was to optimize the current data transformation and reporting processes

Solution

  • Apache Hadoop (HDFS, Hive, Spark)
  • Airflow
  • Zeppelin Notebook
  • Oracle (source system)
  • SAS
apache_cassandra_logo

Result

  • Cluster installation and configuration (OS, cluster components, security)
  • Integration of the cluster with external systems
  • Building reports that allow you to analyze data on energy consumption
  • Development of a data repository in the distributed processing technology
  • Reducing the time of preparing billing reports from several hours to several several seconds

More effective sales analyses from a cross-channel perspective