Data Lake for roaming data monitoring

Challenge

The aim of the project was to develop a system for managing and monitoring telephone connections (data, text messages, voice) in roaming for an international telecommunications operator. The business objective of the project was to supervise the continuity of telecommunications services and to detect fraud and anomalies in wholesale traffic.

Solution

Design complexity:

  • cluster size: ~ 2 PB
  • over 100 servers
  • maintenance of several dozen
  • services Technologies used:
  • HDFS, Hive, Spark, Ranger
  • Elasticsearch (ELK)
  • Apache Kafka

Result

  • Implementation of the Elasticsearch cluster and the Apache Hadoop cluster
  • Designing data flows (metadata, flow control, etc.)
  • Implementation of AD integration and Kerberos certification
  • Ensuring the continuity of the system operation
  • Design and implementation of disaster recovery site (DR)