Telecom – Predictive Maintenance Pilot

Problem Statement : Determining Telecom Supplier End Outages

Solution Overview :

Whiteklay joined hands with Tata Telecommunications to capture real-time metrics from telecom device logs to identify anomalies and enable predictive maintenance.

For use case delivery, we calculated certain metrics from the raw CDR data that act as an input to the machine learning models to further calculate scores which will be used to detect anomalies.

Input parameters : Raw call data records in CSV/ Text format

Metrics Calculated :We aggregated raw CDR data at the device end for a day only for outbound calls and stored the data in IZAC using custom work flows.

Country
Count of Total In country calls
Count of Total In country calls
% In country Calls

Deliverables:

  • Getting the log messages into the IZAC Kafka queue.
  • Translate data into tables with aggregated metrics.
  • Run transformation flows in IZAC and insert the data in a time series format.

Use Case 1: In country dialling % for customer outbound

Rowing Call data records in CSV or text format.

Below are the aggregated metrics which were calculated.

Calculation of all in country calls on a daily basis
Calculation of all out country calls on a daily basis
Calculation of in country calling %
Calculation of out-country calling %
ASR (Answered Seizure Ratio) of in country calls
NER (Network Effectiveness Ratio)of in country calls

Use Case 2: Identifying Supplier Outages

RowingCall data records in CSV or text format.

Step 1:Calculating the given metrics

Count of Calls
Total count of inbound invalids cause codes for suppliers
Total number of supplier outbound invalid cause codes
ASR (Answered Seizure Ratio) of inbound and outbound calls
ACD (Average Call Duration) of inbound and outbound calls
NER (Network Effectiveness Ratio) of inbound and outbound calls
Standard Deviation in call duration
Total Invalid calls
Total Inbound call count
Total outbound call count
Average hold duration
Buckets of call duration

The time interval for which the records were aggregated was based on the data set and discussions with the user team. The metrics identified above were calculated and loaded into the MAPR DB Table using custom Apache Spark code.

This would assist us in identifying “outliers for calls being disconnected at specific call durations.”

Step 2:​ Recognize any sudden increase or decrease in any of the metrics.

We would start with a simple calculation of the slope for all the metrics given above at each interval. If the slope is too close to vertical or horizontal (based on a threshold that can be decided after analysing the provided data), any sudden rise or drop can be an indicator of a major fault, such as a power outage. A major warning can be thrown at this step if such a situation has occurred.

Step 3​:Remove seasonality (or de-seasonalization).

Before we could proceed further, we had to remove seasonality from the data since our data varies heavily depending on the time of day. We only consider daily seasonality in the event and seasonality between months or by year. STL analysis has to be done to remove trends as well.

Step 4​: Train the model and look for outliers

The approach was to feed all the metrics calculated into a different ARIMA model. That means a set of models, each “learning” the trends for each of the metrics listed above. With that, a prediction can be made for what the value of this particular metric should be on the next day and if the actual value differs from our predicted value by a measure greater than the standard deviation, we would classify that data point for that particular metric as an anomaly. The results from each of the models would give us anomalous data points depending on each metric. These results can be combined into a single output using a weighted sum (with weights being tweaked in the testing phase), with which we can determine if the data point is an anomaly or not.

Output: The above was calculated score to determine the severity of a supplier end outage. Based on the severity score, it can be determined if a particular supplier has outages beyond a normal level.