Problem Statement : Determining Telecom Supplier End Outages
Solution Overview :
Whiteklay joined hands with Tata Telecommunications to capture real-time metrics from telecom device logs to identify anomalies and enable predictive maintenance.
For use case delivery, we calculated certain metrics from the raw CDR data that act as an input to the machine learning models to further calculate scores which will be used to detect anomalies.
Input parameters : Raw call data records in CSV/ Text format
Metrics Calculated :We aggregated raw CDR data at the device end for a day only for outbound calls and stored the data in IZAC using custom work flows.
Country |
Count of Total In country calls |
Count of Total In country calls |
% In country Calls |
Deliverables:
- Getting the log messages into the IZAC Kafka queue.
- Translate data into tables with aggregated metrics.
- Run transformation flows in IZAC and insert the data in a time series format.
Use Case 1: In country dialling % for customer outbound
Rowing Call data records in CSV or text format.
Below are the aggregated metrics which were calculated.
Calculation of all in country calls on a daily basis |
Calculation of all out country calls on a daily basis |
Calculation of in country calling % |
Calculation of out-country calling % |
ASR (Answered Seizure Ratio) of in country calls |
NER (Network Effectiveness Ratio)of in country calls |
Use Case 2: Identifying Supplier Outages
RowingCall data records in CSV or text format.
Step 1:Calculating the given metrics
Count of Calls |
Total count of inbound invalids cause codes for suppliers |
Total number of supplier outbound invalid cause codes |
ASR (Answered Seizure Ratio) of inbound and outbound calls |
ACD (Average Call Duration) of inbound and outbound calls |
NER (Network Effectiveness Ratio) of inbound and outbound calls |
Standard Deviation in call duration |
Total Invalid calls |
Total Inbound call count |
Total outbound call count |
Average hold duration |
Buckets of call duration |
The time interval for which the records were aggregated was based on the data set and discussions with the user team. The metrics identified above were calculated and loaded into the MAPR DB Table using custom Apache Spark code.
This would assist us in identifying “outliers for calls being disconnected at specific call durations.”
Step 2: Recognize any sudden increase or decrease in any of the metrics.
We would start with a simple calculation of the slope for all the metrics given above at each interval. If the slope is too close to vertical or horizontal (based on a threshold that can be decided after analysing the provided data), any sudden rise or drop can be an indicator of a major fault, such as a power outage. A major warning can be thrown at this step if such a situation has occurred.
Step 3:Remove seasonality (or de-seasonalization).
Before we could proceed further, we had to remove seasonality from the data since our data varies heavily depending on the time of day. We only consider daily seasonality in the event and seasonality between months or by year. STL analysis has to be done to remove trends as well.
Step 4: Train the model and look for outliers
The approach was to feed all the metrics calculated into a different ARIMA model. That means a set of models, each “learning” the trends for each of the metrics listed above. With that, a prediction can be made for what the value of this particular metric should be on the next day and if the actual value differs from our predicted value by a measure greater than the standard deviation, we would classify that data point for that particular metric as an anomaly. The results from each of the models would give us anomalous data points depending on each metric. These results can be combined into a single output using a weighted sum (with weights being tweaked in the testing phase), with which we can determine if the data point is an anomaly or not.
Output: The above was calculated score to determine the severity of a supplier end outage. Based on the severity score, it can be determined if a particular supplier has outages beyond a normal level.