Impact of Call Fidelity on Accuracy of Transcriptions and Scorecards

By Muhammad Saqib, Mingren Xiang, Allen Shapiro, Ronald Mueller

Executive Summary

This paper is the second installment of our series about call fidelity impact on automated speech analytics, specifically on transcription and scorecard accuracy.

In the first part, we discussed the importance of having high fidelity audio recordings to fully utilize CallMiner features. We also covered the usage of aliases and their importance; especially important when the calls’ fidelity is not good. Furthermore, we defined an initial framework on how to quantify the impact of fidelity on the accuracy of the results.

In this paper, we are going to go one step further and delve into more details about the impact quantification process for call fidelity improvements. The qualified results are shown in multiple tables and we provide analysis to illustrate the direct positive relationship between transcription accuracy and call fidelity

Background

Based on our experience when engaging clients, there are several shortcomings when it comes to traditional QA as it is implemented in a contact center:

  1. Highly subjective: While most contact centers have scoresheets, the actual review of the calls remains to be highly subjective and not quantified.
  2. Sample Too Small: Sample sets used for QA analysis are random and very small. Before engaging with us on CallMiner, one of our clients used to have a QA selection weekly where they randomly picked 10 calls from the thousands of calls that came in that week. Clearly, the sample size is way too small and random to represent the dynamics of the whole contact center.
  3. Low efficiency: The company often needs to staff a QA specialist or have contact center supervisors spend extra hours to sit down with agents to listen to a select set of calls, hoping to find some coachable moments. And of course, since the sample evaluated are so small, there is ofter the agent retort “I have 100 good calls this week and the one we review happens to be the worst one”.

All three of these major issues can be solved proactively with the CallMiner Speech Analytic platform, where the system takes in 100% of your customer interactions.  In addition to call recordings the system is also able to take in chat sessions, emails, texts, and social conversations and bring them all together to analyze all of your customer interactions. Your call recordings (plus chat, email, test, social) are an unmined treasure of customer and agent information and automated speech analytic is the key to unlock all the valuable business insights contained within this trove.

Call fidelity is by far the most important success factor when it comes to Speech Analysis with CallMiner since most insights come from the transcriptions of the audio, and without good fidelity the transcriptions will be problematic. The more accurate your transcriptions are, the more benefit you will derive from your Speech Analytic platform. So, to make sure you get the most out of your speech analytics platform, it is imperative that the call fidelity be good.

Our first fidelity paper showed a simple case study to illustrate how important fidelity is. This paper will provide additional case studies showing the impact of call fidelity on the accuracy of speech analytics.

Download article as PDF

You can now download the entire article as a PDF for free and use for future reference. Click the button and get the PDF version.

Call Fidelity and Transcription Quality

To recap, here we provide the list of factors that may have an impact on transcription accuracy and a description of the potential impact on accuracy:

Table 1 – Factors Affecting Transcription Accuracy

FactorPotential Impact
Recording Channel (Mono vs. Stereo)Having stereo will improve transcript quality as well as allow us to do precise speaker-separation.  
Audio file quality (mp3 vs. wav)High-quality non-compressed audio (like wav) is preferred over compressed audio files (like mp3).  
Many Speakers; Over-talkGenerally, over-talk is much more of an issue in mono. More over-talk generally leads to lower transcription accuracy.
Cell Phone vs. Land LineCalls made on/to cell phones tend to have more disruptions and lower fidelity.  
Headset quality and usageReplace bad headsets with good quality noise-canceling headsets.
Make sure headset/handset is at a good distance from mouth when talking.
Scripted vs. Free-flowing Real Conversation Consistent language for an issue helps in defining scorecards.  
Agent or Customer Accent Strong accents can be difficult to transcribe.
Type of Calls For example, Service vs. Sales calls.
Silence on the call Higher periods of silence is indicative of lower call fidelity.
Language of the call Some languages are harder to transcribe.
Background noise Background noise is always a detriment to call fidelity.

Case Study

Following the same defined framework as in our previous post, we sampled 212 calls for one agent. Out of those 212 calls, half (i.e. 106 calls) were from low fidelity audio files (24kbs mp3) while the other half had high fidelity audio (128kbs mp3). To maintain language consistency in this comparative analysis, we selected the Agent Introduction score. This score is essentially capturing whether an agent properly introduced himself/herself over the call. The below table summarizes the distribution of the calls based on a manual transcription review:

Table 2 – Sample Calls’ Distribution

Calls’ Review Summary
MetricsLow Fidelity CallsHigh Fidelity Calls
Total Calls106106
No Agent Identification Occurred3035
Incorrectly Transcribed Identification Audio397
Correctly Transcribed Identification Audio3764

After carefully reviewing the inputted audio files, generated transcripts and calculated scores within CallMiner, we were able to quantify the below three quality metrics for both the high and low fidelity segments:

  1. Transcription Accuracy Rate
    1. This indicates the proportion of calls that were correctly transcribed for the selected score
  2. False Positives generated due to usage of aliases
    1. This count indicates the number of false score hits that were driven by the usage of aliases.
  3. Alias coverage
    1. An indicator to capture the share of score hits that were driven by the usage of aliases. The less dependency we have on aliases, the better it is.

The below table summarizes the results we obtained from our analysis:

Table 3 – Fidelity Impact Indicators

Score: Agent Introduction
MetricsLow Fidelity CallsHigh Fidelity Calls
Transcription Accuracy49%90%
False Positive Rate (because of Aliases)20%  14%  
Alias Coverage %49%10%

As we can see from table, we witnessed a huge increase (41%) in transcription accuracy by improving the fidelity of audio calls. At the same time, we also found a 6% reduction in false positive score hits that we were initially observing with low fidelity calls.

Lastly, with the better fidelity audio files, the transcription quality significantly improved, and in turn we can rely less on aliases to help identify relevant text for the given score. The lesser usage of aliases is good for two reasons: (1) it means that we don’t have to spend hours extracting them in the first place and, (2) at the same time, we are now avoiding more false positives caused by them. It’s worth pointing out that the reason behind such significant improvements in these three quality metrics is that in this case study the fidelity of the recordings takes a huge jump from 24kbs mp3 to 128 kbs mp3.

To further validate our results, we chose another sample call set that consists of language around order inquiries from inbound calls. Using a similar approach to above, we selected a total of 165 calls, with 63 of poor quality (24kbs mp3) and 102 of better quality (32kbs mp3).

The below table summarizes the review of these calls with respect to ‘order’ language:

Table 4 – Sample Calls’ Distribution

  Calls’ Review Summary
MetricsLow Fidelity CallsBetter Fidelity Calls
Total Calls63102
No Agent Identification Occurred611
Incorrectly Transcribed Identification Audio3043
Correctly Transcribed Identification Audio2748
Transcription Accuracy47%53%

It’s important to point out with for this second call set, both the ‘low’ and ‘better’ fidelity are below the minimum requirement of CallMiner’s recommended bitrates for achieving good transcriptions. CallMiner requires a minimum of 64kbs stereo mp3. In the group with ‘better’ fidelity, bitrates of calls are only 32kbs, which is half of the recommended minimum. Still, we see a 7% increase in transcription accuracy from this slight improvement in audio fidelity.

Notice this increase in audio fidelity for this second case study is much lower than for the earlier case study. This is because the bitrate differences between the call sets in our first experiment are much bigger (24kbs vs 128kbs) than in our second experiment (24kbs vs 32kbs). This result further bolsters our overall view about fidelity having a significant impact on transcription accuracy, and the relationship is positive in that the better your recording fidelity is, the better transcription accuracy you will have.

Conclusion

As our two case studies above show, call recording fidelity is a critically important factor when engaging with Speech analytics. We recommend you check with your internal infrastructure organization or contact your call recording vendor to check your contact center’s current level of call fidelity. This really must be step one in engaging in an automated speech analysis work program.

If the fidelity is not good enough, you need to consider upgrading the quality before beginning in earnest with your automated speech analytics work program. Please refer to the fidelity vs transcription accuracy chart in our first paper on fidelity research to see the recommended bitrate of recordings for achieving accurate transcriptions.

As part of our customer onboarding process, Macrosoft will work with you to examine your call fidelity to make sure it is sufficient for you to get the most out of CallMiner’s speech analytic platform.  Not sure if your recording quality is ready for Speech Analysis, contact us and we will be happy to work with you to get the answer.

Download article as PDF

You can now download the entire article as a PDF for free and use for future reference. Click the button and get the PDF version.


 

Share this:

By Muhammad Saqib, Mingren Xiang, Allen Shapiro, Ronald Mueller | December 1st, 2020 | CallMiner

About the Author

Muhammad Saqib

Muhammad Saqib

Saqib is a Data Science professional at Macrosoft with over 8 years of experience in the field. He enjoys breaking down complex business problems and solving them using data, statisticss and machine learning techniques. He has a penchant for natural language processing, reinforcement learning and time series analysis. He’s a long-time python enthusiast and a fan of data visualization, econometrics, nachos, and snooker. He holds a master’s degree in Data Science from University of California San Diego and a bachelor’s in economics from LUMS, Pakistan.

Mingren Xiang

Mingren Xiang

Mingren is a Data Science professional at Macrosoft. He is Macrosoft's technical lead in voice and conversational analytics using the CallMiner suite of utilities. The practice includes both partnering with CallMiner to deliver speech analytic solutions and developing customized NLP applications. Mingren has a Master of Science from the University of Wisconsin-Milwaukee.

Aside from leading the speech analytics practice at Macrosoft. Mingren's research work focus on Deep Learning applications for medical image processing. He presented the Master thesis on training a CNN (Convolutional Neural Network ) based Encoder-Decoder model to reconstruct CT scans using only one X-ray image. Such a task remains to be one of the hardest challenges in the computer graphic community

Coming from a strong computer science background, Mingren is also sufficient in multiple programming language such as Java, Python, C/C++, various JavaScript libraries and SQL scripts. His specialty in software development is to utilize API to create functional backend services using web development framework like Java Spring and Django in Python.

Allen Shapiro, Director – CCM Practice

Allen Shapiro

Allen brings more than 25 years of diverse experience in Marketing and Vendor Management to Macrosoft Inc. As the Managing Director of our Customer Communications Management (CCM) practice, Allen leads the Onshore and Off-shore CCM development teams. Additionally, Allen oversees pre-sales activities and is responsible for managing the relationship with our CCM software provider Quadient.

Dr. Ronald Mueller CEO of Macrosoft

Ronald Mueller

Ron is CEO and Founder of Macrosoft, Inc. He heads up all company strategic activities and directs day-to-day work of the Leadership Team at Macrosoft. As Macrosoft’s Chief Scientist, Ron defines and structures Macrosoft’s path forward. Ron's focus on new technologies and products, such as Cloud, Big Data, and AI/ML/WFP. Ron has a Ph.D. in Theoretical Physics from New York University and worked in physics for over a decade at Yale University, The Fusion Energy Institute in Princeton, New Jersey, and at Argonne National Laboratory.

Ron also worked at Bell Laboratories in Murray Hill, New Jersey., where he managed a group on Big Data. Ron's work focused around the early work on neural networks. Ron has a career-long passion in ultra-large-scale data processing and analysis including predictive analytics, data mining, machine learning and deep learning.

Recent Blogs

Power Automate AI Builder and Scenarios
Read Blog
Dazzle 3.0 Pre-Launch : Custom-built .NET Framework for Legacy Conversion
Read Blog
Speech to Text Quality Assessment and Analysis: Part 2
Read Blog
Cypress Web Automation: It’s Expanding Role in Macrosoft’s Web App Development Projects
Read Blog
TOP