Automatically identify customer & agent speakers within single channel audio recordings
We are well within the “Age of the Customer”. The first step toward competing successfully in this era is capturing the Voice-of-the-Customer. Many organizations are sitting on a gold mine of customer intelligence which they have already captured – their contact center call recordings. Unfortunately, call recording wasn’t designed for the purpose of analytics and insight. The design of many recording platforms pre-dated the rise of AI-fueled speech analytics, which automatically transcribes and analyzes your call recordings. Call recording was initially designed for the purpose of storage and archiving, often to meet specific regulatory needs. As a result, storage efficiency was prioritized over recording quality – recordings are often highly compressed, degrading the audio quality. Many older systems use a single channel for audio from the speaker and the agent.
Why Mono Recording Poses Challenges for Speech Analytics
The primary issue is that transcriptions cannot identify the source of speech between the agent and caller. Therefore, it becomes difficult to zero in on customer satisfaction or agent performance. Mono recording makes it impossible to pinpoint whether the caller or agent is responsible for what was said as well as the associated sentiment and acoustic measures.
So, what should you do if you are already invested in a mono call recording system? Not to worry, check out CallMiner Speaker Separation.
What is CallMiner Speaker Separation
CallMiner Speaker Separation is a voice biometrics-based software that divides mono recordings into speaker channels representing the agent and the customer portions of a call to improve speech analytics effectiveness.
A “passive enrollment” process is used by training on a group of calls with the same agent. The system then identifies the most prevalent talker across those calls (assumes the customer changes from call to call) and assigns the agent’s voiceprint. During the speech-to-text process, each part of the conversation is then attributed to the correct speaker – improving transcription readability, reducing the time to target agent or customer issues via speaker filtering, and speaker targeted topic-mining. Once a voiceprint of the agent is created, the system assumes any speaker NOT the agent is the customer.
Why Speaker Separation is Helpful to Your Organization
- Transcript Usability – Speaker-associated search with tags automatically applied for categories are enabled with speaker separation. Issues such as customer satisfaction and escalations are easily identified. Agent performance including compliance and sentiment is also clear with automated scoring
- Topic Discovery – Trending issues that may or may not have been identified are revealed based on agent or customer utterances. Topic circles with size indicating call volume split between agent and customer speakers can innovatively support root cause awareness and action.
- Accuracy – Speaker Separation between and agent and a caller is highly reliable if good quality audio is available along with the following considerations:
- Call duration is longer than 5 seconds.
- 3rd person talking or background over talk – assigns voices to most similar agent or customer.
- Hold music, especially with a heavy percentage of vocals or the use instrumentals.
- Efficiency – Storage requirements are the same for mono as only the transcripts are separated between caller and agent. Voiceprint processing overhead is likely 5% or less, compared to 25% and perhaps significantly more required for stereo call recording. Also eliminates the need for a stereo call recoding upgrade.
- Unobtrusive – Passive voiceprint enrollment means agents always remain in service. Also, the integrity of transcription content remains only not associated with identity if speaker separation fails.
By Mingren Xiang | April 7th, 2020 | CallMiner