Introduction:
Progress in AI technology this year has significantly enhanced the capabilities of AI voice assistants. These assistants can listen to spoken language, convert it into text, and respond with a synthesised voice much more quickly. AI voice agents are being increasingly integrated into traditional phone call queues and Interactive Voice Response (IVR) systems with aplomb. This article delves into the latest advancements in AI voice and examines how they will make automated dialers a more effective tool for businesses aiming to enhance their marketing, sales, and customer service strategies.
Reduced Latency in Language Models:
One of the key factors contributing to the readiness of AI voice agents is the recent reduction in processing time and latency with Large Language Models (LLMs). LLMs are AI systems trained on vast amounts of text data, enabling them to understand and generate human-like responses [1]. The reduced latency in these models allows for faster and more efficient processing of customer queries, leading to smoother and more natural phone call conversations.
Latency during phone can also be reduced by using a local LLM which is physically closer to the automated dialler and speech processing engines. This has become possible recently with the release of premises-based proprietary systems such as Azure AI and smaller open source models such as Llama3 8b which can be installed on a modestly priced GPU server. Where physical distance is directly proportional to latency, the closer the better, and localised models are closer than large proprietary models, typically based the USA.
Improved Text-to-Speech Systems:
Another significant advancement that has paved the way for AI-powered voice agents is the improvement in text-to-speech (TTS) systems. Modern TTS engines have achieved remarkable accuracy and realism, making it difficult for customers to distinguish between a human agent and an AI-generated voice [2]. This enhancement in TTS technology ensures that AI voice agents can deliver a more personalised and engaging customer experience.
Function tools:
Large language models are excellent at understanding and generating text conversations but without function calling tools, are unable to perform tasks such as sending emails, querying databases and connecting to APIs. Function tools broaden the range of ‘skills’ that an AI customer service agent can be trained on and the language model can use them without coding or training. Instructions for invoking custom tools can be added in the prompt such as “….having retrieved the customers account balance using function tool ‘account_data’ send the pricing information to them using function tool ‘send_email”.
Enhanced Conversation Handling:
AI-powered automated dialers have also benefited from advancements in conversation handling capabilities. These systems now feature improved interrupt management, allowing them to handle customer interruptions gracefully and maintain the flow of the conversation [3]. Additionally, AI algorithms have become more adept at understanding customer intent, enabling automated dialers to provide relevant and targeted responses to customer queries [4]. Furthermore, the incorporation of local accent accuracy has made AI-powered automated dialers more relatable and accessible to customers from different regions [5].
Benefits for Businesses:
The integration of AI-powered automated dialers offers several benefits for businesses. Firstly, it enables companies to handle a higher volume of outbound calls, increasing their reach and productivity. Secondly, AI voice agents offer a significant advantage over human teams when it comes to ensuring compliance with zero abandoned calls during outbound dialing. Unlike human agents who are limited in number and availability, AI voice agents can be deployed in virtually unlimited numbers, ensuring that there is always an agent available to handle every call. This means that even during peak calling periods or unexpected spikes in call volume, AI voice agents can seamlessly step in to prevent any calls from being dropped or abandoned due to the lack of a live agent.
Moreover, the use of AI technology helps businesses to reduce costs associated with hiring and training human agents, while still maintaining a high level of customer service quality.
Conclusion:
The advent of AI-powered automated dialers marks a significant milestone in the evolution of outbound customer service. With the recent advancements in AI technology, including reduced latency in language models, improved text-to-speech systems, and enhanced conversation handling capabilities, automated dialers are now ready to take on the role of AI customer service agents. As businesses continue to embrace this technology, customers can expect a more efficient, personalised, and seamless customer service experience.
FAQ:
1. What is an AI voice agent?
An AI voice agent is a software application that uses artificial intelligence to simulate human-like conversations through voice. These agents are designed to understand spoken language, process the content, and deliver spoken responses, making them highly effective for tasks such as customer service, handling inquiries, scheduling, or providing information. The latest generation of AI voice agents are increasingly used in contact centres, vehicle system control, smart devices, healthcare appointment management and retail applications.
2. Are AI voice agents capable of handling complex customer queries?
Yes, with the advancements in AI technology, voice agents can understand customer intent and provide relevant and targeted responses to a wide range of queries, in the same way as they do with text interactions. Previous generations of AI voice agents would quickly lose their way when interrupted, asked a ‘curve-ball’ question, or talking on a poor quality connection. These are areas where improvements can be demonstrated.
3. Will AI voice agents replace human customer service agents?
While AI voice agents can handle a significant portion of customer interactions, human agents will still play a crucial role in handling complex or sensitive issues that require empathy and personal touch. A good starting point for contact centres is using AI voice agents to prevent FCA, ICO & Ofcom breaches, such as with abandoned calls, miss-selling and fraud.
4. How do AI voice agents ensure data privacy and security?
Businesses implementing them must adhere to strict GDPR privacy regulations and employ robust security measures to protect customer information. On a web site the voice conversation would be encrypted using TLS 1.3, SIP phone conversations are secured in a similar way. PSTN calls are already very secure. The AI voice agent server should be firewalled and located in a secure data centre such as AWS or Azure.
5. Where will I typically interact with them?
Typical places to interact with an ai voice agent are on a telephone call or via a web page that supports voice.
6. Is an AI voice agent the same as Siri or Alexa?
In some ways they are similar however agents can be trained in your custom data, they can work in teams and are able to execute specific tasks autonomously. Assistants whilst useful, are generally limited to searching the internet, summarising text and responding to FAQs.
7. Can it be deployed in an outbound predictive dialing campaign?
Yes, recent improvements in latency, conversational competence and intent matching mean that they can respond in a very human-like manner now with increased accuracy and are able to respond quickly enough to avoid abandoned call issues.
8. Is it able to take a payment securely?
Yes it is, however security needs to be extremely robust and we are in the process of developing a PCI DSS payment plugin.
9. Is it UK GDPR compliant?
Yes for UK customers, your encrypted data transits and rests in the UK region. Hostcomm is a Level 1 PCI DSS certified supplier and so the AI voice agent has very robust security based on this.
Citations:
[1] Radford, A., et al. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
[2] Wang, Y., et al. (2017). Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135.
[3] Gulati, A., et al. (2020). Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100.
[4] Devlin, J., et al. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[5] Sun, L., et al. (2020). Personalized TTS with multi-speaker modeling and adaptive speaker embedding. arXiv preprint arXiv:2002.07439.