Moveo.AI’s LLM vs GPT-4 for Customer Experience

George

Chief of AI

26 de junho de 2024

in

🗞️ Notícias Moveo

Aren’t we all tired of OpenAI’s GPT-4 trying to be everything for everyone? Εnterprises need vertical-specific LLMs, and we put ours to the ultimate test! We decided to compare our proprietary LLM with GPT-4’s latest edition on 7 dimensions, and the results are clear:

Moveo’s LLM surpasses GPT-4 in CX!

Moveo.AI’s LLM Agents

In the rapidly evolving world of customer experience (CX), Moveo.AI stands out as a pioneering platform that leverages Generative AI (GenAI) to transform how enterprises interact with their customers. Moveo.AI’s proprietary Large Language Models (LLMs), trained on historical and real-time CX data, power human-like Virtual Agents (VAs) that can seamlessly connect to real-time data and unstructured knowledge bases to provide accurate and contextually relevant answers to customer inquiries.

Moveo.AI’s VAs are designed to follow instructions and perform tasks by executing workflows that describe specific business processes. These workflows can be created from natural language descriptions or a no-code drag-and-drop builder. Additionally, the VAs respond to user questions using unstructured knowledge bases through a built-in Retrieval-Augmented Generation (RAG) pipeline.


System Architecture

At the core of Moveo.AI’s system is an advanced decision-making process that efficiently routes user messages. When a user sends a message, an LLM-powered planner analyzes the question and selects the most appropriate response mechanism. Depending on the analysis, the system may execute a predefined workflow or utilize a dynamic RAG pipeline to generate accurate and contextually relevant responses.

system_architect
Moveo’s LLM vs GPT-4

We will compare Moveo.AI with GPT-4, focusing on the RAG pipeline since GPT-4 does not possess planning capabilities. We will strongly emphasize latency metrics to provide a comprehensive analysis of performance, which is crucial in enterprise environments.


Analysis of the RAG Pipeline

To understand the high-level overview of Moveo.AI’s RAG pipeline, consider the following workflow:

When a user sends a message, the system retrieves the most relevant documents from a defined collection. This ensures the LLM’s responses are grounded in reliable and accurate information. The retrieved documents, conversation history, custom instructions, live instructions, and AI profile data are then passed to the LLM. The LLM synthesizes this information to generate a coherent and informative response.

To compare Moveo.AI’s LLM with GPT-4, we evaluated the quality of responses to end-user questions. The evaluation was based on a random sample of thousands of entries from Moveo’s production data, which neither our LLM nor GPT-4 had encountered before. Each entry was converted into a prompt consisting of the user question, conversation history, grounding knowledge from the collection documents, live instructions, and custom instructions. For example, a prompt would be similar to the following format:


Grading dimensions & methodology

The grading process assessed Moveo’s LLM and GPT-4 responses across seven dimensions that capture critical traits within the customer experience setting. Each dimension received a score, determining which LLM provided a better response.

Grading Dimensions

  1. Hallucination: Ensures the LLM adheres to grounding knowledge without generating incorrect responses.

  2. Repetition: Measures the LLM’s ability to avoid repeating itself and consider dialog context.

  3. Disambiguation: Evaluates whether the LLM asks follow-up questions to clarify ambiguous user questions.

  4. Live agent handover: Checks if the LLM suggests connecting with a customer support agent only when appropriate.

  5. Readability: Assesses the clarity and formatting of the LLM’s responses.

  6. Language: Evaluates the syntactic correctness and clarity of the LLM’s language.

  7. Markdown: Measures the LLM’s correct use of markdown syntax for formatting.


Methodology

To evaluate the performance of the different models, we used a separate GPT-4 instance as a “grader,” performing a single API call for each of the samples.

It’s important to note that GPT-4, when used as a Grader, tends to favor responses generated by itself. The AI community documents a bias when the same model is used to generate and evaluate responses. However, despite its bias towards GPT-4, we continue to use it because it is the most powerful closed-source model.


Results

These results show that Moveo’s custom LLM tuned for Customer Experience outperforms GPT-4-0613 in all grading dimensions, except in the Markdown dimension, where GPT-4 performs better in stylistic formatting.

Most importantly, it is worth mentioning that in terms of hallucination, GPT-4 performs worse, which could hurt Customer Experience (CX). For example, if GPT-4 provides incorrect information about a product feature, it could lead to potential liabilities, customer dissatisfaction, and increased support requests. More and more real-life examples, such as Air Canada’s chatbot giving incorrect information to a traveler, are evidence of this.

Response time is crucial, especially in enterprise scenarios where customers expect instant replies to their questions. Imagine a customer reaching out with an urgent query, e.g., about a stolen credit card that needs to be blocked. With Moveo’s LLM, specifically optimized for CX, they receive a response in just 5 seconds. In contrast, GPT-4 takes at least 18 seconds…In that time, Moveo.AI could have handled more than 4 inquiries, significantly enhancing support efficiency and customer satisfaction.


Conclusion

The benchmark results highlight the superiority of Moveo’s LLM for CX over GPT-4. Despite the inherent bias of using GPT-4 as a grader, Moveo’s LLM excelled in most dimensions and demonstrated significantly lower latency. For a more comprehensive assessment, future evaluations will incorporate different LLMs as graders and human evaluators.

Moveo.AI’s innovative approach and focus on CX make it a powerful tool for enterprises looking to enhance customer interactions with advanced AI solutions.

Learn more about Moveo.AI and how it can transform your customer experience today!

Índice

Share article

Plataforma

Soluções

Recursos

Por que Moveo.AI?