The real AI Chatbot metrics that matter

Posted by:

|

On:

|

Today, I’m talking about one of the most misunderstood aspects of AI support implementation: how to measure success. We’re in a technological shift of understanding how AI chatbots should be used within our customer’s experience and measuring the right metrics for AI to succeed.

I’m going to show you how to measure effective KPIs and how to set up these metrics for your team. 

If you love data, this episode is for you. If you feel data is better handled by someone else in the company, send them this podcast.

EFFICIENCY AI METRICS

Let’s look at traditional AI metrics that most companies are currently measuring:

  1. Deflection Rate: The percentage of conversations that never get escalated to a human
  2. Conversation Volume: The number of chats the AI chatbot handles
  3. Average Handle Time: How long conversations last with the AI chatbot
  4. Cost Savings: Calculated based on agent time saved

The main thing to note about these 4 metrics is they measure efficiency, not effectiveness. And actually, 3 of these metrics are the way we measure human efficiency. 

You can’t measure the true success of AI chatbots by only measuring efficiency. You have to balance success by including effectiveness.

I want to make sure you hear me right – I’m not saying you shouldn’t be focused on efficiency metrics. This is just one piece of the pizza pie.

Only a couple of years ago, AI chatbots were viewed as a way to reduce headcount and reduce conversation volume. While only focused on these efficiency metrics, performance looked great while customer feedback looked like:

  • “I spent 20 minutes with the bot going in circles before giving up.”
  • “It answered my question but gave me the wrong solution.”
  • “I had to contact support three separate times to solve one issue.”

According to efficiency metrics, all of these were “successful” AI interactions because they didn’t reach a human agent.

But they were terrible customer experiences. And, you might have BEEN this customer in this same scenario with a product or service. While you, as a customer, never reached a human, your AI interaction turned into a higher percentage of deflection.

YOU SHOULD ALSO BE MEASURING

1. First-Contact Resolution Rate

First-contact resolution rate measures whether the customer’s issue was actually solved in a single interaction, not just whether they gave up trying.

How to measure it:

  • Track unique customer identifiers across channels
  • Look for repeat contacts within a 7-day window after AI interaction
  • So, what’s the formula for FCR?
    (Total AI conversations – All repeat contacts) ÷ Total AI conversations

FCR matters because it reveals whether your AI is actually solving problems or just deflecting them temporarily.

2. Customer Effort Score

Next is the Customer Effort Score, which measures how hard customers had to work to get their issue resolved. This is a survey response after the conversation, so this would be replacing a CSAT survey.

We’re moving into a new culture of measuring success while leveraging AI chatbots which means we need to reconsider outdated measurements, like CSAT.

Generally, only about 15% of customers provide CSAT feedback and this metric has been elevated on a pedestal as Customer Support’s gold standard for measuring success.
It’s time to measure something more effective.

So, what’s different between CSAT and CES?

CSAT questions are worded like:

  • How satisfied were you with the support you received today?
  • Was the issue resolved?
  • Rate your conversation

Even though it was easy for the customer to get the right answer, to receive a refund, or to make an adjustment to their account, the customer still might be salty that they had to contact support in the first place, or that they had to talk to AI, and then unfortunately the AI or human agent receives a negative CSAT score.

CES questions trigger a different psychological response because the focus is on distinct aspects of the customers’ experience, engaging different cognitive and emotional processes.

Here are a few examples of CES survey questions:

  • How easy was it to resolve your issue?
  • How simple was the process to address your concern with our support team?
  • Did you find it easy to navigate our support process to resolve your issue?

How to implement it:

  • Set up a single-question survey after AI interactions
  • Set up automatic triggers to survey customers post-conversation
  • Segment results by issue type and conversation length

If your customer conversation platform doesn’t allow for a customized CSAT question, meaning repurposing the CSAT score into a CES score, consider using a 3rd party system and test these results for at least 2 quarters.

3. Sentiment Score

Analyzing sentiment changes helps you measure how customer emotions shift from the beginning to the end of an AI interaction. This means it’s predicting the customer’s intent for reaching out to support, analyzing the tone in their language, and then assessing their sentiment throughout the conversation.

How to implement it:

  • Many customer communication platforms natively have this feature now, such as Zendesk, Helpscout, and Gorgias. Also, Amazon developed a sentiment calculation called Comprehend. There are 3rd party platforms that solely focus on this.
  • The sentiment shifts are straight forward:
    • Positive → Positive
    • Negative → Positive
    • Positive → Negative
    • Negative → Negative

While measuring customer sentiment with human support, you’ll be able to assess whether your AI is making situations better or worse. You might find that your AI is great at answering simple questions but it makes frustrated customers even more upset. If this is the case, implement emotion-based routing, where customers showing negative sentiment can be quickly routed to specialized human agents.

Posted by

in

Subscribe with your favorite media platform.