Measure and Improve

Measure and Improve

Lyro gets better when you treat it as a loop, not a one-time setup. This guide walks you through measuring how your agent performs, acting on the gaps Lyro surfaces, and re-testing your changes before you publish them.

The improvement loop

Improving your agent comes down to four repeating steps:

  1. Read your Analytics dashboards to see how the agent is doing.
  2. Act on Findings by adding knowledge articles or intents.
  3. Re-test your changes with Evaluations.
  4. Re-publish the agent and watch the next cycle of data.

Each step feeds the next, so the more often you run the loop, the sharper your agent becomes.

Step 1 - Read your analytics

Open Analytics to see how your agent is performing. The dashboard ships with a standard set of charts and lets you build your own.

The default charts cover:

MetricWhat it tells you
Resolution rateHow often conversations get resolved without a human
Answer rateHow often the agent actually answers instead of deflecting
IntentsWhich topics customers ask about most
CSATHow satisfied customers are with the replies
ConversationsOverall volume over time
Response timeHow fast the agent replies
Knowledge groundingHow well answers are backed by your sources
Tool healthWhether connected tools are firing reliably

Use the controls at the top of the page to focus your view:

  • Date range picker - narrow charts to the period you care about.
  • New chart - build a custom chart for a metric or breakdown you want to track.
  • Settings menu - export a CSV, include or exclude test traffic, or reset to the default chart set.

Tip: Test traffic from the playground is excluded by default so your numbers reflect real customers. Toggle "Include test traffic" only when you are debugging your own tests.

Low answer rate, dipping CSAT, or weak knowledge grounding on a popular intent are your cue to dig deeper.

Step 2 - Act on findings

Findings is where Lyro tells you exactly what to fix. The Operator sweeps your workspace and surfaces knowledge gaps, recurring questions your agent could not answer, and configuration issues.

Each finding is grouped so you can triage fast:

  • Urgent - high-confidence issues worth handling first.
  • Worth a look - lower-confidence items to review when you have time.

You can run a sweep on demand with the Run sweep button, or schedule recurring sweeps from Settings. Filter findings by status, knowledge base, or intent to focus on one area.

Turn a finding into a fix

Open any finding to see the detail view, then act on it:

  • Draft an article - Lyro drafts a knowledge article that answers the gap. Review it and publish.
  • Add or refine an intent - close the gap with a dedicated intent for a recurring topic so your agents route and answer it correctly.
  • Ask the Operator - open the Operator panel for more context on why a finding was flagged.

Once you have acted, update the finding's status - mark it resolved, snooze it, or dismiss it - so your task list stays clean.

Note: Resolved and dismissed findings stay in the History tab, so you always have a record of what changed and when.

Step 3 - Re-test with evaluations

Before you trust a change, prove it works. Evaluations are test suites that measure how your agent handles real questions.

To build a suite, you can:

  • Generate from KB - auto-create a starter set of cases from one of your knowledge bases.
  • Save as eval - capture a real turn from the playground as a case.
  • New set - create a suite by hand and add your own cases.

Each case can carry an expected answer, which Lyro uses as the ground truth to judge against.

Run and read results

Pick the agent to test, choose a version (live, draft, or an older published version), and click Run. Lyro scores every case and shows a pass or fail along with:

  • Pass rate across the whole suite.
  • Per-axis scores - whether it routed to the right agent, matched the expected answer, and stayed grounded in your sources.
  • Judge rationale - why each case passed or failed.

Compare the actual answer against the expected one side by side. If a new answer is better, use Use as expected to promote it so future runs are judged against it.

Tip: Run the same suite against your draft before publishing. If the pass rate holds or improves, you know your fix did not break anything that already worked.

Step 4 - Re-publish and repeat

When your evals look good, publish the updated agent. Then come back to Analytics in a few days to confirm the metric you were chasing actually moved.

Run this loop on a regular cadence - read, act, re-test, re-publish - and your agent keeps closing gaps on its own. For a refresher on the building blocks, see Train Lyro on Your Content.