Metrics

Metrics

In MixedVoices, metrics help you evaluate and analyze your voice agent's performance. Each metric can be either binary (PASS/FAIL) or continuous (0-10 scale), allowing for both strict checks and nuanced performance evaluation.

Built-in Metrics

MixedVoices comes with several pre-defined metrics that cover common evaluation needs:

from mixedvoices.metrics import (
    empathy,                 # Measures emotional intelligence and response appropriateness
    hallucination,          # Checks for made-up information
    conciseness,            # Evaluates response brevity and clarity
    context_awareness,      # Assesses understanding of conversation context
    adaptive_qa,            # Measures ability to handle follow-up questions
    objection_handling,     # Evaluates handling of customer objections
    scheduling,             # Assesses appointment scheduling accuracy
    verbatim_repetition,    # Checks for unnecessary repetition
)

# Get all default metrics at once
from mixedvoices.metrics import get_all_default_metrics
metrics = get_all_default_metrics()

Creating Custom Metrics

You can create custom metrics to evaluate specific aspects of your agent's performance:

from mixedvoices.metrics import Metric

# Binary metric (PASS/FAIL)
call_hangup = Metric(
    name="call_hangup",
    definition="FAILS if the bot faces problems in ending the call appropriately",
    scoring="binary"
)

# Continuous metric (0-10 scale)
accent_handling = Metric(
    name="accent_handling",
    definition="Measures how well the agent understands and adapts to different accents",
    scoring="continuous"
)

# Metric that needs to check against agent prompt
hallucination_check = Metric(
    name="factual_accuracy",
    definition="Checks if agent makes claims not supported by its prompt",
    scoring="binary",
    include_prompt=True  # Prompt will be included during evaluation
)

Using Metrics in Projects

Metrics can be added when creating a project or updated later:

import mixedvoices as mv
from mixedvoices.metrics import empathy, get_all_default_metrics

# Create project with specific metrics
project = mv.create_project(
    "dental_clinic",
    metrics=[empathy, call_hangup]
)

# Or use all default metrics
project = mv.create_project(
    "medical_clinic",
    metrics=get_all_default_metrics()
)

# Add new metrics to existing project
project.add_metrics([accent_handling])

# Update existing metric by creating a metric with the same name
project.update_metric(new_call_hangup)

# List available metrics
metric_names = project.list_metric_names()

Example: Metric Set

# Create a comprehensive set of metrics for a medical receptionist
metrics = [
    Metric(
        name="hipaa_compliance",
        definition="Checks if agent maintains patient privacy standards",
        scoring="binary"
    ),
    Metric(
        name="urgency_detection",
        definition="Measures ability to identify and escalate medical emergencies",
        scoring="continuous"
    ),
    Metric(
        name="insurance_handling",
        definition="Evaluates accuracy in insurance information collection",
        scoring="continuous"
    )
]

project = mv.create_project("medical_reception", metrics=metrics)

Evaluation with Metrics

When creating an evaluator, you can choose which metrics to use. Check Agent Evaluation for more details.

# Use all project metrics by default
evaluator = project.create_evaluator(test_cases)

# Use specific metrics, these should be a subset of the project metrics
evaluator = project.create_evaluator(
    test_cases,
    metric_names=["hipaa_compliance", "urgency_detection"]
)

Tips

Creating Effective Metrics

  1. Clear Definitions: Make metric definitions specific and measurable

  2. Appropriate Scoring: Choose binary for pass/fail requirements, continuous for nuanced evaluation

  3. Prompt Awareness: Use include_prompt=True for metrics that need to check against agent knowledge

  4. Consistent Naming: Use lowercase, descriptive names without spaces

Last updated