In MixedVoices, metrics help you evaluate and analyze your voice agent's performance. Each metric can be either binary (PASS/FAIL) or continuous (0-10 scale), allowing for both strict checks and nuanced performance evaluation.
Built-in Metrics
MixedVoices comes with several pre-defined metrics that cover common evaluation needs:
from mixedvoices.metrics import ( empathy,# Measures emotional intelligence and response appropriateness hallucination,# Checks for made-up information conciseness,# Evaluates response brevity and clarity context_awareness,# Assesses understanding of conversation context adaptive_qa,# Measures ability to handle follow-up questions objection_handling,# Evaluates handling of customer objections scheduling,# Assesses appointment scheduling accuracy verbatim_repetition,# Checks for unnecessary repetition)# Get all default metrics at oncefrom mixedvoices.metrics import get_all_default_metricsmetrics =get_all_default_metrics()
Creating Custom Metrics
You can create custom metrics to evaluate specific aspects of your agent's performance:
from mixedvoices.metrics import Metric# Binary metric (PASS/FAIL)call_hangup =Metric( name="call_hangup", definition="FAILS if the bot faces problems in ending the call appropriately", scoring="binary")# Continuous metric (0-10 scale)accent_handling =Metric( name="accent_handling", definition="Measures how well the agent understands and adapts to different accents", scoring="continuous")# Metric that needs to check against agent prompthallucination_check =Metric( name="factual_accuracy", definition="Checks if agent makes claims not supported by its prompt", scoring="binary", include_prompt=True# Prompt will be included during evaluation)
Using Metrics in Projects
Metrics can be added when creating a project or updated later:
import mixedvoices as mvfrom mixedvoices.metrics import empathy, get_all_default_metrics# Create project with specific metricsproject = mv.create_project("dental_clinic", metrics=[empathy, call_hangup])# Or use all default metricsproject = mv.create_project("medical_clinic", metrics=get_all_default_metrics())# Add new metrics to existing projectproject.add_metrics([accent_handling])# Update existing metric by creating a metric with the same nameproject.update_metric(new_call_hangup)# List available metricsmetric_names = project.list_metric_names()
Example: Metric Set
# Create a comprehensive set of metrics for a medical receptionistmetrics = [Metric( name="hipaa_compliance", definition="Checks if agent maintains patient privacy standards", scoring="binary" ),Metric( name="urgency_detection", definition="Measures ability to identify and escalate medical emergencies", scoring="continuous" ),Metric( name="insurance_handling", definition="Evaluates accuracy in insurance information collection", scoring="continuous" )]project = mv.create_project("medical_reception", metrics=metrics)
Evaluation with Metrics
When creating an evaluator, you can choose which metrics to use. Check Agent Evaluation for more details.
# Use all project metrics by defaultevaluator = project.create_evaluator(test_cases)# Use specific metrics, these should be a subset of the project metricsevaluator = project.create_evaluator( test_cases, metric_names=["hipaa_compliance", "urgency_detection"])
Tips
Creating Effective Metrics
Clear Definitions: Make metric definitions specific and measurable
Appropriate Scoring: Choose binary for pass/fail requirements, continuous for nuanced evaluation
Prompt Awareness: Use include_prompt=True for metrics that need to check against agent knowledge
Consistent Naming: Use lowercase, descriptive names without spaces