
Detect LLM
Hallucinations
Trace every LLM call, evaluate output alignment with your system prompts, and get real-time alerts when your AI goes off-script.
# pip install hallutraceai
from hallutraceai import HalluTrace
ht = HalluTrace(api_key="sk_live_...")
ht.trace(
session_id="chat-123",
type="agent",
input="What is Python?",
output="Python is a programming language.",
system_prompt="You are a helpful assistant."
)
# Non-blocking — returns instantly. We evaluate in background.Works with any LLM provider
134+
Evals/sec
<1.2s
Latency
99.9%
Uptime
Catch Every Hallucination in Real Time
Live scores, detection trends, model comparisons, and session monitoring — updating as your LLM runs.
Project: My AI Chatbot
Last 7 days overview
Avg Score
12.4
Sessions
342
Flagged
18
Messages
2.8K
Hallucination Score Distribution
Eval Detection Trends (Last 7 Days)
Topic Detection
Custom Detection Breakdown
Model Comparison
RAG Faithfulness Score
Sentiment & Tone Analysis
Response Cost Analytics
Prompt Compliance Score
Anomaly Detection
| Session | Msgs | Score |
|---|---|---|
| chat-52fc | 5 | 24 |
| chat-3927 | 14 | 17 |
| chat-c954 | 13 | 44 |
| chat-25fe | 16 | 72 |
| chat-6284 | 15 | 80 |
Real-Time Hallucination Correction
HalluTrace scans every agent response against your RAG sources and system prompt in real time. When hallucination or prompt violation is detected, it automatically signals your agent to retry — before the user ever sees a wrong answer.
Starting demo...
Waiting for traces...
Auto-Scan Every Response
Every agent reply is checked against RAG sources, system prompt, and context — inputs, outputs, and metadata.
Detect & Intercept
Catches RAG data mismatches and system prompt violations. Score above threshold triggers correction.
Auto-Correct & Verify
Signals your agent to retry with source grounding. Re-scans the corrected response before delivery.
Everything you need to trust your LLM
From trace ingestion to hallucination scoring to real-time alerts — one platform to monitor and evaluate your AI outputs.
Real-Time Tracing
Capture every LLM call — inputs, outputs, system prompts, model names. Grouped by chat session automatically.
Hallucination Detection
LLM-as-judge evaluates if outputs align with your system prompts. Scores from 0 (perfect) to 100 (hallucinated).
Instant Alerts
Get notified via email, SMS, or webhook when hallucination scores exceed your threshold. Default at 50.
Rich Analytics
Score trends, distributions, model comparisons, session breakdowns — all with animated, interactive charts.
CSV Data Tables
No SDK? Upload CSV files with your LLM data and run hallucination checks directly from the dashboard.
Simple Integration
3 lines of Python. Or use our REST API. Or swap your OpenAI base URL. Works with any LLM provider.
Three steps to hallucination-free AI
Integrate SDK
Install our Python or JS SDK. Add 3 lines of code. Every LLM call is now traced — inputs, outputs, system prompts, and metadata.
Auto Evaluate
Our engine automatically scores each response for hallucination. LLM-as-judge checks alignment with your system prompt. Score 0-100.
Monitor & Alert
View scores in your dashboard with rich charts. Set thresholds. Get instant alerts via email, SMS, or webhook when things go wrong.

Simple, token-based pricing
Start free with 10M tokens/month. Scale as you grow. All plans include Zero Storage mode. Paid plans include bring-your-own-model support.
Free
Perfect for trying out hallucination detection.
Start FreeInfrastructure
- 10M tokens / month
- Standard rate limit
- Up to 1 Gbps uplink
- 128K tokens per trace
Platform
- Unlimited projects
- 1 team member
- 7-day data retention
- Zero Storage option
- CSV data tables
Monitoring
- Full analytics & charts
- Email alerts
- End-to-end encryption
Pay as You Go
Deposit $25 to start. Scale without limits. Pay only for what you use.
Get StartedInfrastructure
- Unlimited tokens
- Enhanced rate limit
- Up to 2 Gbps uplink
- Up to 2M tokens per trace *
Platform
- Unlimited projects
- 1 team member
- 90-day data retention
- Zero Storage option
- CSV data tables
Pricing
- 128K model: $0.08/1M input, $0.18/1M output
- 2M model: $0.28/1M input, $0.68/1M output
Monitoring
- Full analytics & charts
- Email & SMS alerts
- End-to-end encryption
Advanced Analytics
- Topic Detection (auto-classify conversation topics)
- Custom Detection Breakdown
- Model Comparison analytics
- RAG Faithfulness scoring
- Sentiment & Tone analysis
- Response Cost Analytics
- Prompt Compliance scoring
- Anomaly Detection with alerts
Bring Your Own Model
- Custom OpenAI SDK /v1 endpoint
- 80% token discount with your own model
MAX-T
Max Tracing — full control, custom integrations, and dedicated support.
Start 30-Day Free TrialInfrastructure
- Highest priority rate limit
- Up to 5 Gbps uplink
- Up to 2M tokens per trace *
- 15% discount on all transactions
Platform
- Everything in Pay as You Go
- 10 projects
- 365-day data retention
- Zero Storage option
- Unlimited team members
Advanced Eval
- Go Beyond Hallucination (toxicity, bias, PII & more)
- Custom Eval LLM Endpoint
- Custom OpenAI SDK /v1 endpoint
- 80% token discount with your own model
Advanced Analytics
- Topic Detection (auto-classify conversation topics)
- Custom Detection Breakdown
- Model Comparison analytics
- RAG Faithfulness scoring
- Sentiment & Tone analysis
- Response Cost Analytics
- Prompt Compliance scoring
- Anomaly Detection with alerts
Integrations
- Webhook integration with custom output
- JSON export API
Dedicated Support
- 12 feature requests per year
- Unlimited 24/7/365 email priority support
Compare plans in detail
Hover over any feature to learn what it does.
| Feature | Free | Pay as You Go | MAX-T |
|---|---|---|---|
| Usage & Limits | |||
| Monthly tokens Total tokens (input + output + system prompt + RAG context) processed per month. | 10M | Unlimited | Unlimited |
| Rate limit How fast you can send trace data to our API. Higher tiers get priority throughput. | Standard | Enhanced | Highest priority |
| Server uplink speed Dedicated server connection speed for your account — faster uplink means lower latency for trace ingestion. | 1 Gbps | 2 Gbps | 5 Gbps |
| Max tokens per trace Maximum token size for a single trace detection — includes input, output, system prompt, and RAG context combined. * 2M context models are subject to higher per-token pricing. | 128K | Up to 2M * | Up to 2M * |
| Projects Separate environments for different apps or services, each with its own API key. | Unlimited | Unlimited | 10 |
| Team members Number of users who can access the dashboard and manage projects. | 1 | 1 | Unlimited |
| Data retention How long we store your trace data and evaluation results before auto-deletion. | 7 days | 90 days | 365 days |
| Core Features | |||
| LLM call tracing Capture inputs, outputs, system prompts, and model info for every LLM call automatically. | |||
| Hallucination detection LLM-as-judge evaluates each response for alignment with your system prompt. Score 0–100. | |||
| Full analytics & charts Interactive dashboards with score trends, distributions, model comparisons, and session breakdowns. | |||
| End-to-end encryption Your LLM API keys and data are encrypted in transit. We never store keys in plain text. | |||
| Alerts & Integrations | |||
| Email alerts Get notified by email when hallucination scores exceed your configured threshold. | |||
| SMS alerts Receive SMS notifications for critical hallucination events in real time. | |||
| Webhook integration with custom output Send custom-formatted data to your webhook — integrate with Slack, PagerDuty, Jira, or any automation workflow. | |||
| JSON export API Programmatically export all your trace data and evaluation results as JSON for external analysis. | |||
| Data & Storage | |||
| CSV data tables Upload CSV files with your LLM data and run hallucination checks directly — no SDK needed. | |||
| Zero Storage option Your trace data is deleted immediately after hallucination detection runs. We store nothing. | |||
| Advanced & Customization | |||
| Custom OpenAI SDK /v1 endpoint Use your own LLM model via an OpenAI-compatible /v1 endpoint for hallucination evaluation. | |||
| 80% token discount (BYOM) Bring your own model and we only charge for traffic & bandwidth — 80% off standard token pricing. | |||
| Go Beyond Hallucination Custom eval prompts — detect toxicity, bias, off-topic responses, PII leaks, or anything custom. You define the eval criteria and output format. | |||
| Topic Detection Automatically classify conversation topics — politics, healthcare, finance, tech, and more. See what your users are asking about in real time. | |||
| Custom Detection Breakdown Visual breakdown of all detection types — hallucination, toxicity, bias, PII leaks, and off-topic responses in one interactive chart. | |||
| Model Comparison analytics Compare hallucination rates, toxicity, and bias across different LLM models side by side. | |||
| RAG Faithfulness scoring Measure how faithfully your LLM responses match the provided RAG context — context match, source citation, and grounding scores. | |||
| Sentiment & Tone analysis Analyze the sentiment and tone of LLM responses — professional, friendly, neutral, formal, or negative. | |||
| Response Cost Analytics Track token costs per model, per request, and per project — see exactly where your LLM budget goes. | |||
| Prompt Compliance scoring Measure how well LLM outputs follow your system prompt instructions — format, tone, boundaries, and instruction adherence. | |||
| Anomaly Detection Automatically detect unusual spikes in hallucination scores and trigger alerts when anomalies are found. | |||
| Custom Eval LLM Endpoint Create a custom eval endpoint. We POST your trace data to your own model for evaluation — full control over the eval pipeline. | |||
| Volume discounts (15% off) Get 15% off total token pricing at scale — the more you use, the less you pay. | |||
| Support | |||
| Community support Access to documentation, guides, and community resources. | |||
| 12 feature requests per year Submit up to 12 custom feature or integration requests per year. Send us a ticket with your idea and we'll build it for you. | |||
| Unlimited 24/7/365 email priority support Unlimited email priority support available around the clock for your critical needs. Separate from the 12 feature requests. | |||
* 2M context model pricing: $0.28/1M input, $0.68/1M output. 128K model: $0.08/1M input, $0.18/1M output. MAX-T gets 15% off all transactions.
Stop hallucinations before your users notice
Join teams using HalluTrace AI to monitor, evaluate, and improve their LLM outputs. Start free with 10M tokens every month.