Now in Public Beta

Detect LLM Hallucinations

Trace every LLM call, evaluate output alignment with your system prompts, and get real-time alerts when your AI goes off-script.

Near-instant eval — avg up to 2000+ TPS
Integrate in under 5 minutes
quickstart.py
# pip install hallutraceai
from hallutraceai import HalluTrace

ht = HalluTrace(api_key="sk_live_...")

ht.trace(
    session_id="chat-123",
    type="agent",
    input="What is Python?",
    output="Python is a programming language.",
    system_prompt="You are a helpful assistant."
)
# Non-blocking — returns instantly. We evaluate in background.

Works with any LLM provider

OpenAIAnthropicGoogle GeminiMistralCohereLangChainLlamaIndexn8nFlowiseAIHugging FaceAny LLMOpenAIAnthropicGoogle GeminiMistralCohereLangChainLlamaIndexn8nFlowiseAIHugging FaceAny LLM

134+

Evals/sec

<1.2s

Latency

99.9%

Uptime

Dashboard

Catch Every Hallucination in Real Time

Live scores, detection trends, model comparisons, and session monitoring — updating as your LLM runs.

Project: My AI Chatbot

Last 7 days overview

Healthy1,247 evals

Avg Score

12.4

Sessions

342

Flagged

18

Messages

2.8K

Hallucination Score Distribution

0-10
872
11-30
561
31-50
312
51-70
149
71-100
62

Eval Detection Trends (Last 7 Days)

HallucinationToxicityBiasPII
50250

Topic Detection

Detecting topics...

Custom Detection Breakdown

68%Clean
Clean68%
Hallucination14%
Toxicity7%
Bias5%
PII Leak4%
Off-Topic2%

Model Comparison

GPT-4o
12%
Claude 3.5
9%
Gemini Pro
18%
Mistral L
22%
Llama 3
16%
Halluc.Toxic.Bias

RAG Faithfulness Score

0%Faithfulness
Context: 91%Source: 84%Grounded: 87%

Sentiment & Tone Analysis

Professional
42%
Friendly
28%
Neutral
18%
Formal
8%
Negative
4%

Response Cost Analytics

GPT-4o
$12512.4K
Claude 3.5
$898.9K
Gemini Pro
$4615.2K
Mistral L
$286.1K
Total: $28842.6K requestsAvg: $6.76/1K

Prompt Compliance Score

0%Compliant
Format
96%
Tone
92%
Boundaries
94%
Instructions
91%

Anomaly Detection

Threshold
SPIKE
SPIKE
00:0004:0008:0012:0016:0020:0023:00
2 anomalies detectedAuto-alert triggered
SessionMsgsScore
chat-52fc524
chat-39271417
chat-c9541344
chat-25fe1672
chat-62841580

Real-Time Hallucination Correction

HalluTrace scans every agent response against your RAG sources and system prompt in real time. When hallucination or prompt violation is detected, it automatically signals your agent to retry — before the user ever sees a wrong answer.

Pay as You Go & MAX-T
Internal Policy Assistantsession: chat-a4f2

Starting demo...

HalluTrace Monitor

Waiting for traces...

Auto-Scan Every Response

Every agent reply is checked against RAG sources, system prompt, and context — inputs, outputs, and metadata.

Detect & Intercept

Catches RAG data mismatches and system prompt violations. Score above threshold triggers correction.

Auto-Correct & Verify

Signals your agent to retry with source grounding. Re-scans the corrected response before delivery.

Features

Everything you need to trust your LLM

From trace ingestion to hallucination scoring to real-time alerts — one platform to monitor and evaluate your AI outputs.

Real-Time Tracing

Capture every LLM call — inputs, outputs, system prompts, model names. Grouped by chat session automatically.

Hallucination Detection

LLM-as-judge evaluates if outputs align with your system prompts. Scores from 0 (perfect) to 100 (hallucinated).

Instant Alerts

Get notified via email, SMS, or webhook when hallucination scores exceed your threshold. Default at 50.

Rich Analytics

Score trends, distributions, model comparisons, session breakdowns — all with animated, interactive charts.

CSV Data Tables

No SDK? Upload CSV files with your LLM data and run hallucination checks directly from the dashboard.

Simple Integration

3 lines of Python. Or use our REST API. Or swap your OpenAI base URL. Works with any LLM provider.

How It Works

Three steps to hallucination-free AI

01

Integrate SDK

Install our Python or JS SDK. Add 3 lines of code. Every LLM call is now traced — inputs, outputs, system prompts, and metadata.

02

Auto Evaluate

Our engine automatically scores each response for hallucination. LLM-as-judge checks alignment with your system prompt. Score 0-100.

03

Monitor & Alert

View scores in your dashboard with rich charts. Set thresholds. Get instant alerts via email, SMS, or webhook when things go wrong.

Pricing

Simple, token-based pricing

Start free with 10M tokens/month. Scale as you grow. All plans include Zero Storage mode. Paid plans include bring-your-own-model support.

Free

$0/ forever

Perfect for trying out hallucination detection.

Start Free

Infrastructure

  • 10M tokens / month
  • Standard rate limit
  • Up to 1 Gbps uplink
  • 128K tokens per trace

Platform

  • Unlimited projects
  • 1 team member
  • 7-day data retention
  • Zero Storage option
  • CSV data tables

Monitoring

  • Full analytics & charts
  • Email alerts
  • End-to-end encryption
Most Popular

Pay as You Go

$0.08/ per 1M input (128K)

Deposit $25 to start. Scale without limits. Pay only for what you use.

Get Started

Infrastructure

  • Unlimited tokens
  • Enhanced rate limit
  • Up to 2 Gbps uplink
  • Up to 2M tokens per trace *

Platform

  • Unlimited projects
  • 1 team member
  • 90-day data retention
  • Zero Storage option
  • CSV data tables

Pricing

  • 128K model: $0.08/1M input, $0.18/1M output
  • 2M model: $0.28/1M input, $0.68/1M output

Monitoring

  • Full analytics & charts
  • Email & SMS alerts
  • End-to-end encryption

Advanced Analytics

  • Topic Detection (auto-classify conversation topics)
  • Custom Detection Breakdown
  • Model Comparison analytics
  • RAG Faithfulness scoring
  • Sentiment & Tone analysis
  • Response Cost Analytics
  • Prompt Compliance scoring
  • Anomaly Detection with alerts

Bring Your Own Model

  • Custom OpenAI SDK /v1 endpoint
  • 80% token discount with your own model
Max Tracing

MAX-T

$895/ per month

Max Tracing — full control, custom integrations, and dedicated support.

Start 30-Day Free Trial

Infrastructure

  • Highest priority rate limit
  • Up to 5 Gbps uplink
  • Up to 2M tokens per trace *
  • 15% discount on all transactions

Platform

  • Everything in Pay as You Go
  • 10 projects
  • 365-day data retention
  • Zero Storage option
  • Unlimited team members

Advanced Eval

  • Go Beyond Hallucination (toxicity, bias, PII & more)
  • Custom Eval LLM Endpoint
  • Custom OpenAI SDK /v1 endpoint
  • 80% token discount with your own model

Advanced Analytics

  • Topic Detection (auto-classify conversation topics)
  • Custom Detection Breakdown
  • Model Comparison analytics
  • RAG Faithfulness scoring
  • Sentiment & Tone analysis
  • Response Cost Analytics
  • Prompt Compliance scoring
  • Anomaly Detection with alerts

Integrations

  • Webhook integration with custom output
  • JSON export API

Dedicated Support

  • 12 feature requests per year
  • Unlimited 24/7/365 email priority support
Zero Storage — data deleted after detection
Bring your own model — 80% off (paid plans)
No hidden fees. No minimum commitment.

Compare plans in detail

Hover over any feature to learn what it does.

FeatureFreePay as You GoMAX-T
Usage & Limits
Monthly tokens

Total tokens (input + output + system prompt + RAG context) processed per month.

10MUnlimitedUnlimited
Rate limit

How fast you can send trace data to our API. Higher tiers get priority throughput.

StandardEnhancedHighest priority
Server uplink speed

Dedicated server connection speed for your account — faster uplink means lower latency for trace ingestion.

1 Gbps2 Gbps5 Gbps
Max tokens per trace

Maximum token size for a single trace detection — includes input, output, system prompt, and RAG context combined. * 2M context models are subject to higher per-token pricing.

128KUp to 2M *Up to 2M *
Projects

Separate environments for different apps or services, each with its own API key.

UnlimitedUnlimited10
Team members

Number of users who can access the dashboard and manage projects.

11Unlimited
Data retention

How long we store your trace data and evaluation results before auto-deletion.

7 days90 days365 days
Core Features
LLM call tracing

Capture inputs, outputs, system prompts, and model info for every LLM call automatically.

Hallucination detection

LLM-as-judge evaluates each response for alignment with your system prompt. Score 0–100.

Full analytics & charts

Interactive dashboards with score trends, distributions, model comparisons, and session breakdowns.

End-to-end encryption

Your LLM API keys and data are encrypted in transit. We never store keys in plain text.

Alerts & Integrations
Email alerts

Get notified by email when hallucination scores exceed your configured threshold.

SMS alerts

Receive SMS notifications for critical hallucination events in real time.

Webhook integration with custom output

Send custom-formatted data to your webhook — integrate with Slack, PagerDuty, Jira, or any automation workflow.

JSON export API

Programmatically export all your trace data and evaluation results as JSON for external analysis.

Data & Storage
CSV data tables

Upload CSV files with your LLM data and run hallucination checks directly — no SDK needed.

Zero Storage option

Your trace data is deleted immediately after hallucination detection runs. We store nothing.

Advanced & Customization
Custom OpenAI SDK /v1 endpoint

Use your own LLM model via an OpenAI-compatible /v1 endpoint for hallucination evaluation.

80% token discount (BYOM)

Bring your own model and we only charge for traffic & bandwidth — 80% off standard token pricing.

Go Beyond Hallucination

Custom eval prompts — detect toxicity, bias, off-topic responses, PII leaks, or anything custom. You define the eval criteria and output format.

Topic Detection

Automatically classify conversation topics — politics, healthcare, finance, tech, and more. See what your users are asking about in real time.

Custom Detection Breakdown

Visual breakdown of all detection types — hallucination, toxicity, bias, PII leaks, and off-topic responses in one interactive chart.

Model Comparison analytics

Compare hallucination rates, toxicity, and bias across different LLM models side by side.

RAG Faithfulness scoring

Measure how faithfully your LLM responses match the provided RAG context — context match, source citation, and grounding scores.

Sentiment & Tone analysis

Analyze the sentiment and tone of LLM responses — professional, friendly, neutral, formal, or negative.

Response Cost Analytics

Track token costs per model, per request, and per project — see exactly where your LLM budget goes.

Prompt Compliance scoring

Measure how well LLM outputs follow your system prompt instructions — format, tone, boundaries, and instruction adherence.

Anomaly Detection

Automatically detect unusual spikes in hallucination scores and trigger alerts when anomalies are found.

Custom Eval LLM Endpoint

Create a custom eval endpoint. We POST your trace data to your own model for evaluation — full control over the eval pipeline.

Volume discounts (15% off)

Get 15% off total token pricing at scale — the more you use, the less you pay.

Support
Community support

Access to documentation, guides, and community resources.

12 feature requests per year

Submit up to 12 custom feature or integration requests per year. Send us a ticket with your idea and we'll build it for you.

Unlimited 24/7/365 email priority support

Unlimited email priority support available around the clock for your critical needs. Separate from the 12 feature requests.

* 2M context model pricing: $0.28/1M input, $0.68/1M output. 128K model: $0.08/1M input, $0.18/1M output. MAX-T gets 15% off all transactions.

Stop hallucinations before your users notice

Join teams using HalluTrace AI to monitor, evaluate, and improve their LLM outputs. Start free with 10M tokens every month.