Root Cause Analysis
Automatically identify why anomalies, drift, and incidents occur in your AI agents.
How RCA Works
When TuringPulse detects an anomaly, drift event, or incident, it automatically performs root cause analysis by correlating the event with:
- Fingerprint Changes - Prompt, config, or structure modifications
- Deployments - Recent code deployments
- Input Distribution - Changes in input patterns
- External Dependencies - Third-party API changes
- Resource Metrics - Memory, CPU, rate limits
- Time Patterns - Time-of-day or day-of-week effects
Attribution Scores
Each potential cause is assigned an attribution score (0-100%) indicating how likely it is to be the root cause:
| Score Range | Interpretation | Action |
|---|---|---|
| 80-100% | High confidence | Investigate this cause first |
| 50-79% | Moderate confidence | Likely contributing factor |
| 20-49% | Low confidence | Possible but not primary |
| 0-19% | Unlikely | Probably not related |
Viewing RCA Reports
From Anomalies
- Navigate to Operations → Overview → Anomalies
- Click on an anomaly to open the detail drawer
- Scroll to the Root Cause Analysis section
- Review the attributed causes and their scores
From Incidents
- Navigate to Operations → Incidents
- Click on an incident to view details
- The Analysis tab shows RCA results
- Click on a cause to see supporting evidence
From Drift Events
- Navigate to Operations → Overview → Drift
- Click on a drift event
- View correlated changes in the detail panel
Fingerprint-Based RCA
TuringPulse tracks "fingerprints" of your agent configuration to detect changes:
What's Fingerprinted
- Prompts - System prompts, user prompt templates
- Model Config - Temperature, max_tokens, model version
- Workflow Structure - Node graph, tool definitions
- Dependencies - External API versions
Enable Fingerprinting
fingerprint.py
from turingpulse_sdk import init, FingerprintConfig
init(
api_key="sk_live_...",
fingerprint=FingerprintConfig(
enabled=True,
capture_prompts=True, # Track prompt changes
capture_configs=True, # Track model config changes
capture_structure=True, # Track workflow structure
sensitive_config_keys=[ # Keys to redact (not hash)
"api_key", "password", "secret"
],
)
)ℹ️
Privacy
Prompts are hashed, not stored in plain text. Only the hash is used for change detection.
Deploy Correlation
Register deployments to correlate anomalies with code changes:
deploy.py
from turingpulse_sdk import register_deploy
# Auto-detect from CI/CD environment
register_deploy(
workflow_id="customer-support-agent",
auto_detect=True, # Detects GitHub Actions, GitLab CI, etc.
)
# Or provide explicit values
register_deploy(
workflow_id="customer-support-agent",
version="v1.2.3",
git_sha="abc123def",
commit_message="Fix prompt template",
deployed_by="ci/cd",
)CI/CD Integration
.github/workflows/deploy.yml
# GitHub Actions example
- name: Register Deploy
run: |
curl -X POST https://api.turingpulse.ai/v1/deploys \
-H "Authorization: Bearer ${{ secrets.TP_API_KEY }}" \
-H "Content-Type: application/json" \
-d '{
"workflow_id": "customer-support-agent",
"version": "${{ github.sha }}",
"git_sha": "${{ github.sha }}",
"commit_message": "${{ github.event.head_commit.message }}",
"deployed_by": "github-actions"
}'RCA API
Access RCA results programmatically:
rca_api.py
import requests
# Get RCA for an anomaly
response = requests.get(
"https://api.turingpulse.ai/v1/anomalies/{anomaly_id}/rca",
headers={"Authorization": "Bearer sk_live_..."},
)
rca = response.json()
# {
# "anomaly_id": "anom_123",
# "causes": [
# {
# "type": "fingerprint_change",
# "attribution_score": 0.85,
# "description": "Prompt template changed",
# "evidence": {
# "changed_at": "2024-01-15T10:30:00Z",
# "previous_hash": "abc123",
# "current_hash": "def456"
# }
# },
# {
# "type": "deployment",
# "attribution_score": 0.72,
# "description": "Deploy v1.2.3 at 10:25 AM",
# "evidence": {
# "version": "v1.2.3",
# "git_sha": "abc123def",
# "deployed_at": "2024-01-15T10:25:00Z"
# }
# }
# ]
# }Cause Types
| Type | Description | Evidence |
|---|---|---|
fingerprint_change | Prompt, config, or structure changed | Hash diff, change timestamp |
deployment | Code deployment correlated with event | Version, git SHA, deploy time |
input_distribution | Input patterns changed | Distribution stats, examples |
external_dependency | Third-party API issue | API name, error rates |
resource_constraint | Rate limit, timeout, OOM | Resource metrics |
temporal_pattern | Time-based pattern | Time correlation stats |
Best Practices
- Enable fingerprinting - This is the most valuable source of RCA data.
- Register all deploys - Even small changes can cause unexpected behavior.
- Review high-attribution causes first - Focus on causes with 80%+ attribution.
- Document resolutions - Add notes to incidents about what fixed the issue.