Root Cause Analysis

Automatically identify why anomalies, drift, and incidents occur in your AI agents.

How RCA Works

When TuringPulse detects an anomaly, drift event, or incident, it automatically performs root cause analysis by correlating the event with:

Fingerprint Changes - Prompt, config, or structure modifications
Deployments - Recent code deployments
Input Distribution - Changes in input patterns
External Dependencies - Third-party API changes
Resource Metrics - Memory, CPU, rate limits
Time Patterns - Time-of-day or day-of-week effects

Attribution Scores

Each potential cause is assigned an attribution score (0-100%) indicating how likely it is to be the root cause:

Score Range	Interpretation	Action
80-100%	High confidence	Investigate this cause first
50-79%	Moderate confidence	Likely contributing factor
20-49%	Low confidence	Possible but not primary
0-19%	Unlikely	Probably not related

Viewing RCA Reports

From Anomalies

Navigate to Operations → Overview → Anomalies
Click on an anomaly to open the detail drawer
Scroll to the Root Cause Analysis section
Review the attributed causes and their scores

From Incidents

Navigate to Operations → Incidents
Click on an incident to view details
The Analysis tab shows RCA results
Click on a cause to see supporting evidence

From Drift Events

Navigate to Operations → Overview → Drift
Click on a drift event
View correlated changes in the detail panel

Fingerprint-Based RCA

TuringPulse tracks "fingerprints" of your agent configuration to detect changes:

What's Fingerprinted

Prompts - System prompts, user prompt templates
Model Config - Temperature, max_tokens, model version
Workflow Structure - Node graph, tool definitions
Dependencies - External API versions

Enable Fingerprinting

from turingpulse_sdk import init, FingerprintConfig

init(
    api_key="sk_live_...",
    fingerprint=FingerprintConfig(
        enabled=True,
        capture_prompts=True,      # Track prompt changes
        capture_configs=True,      # Track model config changes
        capture_structure=True,    # Track workflow structure
        sensitive_config_keys=[    # Keys to redact (not hash)
            "api_key", "password", "secret"
        ],
    )
)

ℹ️

Privacy

Prompts are hashed, not stored in plain text. Only the hash is used for change detection.

Deploy Correlation

from turingpulse_sdk import register_deploy

# Auto-detect from CI/CD environment
register_deploy(
    workflow_id="customer-support-agent",
    auto_detect=True,  # Detects GitHub Actions, GitLab CI, etc.
)

# Or provide explicit values
register_deploy(
    workflow_id="customer-support-agent",
    version="v1.2.3",
    git_sha="abc123def",
    commit_message="Fix prompt template",
    deployed_by="ci/cd",
)

CI/CD Integration

# GitHub Actions example
- name: Register Deploy
  run: |
    curl -X POST https://api.turingpulse.ai/v1/deploys \
      -H "Authorization: Bearer ${{ secrets.TP_API_KEY }}" \
      -H "Content-Type: application/json" \
      -d '{
        "workflow_id": "customer-support-agent",
        "version": "${{ github.sha }}",
        "git_sha": "${{ github.sha }}",
        "commit_message": "${{ github.event.head_commit.message }}",
        "deployed_by": "github-actions"
      }'

RCA API

Access RCA results programmatically:

import requests

# Get RCA for an anomaly
response = requests.get(
    "https://api.turingpulse.ai/v1/anomalies/{anomaly_id}/rca",
    headers={"Authorization": "Bearer sk_live_..."},
)

rca = response.json()
# {
#   "anomaly_id": "anom_123",
#   "causes": [
#     {
#       "type": "fingerprint_change",
#       "attribution_score": 0.85,
#       "description": "Prompt template changed",
#       "evidence": {
#         "changed_at": "2024-01-15T10:30:00Z",
#         "previous_hash": "abc123",
#         "current_hash": "def456"
#       }
#     },
#     {
#       "type": "deployment",
#       "attribution_score": 0.72,
#       "description": "Deploy v1.2.3 at 10:25 AM",
#       "evidence": {
#         "version": "v1.2.3",
#         "git_sha": "abc123def",
#         "deployed_at": "2024-01-15T10:25:00Z"
#       }
#     }
#   ]
# }

Cause Types

Type	Description	Evidence
`fingerprint_change`	Prompt, config, or structure changed	Hash diff, change timestamp
`deployment`	Code deployment correlated with event	Version, git SHA, deploy time
`input_distribution`	Input patterns changed	Distribution stats, examples
`external_dependency`	Third-party API issue	API name, error rates
`resource_constraint`	Rate limit, timeout, OOM	Resource metrics
`temporal_pattern`	Time-based pattern	Time correlation stats

Best Practices

Enable fingerprinting - This is the most valuable source of RCA data.
Register all deploys - Even small changes can cause unexpected behavior.
Review high-attribution causes first - Focus on causes with 80%+ attribution.
Document resolutions - Add notes to incidents about what fixed the issue.