Root Cause Analysis

Automatically identify why anomalies, drift, and incidents occur in your AI agents.

How RCA Works

When TuringPulse detects an anomaly, drift event, or incident, it automatically performs root cause analysis by correlating the event with:

  • Fingerprint Changes - Prompt, config, or structure modifications
  • Deployments - Recent code deployments
  • Input Distribution - Changes in input patterns
  • External Dependencies - Third-party API changes
  • Resource Metrics - Memory, CPU, rate limits
  • Time Patterns - Time-of-day or day-of-week effects

Attribution Scores

Each potential cause is assigned an attribution score (0-100%) indicating how likely it is to be the root cause:

Score RangeInterpretationAction
80-100%High confidenceInvestigate this cause first
50-79%Moderate confidenceLikely contributing factor
20-49%Low confidencePossible but not primary
0-19%UnlikelyProbably not related

Viewing RCA Reports

From Anomalies

  1. Navigate to Operations → Overview → Anomalies
  2. Click on an anomaly to open the detail drawer
  3. Scroll to the Root Cause Analysis section
  4. Review the attributed causes and their scores

From Incidents

  1. Navigate to Operations → Incidents
  2. Click on an incident to view details
  3. The Analysis tab shows RCA results
  4. Click on a cause to see supporting evidence

From Drift Events

  1. Navigate to Operations → Overview → Drift
  2. Click on a drift event
  3. View correlated changes in the detail panel

Fingerprint-Based RCA

TuringPulse tracks "fingerprints" of your agent configuration to detect changes:

What's Fingerprinted

  • Prompts - System prompts, user prompt templates
  • Model Config - Temperature, max_tokens, model version
  • Workflow Structure - Node graph, tool definitions
  • Dependencies - External API versions

Enable Fingerprinting

fingerprint.py
from turingpulse_sdk import init, FingerprintConfig

init(
    api_key="sk_live_...",
    fingerprint=FingerprintConfig(
        enabled=True,
        capture_prompts=True,      # Track prompt changes
        capture_configs=True,      # Track model config changes
        capture_structure=True,    # Track workflow structure
        sensitive_config_keys=[    # Keys to redact (not hash)
            "api_key", "password", "secret"
        ],
    )
)
ℹ️
Privacy
Prompts are hashed, not stored in plain text. Only the hash is used for change detection.

Deploy Correlation

Register deployments to correlate anomalies with code changes:

deploy.py
from turingpulse_sdk import register_deploy

# Auto-detect from CI/CD environment
register_deploy(
    workflow_id="customer-support-agent",
    auto_detect=True,  # Detects GitHub Actions, GitLab CI, etc.
)

# Or provide explicit values
register_deploy(
    workflow_id="customer-support-agent",
    version="v1.2.3",
    git_sha="abc123def",
    commit_message="Fix prompt template",
    deployed_by="ci/cd",
)

CI/CD Integration

.github/workflows/deploy.yml
# GitHub Actions example
- name: Register Deploy
  run: |
    curl -X POST https://api.turingpulse.ai/v1/deploys \
      -H "Authorization: Bearer ${{ secrets.TP_API_KEY }}" \
      -H "Content-Type: application/json" \
      -d '{
        "workflow_id": "customer-support-agent",
        "version": "${{ github.sha }}",
        "git_sha": "${{ github.sha }}",
        "commit_message": "${{ github.event.head_commit.message }}",
        "deployed_by": "github-actions"
      }'

RCA API

Access RCA results programmatically:

rca_api.py
import requests

# Get RCA for an anomaly
response = requests.get(
    "https://api.turingpulse.ai/v1/anomalies/{anomaly_id}/rca",
    headers={"Authorization": "Bearer sk_live_..."},
)

rca = response.json()
# {
#   "anomaly_id": "anom_123",
#   "causes": [
#     {
#       "type": "fingerprint_change",
#       "attribution_score": 0.85,
#       "description": "Prompt template changed",
#       "evidence": {
#         "changed_at": "2024-01-15T10:30:00Z",
#         "previous_hash": "abc123",
#         "current_hash": "def456"
#       }
#     },
#     {
#       "type": "deployment",
#       "attribution_score": 0.72,
#       "description": "Deploy v1.2.3 at 10:25 AM",
#       "evidence": {
#         "version": "v1.2.3",
#         "git_sha": "abc123def",
#         "deployed_at": "2024-01-15T10:25:00Z"
#       }
#     }
#   ]
# }

Cause Types

TypeDescriptionEvidence
fingerprint_changePrompt, config, or structure changedHash diff, change timestamp
deploymentCode deployment correlated with eventVersion, git SHA, deploy time
input_distributionInput patterns changedDistribution stats, examples
external_dependencyThird-party API issueAPI name, error rates
resource_constraintRate limit, timeout, OOMResource metrics
temporal_patternTime-based patternTime correlation stats

Best Practices

  • Enable fingerprinting - This is the most valuable source of RCA data.
  • Register all deploys - Even small changes can cause unexpected behavior.
  • Review high-attribution causes first - Focus on causes with 80%+ attribution.
  • Document resolutions - Add notes to incidents about what fixed the issue.

Next Steps