Instrumentation at Scale: The Universal Plugin Architecture for AI Agents
How TuringPulse's plugin architecture delivers zero-code instrumentation across 12+ AI frameworks and 6+ LLM providers — without framework lock-in or dependency bloat.
Why One SDK Isn't Enough
The AI framework landscape in 2026 is not fragmented — it is shattered. LangChain, CrewAI, AutoGen, LlamaIndex, Pydantic AI, DSPy, Haystack, Semantic Kernel, Google ADK, Strands, Mastra, Vercel AI — each framework has its own execution model, its own callback system, its own way of representing agent steps, tool calls, and LLM interactions. On the provider side, OpenAI, Anthropic, Google GenAI, Amazon Bedrock, Cohere, and Mistral each expose different client libraries with different interfaces, different streaming patterns, and different metadata structures.
A team running a LangGraph-based agent with OpenAI today might switch to CrewAI with Anthropic next quarter. A platform team supporting multiple product squads will have LlamaIndex in retrieval pipelines, DSPy in optimization loops, and AutoGen in multi-agent orchestration — all running simultaneously. Each combination of framework and provider produces telemetry in a different shape, at different lifecycle points, with different semantics.
Building a single monolithic SDK that understands every framework and every provider is a losing strategy. The integration surface is too large, the release cadence too fast, and the dependency graph too deep. Every framework update risks breaking the entire instrumentation layer. Every new provider requires changes to a shared codebase that touches every other integration. The monolith approach does not scale — not in engineering effort, not in dependency management, and not in release velocity.
The alternative is a plugin architecture: a thin, stable core that defines what telemetry looks like, and a constellation of independent plugins that know how to extract that telemetry from each specific framework and provider. The core never changes when a new framework appears. A plugin never breaks other plugins when it updates. Teams install only the plugins they actually use. This is the architecture that makes instrumentation scale.
The Plugin Architecture
A well-designed instrumentation SDK is organized into three layers. The core SDK handles everything framework-agnostic: trace lifecycle management, span creation, metric computation, governance evaluation, and telemetry transport. It defines the universal data model that all plugins emit into.
Framework plugins translate framework-specific execution patterns into the universal model. A LangGraph plugin understands graph execution and node callbacks. A CrewAI plugin knows how to intercept the agent-task-crew hierarchy. Each framework plugin is a separate package with minimal dependencies.
Provider plugins handle LLM-specific instrumentation — wrapping API calls to capture token counts, model parameters, cost data, and streaming responses. Each provider plugin captures metadata that the core cannot: pricing tiers, model identifiers, and provider-specific error codes.
The installation model reflects this separation. Teams install only the plugins they actually use — no unnecessary dependencies, no version conflicts, no bloat.
Each plugin should be independently versioned and released, so updating your framework instrumentation never risks breaking your provider instrumentation — and vice versa. This decoupling is what makes the plugin architecture scale across dozens of integration targets.
A good SDK lets you install everything with one command for experimentation, or pick specific plugins for production to keep dependencies minimal.
Zero-Code Instrumentation
The goal of every plugin is zero-code instrumentation: teams should get comprehensive telemetry without modifying their agent code. A developer should not need to add tracing decorators, wrap function calls, or manually emit spans. The instrumentation should be invisible — activated by importing a plugin and calling a single initialization function, then silently capturing every relevant event as the agent runs.
A well-designed plugin system achieves this through three instrumentation patterns, chosen based on what each framework supports.
The patch pattern uses monkey-patching to intercept library functions at the module level. When a provider plugin initializes, it patches the client's API methods to wrap every call with span creation, token counting, and metadata capture. Every LLM call in the entire process — regardless of which framework invokes it — is automatically instrumented. This pattern is powerful for global coverage but coarse-grained: it instruments everything or nothing.
The proxy pattern wraps individual client instances rather than patching the library globally. Instead of using the client directly, teams wrap it with a proxy function like client = instrument(OpenAI()). The proxy intercepts the same calls as the patch pattern but provides per-client control: teams can instrument their production client while leaving their test client unwrapped. This pattern is ideal when multiple clients exist with different roles or when selective instrumentation is required.
The callback pattern leverages framework-native extension points. LangGraph exposes callback handlers that fire at graph, node, and edge transitions. The LangGraph plugin registers a callback handler that translates these events into standardized spans with the correct parent-child relationships, timing data, and metadata. This pattern produces the richest telemetry because it captures framework-level semantics — graph structure, node types, conditional edges — that are invisible to lower-level instrumentation.
All three patterns produce the same standardized telemetry. A span created by the patch pattern is structurally identical to one created by the callback pattern. The universal data model ensures that downstream systems — dashboards, alerts, drift detection, governance evaluation — work identically regardless of how the telemetry was captured.
What Gets Captured, Automatically
Regardless of which framework or provider is in use, well-designed instrumentation plugins capture a standardized set of telemetry that covers the full agent execution lifecycle. The universal data model ensures that every agent — whether built on LangGraph, CrewAI, AutoGen, or bare API calls — produces structurally identical telemetry that the platform can analyze, compare, and govern consistently.
For LLM calls, every plugin captures the model identifier, input and output token counts, estimated cost based on provider pricing, wall-clock latency, the presence and structure of tool calls, and the model parameters in effect (temperature, top_p, max_tokens, stop sequences). Streaming calls capture time-to-first-token and chunk-level timing in addition to aggregate metrics. These captures happen at the provider plugin level, so they work identically whether the LLM call originates from a framework or from direct API usage.
For agent steps, framework plugins capture the inputs presented to the agent at each step, the outputs produced, intermediate reasoning or chain-of-thought where the framework exposes it, tool selection decisions, and the outcome of each step (success, error, retry). Multi-agent frameworks like CrewAI and AutoGen additionally capture inter-agent communication, task delegation, and role assignments — the social structure of the agent ensemble, not just the individual actions.
For workflow structure, plugins that instrument graph-based frameworks capture the DAG representation of the workflow: nodes, edges, conditional routing logic, entry and exit points, and parallel execution branches. This structural metadata enables the platform to detect when a workflow's topology changes — a node added, an edge rerouted, a conditional modified — and correlate those structural changes with performance shifts.
For framework metadata, every plugin captures the framework version, SDK version, plugin version, runtime environment (Python version, OS), and relevant configuration. This contextual metadata is essential for debugging — knowing that a regression appeared after upgrading from LangGraph 0.2.x to 0.3.x is immediately actionable in a way that a raw metric drop is not.
Plugin packages should be intentionally separate to keep dependency trees minimal. A framework plugin should not pull in any LLM provider libraries, and vice versa. Each plugin depends only on the core SDK and the library it instruments — nothing more.
From Instrumentation to Intelligence
Raw telemetry is the starting material, not the end product. The value of universal instrumentation is that it feeds every downstream intelligence system in the platform with consistent, high-fidelity data — regardless of the source framework or provider. The same data model flows through the entire pipeline, from SDK capture to actionable insight.
Traces flow into observability dashboards where teams can inspect individual agent runs, drill into specific spans, compare execution timelines across workflow versions, and identify bottlenecks. Because the telemetry is standardized, a LangGraph trace and a CrewAI trace render in the same dashboard with the same drill-down capabilities. Teams do not need separate tools for separate frameworks.
Metrics feed KPI monitoring and drift detection. Aggregated token counts, latency distributions, cost trends, and quality scores are computed from span-level telemetry and tracked against configured baselines. When a metric drifts beyond its configured threshold, the governance engine fires an alert. Because the metrics are derived from the universal data model, drift detection works identically across frameworks — a latency regression in a LangGraph workflow triggers the same detection logic as one in a CrewAI pipeline.
Fingerprints enable change correlation. The SDK generates behavioral fingerprints from prompt templates, model configurations, and workflow structures. When a fingerprint changes — a prompt was modified, a model parameter was adjusted, a new node was added to a graph — the platform records the change and correlates it with any subsequent KPI movements. This closes the loop between “what changed” and “what happened as a result.”
Events populate governance review queues. When an agent execution triggers a governance policy — exceeding a cost threshold, producing a low-quality output, invoking a restricted tool — the event is captured with full context and routed to the appropriate review queue. Reviewers see the complete trace, the policy that triggered, and the specific span that violated the constraint. Governance is not a separate system consuming separate data; it operates on the same telemetry that powers observability.
Getting Started
Instrumenting your AI agents with the right SDK takes minutes, not days. The setup follows the same pattern regardless of your framework and provider combination: install the core SDK and the plugins you need, initialize with your project credentials, and run your agent as usual. Below are complete examples for Python and TypeScript.
Python: Install and Initialize
Install the core SDK alongside the framework and provider plugins you need:
pip install turingpulse-sdk turingpulse-sdk-langgraph turingpulse-sdk-openaiThen initialize the SDK at application startup:
from turingpulse_sdk import init
init(api_key="sk_...", workflow_name="my-agent")
# That's it — LangGraph and OpenAI calls are now auto-instrumentedTypeScript: Install and Initialize
The TypeScript SDK follows the same plugin model:
npm install @turingpulse/sdk @turingpulse/sdk-langgraph @turingpulse/sdk-openaiInitialize at application startup:
import { init } from '@turingpulse/sdk';
init({ apiKey: 'sk_...', workflowName: 'my-agent' });
// LangGraph and OpenAI calls are now auto-instrumentedFramework Integration Examples
Beyond basic initialization, the SDK supports multiple integration styles depending on how much control you need. Here are real-world examples for the most common patterns.
LangGraph
For LangGraph-based agents, the framework plugin wraps your compiled graph and automatically converts every node execution into a span with timing, inputs, outputs, and model metadata:
from turingpulse_sdk_langgraph import instrument_langgraph
# Wrap your compiled graph — all node executions become spans
app = instrument_langgraph(graph, name="Research Pipeline", model="gpt-4o")
# Run as usual — telemetry flows automatically
result = await app.ainvoke({"query": "Analyze Q4 earnings..."})Custom Agent with Decorators
For agents that don't use a supported framework, or when you want fine-grained control over KPI tracking and governance policies, the @instrument decorator provides a declarative approach:
from turingpulse_sdk import instrument, KPIConfig, GovernanceDirective
@instrument(
name="Customer Support Agent",
kpis=[
KPIConfig(kpi_id="response_quality", from_result_path="quality_score"),
KPIConfig(kpi_id="latency_ms", use_duration=True, alert_threshold=5000),
],
governance=GovernanceDirective(hitl=True, reviewers=["support-leads@co.com"]),
)
async def handle_ticket(ticket: dict) -> dict:
# Your agent logic here — all tracing, KPIs, governance handled by the decorator
...REST API (Direct Integration)
For teams working in languages without a native SDK, or for custom pipeline integrations, the REST API provides full control over telemetry submission:
curl -X POST https://api.turingpulse.ai/api/v1/sdk/events \
-H "X-API-Key: sk_..." \
-H "Content-Type: application/json" \
-d '{
"events": [{
"run_id": "abc-123",
"agent_id": "my-agent",
"type": "span",
"workflow_name": "Custom Agent",
"payload": {
"name": "llm_inference",
"duration_ms": 1500,
"status": "success",
"tokens": { "prompt": 1000, "completion": 200 },
"cost_usd": 0.003
}
}]
}'Choosing the Right Integration Method
| Method | Setup Time | Coverage | Flexibility |
|---|---|---|---|
| Framework Plugin | Minutes | Automatic, comprehensive | Framework-specific |
| @instrument Decorator | Minutes | Per-function, customizable | Any Python/TS code |
| REST API | Hours | Manual, full control | Any language |
This is exactly how TuringPulse's SDK works — a lightweight core with 19 Python and 12 TypeScript plugins covering every major AI framework and LLM provider. Each plugin is independently versioned and released, so updating your LangGraph instrumentation never risks breaking your OpenAI instrumentation. Whether you choose zero-code framework plugins, declarative decorators, or direct REST API calls, every integration method produces the same standardized telemetry that powers the full observability and governance platform.