Cohere Integration
Full observability for the Cohere API. Capture chat, embed, and rerank calls with tool use tracking and automatic instrumentation.
Cohere SDK >= 5.0Command R+Embed v3Rerank
Installation
Terminal
pip install turingpulse_sdk turingpulse_sdk_cohere cohereQuick Start
1. Initialize & Instrument
setup.py
from turingpulse_sdk import init, TuringPulseConfig
from turingpulse_sdk_cohere import patch_cohere
# Initialize TuringPulse
init(TuringPulseConfig(
api_key="sk_live_your_api_key",
workflow_name="my-project",
))
# Enable auto-instrumentation for Cohere
patch_cohere()2. Use Cohere Normally
main.py
import cohere
client = cohere.ClientV2(api_key="your-cohere-key")
# Chat with Command R+ - traces are captured automatically
response = client.chat(
model="command-r-plus",
messages=[
{"role": "user", "content": "Explain the theory of relativity"},
],
)
print(response.message.content[0].text)ℹ️
Zero Code Changes
Once auto-instrumentation is enabled, all Cohere API calls including chat, embed, and rerank are automatically traced.
What Gets Captured
| Data Point | Description | Example |
|---|---|---|
| Chat Calls | Model, messages, and completion with metadata | command-r-plus, tokens: 420 |
| Embed Calls | Embedding model, input texts, and dimensions | embed-v3.0, 10 texts, 1024 dims |
| Rerank Calls | Query, documents, and relevance scores | rerank-v3.0, 25 docs, top=5 |
| Tool Use | Tool calls with arguments and results | search_db(query='revenue Q4') |
| Token Usage | Input and output token counts per call | prompt: 280, completion: 140 |
| Latency | End-to-end and per-call timing | total: 1800ms, chat: 1500ms |
| Errors | API errors with status codes and context | TooManyRequestsError: rate limited |
Advanced Configuration
config.py
from turingpulse_sdk import instrument, KPIConfig
from turingpulse_sdk_cohere import patch_cohere
patch_cohere(name="cohere-service")
@instrument(
name="cohere-agent",
kpis=[
KPIConfig(kpi_id="latency_ms", use_duration=True, alert_threshold=5000),
KPIConfig(kpi_id="tokens", alert_threshold=4000, comparator="gt"),
],
)
def my_agent(query: str):
return co.chat(model="command-r-plus", message=query)Tool Use
tools.py
import cohere
client = cohere.ClientV2(api_key="your-cohere-key")
tools = [
{
"type": "function",
"function": {
"name": "query_database",
"description": "Query the sales database",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "SQL query to execute"},
},
"required": ["query"],
},
},
},
]
response = client.chat(
model="command-r-plus",
messages=[{"role": "user", "content": "What were our Q4 sales?"}],
tools=tools,
)
# Tool calls are automatically captured in the traceEmbeddings & Rerank
embed-rerank.py
import cohere
client = cohere.ClientV2(api_key="your-cohere-key")
# Embeddings - batch size and dimensions are tracked
embed_response = client.embed(
texts=["Hello world", "Machine learning is fascinating"],
model="embed-english-v3.0",
input_type="search_document",
embedding_types=["float"],
)
# Rerank - query, documents, and scores are tracked
rerank_response = client.rerank(
query="What is machine learning?",
documents=["ML is a subset of AI...", "Deep learning uses neural networks..."],
model="rerank-english-v3.0",
top_n=3,
)
# Both embed and rerank calls are captured with full metadata💡
RAG Pipeline Tracking
Combine Cohere embed, rerank, and chat tracing to get full visibility into your RAG pipeline performance.