TryVox

Stream

Stream live call audio to a WebSocket endpoint.

Stream

The Stream verb streams live call audio to a WebSocket endpoint in real-time. This is the key verb for integrating AI agents and real-time audio processing.

Example

{
  "voxml_version": "1.0",
  "instructions": [
    {
      "verb": "Stream",
      "url": "wss://ai-agent.example.com/stream",
      "track": "both_tracks"
    }
  ]
}

Parameters

ParameterTypeRequiredDefaultDescription
urlstringyesWebSocket URL (must start with wss://)
namestringnoIdentifier for this stream
trackstringnoboth_tracks"inbound_track", "outbound_track", or "both_tracks"
status_callback_urlstringnoURL to POST stream status events to
parametersobjectnoCustom parameters sent in start message

Audio Tracks

Both Tracks (Default)

Stream both caller and callee audio:

{
  "verb": "Stream",
  "url": "wss://ai.example.com/stream",
  "track": "both_tracks"
}

Inbound Track Only

Stream only the caller's audio:

{
  "verb": "Stream",
  "url": "wss://ai.example.com/stream",
  "track": "inbound_track"
}

Outbound Track Only

Stream only the callee's audio (what the caller hears):

{
  "verb": "Stream",
  "url": "wss://ai.example.com/stream",
  "track": "outbound_track"
}

Custom Parameters

Pass metadata to your WebSocket endpoint:

{
  "verb": "Stream",
  "url": "wss://ai.example.com/stream",
  "parameters": {
    "agent_id": "agent_123",
    "language": "en-US",
    "context": "customer_support"
  }
}

WebSocket Protocol

Connection

TryVox connects to your WebSocket URL and sends a start message:

{
  "event": "start",
  "stream_id": "stream_abc123",
  "call_uuid": "call_xyz789",
  "account_id": "acc_123",
  "from": "+919876543210",
  "to": "+911234567890",
  "custom_parameters": {
    "agent_id": "agent_123"
  }
}

Media Messages

Audio is sent as binary messages with this structure:

{
  "event": "media",
  "stream_id": "stream_abc123",
  "track": "inbound",
  "payload": "<base64-encoded-audio>",
  "timestamp": 1234567890
}

Audio format:

  • Codec: μ-law (PCMU)
  • Sample rate: 8kHz
  • Encoding: Base64
  • Frame size: 20ms

Stop Message

When streaming ends, TryVox sends:

{
  "event": "stop",
  "stream_id": "stream_abc123",
  "call_uuid": "call_xyz789"
}

Status Callbacks

Monitor stream lifecycle events:

{
  "verb": "Stream",
  "url": "wss://ai.example.com/stream",
  "status_callback_url": "https://example.com/stream-status"
}

TryVox POSTs to status_callback_url for these events:

  • stream-started - Stream connected successfully
  • stream-stopped - Stream ended normally
  • stream-failed - Stream connection failed

Example callback payload:

{
  "call_uuid": "call_xyz789",
  "stream_id": "stream_abc123",
  "event": "stream-started",
  "timestamp": 1234567890
}

AI Agent Integration

Stream is designed for AI voice agents. Here's a typical flow:

{
  "voxml_version": "1.0",
  "instructions": [
    {
      "verb": "Say",
      "text": "Connecting you to our AI assistant."
    },
    {
      "verb": "Stream",
      "url": "wss://ai-agent.example.com/voice",
      "track": "both_tracks",
      "parameters": {
        "user_id": "user_123",
        "intent": "support"
      }
    }
  ]
}

Your AI agent receives audio, processes it, and can:

  • Send audio back through the WebSocket
  • Use real-time STT/TTS
  • Maintain conversation context
  • Trigger actions based on speech

Best Practices

  • Always use secure WebSocket (wss://) URLs
  • Implement reconnection logic in your WebSocket server
  • Handle start and stop events to manage resources
  • Use parameters to pass call context to your AI agent
  • Monitor status_callback_url for debugging
  • Process audio in real-time to avoid latency
  • Use both_tracks for full conversation context in AI agents

Common Use Cases

  • AI voice agents and conversational AI
  • Real-time speech analytics
  • Call quality monitoring
  • Live transcription
  • Sentiment analysis
  • Voice biometrics
  • Custom audio processing pipelines

On this page