System Architecture
The Semantic Router implements a sophisticated Mixture-of-Models (MoM) architecture using Envoy Proxy as the foundation, with an External Processor (ExtProc) service that provides intelligent routing capabilities. This design ensures high performance, scalability, and maintainability for production LLM deployments.
High-Level Architecture Overview​
Core Components​
1. Envoy Proxy - Traffic Management Layer​
Role: Acts as the entry point and traffic director for all LLM requests.
Key Responsibilities:
- Load Balancing: Distributes requests across backend model endpoints
- Health Checking: Monitors backend model availability and health
- Request/Response Processing: Handles HTTP protocol management
- Header Management: Manages routing headers set by the ExtProc service
- Timeout Management: Configures appropriate timeouts for different model types
Configuration Highlights:
# Envoy listener configuration
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8801 # Main entry point
http_filters:
- name: envoy.filters.http.ext_proc
typed_config:
grpc_service:
envoy_grpc:
cluster_name: extproc_service
processing_mode:
request_header_mode: "SEND" # Send headers for routing decisions
response_header_mode: "SEND" # Process response headers
request_body_mode: "BUFFERED" # Analyze request content
response_body_mode: "BUFFERED" # Process response content