Documentation Index Fetch the complete documentation index at: https://mintlify.com/microsoft/autogen/llms.txt
Use this file to discover all available pages before exploring further.
The autogen_ext package provides model clients, integrations, and extensions for AutoGen.
Model Clients
OpenAIChatCompletionClient
Client for OpenAI chat completion models. from autogen_ext.models.openai import OpenAIChatCompletionClient
client = OpenAIChatCompletionClient(
model = "gpt-4o" ,
api_key = "sk-..." ,
temperature = 0.7 ,
max_tokens = 1000
)
# Use with agents
from autogen_agentchat.agents import AssistantAgent
agent = AssistantAgent(
name = "assistant" ,
model_client = client,
description = "GPT-4 powered assistant"
)
Model identifier (e.g., “gpt-4o”, “gpt-4-turbo”, “gpt-3.5-turbo”)
OpenAI API key (defaults to OPENAI_API_KEY env var)
Sampling temperature (0.0 to 2.0)
Maximum tokens in response
Nucleus sampling parameter
Request timeout in seconds
Methods Generate a chat completion from autogen_core.models import UserMessage
result = await client.create(
messages = [UserMessage( content = "Hello!" , source = "user" )],
temperature = 0.8
)
print (result.content)
Returns: CreateResult Stream chat completion chunks async for chunk in client.create_stream( messages = messages):
if chunk.content:
print (chunk.content, end = "" , flush = True )
AzureOpenAIChatCompletionClient
Client for Azure OpenAI Service. from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
client = AzureOpenAIChatCompletionClient(
model = "gpt-4" ,
azure_endpoint = "https://your-resource.openai.azure.com" ,
api_key = "your-api-key" ,
api_version = "2024-02-15-preview" ,
azure_deployment = "gpt-4-deployment"
)
Azure OpenAI resource endpoint
Azure OpenAI API key (or use Azure AD auth)
Azure Active Directory token
Function to provide Azure AD tokens
AnthropicChatCompletionClient
Client for Anthropic Claude models. from autogen_ext.models.anthropic import AnthropicChatCompletionClient
client = AnthropicChatCompletionClient(
model = "claude-3-5-sonnet-20241022" ,
api_key = "sk-ant-..." ,
max_tokens = 4096
)
Model name (e.g., “claude-3-5-sonnet-20241022”, “claude-3-opus-20240229”)
Maximum tokens in response (required for Anthropic)
OllamaChatCompletionClient
Client for local Ollama models. from autogen_ext.models.ollama import OllamaChatCompletionClient
client = OllamaChatCompletionClient(
model = "llama3.1:8b" ,
base_url = "http://localhost:11434"
)
SemanticKernelChatCompletionClient
Client using Semantic Kernel integration. from autogen_ext.models.semantic_kernel import SemanticKernelChatCompletionClient
from semantic_kernel import Kernel
kernel = Kernel()
# Configure kernel...
client = SemanticKernelChatCompletionClient(
kernel = kernel,
service_id = "chat-gpt"
)
Service identifier in the kernel
LlamaCppChatCompletionClient
Client for llama.cpp models. from autogen_ext.models.llama_cpp import LlamaCppChatCompletionClient
client = LlamaCppChatCompletionClient(
model_path = "/path/to/model.gguf" ,
n_ctx = 4096 ,
n_gpu_layers = 35
)
Number of layers to offload to GPU
ReplayChatCompletionClient
Client that replays recorded responses for testing. from autogen_ext.models.replay import ReplayChatCompletionClient
client = ReplayChatCompletionClient(
responses = [
"First response" ,
"Second response" ,
"Third response"
]
)
# Useful for testing and deterministic behavior
Pre-recorded responses to replay in order
Model Configuration
OpenAIClientConfiguration
Configuration dataclass for OpenAI clients. from autogen_ext.models.openai import OpenAIClientConfiguration
config = OpenAIClientConfiguration(
model = "gpt-4o" ,
api_key = "sk-..." ,
temperature = 0.7 ,
max_tokens = 2000
)
# Use with component system
client = OpenAIChatCompletionClient.from_config(config)
AzureOpenAIClientConfiguration
Configuration for Azure OpenAI clients. from autogen_ext.models.openai import AzureOpenAIClientConfiguration
config = AzureOpenAIClientConfiguration(
model = "gpt-4" ,
azure_endpoint = "https://your-resource.openai.azure.com" ,
api_version = "2024-02-15-preview" ,
azure_deployment = "gpt-4-deployment"
)
Caching
CachedChatCompletionClient
Wrapper that caches model responses. from autogen_ext.models.cache import CachedChatCompletionClient
from autogen_ext.models.openai import OpenAIChatCompletionClient
base_client = OpenAIChatCompletionClient( model = "gpt-4o" )
cached_client = CachedChatCompletionClient(
client = base_client,
cache_dir = ".autogen_cache"
)
# First call hits the API
result1 = await cached_client.create( messages = messages)
# Second identical call uses cache
result2 = await cached_client.create( messages = messages)
client
ChatCompletionClient
required
Underlying model client to cache
Directory for cache storage (default: “.autogen_cache”)
Seed for cache key generation
Code Execution
DockerCommandLineCodeExecutor
Execute code in Docker containers. from autogen_ext.code_executors import DockerCommandLineCodeExecutor
executor = DockerCommandLineCodeExecutor(
image = "python:3.11" ,
timeout = 60 ,
work_dir = "/workspace"
)
result = await executor.execute_code_blocks(
code_blocks = [
( "python" , "print('Hello from Docker!')" )
]
)
Docker image to use (default: “python:3.11-slim”)
Execution timeout in seconds
Working directory in container
LocalCommandLineCodeExecutor
Execute code locally (use with caution). from autogen_ext.code_executors import LocalCommandLineCodeExecutor
executor = LocalCommandLineCodeExecutor(
timeout = 30 ,
work_dir = "./code_execution"
)
# WARNING: Executes code on your local machine
result = await executor.execute_code_blocks(
code_blocks = [( "python" , "print('Hello')" )]
)
Execution timeout in seconds
Working directory for execution
Memory Extensions
Vector-based semantic memory. from autogen_ext.memory import VectorMemory
from autogen_ext.models.openai import OpenAIEmbeddingClient
memory = VectorMemory(
embedding_client = OpenAIEmbeddingClient( model = "text-embedding-3-small" ),
collection_name = "agent_memory" ,
top_k = 5
)
# Use with AssistantAgent
agent = AssistantAgent(
name = "assistant" ,
model_client = client,
memory = [memory],
description = "Assistant with semantic memory"
)
Redis-backed persistent memory. from autogen_ext.memory import RedisMemory
memory = RedisMemory(
redis_url = "redis://localhost:6379" ,
namespace = "agent_memory"
)
Runtimes & Deployment
Distributed runtime using gRPC. from autogen_ext.runtimes.grpc import GrpcAgentRuntime
runtime = GrpcAgentRuntime(
host = "0.0.0.0" ,
port = 50051
)
await runtime.start()
Cloud-based runtime for distributed agents. from autogen_ext.runtimes.cloud import CloudRuntime
runtime = CloudRuntime(
project_id = "my-project" ,
region = "us-central1"
)
Utilities
Rate limiting for API calls. from autogen_ext.utils import RateLimiter
limiter = RateLimiter( max_calls = 100 , time_window = 60 )
async with limiter:
result = await client.create( messages = messages)
Count tokens for cost estimation. from autogen_ext.utils import TokenCounter
counter = TokenCounter( model = "gpt-4" )
tokens = counter.count_messages(messages)
print ( f "Estimated cost: $ { tokens * 0.00003 } " )
Configuration Models
All model clients support declarative configuration through the component system:
from autogen_core import ComponentModel
from autogen_ext.models.openai import OpenAIChatCompletionClient
# Create from config
config = ComponentModel(
component_type = "OpenAIChatCompletionClient" ,
config = {
"model" : "gpt-4o" ,
"temperature" : 0.7
}
)
client = OpenAIChatCompletionClient.from_config(config)
See Also