Documentation Index Fetch the complete documentation index at: https://mintlify.com/microsoft/autogen/llms.txt
Use this file to discover all available pages before exploring further.
Model clients provide the interface between AutoGen agents and large language models. AutoGen supports multiple LLM providers through the autogen-ext package.
Installation
Install the extension for your chosen provider:
OpenAI
Anthropic
Azure
Ollama
Llama.cpp
pip install "autogen-ext[openai]"
OpenAI
The OpenAIChatCompletionClient supports GPT-4, GPT-3.5, o1, and o3 models.
Basic Usage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.agents import AssistantAgent
# Create OpenAI client
model_client = OpenAIChatCompletionClient(
model = "gpt-4o" ,
api_key = "sk-..." , # Or set OPENAI_API_KEY environment variable
)
# Use with an agent
agent = AssistantAgent(
name = "assistant" ,
model_client = model_client,
system_message = "You are a helpful assistant."
)
Configuration Options
The model name (e.g., gpt-4o, gpt-4-turbo, gpt-3.5-turbo)
OpenAI API key. If not provided, reads from OPENAI_API_KEY environment variable
Sampling temperature between 0 and 2
Nucleus sampling parameter
Maximum tokens to generate
Request timeout in seconds
Override the default OpenAI API endpoint
Advanced Example
from autogen_ext.models.openai import OpenAIChatCompletionClient
client = OpenAIChatCompletionClient(
model = "gpt-4o" ,
api_key = "sk-..." ,
temperature = 0.7 ,
top_p = 0.9 ,
max_tokens = 4096 ,
timeout = 120.0 ,
# For Azure-compatible endpoints
base_url = "https://custom-endpoint.openai.azure.com/" ,
)
Azure OpenAI
The AzureOpenAIChatCompletionClient connects to Azure OpenAI Service.
Basic Usage
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
client = AzureOpenAIChatCompletionClient(
model = "gpt-4o" ,
api_version = "2024-02-01" ,
azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com" ,
api_key = "..." , # Or use Azure AD authentication
azure_deployment = "gpt-4o-deployment" , # Your deployment name
)
Configuration Options
The Azure OpenAI endpoint URL
Azure OpenAI API version (e.g., 2024-02-01)
Your deployment name in Azure
Azure Active Directory token for authentication
Azure AD Authentication
from azure.identity import DefaultAzureCredential
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
# Using Azure AD authentication
credential = DefaultAzureCredential()
token = credential.get_token( "https://cognitiveservices.azure.com/.default" )
client = AzureOpenAIChatCompletionClient(
model = "gpt-4o" ,
api_version = "2024-02-01" ,
azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com" ,
azure_ad_token = token.token,
azure_deployment = "gpt-4o-deployment" ,
)
Anthropic
The AnthropicChatCompletionClient supports Claude models.
Basic Usage
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
client = AnthropicChatCompletionClient(
model = "claude-3-5-sonnet-20241022" ,
api_key = "sk-ant-..." , # Or set ANTHROPIC_API_KEY
max_tokens = 4096 ,
)
Configuration Options
Claude model name:
claude-3-5-sonnet-20241022 - Most capable
claude-3-opus-20240229 - Previous flagship
claude-3-sonnet-20240229 - Balanced
claude-3-haiku-20240307 - Fast and compact
Anthropic API key. Falls back to ANTHROPIC_API_KEY environment variable
Maximum tokens to generate. Required for Anthropic models
Sampling temperature between 0 and 1
Nucleus sampling parameter
Only sample from top K options
Extended Thinking (Claude 3.5 Sonnet)
Claude 3.5 Sonnet supports extended thinking mode:
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
client = AnthropicChatCompletionClient(
model = "claude-3-5-sonnet-20241022" ,
api_key = "sk-ant-..." ,
max_tokens = 16000 ,
thinking = {
"type" : "enabled" ,
"budget_tokens" : 10000 , # Tokens for thinking
},
)
AWS Bedrock
Use Claude models through AWS Bedrock:
from autogen_ext.models.anthropic import AnthropicBedrockChatCompletionClient
client = AnthropicBedrockChatCompletionClient(
model = "anthropic.claude-3-5-sonnet-20241022-v2:0" ,
max_tokens = 4096 ,
# AWS credentials from environment or ~/.aws/credentials
aws_region = "us-west-2" ,
)
AWS region (e.g., us-west-2, us-east-1)
AWS session token for temporary credentials
Ollama
The OllamaChatCompletionClient connects to local Ollama instances.
Basic Usage
from autogen_ext.models.ollama import OllamaChatCompletionClient
client = OllamaChatCompletionClient(
model = "llama3.2" ,
host = "http://localhost:11434" ,
)
Configuration Options
Ollama model name (e.g., llama3.2, mistral, qwen2.5)
host
string
default: "http://localhost:11434"
Ollama server URL
Nucleus sampling parameter
Maximum tokens to generate
Advanced Configuration
from autogen_ext.models.ollama import OllamaChatCompletionClient
client = OllamaChatCompletionClient(
model = "llama3.2" ,
host = "http://localhost:11434" ,
temperature = 0.7 ,
top_p = 0.9 ,
top_k = 40 ,
num_ctx = 8192 , # Context window
num_predict = 2048 , # Max generation
repeat_penalty = 1.1 ,
seed = 42 , # For reproducibility
)
Llama.cpp
Run GGUF models locally with llama.cpp:
Installation
pip install "autogen-ext[llama-cpp]"
Basic Usage
from autogen_ext.models.llama_cpp import LlamaCppChatCompletionClient
client = LlamaCppChatCompletionClient(
model_path = "./models/llama-3.2-3b-instruct-q8_0.gguf" ,
n_ctx = 8192 , # Context window
n_gpu_layers = 35 , # Offload layers to GPU
)
Configuration Options
Path to the GGUF model file
Number of layers to offload to GPU
Maximum tokens to generate
Azure AI
Connect to Azure AI model deployments:
from autogen_ext.models.azure import AzureAIChatCompletionClient
client = AzureAIChatCompletionClient(
endpoint = "https://YOUR-ENDPOINT.inference.ai.azure.com" ,
credential = "YOUR-API-KEY" ,
model = "gpt-4o" ,
)
Streaming Responses
All model clients support streaming:
from autogen_core import CancellationToken
from autogen_core.models import UserMessage
async def stream_example ( client ):
messages = [UserMessage( content = "Tell me a story" , source = "user" )]
async for chunk in client.create_stream(messages, CancellationToken()):
if chunk.content:
print (chunk.content, end = "" , flush = True )
Model Capabilities
Query model capabilities:
capabilities = client.capabilities
print ( f "Vision: { capabilities.vision } " )
print ( f "Function calling: { capabilities.function_calling } " )
print ( f "JSON output: { capabilities.json_output } " )
Token Counting
Count tokens before sending requests:
from autogen_core.models import UserMessage
messages = [UserMessage( content = "Hello, world!" , source = "user" )]
token_count = client.count_tokens(messages)
print ( f "Message uses { token_count } tokens" )
Usage Tracking
Track token usage from responses:
from autogen_core import CancellationToken
from autogen_core.models import UserMessage
messages = [UserMessage( content = "Explain quantum computing" , source = "user" )]
result = await client.create(messages, CancellationToken())
print ( f "Prompt tokens: { result.usage.prompt_tokens } " )
print ( f "Completion tokens: { result.usage.completion_tokens } " )
Error Handling
Handle common errors:
from openai import RateLimitError, APIError
from anthropic import AnthropicError
import asyncio
async def create_with_retry ( client , messages , max_retries = 3 ):
for attempt in range (max_retries):
try :
return await client.create(messages, CancellationToken())
except RateLimitError:
if attempt < max_retries - 1 :
await asyncio.sleep( 2 ** attempt) # Exponential backoff
else :
raise
except APIError as e:
print ( f "API error: { e } " )
raise
Environment Variables
Model clients respect standard environment variables:
# OpenAI
export OPENAI_API_KEY = "sk-..."
export OPENAI_ORG_ID = "org-..."
# Anthropic
export ANTHROPIC_API_KEY = "sk-ant-..."
# Azure OpenAI
export AZURE_OPENAI_ENDPOINT = "https://..."
export AZURE_OPENAI_API_KEY = "..."
# AWS (for Bedrock)
export AWS_ACCESS_KEY_ID = "..."
export AWS_SECRET_ACCESS_KEY = "..."
export AWS_REGION = "us-west-2"
Best Practices
Use Environment Variables
Store API keys in environment variables instead of hardcoding:
import os
from autogen_ext.models.openai import OpenAIChatCompletionClient
# Good: reads from environment
client = OpenAIChatCompletionClient(
model = "gpt-4o" ,
api_key = os.getenv( "OPENAI_API_KEY" ),
)
# Better: automatic from environment
client = OpenAIChatCompletionClient( model = "gpt-4o" )
Set Timeouts
Always configure appropriate timeouts:
client = OpenAIChatCompletionClient(
model = "gpt-4o" ,
timeout = 120.0 , # 2 minute timeout
)
Monitor Usage
Track token usage to manage costs:
total_prompt_tokens = 0
total_completion_tokens = 0
result = await client.create(messages, CancellationToken())
total_prompt_tokens += result.usage.prompt_tokens
total_completion_tokens += result.usage.completion_tokens
print ( f "Total usage: { total_prompt_tokens + total_completion_tokens } tokens" )
Next Steps
Code Executors Set up code execution environments
Tools Add tools and capabilities to agents