Lesson 2

Function Calling and Tool Use

14 min

Function calling is the mechanism by which an LLM signals its intent to invoke an external tool. Understanding what actually travels over the wire — and where it can go wrong — is essential for building reliable agents.

What Happens Under the Hood

When you define tools, the LLM receives them in the system context as JSON Schema. When it decides to call one, it returns a structured tool-use block instead of text:

json

{
  "type": "tool_use",
  "id":   "toolu_01XY",
  "name": "get_weather",
  "input": {
    "city":  "Addis Ababa",
    "units": "celsius"
  }
}

Your code parses this, executes the real function, and returns the result as a tool_result block in the next message. The LLM then continues reasoning with that result.

Defining Tools with JSON Schema

python

from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name":        "search_knowledge_base",
        "description": "Search the internal knowledge base for relevant documents. "
                       "Use this when the user asks about company policy or procedures.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type":        "string",
                    "description": "The search query. Be specific.",
                    "minLength":   3,
                    "maxLength":   200,
                },
                "max_results": {
                    "type":    "integer",
                    "default": 5,
                    "minimum": 1,
                    "maximum": 20,
                },
                "category": {
                    "type": "string",
                    "enum": ["policy", "technical", "hr", "legal"],
                    "description": "Filter results by document category.",
                },
            },
            "required": ["query"],
        },
    }
]

Write descriptions from the model's perspective: "Use this when..." is more effective than a technical summary of what the function does.

Executing Tool Calls Safely

python

import json
from typing import Any

TOOL_REGISTRY: dict[str, callable] = {
    "search_knowledge_base": search_knowledge_base,
    "get_weather":           get_weather,
}

def execute_tool(tool_name: str, tool_input: dict) -> dict[str, Any]:
    """Execute a tool with argument validation and error containment."""
    if tool_name not in TOOL_REGISTRY:
        return {"error": f"Unknown tool: {tool_name}"}

    try:
        result = TOOL_REGISTRY[tool_name](**tool_input)
        return {"result": result}
    except TypeError as e:
        return {"error": f"Invalid arguments: {e}"}
    except Exception as e:
        return {"error": f"Tool execution failed: {type(e).__name__}: {e}"}

Never let a tool exception propagate to the agent loop — wrap every call in an error boundary and return a structured error the LLM can reason about.

Full Agent Loop with Retries

python

MAX_ITERATIONS = 10

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    for iteration in range(MAX_ITERATIONS):
        response = client.messages.create(
            model      = "claude-opus-4-5",
            max_tokens = 1024,
            tools      = tools,
            messages   = messages,
        )

        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Final text answer
            return next(b.text for b in response.content if hasattr(b, "text"))

        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type":        "tool_result",
                        "tool_use_id": block.id,
                        "content":     json.dumps(result),
                    })

            messages.append({"role": "user", "content": tool_results})

    return "Maximum iterations reached without a final answer."

Argument Validation Before Execution

Validate LLM-generated arguments with Pydantic before running the function:

python

from pydantic import BaseModel, Field, ValidationError

class SearchInput(BaseModel):
    query:       str = Field(..., min_length=3, max_length=200)
    max_results: int = Field(default=5, ge=1, le=20)
    category:    str | None = None

def safe_search(tool_input: dict) -> dict:
    try:
        validated = SearchInput(**tool_input)
    except ValidationError as e:
        return {"error": f"Invalid input: {e.errors()}"}
    return search_knowledge_base(**validated.dict())

| Validation layer | What it catches | |---|---| | JSON Schema (in tool definition) | Tells the LLM what is valid | | Pydantic (before execution) | Rejects malformed LLM output | | Business logic checks | Domain-specific constraints | | Error boundary (try/except) | Runtime failures in the actual function |

Summary

Function calling works by the LLM returning a structured tool_use block, which your code executes and returns as a tool_result.
Write tool descriptions from the LLM's perspective using "Use this when..." phrasing to improve tool selection accuracy.
Always wrap tool execution in an error boundary and return structured errors — never let exceptions propagate to the loop.
Validate LLM-generated arguments with Pydantic before execution; JSON Schema in the tool definition is advisory, not enforced.
Set a hard iteration limit in the agent loop to prevent runaway executions.

Agents vs Chains Multi-Step Agents