🔧 LLM Tools

Despite their impressive capabilities, LLMs have inherent limitations that prevent them from functioning as fully autonomous agents. These limitations fall into three categories: temporal, interaction, and functional.

🕰️ Temporal limitation

They cannot access information beyond their training data or retrieve data that changes in real time. This makes them unreliable for tasks requiring current knowledge, such as answering questions about recent events or live market conditions.

➡️ Interaction limitation

They cannot directly influence the external world. While they can generate text describing an action, they cannot actually book a flight, send an email, or control a device.

🧮 Functional limitations

In specialized domains certain tasks requiring precise computation, code execution, or media generation often exceed what language models can reliably accomplish through text prediction alone.

Tools can address each of these limitations.

Temporal toolsaugment information to bridge the temporal gap by injecting up-to-date knowledge into the agent's context. They can be used to extend the LLM's awareness beyond its training cutoff date. These can include tooling to allow your Agent web access, real-time APIs such as weather or stock prices, or custom databases.

Action-executing tools overcome the interaction limitation by enabling agents to affect the real world. Examples include booking systems for hotels and flights, communication tools for emails and calendar management, automation tools for document generation, and IoT controllers for smart devices. These tools trigger Actions and receive Observations in return, forming a feedback loop with the environment.

Domain-specialised tools address functional limitations by handling tasks where LLMs fall short. Calculators, code interpreters, format converters, image generators, and voice synthesis engines allow hybrid problem-solving—combining the LLM's reasoning with reliable execution in specialised domains.

Common LLM providers such as OpenAI and Anthropic already offer some tooling, for example OpenAI's web search feature and Anthropic's web search tool. These tools are available for us to use immediately without any development effort. The trade off is that they cannot be easily modified or extended, and they can be changed at the providers discretion. LLM providers will list the available non-custom, tools available to you.

Custom tools

We are going to focus on building custom tooling for our learning and to provide understanding to help debug and build fully bespoke Agents for our needs. However, it's good to be aware that provider tools exist for rapid prototyping or if they fit your business need.

How does an LLM use tools?

So how can an LLM, that generates text based on probabilities, actually select and execute a tool? This is made possible by a feature called Tool Calling. We are able to define tools via structured descriptions that tell the LLM what tools (functions, APIs or actions) are available, what parameters they expect and what they do. Once the LLM receives this information, it can reason about which tool may be appropriate for a given task and how to use it, purely through text generation. If the LLM decides to use a tool, it generates a tool call; a structured output that includes the name of the tool and the arguments required to invoke it. This tool call is passed to an external executor, which actually runs the tool with the arguments in the real world (i.e. our code).

How tool calling works

Tool calling is a multi-step process. It starts with the developer sending the users prompt along with tool definitions to the LLM. Note: the LLM itself does not execute the tool(s) itself, it generates a textual response indicating which tool (if any) should be called and with what parameters. Think of the LLM like a mediator between the user prompt and the available tools. Our code, will then execute the tool identified by the LLM with the parameters provided also by the LLM. Once the tool has executed, the result is returned to the LLM, along with all prior messages, which then interprets the output and continues to conversation, assessing if a final result can be returned or further tools are required.

So within our agent, we must:

Define the tool definitions we will provide to the LLM.
Define the actual tool implementation.
Provide the LLM with the task to perform and tool definitions from step 1.
Build a systems that can execute the tools based on the tool call generated by the LLM.
Reflect the execution results back into the LLM's context.

Step 1: Tool Definitions

Structured definition of the calculator tool. Think of this like the "instruction manual" for the LLM.

"type: function" indicates to the LLM that this is a callable tool
"name:" the tool's identifier that the LLM will use to reference it ("calculator")
"description:" When and why to use this tool (perform basic arithmetic operations)
"parameters:" The specification of inputs needed to use the tool

calculator_tool_definition = {
    "type": "function",
    "function": {
        "name": "calculator",
        "description": "Perform basic arithmetic operations between two numbers.",
        "parameters": {
            "type": "object",
            "properties": {
                "operator": {
                    "type": "string",
                    "description": "Arithmetic operation to perform",
                    "enum": ["add", "subtract", "multiply", "divide"],
                },
                "first_number": {
                    "type": "number",
                    "description": "First number for the calculation",
                },
                "second_number": {
                    "type": "number",
                    "description": "Second number for the calculation",
                },
            },
            "required": ["operator", "first_number", "second_number"],
        },
    },
}

Step 2: Set up the tool

This is a pure python function that matches the input schema provided in our calculator tool definition Notice the function name matches that provided to the LLM in the tool definition so the correct function is invoked with the appropriate parameters.

def calculator(operator: str, first_number: float, second_number: float) -> float:
    if operator == "add":
        return first_number + second_number
    elif operator == "subtract":
        return first_number - second_number
    elif operator == "multiply":
        return first_number * second_number
    elif operator == "divide":
        if second_number == 0:
            raise ValueError("Cannot divide by zero")
        return first_number / second_number
    else:
        raise ValueError(f"Unsupported operator: {operator}")

Step 3: Executing the tool call

When we call our LLM we provide our tool definitions (calculator) alongside the user's prompt. The LLM decides whether a tool is needed to answer the prompt and if our definition contains an appropriate tool to use. For example, suppose the first prompt asks "What is the capital of Scotland?" it is highly probable the LLM can answer this with a plain text answer from it's training data. However a subsequent prompt of "What is 1234 * 5678?", the LLM may choose to respond by generating a Tool Call, including the correct operator (mulitply) and operands (1234, 5678).

See agent_2_tools.py for this example in code. tool_calls is None if the LLM decided not to use one of our provided tools. On the contrary, the content will be None and the tools_call will be populated if the LLM identifies a tool available to it should be called, in this instance the calculator tool.

Step 4: Executing the tool in our code

It is up to our code to call the 'calculator' function with the parameters provided from the LLM and add this back to the context and inform the LLM of the response.

See excerpt from agent_2_tools.py :

   if ai_message.tool_calls:
        for tool_call in ai_message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)

            if function_name == "calculator":
                result = calculator(**function_args)

We extract 'function_name' and 'function_args' from the LLM response and use them to call the corresponding tool function, passing it the arguements.

Step 5: Reflect the results of the tool back to the LLM

We then pass the results of our tools output (the calculator function) to the LLM. We ensure we append a new message to the conversation (remember LLM's are stateless, so we need to provide the updated context). The new message defines the role of "tool" and includes the output in the content. This is then passed back to the LLM to determine if further tooling is required or there is enough information to merit a final reponse which will be returned to the users initial prompt.

See excerpt from agent_2_tools.py :

 messages.append(
                    {
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": str(result),  # the result of the calculator tool execution
                    }
                )

    final_response = completion(model="gpt-5-mini", messages=messages) # call the LLM with this new context

To summarise the journey of the "What is 1234 x 5678?" prompt from the user. We pass this along with the calculator tool definition to the LLM. The LLM responds to our Agent stating the calculator tool should be called with the multiply operator and the operands 1234 and 5678. Our Agent calls the calculator function we defined and adds the result to the context and once more calls the LLM with this updated context. The LLM then determines no further tool calls are required and responds with the final answer, our agent can then present this to the user.