"Tool calling" from LLM. Understanding hot it works

I am interested in learning how LLMs can understand requests requiring a "tool call".
In this post "Tool Calling" and Ollama, there is a nice description of how "Tool calling" works with Ollama.
The idea of this feature is that LLMs can have access to some tools (aka external APIs) and can call them to get extra information. To be able to do this, the LLM has to understand the current request, determine that this request could be forwarded to a tool, and parse the arguments. Canonical example is “What is the weather in ….“. This requires access to some live data because LLM doesn’t have any live updates inside it.
Here is a shorter example of the code from the original article:
#!/bin/bash
SERVICE_URL="http://localhost:11434"
read -r -d '' DATA <<- EOM
{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "This is Bob. We are doing math. Help us to add 2 and 3. BTW. Say hello to him"
}
],
"stream": false,
"tools": [
{
"function": {
"description": "Say hello to a given person with his name",
"name": "say_hello",
"parameters": {
"properties": {
"name": {
"description": "The name of the person",
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
}
},
"type": "function"
},
{
"function": {
"description": "Add two numbers",
"name": "add_numbers",
"parameters": {
"properties": {
"number1": {
"description": "The first number",
"type": "number"
},
"number2": {
"description": "The second number",
"type": "number"
}
},
"required": [
"number1",
"number2"
],
"type": "object"
}
},
"type": "function"
}
]
}
EOM
curl --no-buffer ${SERVICE_URL}/api/chat \
-H "Content-Type: application/json" \
-d "${DATA}" | jq '.'
I wanted to understand how effectively an LLM can recognize that a current message should be forwarded to a tool.
Additionally, it is interesting to know how an LLM will select a tool if the context can be interpreted for more than one tool.
What if the message is not just simple and short but also requires a "normal" text response?
I tested three models: qwen2.5:1.5b, mistral-nemo, and llama3.1 to see if the behavior is consistent or varies depending on the model.
Finally, I tried to determine if the results are always the same for a given message by repeating each request 100 times.
To automate this task i have created small golang application.
Here is the list of user messages I sent to each LLM:
Results for "qwen2.5:1.5b"
User message | Text response | say_hello call | add_numbers call |
Hello | ✓ | ||
Say hello to Bob | ✓ | ||
Add 2 and 3 | ✓ | ||
I came with my friend Bob. We will stay for 2 days. Say hello to him | ✓ | ||
This is Bob. He needs to add 2 and 3. Say hello to him | ✓ | ||
Help us to add 2 and 3 | ✓ | ||
This is Bob. We are doing math. Help us to add 2 and 3. BTW. Say hello to him | ✓ | ✓ | |
We need to know what is the smallest natural number. Can you tell us? | ✓ | ||
This is Bob. We are doing math. Help us to add 2 and 3. BTW. Say hello to him. And we need to know what is the smallest natural number. Can you tell us? | ✓ | ✓ |
There are some differences for other models.
User message | qwen2.5:1.5b | mistral-nemo | llama3.1 |
Hello | Text | Text | say_hello call |
Say hello to Bob | say_hello call | say_hello call | say_hello call |
Add 2 and 3 | add_numbers call | add_numbers call | add_numbers call |
I came with my friend Bob. We will stay for 2 days. Say hello to him | say_hello call | say_hello call | say_hello call |
This is Bob. He needs to add 2 and 3. Say hello to him | say_hello call | say_hello+add_numbers call | say_hello + add_numbers call |
Help us to add 2 and 3 | add_numbers call | add_numbers call | add_numbers call |
This is Bob. We are doing math. Help us to add 2 and 3. BTW. Say hello to him | say_hello + add_numbers call | say_hello + add_numbers call | say_hello+add_numbers call |
We need to know what is the smallest natural number. Can you tell us? | Text | Text | add_numbers call |
This is Bob. We are doing math. Help us to add 2 and 3. BTW. Say hello to him. And we need to know what is the smallest natural number. Can you tell us? | say_hello + add_numbers call | say_hello + add_numbers call | say_hello+add_numbers call |
Unexpected results
llama3.1 calls say_hello for a prompt "Hello". The "name" arguments is "Hello", sometimes empty.
llama3.1 recognizes the prompt "We need to know what is the smallest natural number. Can you tell us?" as add_numbers call with arguments 0 and 1. Why?
Conclusions
On a single prompt a tool call and contents return does not happen together. It will be or contents returned or a tool called (this is not proved, just what i see)
For a single prompt LLM can call more 1 tool
Subscribe to my newsletter
Read articles from Roman Gelembjuk directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
