Building Powerful AI Agents with smolagents: A Minimalist Approach

blog preview

If you've been following our previous discussions on AI agents, you know that frameworks often add unnecessary complexity to what should be a straightforward process. While we've shown how to build agents from scratch without frameworks, there are times when a minimal abstraction layer can save you from writing repetitive boilerplate code.

That's where smolagents enters the ring - a refreshingly minimal library for building powerful AI agents in just a few lines of code, built by the nice folks at huggingface.

Why smolagents Stands Out

Unlike bloated frameworks that obscure the core agent functionality behind layers of abstraction, smolagents keeps things remarkably simple:

  • Truly minimal: The entire agent logic fits in ~1,000 lines of code
  • Maximum transparency: You can easily inspect and understand what's happening under the hood
  • First-class code execution: Agents write actions as Python code snippets rather than structured tool calls - more on that later.
  • Model-agnostic: Works with virtually any LLM from OpenAI, Anthropic, Hugging Face, or local models
  • Multi-modal support: Handles text, vision, video, and even audio inputs, similar to what we explored in our analyzing images with GPT-4o guide
  • Extensive tool compatibility: Use tools from LangChain, Anthropic's MCP, or even Hugging Face Spaces

Let's experience hands-on how smolagents provides the perfect balance between simplicity and power.

Getting Started with smolagents

To warm up, let's run a simple agent using smolagents.

First, install the package:

1pip install smolagents

Then define your agent, give it the tools it needs, and run it:

1from smolagents import CodeAgent, DuckDuckGoSearchTool
2from smolagents import OpenAIServerModel
3
4model = OpenAIServerModel(
5 model_id="gpt-4o",
6 api_base="https://api.openai.com/v1",
7 api_key=os.environ["OPENAI_API_KEY"],
8)
9
10agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
11
12answer = agent.run("What's the population in New York? What was it in 1980?")
13print(answer)

That's it! The agent will:

  1. Parse the question
  2. Decide to search for information about leopard speed and the Pont des Arts length
  3. Write and execute Python code to calculate the answer
  4. Return a complete, well-reasoned response

Note: Smolagents by default outputs a lot of debug information, helping you understand what's happening at each step. You can disable this by setting verbosity_level=LogLevel.ERROR in the agent constructor.

How Code Agents Work - and why they're better than traditional agents

Unlike traditional agents that use structured JSON for tool calls, smolagents' CodeAgent writes its actions as Python code snippets. This approach is particularly powerful when working with complex data structures, as we've seen when creating knowledge graphs with Neo4j. For example, if your question was about the speed of a leopard and the length of the Pont des Arts bridge in Paris, the agent might generate code like this:

1# The agent might generate code like this:
2requests_to_search = ["leopard top speed", "pont des arts length paris"]
3for request in requests_to_search:
4 print(f"Searching for: {request}")
5 results = web_search(request)
6 print(f"Results: {results}")
7
8# Then calculate the answer
9leopard_speed = 58 # km/h from search results
10bridge_length = 155 # meters from search results
11seconds_to_cross = (bridge_length / 1000) / (leopard_speed / 3600)
12print(f"A leopard would cross Pont des Arts in {seconds_to_cross:.2f} seconds")

This approach has been shown to:

  • Use 30% fewer steps (and thus 30% fewer LLM calls)
  • Achieve higher performance on difficult benchmarks
  • Provide more flexibility in how tools are used

See here, here and here.

Security Considerations

Since code execution can be (or actually is) a security concern, smolagents provides several options:

  1. A secure Python interpreter to run code more safely in your environment: Hugging Face created a new, limited Python interpreter which only allows to import whitelisted modules and even only whitelisted submodules. This is the default behavior.
  2. Sandboxed environments using E2B or Docker to isolate execution: This allows to execute the agent code in remote, more isolated environments
  3. The ability to review code before execution
1# Using a sandboxed environment
2from smolagents import CodeAgent, E2BSandbox
3
4agent = CodeAgent(
5 tools=[DuckDuckGoSearchTool()],
6 model=model,
7 sandbox=E2BSandbox()
8)

Comparing smolagents with Other Frameworks

If you've read our previous post on building agents from scratch, you might wonder why use any framework at all. The answer lies in the balance:

ApproachProsCons
Raw ImplementationComplete control, No dependenciesRepetitive boilerplate, Reinventing the wheel
smolagentsMinimal abstraction, Transparent code, FlexibleSmall learning curve
Large FrameworksMany pre-built componentsComplex abstractions, Hard to debug, Less flexible

smolagents handles the non-trivial parts (like maintaining consistent code formats throughout the system prompt, parser, and execution) while keeping everything else transparent and accessible.

Model Compatibility of smolagents

smolagents works with virtually any LLM:

1# Using OpenAI
2from smolagents import CodeAgent, OpenAIModel
3agent = CodeAgent(tools=[...], model=OpenAIModel(model="gpt-4o"))
4
5# Using Anthropic
6from smolagents import CodeAgent, AnthropicModel
7agent = CodeAgent(tools=[...], model=AnthropicModel(model="claude-3-opus"))
8
9# Using local models via Ollama
10from smolagents import CodeAgent, OllamaModel
11agent = CodeAgent(tools=[...], model=OllamaModel(model="llama3"))
12
13# Using 100+ LLMs via LiteLLM
14from smolagents import CodeAgent, LiteLLMModel
15agent = CodeAgent(tools=[...], model=LiteLLMModel(model="gpt-4o"))

Sharing Your Agents created with smolagents

One of the most powerful features of smolagents is the ability to share your agents on Hugging Face Hub:

1# Save your agent to Hub
2agent.push_to_hub("username/my_agent")
3
4# Load an agent from Hub
5from smolagents import CodeAgent
6agent = CodeAgent.from_hub("username/my_agent")

Command Line Interface

smolagents also provides convenient CLI commands:

1# Run a general-purpose agent
2smolagent "Plan a trip to Tokyo, Kyoto and Osaka between Mar 28 and Apr 7." \
3--model-type "HfApiModel" --model-id "Qwen/Qwen2.5-Coder-32B-Instruct" \
4--imports "pandas numpy" --tools "web_search"
5
6# Run a specialized web browsing agent
7webagent "go to xyz.com/men, get to sale section, click the first clothing item you see. Get the product details, and the price, return them." \
8--model-type "LiteLLMModel" --model-id "gpt-4o"

Note: To run the web-agent, you need to install the helium package.

Best Practices for Building Effective Agents

When building agents with smolagents, following some key principles can dramatically improve performance:

Simplify Your Workflow

The best agentic systems are often the simplest ones. Giving an LLM agency introduces some risk of errors, so simplifying your workflow is crucial:

  • Reduce the number of LLM calls: Whenever possible, group multiple tools into one. For example, instead of separate "travel distance API" and "weather API" calls, create a unified "return_spot_information" function.
  • Prefer deterministic functions: When possible, use deterministic functions rather than agentic decisions to reduce error risk. This approach reduces costs, latency, and error risk simultaneously!

Improve Information Flow to the LLM

Remember that your LLM engine is like an intelligent robot trpped in a room, with notes passed under the door as its only communication with the outside world. It won't know anything you don't explicitly tell it.

For example, here's a poorly designed weather tool:

1def get_weather_api(location: str, date_time: str) -> str:
2 """Returns the weather report."""
3 lon, lat = convert_location_to_coordinates(location)
4 date_time = datetime.strptime(date_time)
5 return str(get_weather_report_at_coordinates((lon, lat), date_time))

And here's a better version:

1def get_weather_api(location: str, date_time: str) -> str:
2 """
3 Returns the weather report.
4
5 Args:
6 location: the name of the place, formatted as "Place, City, Country"
7 date_time: the date and time, formatted as '%m/%d/%y %H:%M:%S'
8 """
9 lon, lat = convert_location_to_coordinates(location)
10 try:
11 date_time = datetime.strptime(date_time)
12 except Exception as e:
13 raise ValueError("Date format error: " + str(e))
14 temperature, rain_risk, wave_height = get_weather_report_at_coordinates((lon, lat), date_time)
15 return f"Weather for {location}: {temperature}°C, rain risk {rain_risk*100:.0f}%, waves {wave_height}m"

The improved version provides clear format requirements, better error handling, and a structured output format.

Tips when working with smolagents

Enable Planning Mode

smolagents provides a powerful planning feature that helps agents reflect on their progress and plan next steps:

1agent = CodeAgent(
2 tools=[search_tool, image_generation_tool],
3 model=HfApiModel(model_id="Qwen/Qwen2.5-72B-Instruct"),
4 planning_interval=3 # Activate planning every 3 steps
5)

This adds a supplementary planning step where the LLM updates its knowledge and reflects on what steps to take next, without making any tool calls.

Reduce logging output

By default, smolagents logs a lot of information to help you understandwhat's happening at each step. If you find this overwhelming,you can reduce the verbosity level:

1from smolagents import LogLevel
2
3agent = CodeAgent(
4 tools=[search_tool, image_generation_tool],
5 model=model,
6 verbosity_level=LogLevel.ERROR
7)

Change final answer output format

The smolagents agent uses the FinalAnswerTool to generate the final, user-facing answer. By default, this tool attempts to create a natural language answer. However, we found that sometimes the LLM answered with a JSON object.

Thankfully, it's quite easy to change the output format:

1agent = CodeAgent(
2 tools=[DuckDuckGoSearchTool()],
3 model=model,
4)
5
6# Make sure to include the `{{task}}` placeholder
7agent.prompt_templates["final_answer"][
8 "post_messages"
9] = "Based on the above, please provide an answer to the following user task: {{task}}. Always answer in user-readable text, don't use json."

Streaming the agent output

As LLM outputs are generally speaking quite slow, chat applications often provide the answer as message stream - as this gives the user immediate response - even though the full answer is not yet available.

Smolagents also allows you to stream the output. Simply set the stream parameter to True. This gives a generator that yields messages as they become available:

1result = agent.run("What's the population in New York? What was it in 1980?", stream=True)
2for message in result:
3 print(message)

Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

  • Cost control
  • Data privacy
  • Excellent performance - adjusted specifically for your intended use

Conclusion

smolagents strikes a good balance between simplicity and power. It handles the complex parts of agent development while keeping everything transparent and accessible.

By focusing on code-based actions rather than structured tool calls, smolagents enables more flexible, efficient agents that can tackle complex tasks with fewer steps and higher success rates.

If you've been frustrated by the complexity of larger frameworks or tired of writing the same boilerplate code for every agent project, smolagents might be exactly what you're looking for.

Further Reading

More information on our managed RAG solution?
To Pondhouse AI
More tips and tricks on how to work with AI?
To our Blog