Advanced Self-Made Agent: Building a Smarter, Multi-Step AI Agent
In our previous blog post on creating AI
agents without frameworks, we highlighted the simplicity of designing an
agent by directly exposing tools like query_database
and
search_wikipedia
. We showcased how your agent only needs to:
- Understand available tools
- Decide when and how to use them
- Maintain relevant context
- Arrange the final answer
That initial design covered the fundamental loop of processing user input and optionally calling tools. Now, let's take things further by adding more explicit state management and a multi-step processing loop. This approach helps your agent to:
You can also refine your prompting approach by exploring DSPy: Build Better AI Systems with Automated Prompt Optimization, allowing for more targeted instructions within these multiple iterations.
- Dynamically decide if it needs more information from the user
- Run multiple steps of "thinking" before settling on a final answer
- Exit gracefully when the maximum iterations are reached
The new code still relies on the same functions and tool definitions from
the last blog post. However, we've introduced an AgentState
enum and a
multi-step processing loop, allowing the AI agent to refine its answers.
Note: This is a continuation of our previous tutorial on building AI agents from scratch. If you haven't read it yet, we recommend starting there to understand the basics before diving into this advanced agent.
What Makes This Agent Better?
-
State Management via Enum We capture states like
THINKING
,DONE
,ERROR
, or evenNEED_MORE_INFO
for clarity. Instead of storing ephemeral states in local variables, an enum ensures we always know exactly where we are in the conversation cycle. -
Iteration Loop for Complex Reasoning By repeating a think→act→evaluate cycle up to
max_iterations
times, we let our agent handle more elaborate tasks, verify partial results, and explore multiple reasoning paths. -
Interactive Follow-Ups The agent can return a "need more info" response if the conversation context is missing something essential. This allows for more natural user–agent collaboration, instead of abruptly ending in an error message.
-
Consistent Tooling We still use the same tool descriptors from the first tutorial (
query_database
andsearch_wikipedia
). This ensures that even though our agent logic has evolved, we don't have to reinvent existing capabilities.
Key Highlights of the Code
Below is the conceptual overview of our advanced agent. Notice how we start with an enum, a multi-iteration loop, and a step-by-step approach for each iteration:
-
AgentState Definition We capture possible states in an enum:
THINKING
– The agent is still hypothesizing an answer or deciding on next stepsDONE
– The agent has arrived at a final answerERROR
– An error occurred or the agent couldn't continueNEED_MORE_INFO
– The agent requires additional clarifications from the user before proceeding
-
Loop with Max Iterations If the previous agent ended after a single pass, this advanced one allows for repeated attempts at answering:
This loop ensures the agent thoroughly processes or clarifies issues before finalizing an answer.
-
Asking for Clarification If in the middle of a conversation the LLM realizes it can't proceed with the provided data, it can transition to
NEED_MORE_INFO
. When the user responds with more context, we resume the loop. -
Separation of "Think and Act" We keep the logic for choosing an action (
TOOL_CALL
,ASK_USER
,FINAL_ANSWER
) in a separate_think_and_act()
method. This keeps your main loop clear, well-structured, and easy to debug.
Why Multi-Step Agents Are Useful
Imagine a user interacting with your AI about a complex database topic:
- In the initial iteration, the agent checks if it already contains the
necessary data. If not, it calls
query_database
. - The tool call might produce partial or ambiguous data. The agent iterates once more, deciding whether to refine the query or ask the user a clarifying question.
- Only after processing multiple steps of context will your AI produce a final answer or realize more info is needed.
This advanced approach ensures the user is never stuck with a half-baked reply or a single-step error message. The multi-iteration loop fosters a natural back-and-forth that resembles real human conversation.
By building on the existing ideas in our previous blog post and seamlessly integrating the same tools, you now have a more versatile AI agent. You still maintain the clarity of a framework-free approach to debugging and extensibility, while benefiting from iterative reasoning cycles.
For anyone serious about reliability—and not just one-shot answers—this is the next step in building robust AI systems.
Detailed Agent Implementation
Below is a fully working code example of an advanced multi-step agent. It uses an enum to define possible states, a loop to allow multiple "thinking" iterations, and a simple mechanism for asking the user for more info if needed. The code is heavily commented so you can copy it directly to your project and follow the explanatory notes.
Note: In the code example below, we use mock functions for the used
tools (query_database
and search_wikipedia
). Please have a look at
our previous blog post for the original
tool definitions.
To run the agent, simply call interact_with_agent()
. You can then input
questions or statements to see how the agent processes them over multiple
iterations. The code is designed to be easily extendable with more
complex tools, additional states, or advanced reasoning logic.
Explanation In Detail
-
AgentState Enum We define the possible states the agent can have at the end of each iteration:
- THINKING – The agent is still deciding its next step.
- DONE – We have a final answer to deliver to the user.
- ERROR – The agent encountered some problem that prevents further progress.
- NEED_MORE_INFO – The agent cannot proceed without additional details from the user.
-
process_with_loop() This is the main method that handles multiple "thinking" iterations. It appends user input to the conversation, calls _think_and_act() each time, checks the state, and either asks for more info, returns a final answer, or stops if max iterations are reached.
-
_think_and_act() Called within each iteration. In a real scenario, we would pass our conversation so far to the LLM, parse its "ACTION:" directive, and respond accordingly—calling the necessary tool, asking the user for more data, or returning a final answer.
-
execute_tool() If the LLM requests a tool call in "TOOL_CALL," we parse out arguments and run the corresponding Python function (for example, searching Wikipedia or querying a database). This method returns a JSON string with the results or an error message.
-
interact_with_agent() A basic read–eval–print loop (REPL) to demonstrate how the agent interacts with real user input. It feeds each user query into process_with_loop(), then displays the final or partial results, plus an optional iteration trace.
By following the above code and commentary, you can incorporate additional logic—like actual OpenAI calls, real tool integrations, or more advanced planning. The key benefit is flexibility: you control the logic, error handling, and iteration loops rather than having them dictated by a large framework.
For an even deeper dive into integrating external knowledge, check out How to: RAG with Azure AI Search. It shows how connecting additional data sources can bolster your agent's accuracy and context.
What's next?
While this agent gives you already a lot of flexibility, and should give you a good starting point for more complex AI agents, there are still ways to improve it further. One of the most prominent ones being adding 'memory'. Meaning that the agent can remember previous interactions and use them to improve its answers. This will be done as part of the next tutorial.
Interested in how to train your very own Large Language Model?
We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:
- Cost control
- Data privacy
- Excellent performance - adjusted specifically for your intended use