Advanced Self-Made Agent: Building a Smarter, Multi-Step AI Agent

In our previous blog post on creating AI agents without frameworks, we highlighted the simplicity of designing an agent by directly exposing tools like query_database and search_wikipedia. We showcased how your agent only needs to:

Understand available tools
Decide when and how to use them
Maintain relevant context
Arrange the final answer

That initial design covered the fundamental loop of processing user input and optionally calling tools. Now, let's take things further by adding more explicit state management and a multi-step processing loop. This approach helps your agent to:

You can also refine your prompting approach by exploring DSPy: Build Better AI Systems with Automated Prompt Optimization, allowing for more targeted instructions within these multiple iterations.

Dynamically decide if it needs more information from the user
Run multiple steps of "thinking" before settling on a final answer
Exit gracefully when the maximum iterations are reached

The new code still relies on the same functions and tool definitions from the last blog post. However, we've introduced an AgentState enum and a multi-step processing loop, allowing the AI agent to refine its answers.

Note: This is a continuation of our previous tutorial on building AI agents from scratch. If you haven't read it yet, we recommend starting there to understand the basics before diving into this advanced agent.

What Makes This Agent Better?

State Management via Enum We capture states like THINKING, DONE, ERROR, or even NEED_MORE_INFO for clarity. Instead of storing ephemeral states in local variables, an enum ensures we always know exactly where we are in the conversation cycle.
Iteration Loop for Complex Reasoning By repeating a think→act→evaluate cycle up to max_iterations times, we let our agent handle more elaborate tasks, verify partial results, and explore multiple reasoning paths.
Interactive Follow-Ups The agent can return a "need more info" response if the conversation context is missing something essential. This allows for more natural user–agent collaboration, instead of abruptly ending in an error message.
Consistent Tooling We still use the same tool descriptors from the first tutorial (query_database and search_wikipedia). This ensures that even though our agent logic has evolved, we don't have to reinvent existing capabilities.

Key Highlights of the Code

Below is the conceptual overview of our advanced agent. Notice how we start with an enum, a multi-iteration loop, and a step-by-step approach for each iteration:

AgentState Definition We capture possible states in an enum:
- THINKING – The agent is still hypothesizing an answer or deciding on next steps
- DONE – The agent has arrived at a final answer
- ERROR – An error occurred or the agent couldn't continue
- NEED_MORE_INFO – The agent requires additional clarifications from the user before proceeding

Loop with Max Iterations If the previous agent ended after a single pass, this advanced one allows for repeated attempts at answering:

1while self.iteration_count < self.max_iterations:
2    # 1. Agent decides next step
3    # 2. Possibly calls tools
4    # 3. Handles results or asks user for more info

This loop ensures the agent thoroughly processes or clarifies issues before finalizing an answer.

Asking for Clarification If in the middle of a conversation the LLM realizes it can't proceed with the provided data, it can transition to NEED_MORE_INFO. When the user responds with more context, we resume the loop.
Separation of "Think and Act" We keep the logic for choosing an action (TOOL_CALL, ASK_USER, FINAL_ANSWER) in a separate _think_and_act() method. This keeps your main loop clear, well-structured, and easy to debug.

Why Multi-Step Agents Are Useful

Imagine a user interacting with your AI about a complex database topic:

In the initial iteration, the agent checks if it already contains the necessary data. If not, it calls query_database.
The tool call might produce partial or ambiguous data. The agent iterates once more, deciding whether to refine the query or ask the user a clarifying question.
Only after processing multiple steps of context will your AI produce a final answer or realize more info is needed.

This advanced approach ensures the user is never stuck with a half-baked reply or a single-step error message. The multi-iteration loop fosters a natural back-and-forth that resembles real human conversation.

By building on the existing ideas in our previous blog post and seamlessly integrating the same tools, you now have a more versatile AI agent. You still maintain the clarity of a framework-free approach to debugging and extensibility, while benefiting from iterative reasoning cycles.

For anyone serious about reliability - and not just one-shot answers - this is the next step in building robust AI systems.

Detailed Agent Implementation

Below is a fully working code example of an advanced multi-step agent. It uses an enum to define possible states, a loop to allow multiple "thinking" iterations, and a simple mechanism for asking the user for more info if needed. The code is heavily commented so you can copy it directly to your project and follow the explanatory notes.

Note: In the code example below, we use mock functions for the used tools (query_database and search_wikipedia). Please have a look at our previous blog post for the original tool definitions.

1from enum import Enum
2import time
3import json
4from typing import Any, Dict, Tuple
5
6# Mocked tools for example purposes
7def query_database(query: str) -> str:
8    return json.dumps({"mock_result": f"Database queried with: {query}"})
9
10def search_wikipedia(query: str) -> str:
11    return json.dumps({"mock_result": f"Wikipedia searched for: {query}"})
12
13
14class AgentState(Enum):
15    THINKING = "thinking"
16    DONE = "done"
17    ERROR = "error"
18    NEED_MORE_INFO = "need_more_info"
19
20
21class Agent:
22    def __init__(self, max_iterations: int = 5, think_time: float = 0.5):
23        self.client = OpenAI()
24        self.max_iterations = max_iterations
25        self.think_time = think_time  # Time between iterations
26        self.messages = []
27        self.iteration_count = 0
28
29    def process_with_loop(self, user_input: str) -> Dict:
30        """
31        Process user input with multiple iterations if needed.
32        Returns both final answer and execution trace.
33        """
34        self.iteration_count = 0
35        trace = []
36
37        # Initial prompt
38        self.messages.append({"role": "user", "content": user_input})
39
40        while self.iteration_count < self.max_iterations:
41            self.iteration_count += 1
42
43            try:
44                # Get agent's thoughts and next action
45                state, response = self._think_and_act()
46
47                # Record this iteration
48                trace.append(
49                    {
50                        "iteration": self.iteration_count,
51                        "state": state.value,
52                        "response": response,
53                    }
54                )
55
56                # Handle different states
57                if state == AgentState.DONE:
58                    return {
59                        "status": "success",
60                        "answer": response,
61                        "iterations": trace,
62                        "iteration_count": self.iteration_count,
63                    }
64
65                elif state == AgentState.ERROR:
66                    return {
67                        "status": "error",
68                        "error": response,
69                        "iterations": trace,
70                        "iteration_count": self.iteration_count,
71                    }
72
73                elif state == AgentState.NEED_MORE_INFO:
74                    return {
75                        "status": "need_more_info",
76                        "question": response,
77                        "iterations": trace,
78                        "iteration_count": self.iteration_count,
79                    }
80
81                # Add thinking time between iterations
82                time.sleep(self.think_time)
83
84            except Exception as e:
85                return {
86                    "status": "error",
87                    "error": str(e),
88                    "iterations": trace,
89                    "iteration_count": self.iteration_count,
90                }
91
92        return {
93            "status": "max_iterations_reached",
94            "iterations": trace,
95            "iteration_count": self.iteration_count,
96            "final_state": response,
97        }
98
99    def _think_and_act(self) -> Tuple[AgentState, str]:
100        """
101        Single iteration of thinking and acting.
102        Returns state and response.
103        """
104        completion = self.client.chat.completions.create(
105            model="gpt-4o",
106            messages=[
107                *self.messages,
108                {
109                    "role": "system",
110                    "content": f"""
111                    This is iteration {self.iteration_count} of {self.max_iterations}.
112                    Determine if you:
113                    1. Need to use tools to gather more information
114                    2. Need to ask the user for clarification
115                    3. Have enough information to provide final answer
116
117                    Format your response as:
118                    THOUGHT: your reasoning process
119                    ACTION: TOOL_CALL or ASK_USER or FINAL_ANSWER
120                    CONTENT: your tool call, question, or final answer
121                    """,
122                },
123            ],
124            tools=tools,
125        )
126
127        response = completion.choices[0].message
128        self.messages.append(response)
129
130        # Parse the response
131        content = response.content
132        if "ACTION: TOOL_CALL" in content:
133            # Handle tool calls through function calling
134            if response.tool_calls:
135                tool_results = []
136                for tool_call in response.tool_calls:
137                    result = self.execute_tool(tool_call)
138                    tool_results.append(result)
139                    self.messages.append(
140                        {
141                            "role": "tool",
142                            "tool_call_id": tool_call.id,
143                            "content": result,
144                        }
145                    )
146                return AgentState.THINKING, "Executed tools: " + ", ".join(tool_results)
147
148        elif "ACTION: ASK_USER" in content:
149            # Extract question from CONTENT section
150            question = content.split("CONTENT:")[1].strip()
151            return AgentState.NEED_MORE_INFO, question
152
153        elif "ACTION: FINAL_ANSWER" in content:
154            # Extract final answer from CONTENT section
155            answer = content.split("CONTENT:")[1].strip()
156            return AgentState.DONE, answer
157
158        return AgentState.ERROR, "Could not determine next action"
159
160    def execute_tool(self, tool_call: Any) -> str:
161        """
162        Execute a tool based on the LLM's decision.
163
164        Args:
165            tool_call: The function call object from OpenAI's API
166
167        Returns:
168            str: JSON-formatted result of the tool execution
169        """
170        try:
171            # Extract function details
172            function_name = tool_call.function.name
173            function_args = json.loads(tool_call.function.arguments)
174
175            # Log tool usage (helpful for debugging)
176            print(f"Executing tool: {function_name} with args: {function_args}")
177
178            # Execute the appropriate tool
179            if function_name == "query_database":
180                result = query_database(function_args["query"])
181            elif function_name == "search_wikipedia":
182                result = search_wikipedia(function_args["query"])
183            else:
184                result = json.dumps({"error": f"Unknown tool: {function_name}"})
185
186            # Log tool result (helpful for debugging)
187            print(f"Tool result: {result}")
188
189            return result
190
191        except json.JSONDecodeError:
192            return json.dumps({"error": "Failed to parse tool arguments"})
193        except Exception as e:
194            return json.dumps({"error": f"Tool execution failed: {str(e)}"})
195
196
197# Usage example:
198def interact_with_agent():
199    agent = Agent(max_iterations=5)
200
201    while True:
202        user_input = input("\nYour question (or 'quit' to exit): ")
203        if user_input.lower() == "quit":
204            break
205
206        result = agent.process_with_loop(user_input)
207
208        if result["status"] == "success":
209            print(f"\nAnswer: {result['answer']}")
210        elif result["status"] == "need_more_info":
211            print(f"\nNeed more information: {result['question']}")
212        else:
213            print(f"\nError or max iterations reached: {result['status']}")
214
215        # Optional: Show iteration trace
216        print("\nExecution trace:")
217        for step in result["iterations"]:
218            print(f"\nIteration {step['iteration']} ({step['state']}):")
219            print(step["response"])

To run the agent, simply call interact_with_agent(). You can then input questions or statements to see how the agent processes them over multiple iterations. The code is designed to be easily extendable with more complex tools, additional states, or advanced reasoning logic.

Explanation In Detail

AgentState Enum We define the possible states the agent can have at the end of each iteration:
- THINKING – The agent is still deciding its next step.
- DONE – We have a final answer to deliver to the user.
- ERROR – The agent encountered some problem that prevents further progress.
- NEED_MORE_INFO – The agent cannot proceed without additional details from the user.
process_with_loop() This is the main method that handles multiple "thinking" iterations. It appends user input to the conversation, calls _think_and_act() each time, checks the state, and either asks for more info, returns a final answer, or stops if max iterations are reached.
_think_and_act() Called within each iteration. In a real scenario, we would pass our conversation so far to the LLM, parse its "ACTION:" directive, and respond accordingly - calling the necessary tool, asking the user for more data, or returning a final answer.
execute_tool() If the LLM requests a tool call in "TOOL_CALL," we parse out arguments and run the corresponding Python function (for example, searching Wikipedia or querying a database). This method returns a JSON string with the results or an error message.
interact_with_agent() A basic read–eval–print loop (REPL) to demonstrate how the agent interacts with real user input. It feeds each user query into process_with_loop(), then displays the final or partial results, plus an optional iteration trace.

By following the above code and commentary, you can incorporate additional logic - like actual OpenAI calls, real tool integrations, or more advanced planning. The key benefit is flexibility: you control the logic, error handling, and iteration loops rather than having them dictated by a large framework.

For an even deeper dive into integrating external knowledge, check out How to: RAG with Azure AI Search. It shows how connecting additional data sources can bolster your agent's accuracy and context.

What's next?

While this agent gives you already a lot of flexibility, and should give you a good starting point for more complex AI agents, there are still ways to improve it further. One of the most prominent ones being adding 'memory'. Meaning that the agent can remember previous interactions and use them to improve its answers. This will be done as part of the next tutorial.

Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

Cost control
Data privacy
Excellent performance - adjusted specifically for your intended use

Get your free LLM training guide