Advanced Self-Made Agent: Building a Smarter, Multi-Step AI Agent

blog preview

In our previous blog post on creating AI agents without frameworks, we highlighted the simplicity of designing an agent by directly exposing tools like query_database and search_wikipedia. We showcased how your agent only needs to:

  1. Understand available tools
  2. Decide when and how to use them
  3. Maintain relevant context
  4. Arrange the final answer

That initial design covered the fundamental loop of processing user input and optionally calling tools. Now, let's take things further by adding more explicit state management and a multi-step processing loop. This approach helps your agent to:

You can also refine your prompting approach by exploring DSPy: Build Better AI Systems with Automated Prompt Optimization, allowing for more targeted instructions within these multiple iterations.

  • Dynamically decide if it needs more information from the user
  • Run multiple steps of "thinking" before settling on a final answer
  • Exit gracefully when the maximum iterations are reached

The new code still relies on the same functions and tool definitions from the last blog post. However, we've introduced an AgentState enum and a multi-step processing loop, allowing the AI agent to refine its answers.

Note: This is a continuation of our previous tutorial on building AI agents from scratch. If you haven't read it yet, we recommend starting there to understand the basics before diving into this advanced agent.

What Makes This Agent Better?

  1. State Management via Enum We capture states like THINKING, DONE, ERROR, or even NEED_MORE_INFO for clarity. Instead of storing ephemeral states in local variables, an enum ensures we always know exactly where we are in the conversation cycle.

  2. Iteration Loop for Complex Reasoning By repeating a think→act→evaluate cycle up to max_iterations times, we let our agent handle more elaborate tasks, verify partial results, and explore multiple reasoning paths.

  3. Interactive Follow-Ups The agent can return a "need more info" response if the conversation context is missing something essential. This allows for more natural user–agent collaboration, instead of abruptly ending in an error message.

  4. Consistent Tooling We still use the same tool descriptors from the first tutorial (query_database and search_wikipedia). This ensures that even though our agent logic has evolved, we don't have to reinvent existing capabilities.

Key Highlights of the Code

Below is the conceptual overview of our advanced agent. Notice how we start with an enum, a multi-iteration loop, and a step-by-step approach for each iteration:

  1. AgentState Definition We capture possible states in an enum:

    • THINKING – The agent is still hypothesizing an answer or deciding on next steps
    • DONE – The agent has arrived at a final answer
    • ERROR – An error occurred or the agent couldn't continue
    • NEED_MORE_INFO – The agent requires additional clarifications from the user before proceeding
  2. Loop with Max Iterations If the previous agent ended after a single pass, this advanced one allows for repeated attempts at answering:

    1while self.iteration_count < self.max_iterations:
    2 # 1. Agent decides next step
    3 # 2. Possibly calls tools
    4 # 3. Handles results or asks user for more info

    This loop ensures the agent thoroughly processes or clarifies issues before finalizing an answer.

  3. Asking for Clarification If in the middle of a conversation the LLM realizes it can't proceed with the provided data, it can transition to NEED_MORE_INFO. When the user responds with more context, we resume the loop.

  4. Separation of "Think and Act" We keep the logic for choosing an action (TOOL_CALL, ASK_USER, FINAL_ANSWER) in a separate _think_and_act() method. This keeps your main loop clear, well-structured, and easy to debug.

Why Multi-Step Agents Are Useful

Imagine a user interacting with your AI about a complex database topic:

  1. In the initial iteration, the agent checks if it already contains the necessary data. If not, it calls query_database.
  2. The tool call might produce partial or ambiguous data. The agent iterates once more, deciding whether to refine the query or ask the user a clarifying question.
  3. Only after processing multiple steps of context will your AI produce a final answer or realize more info is needed.

This advanced approach ensures the user is never stuck with a half-baked reply or a single-step error message. The multi-iteration loop fosters a natural back-and-forth that resembles real human conversation.

By building on the existing ideas in our previous blog post and seamlessly integrating the same tools, you now have a more versatile AI agent. You still maintain the clarity of a framework-free approach to debugging and extensibility, while benefiting from iterative reasoning cycles.

For anyone serious about reliability—and not just one-shot answers—this is the next step in building robust AI systems.

Detailed Agent Implementation

Below is a fully working code example of an advanced multi-step agent. It uses an enum to define possible states, a loop to allow multiple "thinking" iterations, and a simple mechanism for asking the user for more info if needed. The code is heavily commented so you can copy it directly to your project and follow the explanatory notes.

Note: In the code example below, we use mock functions for the used tools (query_database and search_wikipedia). Please have a look at our previous blog post for the original tool definitions.

1from enum import Enum
2import time
3import json
4from typing import Any, Dict, Tuple
5
6# Mocked tools for example purposes
7def query_database(query: str) -> str:
8 return json.dumps({"mock_result": f"Database queried with: {query}"})
9
10def search_wikipedia(query: str) -> str:
11 return json.dumps({"mock_result": f"Wikipedia searched for: {query}"})
12
13
14class AgentState(Enum):
15 THINKING = "thinking"
16 DONE = "done"
17 ERROR = "error"
18 NEED_MORE_INFO = "need_more_info"
19
20
21class Agent:
22 def __init__(self, max_iterations: int = 5, think_time: float = 0.5):
23 self.client = OpenAI()
24 self.max_iterations = max_iterations
25 self.think_time = think_time # Time between iterations
26 self.messages = []
27 self.iteration_count = 0
28
29 def process_with_loop(self, user_input: str) -> Dict:
30 """
31 Process user input with multiple iterations if needed.
32 Returns both final answer and execution trace.
33 """
34 self.iteration_count = 0
35 trace = []
36
37 # Initial prompt
38 self.messages.append({"role": "user", "content": user_input})
39
40 while self.iteration_count < self.max_iterations:
41 self.iteration_count += 1
42
43 try:
44 # Get agent's thoughts and next action
45 state, response = self._think_and_act()
46
47 # Record this iteration
48 trace.append(
49 {
50 "iteration": self.iteration_count,
51 "state": state.value,
52 "response": response,
53 }
54 )
55
56 # Handle different states
57 if state == AgentState.DONE:
58 return {
59 "status": "success",
60 "answer": response,
61 "iterations": trace,
62 "iteration_count": self.iteration_count,
63 }
64
65 elif state == AgentState.ERROR:
66 return {
67 "status": "error",
68 "error": response,
69 "iterations": trace,
70 "iteration_count": self.iteration_count,
71 }
72
73 elif state == AgentState.NEED_MORE_INFO:
74 return {
75 "status": "need_more_info",
76 "question": response,
77 "iterations": trace,
78 "iteration_count": self.iteration_count,
79 }
80
81 # Add thinking time between iterations
82 time.sleep(self.think_time)
83
84 except Exception as e:
85 return {
86 "status": "error",
87 "error": str(e),
88 "iterations": trace,
89 "iteration_count": self.iteration_count,
90 }
91
92 return {
93 "status": "max_iterations_reached",
94 "iterations": trace,
95 "iteration_count": self.iteration_count,
96 "final_state": response,
97 }
98
99 def _think_and_act(self) -> Tuple[AgentState, str]:
100 """
101 Single iteration of thinking and acting.
102 Returns state and response.
103 """
104 completion = self.client.chat.completions.create(
105 model="gpt-4o",
106 messages=[
107 *self.messages,
108 {
109 "role": "system",
110 "content": f"""
111 This is iteration {self.iteration_count} of {self.max_iterations}.
112 Determine if you:
113 1. Need to use tools to gather more information
114 2. Need to ask the user for clarification
115 3. Have enough information to provide final answer
116
117 Format your response as:
118 THOUGHT: your reasoning process
119 ACTION: TOOL_CALL or ASK_USER or FINAL_ANSWER
120 CONTENT: your tool call, question, or final answer
121 """,
122 },
123 ],
124 tools=tools,
125 )
126
127 response = completion.choices[0].message
128 self.messages.append(response)
129
130 # Parse the response
131 content = response.content
132 if "ACTION: TOOL_CALL" in content:
133 # Handle tool calls through function calling
134 if response.tool_calls:
135 tool_results = []
136 for tool_call in response.tool_calls:
137 result = self.execute_tool(tool_call)
138 tool_results.append(result)
139 self.messages.append(
140 {
141 "role": "tool",
142 "tool_call_id": tool_call.id,
143 "content": result,
144 }
145 )
146 return AgentState.THINKING, "Executed tools: " + ", ".join(tool_results)
147
148 elif "ACTION: ASK_USER" in content:
149 # Extract question from CONTENT section
150 question = content.split("CONTENT:")[1].strip()
151 return AgentState.NEED_MORE_INFO, question
152
153 elif "ACTION: FINAL_ANSWER" in content:
154 # Extract final answer from CONTENT section
155 answer = content.split("CONTENT:")[1].strip()
156 return AgentState.DONE, answer
157
158 return AgentState.ERROR, "Could not determine next action"
159
160 def execute_tool(self, tool_call: Any) -> str:
161 """
162 Execute a tool based on the LLM's decision.
163
164 Args:
165 tool_call: The function call object from OpenAI's API
166
167 Returns:
168 str: JSON-formatted result of the tool execution
169 """
170 try:
171 # Extract function details
172 function_name = tool_call.function.name
173 function_args = json.loads(tool_call.function.arguments)
174
175 # Log tool usage (helpful for debugging)
176 print(f"Executing tool: {function_name} with args: {function_args}")
177
178 # Execute the appropriate tool
179 if function_name == "query_database":
180 result = query_database(function_args["query"])
181 elif function_name == "search_wikipedia":
182 result = search_wikipedia(function_args["query"])
183 else:
184 result = json.dumps({"error": f"Unknown tool: {function_name}"})
185
186 # Log tool result (helpful for debugging)
187 print(f"Tool result: {result}")
188
189 return result
190
191 except json.JSONDecodeError:
192 return json.dumps({"error": "Failed to parse tool arguments"})
193 except Exception as e:
194 return json.dumps({"error": f"Tool execution failed: {str(e)}"})
195
196
197# Usage example:
198def interact_with_agent():
199 agent = Agent(max_iterations=5)
200
201 while True:
202 user_input = input("\nYour question (or 'quit' to exit): ")
203 if user_input.lower() == "quit":
204 break
205
206 result = agent.process_with_loop(user_input)
207
208 if result["status"] == "success":
209 print(f"\nAnswer: {result['answer']}")
210 elif result["status"] == "need_more_info":
211 print(f"\nNeed more information: {result['question']}")
212 else:
213 print(f"\nError or max iterations reached: {result['status']}")
214
215 # Optional: Show iteration trace
216 print("\nExecution trace:")
217 for step in result["iterations"]:
218 print(f"\nIteration {step['iteration']} ({step['state']}):")
219 print(step["response"])

To run the agent, simply call interact_with_agent(). You can then input questions or statements to see how the agent processes them over multiple iterations. The code is designed to be easily extendable with more complex tools, additional states, or advanced reasoning logic.

Explanation In Detail

  1. AgentState Enum We define the possible states the agent can have at the end of each iteration:

    • THINKING – The agent is still deciding its next step.
    • DONE – We have a final answer to deliver to the user.
    • ERROR – The agent encountered some problem that prevents further progress.
    • NEED_MORE_INFO – The agent cannot proceed without additional details from the user.
  2. process_with_loop() This is the main method that handles multiple "thinking" iterations. It appends user input to the conversation, calls _think_and_act() each time, checks the state, and either asks for more info, returns a final answer, or stops if max iterations are reached.

  3. _think_and_act() Called within each iteration. In a real scenario, we would pass our conversation so far to the LLM, parse its "ACTION:" directive, and respond accordingly—calling the necessary tool, asking the user for more data, or returning a final answer.

  4. execute_tool() If the LLM requests a tool call in "TOOL_CALL," we parse out arguments and run the corresponding Python function (for example, searching Wikipedia or querying a database). This method returns a JSON string with the results or an error message.

  5. interact_with_agent() A basic read–eval–print loop (REPL) to demonstrate how the agent interacts with real user input. It feeds each user query into process_with_loop(), then displays the final or partial results, plus an optional iteration trace.

By following the above code and commentary, you can incorporate additional logic—like actual OpenAI calls, real tool integrations, or more advanced planning. The key benefit is flexibility: you control the logic, error handling, and iteration loops rather than having them dictated by a large framework.

For an even deeper dive into integrating external knowledge, check out How to: RAG with Azure AI Search. It shows how connecting additional data sources can bolster your agent's accuracy and context.

What's next?

While this agent gives you already a lot of flexibility, and should give you a good starting point for more complex AI agents, there are still ways to improve it further. One of the most prominent ones being adding 'memory'. Meaning that the agent can remember previous interactions and use them to improve its answers. This will be done as part of the next tutorial.


Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

  • Cost control
  • Data privacy
  • Excellent performance - adjusted specifically for your intended use

Further reading

More information on our managed RAG solution?
To Pondhouse AI
More tips and tricks on how to work with AI?
To our Blog