Advanced RAG: Recursive Retrieval with llamaindex
When it comes to Retrieval Augmented Generation (RAG), the quality of both the created document index and the retrieval process is crucial for getting good and consistent answers based on your documentation. One especially challenging aspect is how to model relationships between text chunks of your documents.
As a quick reminder, in text-based RAG, documentation is first parsed into text, which is then divided into smaller chunks. We need to divide the full text into smaller portions, as LLMs have a maximum input length. Additionally, they are charged per token, so we want to keep the input as short as possible. These chunks are then indexed and used for retrieval during the generation process.
Herein lies the challenge: If we divide the text into smaller chunks of texts, then use these chunks for retrieval, how can we make sure to retrieve all relevant information, which might be scattered across multiple chunks? This challenge is even more pronounced when the text contains tables and complex structures - as tables mostly need different handling than flowing text. How to capture the relationship between a table and accompanying text?
That's where recursive retrieval comes into play. Recursive retrieval allows RAG to generate more coherent and contextually relevant responses by recursively retrieving and incorporating relevant information from retrieved document nodes.
In this guide, we will introduce you to the concept of recursive retrieval and demonstrate it hands-on by using llamaindex.
Note: This guide is heavily influenced by this very good tutorial of llamaindex. We'll add some additional context and explanation to make it more accessible - but full credit goes to the llamaindex team.
What is Recursive Retrieval?
To understand why recursive retrieval is such a powerful concept, let's look at it in detail. During normal retrieval, we use the user query to find potentially relevant documents - required for our LLM model to answer the mentioned user query. This is mostly done by comparing the semantic meaning of the user query with the semantic meaning of the documents in our index. (This sentence is not 100% correct, as we simply compare embeddings of the query and the documents - which is not exactly the semantic meaning - but a good enough approximation for now).
When looking at how we create these documents, we can see that we divide the full texts of our source documents into smaller chunks, which we then index. This is done to make sure that we can retrieve relevant information from our documents, even if the full document is too long to be processed by our LLM model.
However, this approach has a downside: If the relevant information is spread across multiple chunks, we might not be able to retrieve all relevant information with a single retrieval. If we look at tables, for example, oftentimes the 'semantic meaning' of a table is not captured by the table itself, but by the text surrounding it.
Recursive retrieval solves this problem by recursively looking at not only the semantically most similar documents, but also document chunks which might be related to these documents. This way, we can make sure to capture all relevant information, even if it is spread across multiple chunks.
This means, recursive retrieval consists of two main components:
- A way to identify relationships between document chunks
- A way to recursively retrieve related document chunks
While there are multiple ways to implement recursive retrieval, we will focus on how to implement it with llamaindex, as it provides a proven implementation of recursive retrieval (and is great for RAG in general).
What is llamaindex?
Llamaindex is a Python or TypeScript library for building LLM applications in the area of "Context Augmentation" (which basically means RAG). It provides tools for indexing documents, retrieving relevant documents and document chunks, and for generating answers based on the retrieved documents.
More specifically, llamaindex provides these main components:
- Data connectors to ingest existing data from their native source and format. These could be APIs, PDFs, SQL, and (much) more.
- Data indexes to structure your data in intermediate representations that are easy and performant for LLMs to consume.
- Engines provide natural language access to your data. For example:
- Query engines are powerful retrieval interfaces for knowledge-augmented output.
- Chat engines are conversational interfaces for multi-message, “back and forth” interactions with your data.
- Data agents are LLM-powered knowledge workers augmented by tools, from simple helper functions to API integrations and more. This could be LangChain, Flask, Docker, ChatGPT, or many others.
- Application integrations tie llamaindex back into the rest of your ecosystem.
More information about llamaindex can be found in their absolutely brilliant documentation.
How to Implement Recursive Retrieval with llamaindex
The main tools required to implement recursive retrieval with llamaindex
are Data Indexes
and Query Engines
. Before getting stuck in theory,
let's directly jump into a hands-on example.
Before getting started, you can download the example data from our website.
To use camelot to extract tables from PDFs, we first need the following system dependencies:
Then, we need to install llamaindex and its dependencies:
Note: There was quite a big update from llamaindex 0.9 to 0.10. Best to remove the old version and then install the latest version.
Next, we can import the required libraries and define which OpenAI models
we want to use. Change the OPENAI_API_KEY
to your own API key.
Optionally, you can set up debug logging to see exactly which prompts llamaindex is sending to the LLM and which responses it gets back.
Now we are ready to extract the text from our PDFs. Note that this does not load the tables in the file as really tabular data, but just as plain text. We'll see how to handle tables better in the next step.
The docs object now contains the text of the PDF as well as some metadata like page numbers. As you can see, llamaindex makes it really easy to load documents and extract the text from them.
As mentioned above, while we already got the text from the tables in the PDF, this method of simply parsing tables as text is not ideal and often misses important information in the tables. Mainly due to the fact that the PDF standard does not define a table as a specific object - it is just text and lines. Normal text parsers have a hard time extracting these information.
However, there is a tool called camelot
which is specifically designed
to recognize tables in PDFs and extract them as tabular data - like
a pandas dataframe.
The above snippet extracts the tables from the PDF and stores them in a list of pandas dataframes. We can now use these dataframes to create a more structured representation of the tables in our index.
Ok, so far we have extracted the information from our source document
- but how can we make them accessible for our LLM? Meaning - how can we search for relevant information during query time?
This is where llamaindex' QueryEngine
comes into play. It abstracts the
data and provides an interface to "connect" these data to an LLM. Using
our parsed documents and asking questions against them is as easy as the
following lines of code:
There are multiple query engines for various data sources, like SQL, CSV,
and more. The PandasQueryEngine
is specifically designed to work with
pandas dataframes. It works as follows:
- During query time, the query engine sends the user query along with the
df.head()
to the LLM. The LLM is asked to return python code to answer the user's question.
This is quite powerful, as the LLM can therefore indirectly work with the data in the dataframe - without needing to see the whole dataframe.
As we know how to query the tabular data, we can now link these table data
to the flow-text. For that, we are going to build a VectorStoreIndex
which is a special index that can store and retrieve document chunks based
on their semantic similarity. Before diving into the code, let's outline
the strategy.
Llamaindex uses "Nodes" to represent the data in the index. These nodes can have relationships to other nodes. For example, a node representing the full text of a document can have relationships to nodes representing the tables in the document. Therefore, we can do something like this:
- Create a node for each of the tables, with either a short description or - better - related text, so that we can retrieve them based on the user query.
- Create nodes from the textual data of the PDF.
- Combine the nodes of the tables and the nodes of the textual data into one index.
Note: In the example above, we manually describe the table nodes. In a real-world scenario, you would probably want to extract this information automatically by sending parts of the tables to an LLM and asking it to describe the table. Or alternatively, use the table surrounding text to describe the table.
Now we have a VectorStoreIndex
which contains the nodes of the tables
and the nodes of the textual data. We can now use this index to create
a RecursiveRetriever
and a RetrieverQueryEngine
to query the index.
Using the latter, we again get a handy interface to ask questions via LLM.
Note: In the example above, we use a response_synthesizer
to make
the response of the LLM nicer. This is optional and can be omitted. More
information about the response_synthesizer
can be found
here
To use the interface, we just call the query
method.
And that's it! We have now implemented recursive retrieval with llamaindex.
Conclusion
In conclusion, RAG enhanced by recursive retrieval and llamaindex offers a significant leap forward in how we approach information retrieval and generation tasks. This guide has walked you through the complexities and intricacies of breaking down documentation into manageable chunks, the challenges in ensuring comprehensive information retrieval, and the innovative solution that recursive retrieval presents. By implementing this with llamaindex, we demonstrated not just a theoretical concept but a practical application that can be integrated into your projects to enhance the accuracy and contextuality of responses.
The journey from understanding the limitations of traditional retrieval methods to executing a hands-on example with llamaindex highlights the transformative potential of recursive retrieval in AI-driven applications. This technology allows us to capture and utilize scattered information across multiple document chunks, ensuring that even the most complex queries are answered with the highest degree of relevance and completeness.
As we continue to push the boundaries of what's possible with AI and machine learning, the integration of recursive retrieval and llamaindex into RAG processes represents a significant step towards more intelligent, efficient, and context-aware systems. Whether you're a developer, a researcher, or an enthusiast, the advancements discussed in this guide open new avenues for exploration and innovation in the field of artificial intelligence.
We encourage you to dive deeper into the concepts, experiment with the code samples provided, and consider how recursive retrieval can be applied to your own projects. The possibilities are as limitless as the knowledge that fuels them. With tools like llamaindex and the power of recursive retrieval, the future of AI looks more promising and exciting than ever.
Further Reading
- Increase RAG performance using ColBERT reranker
- How to test your RAG pipeline?
- Semantic search for databases
Interested in how to train your very own Large Language Model?
We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:
- Cost control
- Data privacy
- Excellent performance - adjusted specifically for your intended use