Agentic RAG
Agentic RAG
Traditional RAG pipelines lose context when chunking documents and often retrieve irrelevant results. This guide covers two techniques that address these problems: Contextual RAG for smarter ingestion, and Agentic Retrieval for more accurate search.
1. Ingestion: Contextual RAG (Anthropic, Sep 2024)
Parsing PDFs to text is notoriously painful—tables, charts, and complex layouts break most pipelines. OCR fails. LlamaParse (paid) fails. What actually works: VLMs (GPT-4o, Gemini-2.5), or better yet, Agentic Docs Extraction (Andrew Ng @ Landing.AI).
2. Agentic Retrieval
Despite the fancy name, "agentic" retrieval is just a loop with three tools. Implementation takes about 10 lines:
- Search: Return top 50–100 results with preview text only (not full content—just like Google snippets).
- Fetch: Review the previews, then "click" to retrieve full content for relevant results.
- Extending-read: Solves the "cut-in-the-middle" chunking problem by reading adjacent pages until the LLM determines it has enough context.
Note: The R-Agent runs as a separate LLM instance with its own context. This keeps the retrieval process isolated from your main LLM context.
3. Notes
- The original Contextual RAG article combines vector search + BM25 + Cohere reranking—solid performance. I am just in the mode for playing around.
- "Agentic," "vibe coding," "context engineering"—buzzwords I dislike, even when technically accurate.
- There are some technique are even more awesome at: https://pageindex.ai/blog but it's for another weekend
Conclusion
The core idea is simple: give the LLM enough context to retrieve intelligently, rather than relying on naive similarity matching. Contextual ingestion embeds document-level understanding into each chunk. Agentic retrieval lets the model browse results like a human would. Both techniques are straightforward to implement—the hard part is parsing documents cleanly in the first place.