Son Pham - Senior Software Engineer

Agentic RAG

Traditional RAG pipelines lose context when chunking documents and often retrieve irrelevant results. This guide covers two techniques that address these problems: Contextual RAG for smarter ingestion, and Agentic Retrieval for more accurate search.

1. Ingestion: Contextual RAG (Anthropic, Sep 2024)

Parsing PDFs to text is notoriously painful—tables, charts, and complex layouts break most pipelines. OCR fails. LlamaParse (paid) fails. What actually works: VLMs (GPT-4o, Gemini-2.5), or better yet, Agentic Docs Extraction (Andrew Ng @ Landing.AI).

2. Agentic Retrieval

Despite the fancy name, "agentic" retrieval is just a loop with three tools. Implementation takes about 10 lines:

Search: Return top 50–100 results with preview text only (not full content—just like Google snippets).
Fetch: Review the previews, then "click" to retrieve full content for relevant results.
Extending-read: Solves the "cut-in-the-middle" chunking problem by reading adjacent pages until the LLM determines it has enough context.

Note: The R-Agent runs as a separate LLM instance with its own context. This keeps the retrieval process isolated from your main LLM context.

3. Notes

The original Contextual RAG article combines vector search + BM25 + Cohere reranking—solid performance. I am just in the mode for playing around.
"Agentic," "vibe coding," "context engineering"—buzzwords I dislike, even when technically accurate.
There are some technique are even more awesome at: https://pageindex.ai/blog but it's for another weekend

Conclusion

The core idea is simple: give the LLM enough context to retrieve intelligently, rather than relying on naive similarity matching. Contextual ingestion embeds document-level understanding into each chunk. Agentic retrieval lets the model browse results like a human would. Both techniques are straightforward to implement—the hard part is parsing documents cleanly in the first place.