Will the Larger Context Window Kill RAG?
"640 KB ought to be enough for anybody" — Bill Gates, 1981
"There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days" — Eric Schmidt, 2010
"Information is the oil of the 21st century, and analytics is the combustion engine." — Peter Sondergaard, 2011
"The context window will kill RAG" — Every second AI specialist, 2024
Disclaimer:
There is no solid proof that the quotes mentioned here are accurate. The text below is purely the author’s own imagination. I assumed that a wonderful future is just around the corner, where a super-duper chip will be invented, resolving memory issues, LLMs will become cheaper, faster, and the hallucination problem will be solved. Therefore, this text should not be taken as an ultimate truth.
Lately, there’s been a lot of buzz around the arrival of LLMs with large context windows — millions of tokens. Some people are already saying that this will make RAG obsolete.
But is that really the case?
Are we so sure that larger context windows will always keep up with the exponential growth of data? According to estimates, the total amount of data in the world doubles every two to three years. At some point, even these huge context windows might start looking a bit too cramped.
Let’s say we’re talking about a million tokens right now — that’s roughly 2,000 pages of text. Think of 200 contracts, each a hundred pages long. Not that impressive if we’re talking about large-scale company archives. Even if we’re talking about 10 million tokens, that’s 20,000 pages of English text. What about Slavic or Eastern languages?
So, we’re not talking about fitting an entire corporate database into a single context just yet. Instead, it’s more about reducing the requirement for search accuracy. You can just grab a broad set of a few hundred relevant documents, and let the model do the fact extraction on its own.
But here’s what’s important. We’re still in the early days of RAG. Right now, RAG handles information retrieval well but struggles with more complex analytical tasks, like the ones in the infamous FinanceBench. And if we’re talking about creative tasks that need deep integration with unique, user-specific content, RAG is still hovering at the edge of what’s possible. In other words, at this stage, a million tokens feel like more of a “buffer” than a solution.
But the larger context windows might give RAG a major boost! Here’s why:
• Tackling more complex tasks. As context windows grow, RAG will be able to handle much more sophisticated analytical and creative challenges, weaving internal data together to produce insights and narratives.
• Blending internal and external data. With larger context, RAG will be able to mix internal company data with real-time info from the web, unlocking new possibilities for hybrid use cases.
• Keeping interaction context intact. Longer contexts mean keeping the entire conversation history alive, turning interactions into richer dialogues that are deeply rooted in “your” data.
So, what’s next? Once people and companies have tools to find and analyze all their stored data, they’re going to start digitizing everything. Customer calls, online and offline behavior patterns, competitor info, logs from every single meeting… You name it. Data volumes will start skyrocketing again, and no context window — no matter how big — will ever be able to capture it all.
And that’s when we’ll be heading into the next RAG evolution, which will need even more advanced techniques to keep up.