A LinkedIn post crossed every engineering feed last week with a single number.
PageIndex (Vectorless RAG): 98.7% on FinanceBench. GPT-4o Search: 31%.
The number is real. The framework is real. The architecture is genuinely new — a reasoning-based retrieval over hierarchical document structure, with no embeddings, no chunking, and no vector database. The GitHub repo has crossed twenty-three thousand stars. The paper is on arXiv. The company behind it, VectifyAI, has published a reproducible evaluation harness.
The number is also incomplete in a way that matters if you are paying for the infrastructure.
This piece takes the claim apart on engineering terms. What PageIndex actually does. What the 98.7% includes and excludes. What the marketing leaves out — latency, cost shape, document-class fit, the failure mode at scale. And the hybrid architecture that the honest independent reviewers keep landing on, which is what any team running production retrieval against a real corpus will end up building.
What PageIndex actually does
The architecture is two phases.
Indexing. Parse a PDF or Markdown document. Rather than chunking it into arbitrary 1,000-character spans, segment it along natural section boundaries — the table of contents, headings, page ranges. Build a tree where each node is a section, holding { title, id, page range, short summary }. No embeddings get computed. No vector store gets touched.
Retrieval. When a query arrives, an LLM walks the tree. It reads the root summary, decides which child to descend into, repeats at the next level, and stops when it has the leaf node containing the answer. The "search" is sequential reasoning across the document's actual structure, not similarity over embedded chunks.
Conceptually this is the way a financial analyst opens a 10-K filing. You do not embed the document — you look at the table of contents, jump to the section that looks right, scan its headers, jump to the page that has the number. PageIndex is automating that exact behaviour with an LLM as the navigator.
It is genuinely a different architecture, not a wrapper over vector search.
The 98.7% is real — and narrowly defined
The benchmark is FinanceBench: a standard test of open-book question-answering over SEC filings — 10-K, 10-Q, 8-K. The evaluation runs the system over a curated set of financial questions, each with a known correct answer and the exact page it should have come from.
VectifyAI's evaluation harness is public (VectifyAI/Mafin2.5-FinanceBench). The 98.7% is reproducible. Traditional vector RAG on the same benchmark scores around 50%. GPT-4o with web search scores 31%.
Three things to hold steady about that number before extrapolating from it.
One. The benchmark is single-document question answering. Each query is paired with a specific filing and the system answers from that filing. There is no corpus retrieval — no "find the right document among 50,000 contracts" — only "find the right page inside this one document."
Two. The document class is highly structured. SEC filings have stable, deep tables of contents, predictable section numbering, regulator-driven taxonomy. Tree traversal works extraordinarily well on these. It also works on legal contracts, academic textbooks, technical manuals, and anything with a real structural hierarchy.
Three. The benchmark scores accuracy. It does not score latency, cost per query, or token consumption. None of those numbers appear in the leaderboard.
The 98.7% is a real win for a real class of problem. The mistake is reading it as "vector RAG is over." The benchmark does not measure the problem vector RAG is actually solving in most production deployments.
What the marketing does not say
Three trade-offs are absent from the LinkedIn post that any team planning to deploy this needs to model honestly.
Latency
The independent tests that exist all land in the same range: two to five seconds per query for PageIndex, depending on document complexity. Vector similarity search runs in milliseconds.
The reason is structural, not implementation-level. Every step of the tree traversal is an LLM call. A query that needs to descend three levels through a 10-K is making three serial LLM calls before the system can even read the answer page, and another to synthesise the response. None of those calls are parallelisable in the obvious way — the descent decisions depend on each prior decision.
For a chatbot, two to five seconds is acceptable. For a real-time customer-facing application — a banking portal answering balance enquiries, a support widget retrieving article snippets — it is not. The latency budget has to be designed around the retrieval architecture, not retrofitted to it.
Cost shape
Vector RAG amortises its cost. You pay once to embed the corpus — a one-shot job — and then the per-query cost is essentially the vector store lookup (cheap) plus one LLM call to generate the answer. Embedding storage is small. The unit economics work.
PageIndex inverts this. There is no embedding step, so no upfront cost. But every retrieval is multiple LLM calls. On a high-traffic deployment, you are running inference for retrieval itself, not just answer generation. Independent tests consistently report that PageIndex costs more per query than vector RAG by a meaningful multiple, though VectifyAI has not published token-consumption numbers and the public repository's README does not include cost analysis.
The cost shape we wrote about for multimodal RAG pipelines applies here too: the architecture's true cost only becomes visible at production volume, and by then it is structural.
Corpus scale
The benchmark is one-document QA. The architecture does not extend gracefully to thousands of documents.
If you have a corpus of fifty thousand support articles, an enterprise wiki, a legal-document repository, or any large unstructured collection, PageIndex is not the right tool. Vector search excels at this exact problem — which document, among many, is relevant — because it is doing approximate nearest-neighbour over embeddings, which scales to billions of items with sub-100ms latency.
Tree traversal cannot do this. You would have to first identify the relevant document by some other mechanism (which is, in practice, vector search), and only then hand the document to PageIndex for the within-document retrieval. Which is the hybrid pattern the independent reviewers keep arriving at, and which the marketing does not surface.
The complex-PDF cliff
The open-source PageIndex parser is the standard PDF library path. It handles clean, well-structured filings cleanly. It does not handle scanned documents, complex multi-column layouts, embedded tables that span pages, or charts with critical numeric data.
The repository's README acknowledges this and points to VectifyAI's paid cloud service for enhanced OCR and tree-building. That is a fair business model, but it is a closed surface — the marketing claim "no vector database" implicitly becomes "no vector database, plus a paid hosted parser for any document that is not a clean 10-K."
Where tree-based retrieval is the right answer
Subtract the corpus-scale problem and the latency-sensitive problem, and a specific class of workload remains where PageIndex genuinely dominates.
SEC filing analysis. Equity research, compliance review, financial due diligence. Long, structured documents. Accuracy matters more than latency. Cost per query is acceptable because the alternative is an analyst doing it by hand.
Legal contract review. Same structural properties — table of contents, defined sections, predictable hierarchy. The "find the indemnification clause" or "what does this say about termination" pattern is exactly what tree traversal handles.
Academic and technical manual QA. Engineering reference texts, regulatory handbooks, internal SOPs at depth. Stable structure. High accuracy bar.
Audit-defensible single-document retrieval. Because every retrieval is traceable back to specific page and section references, the answer is verifiable against the source. For regulated workflows where the auditor has to be able to trace any conclusion back to a specific paragraph, this is a strong fit. We laid out the verifiability pattern in Designing Strict RBAC for Enterprise Knowledge Bases — tree-based retrieval is a structural cousin of that approach.
Where vector RAG remains the right answer
Corpus retrieval at scale. Anything where the first question is "which document," not "where in this document."
Latency-sensitive interactive applications. Customer-facing chat, real-time support, anything with a sub-second response budget.
Cost-sensitive high-volume workloads. Where the per-query cost difference between vector and tree-traversal compounds into a budget problem inside the first quarter of production.
Heterogeneous content. Knowledge bases mixing structured documents, ad-hoc memos, transcripts, tickets, code snippets, and emails. None of these have the kind of clean hierarchical structure tree-based retrieval depends on.
The hybrid that actually ships
The honest independent reviewers — including the engineers who built and ran their own A/B tests against real corpora — keep arriving at the same conclusion. The production architecture is not vector versus tree. It is vector into tree.
The pattern, in five lines:
- The user query hits an intent classifier (cheap, fast, Haiku-tier) that decides whether the question is a corpus-discovery question or a within-document question
- Corpus-discovery questions route to vector search over the embedded corpus — fast, cheap, returns the top one or two documents
- Within-document questions, or the documents identified by step two, route to tree-based traversal for the deep retrieval inside the document
- The synthesis layer combines the retrieved evidence and generates the answer, with citations back to the page and section that tree traversal provides natively
- The audit trail records both the corpus-discovery path and the tree-traversal path — so every claim in the answer is traceable to a specific source location
This architecture inherits the strengths of both — vector for scale, tree for precision — and lets the cost and latency budget be governed by the type of question, not the limitations of a single retrieval mechanism.
It is also the only architecture we would put in front of an enterprise client who has to defend the system to a regulator. The marketing line "no vector layer" is correct only for the within-document phase. Production-grade retrieval needs both.
What this teaches us about enterprise scaling
Two things.
One. A 98.7% benchmark is a real engineering achievement, and PageIndex deserves the attention it is getting. The marketing reading — that this is "the end of chunking" or that vector RAG is obsolete — is not what the benchmark shows. The benchmark shows that for a specific class of single-document question answering over structured filings, reasoning-based traversal outperforms similarity-based retrieval. That is a real result. It does not mean every retrieval problem has the same shape.
Two. The market for enterprise RAG is bifurcating along workload class, not along architecture preference. Teams shipping production AI into regulated environments are going to need to articulate which retrieval mechanism they are using for which kind of question, and why, and what the cost and latency shape of each looks like at production volume. Anyone who shows up with "we use [single architecture] for everything" is going to lose to the team that picked the right mechanism for each problem and documented the trade-offs explicitly.
The 98.7% piece of news is a useful forcing function for that conversation. It is not a replacement for the conversation.
Companion content
- Designing Strict RBAC for Enterprise Knowledge Bases — the audit-trail discipline that pairs with tree-based citation
- Optimising Cold-Start Latency and Cost of Multimodal RAG Pipelines — the cost shape under production volume, applied to a different retrieval architecture
- RAG with Amazon Bedrock Knowledge Bases — production-grade vector RAG with native Bedrock integration
- Retrieve First, Reason Second — the architectural primer on why retrieval shape determines reasoning quality
- Why 95% of Enterprise AI Pilots Fail at the Deployment Phase — the broader pattern this is one specific case of
How to engage
We design and ship retrieval architectures for regulated enterprises — vector, tree-based, hybrid, and the audit-grade observability that makes any of them defensible. If you are evaluating tree-based retrieval against your specific corpus and workload, we will help you do the honest comparison before you commit to the architecture. Talk to us at creativeminds.dev/contact.
