CVS Technology — Hybrid RAG, 5 Parallel Retrievers, RRF Fusion & Abstention

Ingestion

A five-stage ingestion pipeline turns one document into searchable evidence.

CVS connects directly to where your knowledge already lives — SharePoint, Google Drive, Confluence, S3, and on-premise file servers — then parses every format through triple OCR and vision: PDFs, scans, DOCX, PPTX, XLSX, and images. Tables, figures, and page anchors survive parsing intact so the original evidence can be returned later, not paraphrased away.

Smart chunking produces semantically coherent fragments rather than blind fixed-width splits. Each chunk is enriched with entities, metadata, document diffs, and temporal facts, then written to a multi-layer index simultaneously: a pgvector store for semantic recall, a BM25F full-text index for exact terms, a Neo4j temporal knowledge graph for relationships, plus metadata and temporal indexes. One pass, five retrieval surfaces.

Connectors for SharePoint, Google Drive, Confluence, S3, and local file shares — no copy-paste migrations
Triple OCR plus vision enrichment across PDF, scanned PDF, DOCX, PPTX, XLSX, and images
Semantic chunking that preserves tables, figures, and page anchors as first-class evidence
Multi-layer indexing into pgvector, BM25F, Neo4j temporal knowledge graph, metadata, and temporal stores

**A five-stage ingestion pipeline turns one document into searchable evidence..** CVS connects directly to where your knowledge already lives — SharePoint, Google Drive, Confluence, S3, and on-premise file servers — then parses every format through triple OCR and vision: PDFs, scans, DOCX, PPTX, XLSX, and images. Tables, figures, and page anchors survive parsing intact so the original evidence can be returned later, not paraphrased away.

Routing

An intent router sends each query down the cheapest path that can answer it.

Not every question deserves a full reasoning run. A central intent router classifies each query and dispatches it into one of four lanes: an instant, zero-token cache hit; a standard fast hybrid search; a deep multi-document synthesis; or an ultra reasoning path that decomposes the question into a directed acyclic graph of sub-queries.

This token-saving cascade means simple questions never wake up an expensive LLM, while genuinely hard, multi-document questions get the full decomposition treatment. The result is predictable latency, predictable cost, and no per-query token surprises — the cascade alone cuts LLM spend by 85–95% versus naive RAG.

Instant lane: zero-token cache for repeated and trivially answerable queries
Standard lane: fast hybrid search for the majority of everyday questions
Deep lane: multi-document synthesis when one source is not enough
Ultra lane: decomposition DAG that breaks complex questions into auditable sub-steps

**An intent router sends each query down the cheapest path that can answer it..** Not every question deserves a full reasoning run. A central intent router classifies each query and dispatches it into one of four lanes: an instant, zero-token cache hit; a standard fast hybrid search; a deep multi-document synthesis; or an ultra reasoning path that decomposes the question into a directed acyclic graph of sub-queries.

Retrieval

5 parallel retrievers, fused by RRF, reranked by a cross-encoder.

CVS runs five retrievers at once — vector search, knowledge-graph traversal, BM25F full text, temporal retrieval, and metadata filtering. Each sees the corpus differently, so they catch different evidence: semantics, relationships, exact terms, time validity, and structured attributes. No single retriever has to be perfect.

Their ranked outputs merge through Reciprocal Rank Fusion (k=60), then a cross-encoder reranks the fused candidates to assemble a tight evidence set for the answer builder. This is why CVS reaches 94.7% answer accuracy versus the 67–73% typical of single-retriever systems like basic RAG or Copilot.

Vector (pgvector) + Neo4j knowledge graph + BM25F + temporal + metadata, all in parallel
Reciprocal Rank Fusion (k=60) merges five independent rankings into one consensus
Cross-encoder reranking sharpens the final evidence set before answer generation
94.7% answer accuracy versus 67–73% for single-retriever systems

**5 parallel retrievers, fused by RRF, reranked by a cross-encoder..** CVS runs five retrievers at once — vector search, knowledge-graph traversal, BM25F full text, temporal retrieval, and metadata filtering. Each sees the corpus differently, so they catch different evidence: semantics, relationships, exact terms, time validity, and structured attributes. No single retriever has to be perfect.

Abstention

Adversarial abstention: the system knows when it does not know.

After retrieval, CVS asks one question before answering: is the evidence sufficient? If yes, it answers with inline citations and writes the interaction to a tamper-evident audit log. If no, it abstains plainly instead of fabricating a plausible-sounding response — the single behavior that kills most enterprise RAG pilots.

An abstention is not a dead end. The unanswered question routes to the designated subject-matter expert, their verified answer is captured, and the knowledge base is patched so the next person gets an instant response. In production this drives hallucination below 2% versus roughly 19% for ordinary RAG.

Confidence gate evaluates evidence sufficiency before any answer is generated
Sufficient evidence → cited answer plus a full audit-log entry
Insufficient evidence → clear abstention, then expert escalation
Captured expert answers patch the base — under 2% hallucination versus ~19% for ordinary RAG

**Adversarial abstention: the system knows when it does not know..** After retrieval, CVS asks one question before answering: is the evidence sufficient? If yes, it answers with inline citations and writes the interaction to a tamper-evident audit log. If no, it abstains plainly instead of fabricating a plausible-sounding response — the single behavior that kills most enterprise RAG pilots.

How a document becomes a verified, citable answer.

A five-stage ingestion pipeline turns one document into searchable evidence.

An intent router sends each query down the cheapest path that can answer it.

5 parallel retrievers, fused by RRF, reranked by a cross-encoder.

Adversarial abstention: the system knows when it does not know.

Run CVS against your hardest question.