When building retrieval-augmented generation (RAG) systems, most implementations focus on retrieving and synthesizing text chunks. This works well for narrative content, but research papers, technical reports, and business documents often contain their most valuable information in tables. Standard RAG systems struggle here. Tables get split across chunks, their structure is lost, and queries like “compare the metrics in Table 3” return incomplete or unusable answers.
I have been working on a library called Tablemind that takes a different approach: treating tables as first-class citizens in the RAG pipeline. Rather than an afterthought, tables are preserved in their entirety, indexed with their structure intact, and actively retrieved when relevant to a query.
The Problem with Tables in RAG
To understand why a table-first approach matters, consider what happens in a typical RAG pipeline when you ask about data in a table. RAG systems chunk documents to work around LLM context limits — you retrieve only the most relevant fragments rather than feeding entire documents. This works for narrative text, but creates several problems for tables and structured data:
- Chunking destroys structure: A 20-row table might be split across 3–4 chunks during ingestion. When retrieved, the LLM sees only partial rows and columns, losing the complete picture.
- Reference chains break: A text chunk might mention “as shown in Table 2,” but the table itself lives in a different chunk with no semantic overlap. The retrieval system fails to connect the reference to its target.
- Semantic search misses exact data: Vector embeddings capture semantic meaning, but table queries often require exact-match operations (“What’s the F1 score for the baseline model?”). The embedding may not rank the table chunk highly enough for retrieval.
- Figure captions get orphaned: Similar issues affect figures and their captions — the visual content is described in one chunk, referenced in another, and the connection is lost.
- Aggregation queries fail: Questions like “What’s the average accuracy across all models?” or “Which approach performed best?” require seeing the entire table. Partial chunks can’t support these calculations or comparisons.
- Column headers get separated: When a table splits mid-way, later chunks may contain data rows without their column headers, making the values meaningless or ambiguous to the LLM.
- Context-dependent values become unclear: Numbers like “95.2” are meaningless without knowing the column (is it accuracy? precision? F1?) and row context (which model? which dataset?).
- Document-level insights remain hidden: Some queries require understanding relationships across multiple sections or synthesizing information from the entire document — “What are the key innovations in this paper?” or “Summarize the methodology.” Chunk-based retrieval often misses the big picture, leaving the LLM with fragments that can’t support holistic analysis.
A Table-First Architecture
The library addresses these issues through several design choices:
1. Structure-Preserving Parsing with Docling
Instead of treating PDFs as raw text streams, the library uses Docling for document parsing. Docling understands document structure — it knows which items are tables, which are figures, and how sections relate to each other.
Tables are extracted as complete Markdown representations with their row/column structure preserved. This means even if the table never gets properly retrieved via vector search, you can access it directly via table number indexing.
2. Multi-Format Support
Documents come in various formats, and the library handles them uniformly:
- PDF — Full Docling support with optional OCR for scanned documents
- Markdown — Native parsing with table extraction
- HTML — Table structure preserved from web content
- DOCX — Word documents with tables intact
- Text — Plain text with optional Markdown table detection
This flexibility matters because you might ingest papers from arXiv (PDF), documentation from GitHub (Markdown), and internal reports (DOCX) into the same knowledge base.
3. Vector + Keyword + Agentic Retrieval
The RAG system implements a three-pronged retrieval strategy. Semantic search using sentence-transformers embeddings handles conceptual queries. Keyword search (BM25) handles exact-term queries like model names and specific metrics. Hybrid search combines both with configurable weights.
The key innovation is agentic table fetching. After initial retrieval, an LLM analyzes the results to detect missing references. If a query mentions “Table 4”, but that table is not in the retrieved chunks, the system automatically fetches it by table number. This reference detection happens automatically — the query itself implies what’s needed.
4. Table Prioritization
When a query appears to focus on tabular data (detected via keywords like “table”, “metrics”, “results”, “compare”), the system applies a 3× score multiplier to table chunks during retrieval. Table embeddings often score lower on semantic search because their content is mostly numbers and short headers. The boost compensates for this disadvantage, ensuring tables are not consistently outranked by prose chunks that happen to use similar words.
5. Complete Table Context in Answers
When generating answers, the LLM receives table data in its original Markdown format, not as fragmented text. The prompt explicitly instructs it to:
- Analyze the full table structure
- Report all variants (important when tables compare multiple models)
- Present data in structured formats when helpful
- Cite table sources properly
This avoids a common failure mode where the LLM picks one row from a multi-variant table and ignores the rest.
6. Autonomous Query Mode Detection
One of the thorniest challenges in RAG is knowing when chunk-based retrieval isn’t enough. The library includes auto mode detection: an LLM analyzes each incoming query to determine whether it requires standard retrieval or full document review.
Chunk-based retrieval is designed to work within LLM context limits — you retrieve only relevant fragments. But some questions need the full picture. Broad questions that require synthesis across the entire document — “Does the paper’s narrative flow logically from problem to solution?”, “Are the claims properly supported throughout the document?”, “How could the structure be improved for clarity?” — trigger full review mode, where the system hierarchically summarizes each section and synthesizes a comprehensive answer. This requires more tokens and processing time, but captures relationships that chunk-based retrieval misses. Specific, targeted questions (“What’s the F1 score in Table 3?”) use efficient vector retrieval. This decision happens automatically — the user doesn’t need to know which mode is being used.
Technical Implementation
The library is organized into three core modules:
docling_parser.py — Document Parsing
Handles all document parsing and table/figure extraction. Returns a ParsedDocument object with:
- Full text and Markdown representations
- Structured table data (as Markdown, with row/column counts)
- Figure captions and descriptions
- Section hierarchy
rag_ingestion.py — Vector Database Ingestion
Manages document ingestion into Qdrant (or other vector databases):
- Intelligent chunking with Docling’s HybridChunker, preserving context and coherence
- Embedding generation with configurable models
- Metadata-rich chunks (table/figure flags, headings, captions)
- Efficient incremental updates with stable document IDs based on SHA256 content hashing
rag_library.py — Query System
The main RAG pipeline:
- Multi-mode retrieval (semantic, keyword, hybrid)
- Cross-encoder reranking for relevance
- Agentic table/figure reference detection
- LLM-based answer generation with multi-LLM support
- Full document review mode with hierarchical section summarization
- Auto query mode detection (autonomous selection between standard RAG and full review)
Web Interface
For interactive use, web_app.py provides a streaming chat interface.
The system supports dynamic updates to the LLM provider configuration via environment variables,
so models can be changed without restarting the server.
- Real-time streaming
- Conversation history
- Memory compaction
- Semantic / Keyword / Hybrid / Auto search
- Table prioritization toggle
- Query mode selection
- File watcher & auto-refresh
- Agentic mode indicator
- Source panel
Practical Usage
The library offers multiple ways to query documents. For specific questions about facts, figures, or table data, use standard RAG mode with semantic or hybrid search. This chunk-based approach works within LLM context limits, retrieving only relevant fragments for fast, efficient answers. Enable table prioritization and agentic fetching when your query mentions tables or requires comparison across multiple rows.
For broader analysis requiring synthesis across the entire document, use full document review mode. The library supports both automatic and manual query mode selection. By default, auto mode detection analyzes each query to determine which approach is appropriate. You can also manually override this when you want explicit control — human-in-the-loop mode lets you force standard RAG for faster results, or full review for comprehensive analysis.
For large-context LLMs, you can skip vector retrieval entirely and pass full document context directly with selected tables.
Limitations and Considerations
No approach is universally optimal. Here are some trade-offs:
- Large tables — Very wide tables (20+ columns) can still be challenging for LLMs to parse accurately.
- Table number inference — If a document lacks explicit table numbers, the system falls back to sequential counting.
- Multi-file references — Cross-document table references aren’t currently handled.
- OCR quality — Scanned PDFs require OCR, which introduces potential errors in table structure.
- Full review processing — Complete document review requires reading and summarizing all sections, which is slower and more expensive than chunk-based retrieval. Auto mode detection helps by only triggering this when necessary, but broad queries will inherently take longer to process.
The library is designed to be modular — you can use just the parsing component, or combine it with your own RAG pipeline.
Conclusion
Standard RAG systems work well for narrative text, but they break down when documents contain their most valuable information in tables and structures. Tables get fragmented, references get orphaned, and document-level insights remain hidden.
Tablemind addresses these gaps through a table-first architecture: complete table preservation, agentic reference detection, intelligent query mode routing, and full document review for questions that require synthesis beyond chunk-based retrieval. The auto mode detection autonomously chooses between fast vector retrieval and comprehensive review, with manual override available when human judgment is needed.
You can use the complete pipeline, or pull out individual components (parser, ingestion, or query system) for your own applications.