Tablemind RAG Tables Docling

When building retrieval-augmented generation (RAG) systems, most implementations focus on retrieving and synthesizing text chunks. This works well for narrative content, but research papers, technical reports, and business documents often contain their most valuable information in tables. Standard RAG systems struggle here. Tables get split across chunks, their structure is lost, and queries like “compare the metrics in Table 3” return incomplete or unusable answers.

I have been working on a library called Tablemind that takes a different approach: treating tables as first-class citizens in the RAG pipeline. Rather than an afterthought, tables are preserved in their entirety, indexed with their structure intact, and actively retrieved when relevant to a query.

The Problem with Tables in RAG

To understand why a table-first approach matters, consider what happens in a typical RAG pipeline when you ask about data in a table. RAG systems chunk documents to work around LLM context limits — you retrieve only the most relevant fragments rather than feeding entire documents. This works for narrative text, but creates several problems for tables and structured data:

Chunking destroys structure: A 20-row table might be split across 3–4 chunks during ingestion. When retrieved, the LLM sees only partial rows and columns, losing the complete picture.
Reference chains break: A text chunk might mention “as shown in Table 2,” but the table itself lives in a different chunk with no semantic overlap. The retrieval system fails to connect the reference to its target.
Semantic search misses exact data: Vector embeddings capture semantic meaning, but table queries often require exact-match operations (“What’s the F1 score for the baseline model?”). The embedding may not rank the table chunk highly enough for retrieval.
Figure captions get orphaned: Similar issues affect figures and their captions — the visual content is described in one chunk, referenced in another, and the connection is lost.
Aggregation queries fail: Questions like “What’s the average accuracy across all models?” or “Which approach performed best?” require seeing the entire table. Partial chunks can’t support these calculations or comparisons.
Column headers get separated: When a table splits mid-way, later chunks may contain data rows without their column headers, making the values meaningless or ambiguous to the LLM.
Context-dependent values become unclear: Numbers like “95.2” are meaningless without knowing the column (is it accuracy? precision? F1?) and row context (which model? which dataset?).
Document-level insights remain hidden: Some queries require understanding relationships across multiple sections or synthesizing information from the entire document — “What are the key innovations in this paper?” or “Summarize the methodology.” Chunk-based retrieval often misses the big picture, leaving the LLM with fragments that can’t support holistic analysis.

        These problems all stem from the same root cause: chunk-based retrieval, designed for prose
        within LLM context limits, doesn’t handle structured or cross-sectional content well.
    

A Table-First Architecture

The library addresses these issues through several design choices:

1. Structure-Preserving Parsing with Docling

Instead of treating PDFs as raw text streams, the library uses Docling for document parsing. Docling understands document structure — it knows which items are tables, which are figures, and how sections relate to each other.

Tables are extracted as complete Markdown representations with their row/column structure preserved. This means even if the table never gets properly retrieved via vector search, you can access it directly via table number indexing.

2. Multi-Format Support

Documents come in various formats, and the library handles them uniformly:

PDF — Full Docling support with optional OCR for scanned documents
Markdown — Native parsing with table extraction
HTML — Table structure preserved from web content
DOCX — Word documents with tables intact
Text — Plain text with optional Markdown table detection

This flexibility matters because you might ingest papers from arXiv (PDF), documentation from GitHub (Markdown), and internal reports (DOCX) into the same knowledge base.

3. Vector + Keyword + Agentic Retrieval

The RAG system implements a three-pronged retrieval strategy. Semantic search using sentence-transformers embeddings handles conceptual queries. Keyword search (BM25) handles exact-term queries like model names and specific metrics. Hybrid search combines both with configurable weights.

The key innovation is agentic table fetching. After initial retrieval, an LLM analyzes the results to detect missing references. If a query mentions “Table 4”, but that table is not in the retrieved chunks, the system automatically fetches it by table number. This reference detection happens automatically — the query itself implies what’s needed.

4. Table Prioritization

When a query appears to focus on tabular data (detected via keywords like “table”, “metrics”, “results”, “compare”), the system applies a 3× score multiplier to table chunks during retrieval. Table embeddings often score lower on semantic search because their content is mostly numbers and short headers. The boost compensates for this disadvantage, ensuring tables are not consistently outranked by prose chunks that happen to use similar words.

5. Complete Table Context in Answers

When generating answers, the LLM receives table data in its original Markdown format, not as fragmented text. The prompt explicitly instructs it to:

Analyze the full table structure
Report all variants (important when tables compare multiple models)
Present data in structured formats when helpful
Cite table sources properly

This avoids a common failure mode where the LLM picks one row from a multi-variant table and ignores the rest.

6. Autonomous Query Mode Detection

One of the thorniest challenges in RAG is knowing when chunk-based retrieval isn’t enough. The library includes auto mode detection: an LLM analyzes each incoming query to determine whether it requires standard retrieval or full document review.

Chunk-based retrieval is designed to work within LLM context limits — you retrieve only relevant fragments. But some questions need the full picture. Broad questions that require synthesis across the entire document — “Does the paper’s narrative flow logically from problem to solution?”, “Are the claims properly supported throughout the document?”, “How could the structure be improved for clarity?” — trigger full review mode, where the system hierarchically summarizes each section and synthesizes a comprehensive answer. This requires more tokens and processing time, but captures relationships that chunk-based retrieval misses. Specific, targeted questions (“What’s the F1 score in Table 3?”) use efficient vector retrieval. This decision happens automatically — the user doesn’t need to know which mode is being used.

Technical Implementation

The library is organized into three core modules:

`docling_parser.py` — Document Parsing

Handles all document parsing and table/figure extraction. Returns a ParsedDocument object with:

Full text and Markdown representations
Structured table data (as Markdown, with row/column counts)
Figure captions and descriptions
Section hierarchy

`rag_ingestion.py` — Vector Database Ingestion

Manages document ingestion into Qdrant (or other vector databases):

Intelligent chunking with Docling’s HybridChunker, preserving context and coherence
Embedding generation with configurable models
Metadata-rich chunks (table/figure flags, headings, captions)
Efficient incremental updates with stable document IDs based on SHA256 content hashing

`rag_library.py` — Query System

The main RAG pipeline:

Multi-mode retrieval (semantic, keyword, hybrid)
Cross-encoder reranking for relevance
Agentic table/figure reference detection
LLM-based answer generation with multi-LLM support
Full document review mode with hierarchical section summarization
Auto query mode detection (autonomous selection between standard RAG and full review)

Web Interface

For interactive use, web_app.py provides a streaming chat interface. The system supports dynamic updates to the LLM provider configuration via environment variables, so models can be changed without restarting the server.

Real-time streaming
Conversation history
Memory compaction
Semantic / Keyword / Hybrid / Auto search
Table prioritization toggle
Query mode selection
File watcher & auto-refresh
Agentic mode indicator
Source panel

Practical Usage

The library offers multiple ways to query documents. For specific questions about facts, figures, or table data, use standard RAG mode with semantic or hybrid search. This chunk-based approach works within LLM context limits, retrieving only relevant fragments for fast, efficient answers. Enable table prioritization and agentic fetching when your query mentions tables or requires comparison across multiple rows.

For broader analysis requiring synthesis across the entire document, use full document review mode. The library supports both automatic and manual query mode selection. By default, auto mode detection analyzes each query to determine which approach is appropriate. You can also manually override this when you want explicit control — human-in-the-loop mode lets you force standard RAG for faster results, or full review for comprehensive analysis.

For large-context LLMs, you can skip vector retrieval entirely and pass full document context directly with selected tables.

Limitations and Considerations

No approach is universally optimal. Here are some trade-offs:

Large tables — Very wide tables (20+ columns) can still be challenging for LLMs to parse accurately.
Table number inference — If a document lacks explicit table numbers, the system falls back to sequential counting.
Multi-file references — Cross-document table references aren’t currently handled.
OCR quality — Scanned PDFs require OCR, which introduces potential errors in table structure.
Full review processing — Complete document review requires reading and summarizing all sections, which is slower and more expensive than chunk-based retrieval. Auto mode detection helps by only triggering this when necessary, but broad queries will inherently take longer to process.

The library is designed to be modular — you can use just the parsing component, or combine it with your own RAG pipeline.

Conclusion

Standard RAG systems work well for narrative text, but they break down when documents contain their most valuable information in tables and structures. Tables get fragmented, references get orphaned, and document-level insights remain hidden.

Tablemind addresses these gaps through a table-first architecture: complete table preservation, agentic reference detection, intelligent query mode routing, and full document review for questions that require synthesis beyond chunk-based retrieval. The auto mode detection autonomously chooses between fast vector retrieval and comprehensive review, with manual override available when human judgment is needed.

You can use the complete pipeline, or pull out individual components (parser, ingestion, or query system) for your own applications.