Welcome to Your AI Research Partner

The Deep Research Agent (DRA) is an experimental capability integrated into the browser to enable structured, multi‑step investigation of complex topics directly within the user's workflow. It transforms a single topic into a comprehensive, well-structured, and fully-cited report. This guide will walk you through its features and show you how to get the most out of your automated research assistant.

Why It Matters: Beyond the Filter Bubble

The Problem: Algorithmic Bias & Informational Silos

Traditional research workflows often rely on a single search engine or database. This creates a "filter bubble," where algorithmic biases can inadvertently hide critical information and diverse perspectives, leading to an incomplete or skewed understanding of a topic.

The Solution: Multi-Engine Synthesis with an AI Judge

This agent systematically overcomes this limitation. By generating separate reports from different search engines (e.g., a standard web search vs. an academic search) and then using a powerful LLM as an impartial "Judge," it identifies and merges the most valuable, unique insights from each source into a single, definitive report.

The Value Proposition: Higher-Fidelity Insights

This method produces a final analysis that is more comprehensive, objective, and robust than what any single source could provide. Furthermore, the **Scoring** utility provides direct feedback on report quality, helping you iteratively refine the research plan itself. By comparing how different engines cover a topic, you can identify and fill knowledge gaps, leading to a truly superior research outcome.

Read the Blog: Break Your Research Filter Bubble

Meta-Research: Studying AI Research Methodologies

Beyond practical research tasks, this platform serves as a meta-research tool—a laboratory for investigating how different AI systems approach research, facilitating experimentation, and enabling comparative analysis of AI research methodologies.

Research on Research: How AI Systems Research Differently

By generating multiple reports on the same topic using different AI systems (Claude, Gemini, OpenAI, DeepSeek), you can observe and analyze how each model approaches research:

Source Selection Patterns

Which sources do different AI models prefer? Do some favor academic papers while others prioritize news articles or blog posts?

Synthesis Styles

How do models structure information? Some may be more analytical, others more narrative, some data-driven, others conceptual.

Citation Practices

How thoroughly do different models support their claims? Citation density, source diversity, and attribution accuracy vary significantly.

Reasoning Depth

Do models simply summarize sources, or do they synthesize new insights? Compare the analytical depth across different systems.

Experimentation Platform

The platform provides a controlled environment for systematic experimentation:

▸ Prompt Engineering: Modify prompts in the prompts/ directory to test how different instructions affect research quality and output structure.
▸ Search Engine Comparison: Run the same research plan with different search engines (Perplexity vs Tavily vs Brave) to compare source coverage and result quality.
▸ Fusion Quality Analysis: Generate multiple reports, then use the Judge feature to fuse them. Analyze whether the fused report truly combines the best elements or introduces new errors.
▸ Scoring Consistency: Score the same report with different Judge engines to see how scoring varies across AI models and identify patterns in evaluation criteria.

Enabling Practical Use Cases

Beyond methodological research, the platform enables practical applications across domains:

Exploring Topics in Depth

Conduct comprehensive investigations on complex subjects, gathering diverse perspectives and synthesizing them into coherent, well-sourced reports.

Identifying Research Topics

Discover emerging areas of study, map research landscapes, and identify gaps where new contributions could be valuable.

Creating Briefing Notes

Rapidly produce concise, well-cited briefings for decision-makers on policy issues, technologies, or market developments.

Analyzing Policy

Compare policy approaches across jurisdictions, track regulatory developments, and assess stakeholder positions.

Comparing Technologies

Evaluate technical options, compare frameworks and tools, and identify best practices for technology adoption decisions.

Multi-Output Fusion with Citation Reconstruction

Fuse multiple research outputs into one synthesis while reconstructing citations—broader coverage doesn't compromise traceability.

💡 Key Insight: Comparative AI Analysis

The most powerful meta-research capability is comparative analysis. By systematically comparing how different AI systems approach the same research question, you gain insights not just into the topic, but into the research process itself. This "research on research" approach helps identify strengths, weaknesses, and biases in current AI methodologies—informing both practical usage and future AI system development.

Core Features

Automated Planning

The AI first acts as an expert planner, breaking down your topic into logical objectives and sub-topics to ensure comprehensive coverage.

Multi-Engine Research

For each sub-topic, the system uses both LLM-powered search (Perplexity, Claude) and API-based search engines (Tavily, SerpApi/Google, Brave)—gathering diverse, well-rounded information from across the web.

Cited Synthesis

The system synthesizes all findings into a single report, ensuring every claim is backed by a citation and provides a clean, centralized reference list.

Multi-Model Synthesis

Generate multiple report versions using different LLM providers (Claude, Gemini, OpenAI, DeepSeek) to compare diverse perspectives and synthesis styles.

AI Judging

Generate multiple report versions, then choose any LLM as your "AI Judge" (Claude, Gemini, OpenAI, DeepSeek) to analyze and fuse them into a single, superior version.

Automated Scoring

Quantify the quality of each report. The AI scores syntheses against the original plan on metrics like objective fulfillment, coverage, and insight.

How to Use: The Research Workflow

Step 1: Define Your Research Topic

This is your starting point. Provide a clear and concise research topic. The more specific your topic, the better the resulting plan will be.

Enter your topic in the text area. Good examples include "The impact of quantum computing on modern cryptography" or "Sustainable urban planning strategies in coastal cities."

Two Options for Creating Your Research Plan:

Option A: AI-Generated Plan (Recommended)

Select a Plan Generation Engine from the dropdown (Gemini, Claude, Perplexity, OpenAI, or DeepSeek)
Choose Research Depth: Select the number of research questions:
- 2 - Quick overview (fastest)
- 3 - Standard (recommended for most topics)
- 5 - Comprehensive (deeper exploration)
- 7 - Deep dive (maximum detail)
Click the Generate Research Plan button to let AI create a structured plan for you

Option B: Manual Plan Entry

Check the box "I'll enter my own research plan" to skip AI generation
Click Continue to Plan Editor to write your own plan manually
Useful when you have a specific research structure in mind or want full control over the research questions

Step 2: Review and Refine the Plan

This is the most critical step for ensuring a high-quality outcome. The AI will generate a detailed plan, but you are the expert. You can now edit the plan directly in the text box.

Review the Objectives: Do they align with your goals?
Check the Subtopics: Are they relevant? Should any be added, removed, or rephrased?
Adjust Search Queries: You can modify the suggested search queries under each subtopic to guide the research agent more effectively.

Tip: Taking a few minutes to refine the plan here can dramatically improve the final report's focus and relevance.

Step 3: Configure and Execute

Once you are satisfied with your research plan, configure the execution parameters and start the main process.

Select a Search Engine: This model will perform the actual web research for each subtopic. Some engines have special parameters (like recency) you can adjust.
Select a Synthesis Engine: This model will read all the research findings and write the final, cohesive report.
Set Target Report Length: Provide an approximate word count for the final report.
Click **Generate New Research**. The agent will now work in the background with real-time status updates showing progress through each step (analyzing subtopics, synthesizing, etc.). This initial run gathers all the necessary data and unlocks the advanced features below.

Visualizing the Workflow

The Full Research Lifecycle

The agent provides a seamless user experience, guiding you from the initial topic definition to the final, advanced stages of judging and scoring multiple AI-generated reports.

View Flow Chart

Screenshot of the research workflow from planning to judging

Sample Score Report

After generating reports, the AI scoring utility provides a detailed, quantitative breakdown of each version's performance, helping you objectively identify the highest-quality synthesis.

External Prompt Management

All LLM prompts are now stored externally in the prompts/ directory, allowing you to customize the agent's behavior without modifying the core code.

Why External Prompts?

No Code Changes: Modify prompt templates by editing text files - no Python knowledge required
Runtime Reload: Use the "Reload Prompts" button in the UI to apply changes instantly without restarting the server
Experimentation Friendly: Test different prompt strategies quickly and iterate based on results
Version Control: Track prompt changes in Git alongside your code

Available Prompt Templates

Search Engine Prompts

research_claude.txt - Claude web search with citations
research_perplexity.txt - Perplexity web search with citations
research_tavily.txt - Tavily API documentation
research_serpapi.txt - SerpApi API documentation
research_brave.txt - Brave Search API documentation

Processing Engine Prompts

Shared by Gemini, OpenAI, DeepSeek, Perplexity, Claude

planning.txt - Generate research plans from topic
synthesis.txt - Synthesize findings into report
judge.txt - Merge multiple reports into superior one
scoring.txt - Score reports against research plan
parse_plan.txt - Parse text plan back to JSON

How to Reload Prompts

Edit any .txt file in the prompts/ directory
Save your changes
Click the "Reload Prompts" button in the navigation bar (or restart the server)
Verify success by checking the confirmation alert showing loaded prompts

Note: All prompt templates support Python .format() style placeholders like {variable_name}.

Demo: Multi-Engine Fusion Results

See the power of multi-engine synthesis with this real example using two different search engines to research RAG system design.

Research Topic & Plan

Topic: RAG System Design

Generated with 2 research questions for a broad overview. More questions = deeper research (you can choose 2, 3, 5, or 7 questions).

View Research Plan

1 Report from Search Engine 1

Generated using the first search engine's results

View Report 1

2 Report from Search Engine 2

Generated using the second search engine's results

View Report 2

AI Judge Fusion Report

The AI Judge analyzed both source reports and created a synthesis that combines their strengths while preserving the best content from each. Note: The fused report score is not guaranteed to be higher than individual reports. Always score all reports to compare objectively. Fusion quality depends on the Judge engine's capability, how well input reports complement each other, and whether combining content maintains coherence. Sometimes a single excellent report scores higher than a fusion of mixed-quality reports.

Key Fusion Benefits:

✓ Preserves unique insights from both search engines
✓ Resolves contradictions by presenting multiple viewpoints
✓ Eliminates redundancy while maintaining citation density
✓ Creates coherent narrative with better structure

View Fusion Report

What to Observe

When comparing these three reports, notice how:

Source Reports 1 & 2 may have different coverage, sources, and perspectives due to using different search engines
Fusion Report (labeled "Fusion (2 sources)") combines the best elements from both, creating a more comprehensive and balanced analysis
Citation density is maintained throughout - all claims remain well-supported
Structure is improved with better flow and organization

Advanced Features & Tips

Recent Improvements (v1.2.6+)

Plan Caching

The system now caches parsed research plans to speed up multi-engine workflows. When you generate multiple reports with the same plan using different search/synthesis engine combinations, the plan is only parsed once. The cache automatically invalidates when you generate a new plan, edit the plan text, or click Reset. Enable DEBUG_MODE=true in your .env file to see cache activity in the console.

Meaningful Filenames

When saving reports as HTML, filenames now include the research topic, date, and engine information for easy identification. Format: {topic}-{date}-{search-engine}-{synthesis-engine}.html. This makes it simple to organize and compare reports generated with different engine combinations.

Debug Mode

Enable optional debug output by setting DEBUG_MODE=true in your .env file. Debug output shows raw LLM responses, citation conversion details, and plan cache activity. This is useful for troubleshooting or understanding how the agent processes information. By default, debug mode is disabled.

Custom API Endpoints

Override the default Anthropic API endpoint by setting ANTHROPIC_BASE_URL in your .env file. This is useful for using proxy services, custom deployments, or regional endpoints. The .env file takes precedence over system environment variables when using the "Reload .env" feature.

Smart Date Ranges

Research plans use context-aware temporal references to avoid outdated date ranges. The system adjusts based on the current month: Jan-Mar uses conservative date ranges (previous year, early current year), Apr-Jun balances between recent and current year, and Jul-Dec uses full previous-to-current ranges. This prevents searching for "2024-2026" in early 2026 when 2026 content is sparse.

The planning prompt also includes explicit Topic Type Guidelines to help the AI determine whether to use current year only (for emerging/cutting-edge topics like AI agents, quantum computing) or previous-to-current year ranges (for established topics like software testing, DevOps). This ensures consistent date range selection across different research topics.

Previous Improvements (v1.2.5)

Dynamic .env Reload

Reload your API configuration without restarting the server. Click the "Reload .env" button in the navigation bar to apply changes to API keys, model names, or settings. The UI will automatically update with the new configuration while preserving your current research session.

Visual Toast Notifications

Get instant feedback with elegant toast notifications for reload operations. Success, error, and info messages appear in the top-right corner with smooth animations, keeping you informed without disrupting your workflow.

Brave Search Integration

New privacy-focused search engine option with configurable rate limiting. Brave Search offers independent web indexing, freshness filters, and extra snippets for richer content. Configure BRAVE_RATE_LIMIT in your .env file to match your subscription tier (1/10/20 req/sec for Free/Base/Pro tiers).

Better Table Formatting

All synthesis engines now properly render markdown tables with clean borders, alternating row colors, and hover effects. Tables are automatically formatted with proper pipe (`|`) alignment.

Perplexity Citation Mapping Fixed

Reports generated using Perplexity as the search engine now have accurate citation mapping between in-text citations and the References section. The system properly handles Perplexity's unique citation format.

Adjustable Research Depth

Control how detailed your research plan is by choosing 2, 3, 5, or 7 research questions. More questions mean deeper exploration but longer research time.

Iteration with "Re-Synthesize"

After a report is generated, the **Re-Synthesize** button appears. This powerful feature allows you to re-write the final report using the *already-gathered research data*. This is incredibly useful for:

Trying a different synthesis model to see if it produces a better narrative.
Adjusting the report length without having to re-run the entire (and time-consuming) research process.
Creating multiple versions of the report to compare before final judging.

Step 4 (Optional): Synthesis Management

After generating one or more syntheses, the **Synthesis Management** panel appears. This is your command center for comparing different versions and creating a definitive final report.

Scoring Reports: Quantifying Quality

The scoring utility provides an objective, AI-driven evaluation of your reports, allowing you to quickly compare their performance. The scorer uses the same AI-powered analysis tools (Draft Statistics and Citation Overlap Analysis) to inform its evaluation.

The Scoring Workflow:

Generate Reports: Create one or more reports using the "Generate" or "Re-Synthesize" buttons.
Select Reports to Score: In the management panel, use the checkboxes to select one or more reports.
Choose an Engine: The "Judge Engine" dropdown is also used for scoring. Select the LLM you want to act as the evaluator.
Click "Score Selected": A modal window will appear, showing a detailed score card for each selected report. You'll see an overall score and a breakdown across key metrics like Objective Fulfillment, Question Coverage, and Depth & Insight.

The AI Judge: Creating the Definitive Report

The judge's goal is to analyze multiple drafts and produce a single, superior report that best fulfills the original research plan's objectives. It intelligently merges content, resolves contradictions, and ensures the best information is included.

AI-Powered Analysis Tools:

The judge is assisted by two automated analysis tools that provide insights into the source reports:

📊 Draft Statistics: Automatically calculates quantitative metrics for each report including word count, total citations, unique citations, citation density (citations per 100 words), and citation coverage (percentage of paragraphs with citations). This helps identify which reports are most comprehensive and well-supported.
🔗 Citation Overlap Analysis: Analyzes which sources are cited by multiple reports (consensus sources) versus sources unique to each report. This helps the judge identify broadly-supported information versus unique insights that only one engine discovered.

These tools run automatically during judging and scoring, providing the AI with data-driven insights to make better fusion decisions.

How the Judge Works:

Quality Filtering: The judge actively filters out low-quality content including unsubstantiated statistics, unsupported claims, vague filler statements, and redundant information.
Selective Citations: Only sources that were actually cited in the report text are included in the References section. If the highest in-text citation is [48], only 48 sources will be listed.
Citation Accuracy: Every statistic and factual claim in the judged report must have a proper citation marker [CITATION:X] attached to it.
Quality Over Quantity: The judge is instructed that a shorter, well-supported report is better than a longer report filled with unverified claims.
Tool-Guided Synthesis: The judge uses the citation overlap analysis to prioritize consensus sources (information verified by multiple engines) while preserving unique insights from individual reports.

The Judging Workflow:

Generate Multiple Versions: Use the **Re-Synthesize** button to create at least two reports to compare.
Select Reports for Judging: Use the checkboxes to select two or more reports you want the judge to analyze.
Choose a Judge Engine: Select the LLM you want to act as the final editor.
Click "Judge Selected": Watch as real-time status updates show the judge's progress (analyzing reports, generating citation overlap statistics, synthesizing, post-processing). The AI will analyze the reports and create a new, consolidated version, which will be added to the list and displayed.

Note: Because the judge applies strict quality filtering, the resulting report may be shorter than the source reports. This is intentional - the judge removes filler, redundancy, and unsupported content while preserving only well-sourced, verifiable information.

Report Management

At the top of the displayed report, you have three options:

Start New Topic: Clears the entire session (plan, research data, and all syntheses) and takes you back to Step 1.
Copy HTML: Copies the raw HTML of the currently viewed report to your clipboard, perfect for pasting into a CMS or other web-based editors.
Save as HTML: Downloads a complete, standalone HTML file of your currently viewed report for easy sharing and archiving. The filename automatically includes the topic, date, and engine information (e.g., rag-system-design-2025-02-09-claude-gemini.html), making it easy to identify and organize multiple reports.