Multi-Agent System for Research Project Analysis & SDG Alignment with Interactive Dashboard
v3.0 | Two-Phase Architecture: Extract → Analyze → VisualizeThe SDG Analysis Pipeline is a comprehensive multi-agent system that extracts active research projects from institute websites, analyzes their alignment with UN Sustainable Development Goals, assesses technology integration, and generates enhancement recommendations. The system produces an interactive dashboard for visualization and exploration.
Separates web crawling (extract.py) from analysis (s2p.py) for better error handling, checkpointing, and reusability.
Driven by input/institutes.json with two-level parallelism: institutes process concurrently, projects within each institute process concurrently.
Agents leverage objectives, funding, period, keywords, themes, team, and outputs from additional_info for deeper analysis.
Generates both combined output file AND individual files per institute in output/institutes/ directory.
Real-time progress with elapsed time, ETA calculations, and detailed timing statistics on completion.
- dashboard20.py produces advanced visualizations with sunburst, sankey, radar charts, and detailed project modals.
The pipeline consists of two main phases with a final visualization step:
Two-pass web scraping with Crawl4AI: First pass extracts candidates from listing pages, second pass fetches full project details. Filters out jobs, completed projects, and non-relevant content.
Institute-driven parallel processing: Reads institutes.json, filters projects from active_projects.json by institute_id, runs SDG classification, technology analysis, and enhancement recommendations.
Generates interactive HTML dashboard with global and institute-level views, SDG hierarchy visualization, technology assessment profiles, and detailed project cards with modal deep-dive.
Separation of Concerns: Extraction and analysis are separate - you can re-run analysis without re-crawling. Checkpointing: If analysis fails, you have the extracted data to resume from. Scalability: Two-level parallelism maximizes throughput while preventing API overload.
The extractor uses a two-pass approach with Crawl4AI and LLM-based structured extraction:
Fetch institute listing page, extract candidate projects with titles, URLs, brief descriptions, and status indicators.
Exclude jobs (vacancy, hiring), page elements (cookies, privacy), and completed projects (finished, closed).
For each candidate, fetch project page and extract description, objectives, duration, funding, partners, contact, and additional_info.
Confirm project is active/ongoing. Default to "active" if no status is explicitly mentioned.
Output to input/active_projects.json with complete metadata and timing statistics.
| Argument | Short | Description |
|---|---|---|
--skip-existing |
Skip institutes with existing projects in output file | |
--force |
Force reprocess all institutes (overrides --skip-existing and --reprocess) | |
--reprocess |
Comma-separated institute IDs, short names, or full names to reprocess (e.g., '1,2,3' or 'UNU-WIDER,MERIT') | |
--output |
-o |
Custom output file path |
--input |
-i |
Custom institutes.json path |
--max-projects |
Max projects per institute (overrides env) |
# Run with defaults
python extract.py
# Skip already processed institutes
python extract.py --skip-existing
# Force reprocess all
python extract.py --force
# Reprocess specific institutes by ID
python extract.py --reprocess 1,2,3
# Reprocess by short name (recommended)
python extract.py --reprocess UNU-WIDER,MERIT
# Reprocess by partial full name
python extract.py --reprocess "Biotechnology","Comparative Regional"
See EXTRACT_PROJECTS.html for complete extraction documentation.
The analysis pipeline is institute-driven for maximum parallelism:
The system uses two-level parallelism for optimal throughput:
| Level | Concurrency Setting | Description |
|---|---|---|
| Institute-Level | MAX_INSTITUTE_CONCURRENCY (default: 2) |
Number of institutes processed simultaneously |
| Project-Level | MAX_LLM_CONCURRENCY (default: 3) |
Max LLM calls per institute (SDG + Tech + Recommendation agents) |
Example: With 2 institute concurrency and 3 LLM concurrency, you can process up to 6 LLM calls in parallel (2 institutes × 3 calls each).
The pipeline supports incremental processing to resume interrupted runs or add new institutes without reprocessing existing ones:
| Feature | Description |
|---|---|
SKIP_EXISTING |
When enabled, skips institutes with existing output files in output/institutes/ |
| Automatic Detection | Checks if output/institutes/{institute_name}.json exists before processing |
| Smart Logging | Shows "Loading existing result" or "Skipped (loaded N projects)" for cached institutes |
Resuming Interrupted Runs: If the pipeline crashes mid-run, set SKIP_EXISTING=true to continue from where it left off.
Adding New Institutes: When adding new institutes to input/institutes.json, only new institutes will be processed.
Cost Savings: Avoid re-processing completed institutes when re-running analysis.
# Skip existing output files (CLI flag)
python s2p.py --skip-existing
# Or using environment variable
SKIP_EXISTING=true python s2p.py
# Reprocess specific institutes by ID
python s2p.py --reprocess 1,2,3
# Reprocess by short name (recommended)
python s2p.py --reprocess UNU-WIDER,MERIT
| Argument | Short | Description |
|---|---|---|
--skip-existing |
Skip institutes with existing output files | |
--force |
Force reprocess all institutes (overrides --skip-existing, --reprocess and env) | |
--reprocess |
Comma-separated institute IDs, short names, or full names to reprocess | |
--output |
-o |
Custom output file path |
--institutes |
-i |
Custom institutes.json path |
--projects |
-p |
Custom active_projects.json path |
For each project, three agents run in parallel using asyncio.to_thread():
Multi-stage analysis: Keywords → Semantic LLM → Calibration with priority regions
Assesses tech integration, maturity, and identifies gaps
Suggests emerging technologies with feasibility scoring
Agents use project metadata to support more detailed analysis. When fields are missing, agents use graceful fallbacks (e.g., "Not specified") to ensure analysis continues:
| Context Source | Fields Used | Benefits | Fallback |
|---|---|---|---|
| Core Project Data | title, description, location, partners | Basic project understanding | Empty string / empty list |
| Objectives & Period | objectives, period, funding | Timeline-aware recommendations | "Not specified" |
| Additional Info | keywords, themes, team, outputs | Domain-specific analysis | "Not specified" / "None" |
Note: Not all projects have complete metadata. Agents are designed to work with partial information, using whatever context is available while providing "Not specified" placeholders for missing fields in their analysis prompts.
python s2p.py
During execution, the pipeline provides real-time feedback:
Progress: 5/20 institutes completed | Elapsed: 125.3s (2.1m) | ETA: 376.0s (6.3m)
On completion, detailed timing statistics are displayed:
======================================== TIMING STATISTICS ======================================== Total time: 1500.00s (25.00m) Average per institute: 75.00s Average per project: 3.50s Projects per minute: 17.14
The Maturity Score is a comprehensive multi-dimensional metric that evaluates technology readiness across four key dimensions. Each dimension is scored independently and then combined using a weighted formula.
| Dimension | Weight | Description | Key Indicators |
|---|---|---|---|
| Technical Readiness | 35% | How developed and proven the technology is | Production keywords, deployment status, tech count, temporal analysis |
| Operational Status | 25% | Current deployment and operational state | Deployed/piloting/development keywords, scale indicators |
| Adoption Level | 20% | User adoption and scale of implementation | User/beneficiary counts, partner count, geographic reach |
| Evidence Base | 20% | Validation and research backing | Peer-reviewed publications, evaluations, measured results, outputs |
| Score Range | Level | Characteristics | Example Indicators |
|---|---|---|---|
| 0.90-1.00 | Production | Fully deployed, proven at scale, operational in multiple sites | "deployed", "operational", "live", "scaling", "proven", "fully operational" |
| 0.70-0.89 | Advanced | Field-tested, validated pilots, ready for scale-up | "field-tested", "validated", "beta", "commercialized", "implemented" |
| 0.50-0.69 | Intermediate | Working prototypes, active pilots, showing promise | "pilot", "prototype", "testing", "demonstration", "user testing" |
| 0.25-0.49 | Early | Early prototypes, proof of concept, experimental | "experimental", "proof of concept", "exploratory", "initial development" |
| 0.00-0.24 | Planning | Concept phase, planning, research only | "proposed", "planned", "roadmap", "design phase", "conceptual" |
1. Technical Readiness (35% weight)
2. Operational Status (25% weight)
3. Adoption Level (20% weight)
4. Evidence Base (20% weight)
5. Quantitative Boosts (added to overall score)
overall = (technical × 0.35) + (operational × 0.25) +
(adoption × 0.20) + (evidence × 0.20) +
quantitative_boost
High Maturity Project (0.89): AI diagnostic platform deployed across 50 clinics, 100K+ patients processed, peer-reviewed study published.
→ Technical: 0.90 (AI proven), Operational: 0.95 (deployed at scale), Adoption: 0.80 (50 clinics), Evidence: 0.90 (published)
→ Overall: (0.90×0.35) + (0.95×0.25) + (0.80×0.20) + (0.90×0.20) = 0.89
Medium Maturity Project (0.57): Blockchain land registry pilot in 3 communities, prototype working, conference proceedings published.
→ Technical: 0.55 (prototype), Operational: 0.60 (pilot), Adoption: 0.40 (3 communities), Evidence: 0.65 (conference)
→ Overall: (0.55×0.35) + (0.60×0.25) + (0.40×0.20) + (0.65×0.20) = 0.57
Low Maturity Project (0.23): Exploring IoT sensors for water monitoring, researching options, seeking funding.
→ Technical: 0.25 (research), Operational: 0.20 (planning), Adoption: 0.10 (no users), Evidence: 0.30 (exploratory)
→ Overall: (0.25×0.35) + (0.20×0.25) + (0.10×0.20) + (0.30×0.20) = 0.23
The Innovation Score is a multi-dimensional metric that evaluates how innovative and forward-looking a project's technology approach is. It considers not just what technologies are used, but how they're combined, the novelty of the approach, and the visionary nature of the project.
| Dimension | Weight | Description | Key Indicators |
|---|---|---|---|
| Emerging Technology Usage | 40% | Presence of cutting-edge technologies | AI/ML, blockchain, IoT, spatial tech, data analytics, cloud, mobile |
| Novelty Indicators | 30% | Language indicating innovative approaches | Breakthrough, novel, pioneering, cutting-edge, proprietary, patented |
| Technology Combination | 20% | Combinatorial innovation from tech synergy | Premium tech pairs (AI+IoT, AI+Blockchain, etc.), diversity bonus |
| Forward-Looking Language | 10% | Future-oriented, visionary statements | Will transform, next-generation, paradigm shift, revolutionize |
| Score Range | Level | Characteristics | Example Indicators |
|---|---|---|---|
| 0.80-1.00 | Transformative | Breakthrough innovation combining multiple cutting-edge technologies | Multiple premium tech combos, novel language, strong vision |
| 0.60-0.79 | High Innovation | Advanced use of emerging technologies with novel approaches | AI/ML + spatial, cutting-edge language, some combinations |
| 0.40-0.59 | Moderate Innovation | Good technology mix with some innovative elements | Single emerging tech, moderate novelty, basic combinations |
| 0.20-0.39 | Low Innovation | Conventional technology approach with limited novelty | Commoditized tech only, standard approaches, minimal novelty |
| 0.00-0.19 | Minimal Innovation | Traditional or no significant technology component | No emerging tech, established/conventional language only |
1. Emerging Technology Usage (40% weight)
2. Novelty Indicators (30% weight)
3. Technology Combination (20% weight)
4. Forward-Looking Language (10% weight)
innovation = (tech_innovation × 0.40) + (novelty_score × 0.30) +
(combinatorial_score × 0.20) + (forward_looking_score × 0.10)
High Innovation Project (0.88): AI-powered satellite imagery analysis for disaster response. Uses cutting-edge deep learning with geospatial data. Breakthrough approach with proprietary algorithms.
→ Tech: 0.60 (AI 0.35 + Spatial 0.25), Novelty: 1.00 (breakthrough + cutting-edge + proprietary), Combination: 0.62 (AI+Spatial premium), Forward: 0.50 (will transform + next-generation)
→ Overall: (0.60×0.40) + (1.00×0.30) + (0.62×0.20) + (0.50×0.10) = 0.88
Medium Innovation Project (0.55): Mobile app for data collection in rural health clinics. Uses cloud storage and basic analytics. Emerging technology approach with modern UI.
→ Tech: 0.30 (Mobile 0.15 + Cloud 0.15), Novelty: 0.65 (emerging + modern + advanced), Combination: 0.30 (2 categories), Forward: 0.38 (scalable to + next phase)
→ Overall: (0.30×0.40) + (0.65×0.30) + (0.30×0.20) + (0.38×0.10) = 0.55
Low Innovation Project (0.18): Traditional database system for record management. Established technology with conventional methods. Standard approach.
→ Tech: 0.0 (no emerging tech), Novelty: 0.20 (established + conventional -0.20), Combination: 0.0 (1 category), Forward: 0.30 (minimal forward-looking)
→ Overall: (0.0×0.40) + (0.20×0.30) + (0.0×0.20) + (0.30×0.10) = 0.18
The Integration Level is a sophisticated multi-dimensional metric that evaluates how deeply and effectively technologies are embedded within a project. Unlike simple technology counts, this assessment considers breadth, depth, interconnectedness, and architectural sophistication.
| Dimension | Weight | Description | Key Indicators |
|---|---|---|---|
| Breadth | 25% | Technology count and category diversity | Number of technologies, diversity across categories (AI, IoT, cloud, etc.) |
| Depth | 30% | How deeply technologies are embedded in project implementation | Tech in objectives, implementation language, multiple mentions across sections |
| Interconnectedness | 25% | Whether technologies form an integrated ecosystem | Premium tech combinations, integration keywords, data flow indicators |
| Sophistication | 20% | Architectural complexity and modern development practices | Advanced patterns (microservices, serverless), DevOps, scalability considerations |
| Score Range | Level | Characteristics | Example Profile |
|---|---|---|---|
| 0.75-1.00 | High | Multiple diverse technologies deeply integrated with sophisticated architecture | 5+ technologies across 3+ categories, integrated ecosystem, advanced patterns |
| 0.45-0.74 | Medium | Good technology mix with some integration and implementation depth | 3+ technologies or 2+ categories, moderate interconnectedness |
| 0.00-0.44 | Low | Limited technology use with minimal integration | Few technologies, single category, shallow implementation |
1. Breadth Score (25% weight)
2. Depth Score (30% weight)
3. Interconnectedness Score (25% weight)
4. Sophistication Score (20% weight)
integration = (breadth × 0.25) + (depth × 0.30) +
(interconnectedness × 0.25) + (sophistication × 0.20)
High Integration Project (0.82): Distributed AI platform with microservices architecture integrating IoT sensors, blockchain for data integrity, and cloud infrastructure.
→ Breadth: 0.80 (6 techs, 4 categories), Depth: 0.90 (tech in objectives + implement language), Interconnectedness: 0.85 (AI+IoT+Blockchain combos + ecosystem), Sophistication: 0.75 (microservices + DevOps)
→ Overall: (0.80×0.25) + (0.90×0.30) + (0.85×0.25) + (0.75×0.20) = 0.82
Medium Integration Project (0.52): Mobile application using cloud storage and basic data analytics for health data collection.
→ Breadth: 0.55 (3 techs, 2 categories), Depth: 0.60 (use/utilize language), Interconnectedness: 0.40 (mobile+data combo), Sophistication: 0.45 (API-based + scalable)
→ Overall: (0.55×0.25) + (0.60×0.30) + (0.40×0.25) + (0.45×0.20) = 0.52
Low Integration Project (0.28): Basic website using standard HTML/CSS with minimal technology stack.
→ Breadth: 0.15 (1 tech, 1 category), Depth: 0.35 (superficial mention), Interconnectedness: 0.20 (no combinations), Sophistication: 0.35 (basic patterns)
→ Overall: (0.15×0.25) + (0.35×0.30) + (0.20×0.25) + (0.35×0.20) = 0.28
The pipeline uses a hybrid approach that combines deterministic rule-based algorithms with LLM semantic understanding. This provides the best of both worlds: consistent, reproducible scoring with deep semantic nuance.
Deterministic algorithms compute baseline scores for Integration Level, Maturity, and Innovation using keyword matching, pattern detection, and weighted formulas.
LLM receives rule-based scores as context, performs deep semantic analysis, and can refine scores based on nuanced understanding that rule-based methods might miss.
| Dimension | Rule-Based Contribution | LLM Enhancement Role | Override Behavior |
|---|---|---|---|
| Integration Level | Multi-dimensional analysis across breadth, depth, interconnectedness, and sophistication with 4 sub-methods | Semantic understanding of how technologies meaningfully contribute to project goals and work together | LLM can override - When deep semantic analysis suggests different integration than rule-based patterns indicate |
| Maturity Score | Four-dimensional model (Technical 35%, Operational 25%, Adoption 20%, Evidence 20%) with context-weighted scoring | Interprets nuanced deployment status, validates evidence quality, assesses real-world operational readiness | LLM can override - When semantic context indicates different maturity than keyword patterns suggest |
| Innovation Score | Four-dimensional model (Tech 40%, Novelty 30%, Combination 20%, Forward-Looking 10%) with weighted tech categories | Detects genuine breakthrough approaches vs marketing buzzwords, assesses true novelty in domain context | LLM can override - When semantic analysis reveals genuine innovation or identifies inflated claims |
Rule-based scoring ensures same input always produces same baseline score, enabling comparability across projects.
LLM adds human-like understanding of context, intent, and genuine innovation that keyword matching cannot capture.
Rule-based baseline anchors LLM analysis, reducing risk of hallucination by providing grounded starting point.
If LLM fails or returns incomplete data, system falls back to rule-based scores, ensuring reliability.
Step 1: Rule-based algorithms compute initial scores with full debug logging
Step 2: Initial scores are provided to LLM as context/suggested values in the prompt
Step 3: LLM can choose to:
• Keep the score if it seems accurate based on semantic analysis
• Adjust the score if deep understanding suggests different assessment
• Return nothing (falls back to rule-based via data.get("key", rule_based_value))
Step 4: Final score uses LLM value if provided, otherwise rule-based value
Beyond quantitative scores, the LLM provides essential qualitative analysis that rule-based methods cannot generate:
| Output Field | Purpose | Why LLM is Essential |
|---|---|---|
| analysis | Narrative explanation of technology approach | Synthesizes complex information into coherent human-readable summary |
| strengths | What's done well technically | Identifies specific technical merits and implementation quality |
| gaps | Missing technologies or capabilities | Recognizes what's missing based on domain knowledge and best practices |
| scalability_assessment | Scaling potential and limitations | Evaluates architectural decisions for scalability implications |
| interoperability_notes | Integration with existing systems | Assesses compatibility with standards and existing infrastructure |
| tech_recommendations | Specific technology improvements | Suggests relevant, actionable enhancements based on project context |
The dashboard generator creates an interactive HTML visualization with both Global Analysis and Institute View tabs:
python dashboard20.py
Output (Anthropic): dashboard_advanced.html - Open in any modern web browser.