SDG Analysis Pipeline

Multi-Agent System for Research Project Analysis & SDG Alignment with Interactive Dashboard

v3.0 | Two-Phase Architecture: Extract → Analyze → Visualize

Overview

The SDG Analysis Pipeline is a comprehensive multi-agent system that extracts active research projects from institute websites, analyzes their alignment with UN Sustainable Development Goals, assesses technology integration, and generates enhancement recommendations. The system produces an interactive dashboard for visualization and exploration.

🔍 Two-Phase Extraction

Separates web crawling (extract.py) from analysis (s2p.py) for better error handling, checkpointing, and reusability.

🌍 Institute-Driven Processing

Driven by input/institutes.json with two-level parallelism: institutes process concurrently, projects within each institute process concurrently.

📊 Rich Context Analysis

Agents leverage objectives, funding, period, keywords, themes, team, and outputs from additional_info for deeper analysis.

🎯 Per-Institute Outputs

Generates both combined output file AND individual files per institute in output/institutes/ directory.

⏱️ Progress Tracking

Real-time progress with elapsed time, ETA calculations, and detailed timing statistics on completion.

📈 Interactive Dashboard

- dashboard20.py produces advanced visualizations with sunburst, sankey, radar charts, and detailed project modals.

Pipeline Architecture

The pipeline consists of two main phases with a final visualization step:

📥

Phase 1: Extraction (extract.py)

Two-pass web scraping with Crawl4AI: First pass extracts candidates from listing pages, second pass fetches full project details. Filters out jobs, completed projects, and non-relevant content.

📊

Phase 2: Analysis (s2p.py)

Institute-driven parallel processing: Reads institutes.json, filters projects from active_projects.json by institute_id, runs SDG classification, technology analysis, and enhancement recommendations.

📈

Phase 3: Visualization (dashboard20.py)

Generates interactive HTML dashboard with global and institute-level views, SDG hierarchy visualization, technology assessment profiles, and detailed project cards with modal deep-dive.

Key Architectural Benefits

Separation of Concerns: Extraction and analysis are separate - you can re-run analysis without re-crawling. Checkpointing: If analysis fails, you have the extracted data to resume from. Scalability: Two-level parallelism maximizes throughput while preventing API overload.

Phase 1: Extraction (extract.py)

The extractor uses a two-pass approach with Crawl4AI and LLM-based structured extraction:

1️⃣ Candidate Discovery

Fetch institute listing page, extract candidate projects with titles, URLs, brief descriptions, and status indicators.

2️⃣ Smart Filtering

Exclude jobs (vacancy, hiring), page elements (cookies, privacy), and completed projects (finished, closed).

3️⃣ Full Details

For each candidate, fetch project page and extract description, objectives, duration, funding, partners, contact, and additional_info.

4️⃣ Status Verification

Confirm project is active/ongoing. Default to "active" if no status is explicitly mentioned.

5️⃣ Save Results

Output to input/active_projects.json with complete metadata and timing statistics.

Key Features

CLI Arguments Reference (extract.py)

Argument Short Description
--skip-existing Skip institutes with existing projects in output file
--force Force reprocess all institutes (overrides --skip-existing and --reprocess)
--reprocess Comma-separated institute IDs, short names, or full names to reprocess (e.g., '1,2,3' or 'UNU-WIDER,MERIT')
--output -o Custom output file path
--input -i Custom institutes.json path
--max-projects Max projects per institute (overrides env)
# Run with defaults
python extract.py

# Skip already processed institutes
python extract.py --skip-existing

# Force reprocess all
python extract.py --force

# Reprocess specific institutes by ID
python extract.py --reprocess 1,2,3

# Reprocess by short name (recommended)
python extract.py --reprocess UNU-WIDER,MERIT

# Reprocess by partial full name
python extract.py --reprocess "Biotechnology","Comparative Regional"

See EXTRACT_PROJECTS.html for complete extraction documentation.

Phase 2: Analysis Overview (s2p.py)

The analysis pipeline is institute-driven for maximum parallelism:

Input Sources

  • input/institutes.json - List of institutes to process
  • input/active_projects.json - Extracted project data from extract.py
  • input/sdgs.json - SDG keyword taxonomy for classification

Output Files

  • output/analyzed_projects.json - Combined results from all institutes
  • output/institutes/{name}.json - Per-institute result files
  • output/pipeline.log - Detailed execution logs

Parallel Processing Model

The system uses two-level parallelism for optimal throughput:

Level Concurrency Setting Description
Institute-Level MAX_INSTITUTE_CONCURRENCY (default: 2) Number of institutes processed simultaneously
Project-Level MAX_LLM_CONCURRENCY (default: 3) Max LLM calls per institute (SDG + Tech + Recommendation agents)

Example: With 2 institute concurrency and 3 LLM concurrency, you can process up to 6 LLM calls in parallel (2 institutes × 3 calls each).

Resume & Incremental Processing

The pipeline supports incremental processing to resume interrupted runs or add new institutes without reprocessing existing ones:

Feature Description
SKIP_EXISTING When enabled, skips institutes with existing output files in output/institutes/
Automatic Detection Checks if output/institutes/{institute_name}.json exists before processing
Smart Logging Shows "Loading existing result" or "Skipped (loaded N projects)" for cached institutes

Use Cases for SKIP_EXISTING

Resuming Interrupted Runs: If the pipeline crashes mid-run, set SKIP_EXISTING=true to continue from where it left off.
Adding New Institutes: When adding new institutes to input/institutes.json, only new institutes will be processed.
Cost Savings: Avoid re-processing completed institutes when re-running analysis.

# Skip existing output files (CLI flag)
python s2p.py --skip-existing

# Or using environment variable
SKIP_EXISTING=true python s2p.py

# Reprocess specific institutes by ID
python s2p.py --reprocess 1,2,3

# Reprocess by short name (recommended)
python s2p.py --reprocess UNU-WIDER,MERIT

CLI Arguments Reference

Argument Short Description
--skip-existing Skip institutes with existing output files
--force Force reprocess all institutes (overrides --skip-existing, --reprocess and env)
--reprocess Comma-separated institute IDs, short names, or full names to reprocess
--output -o Custom output file path
--institutes -i Custom institutes.json path
--projects -p Custom active_projects.json path

Agent Execution Flow

For each project, three agents run in parallel using asyncio.to_thread():

SDG Classification Agent

Multi-stage analysis: Keywords → Semantic LLM → Calibration with priority regions

  • Confidence scores per SDG
  • Specific target mapping (e.g., 7.2)
  • Justification chains
  • Impact pathways

Technology Analysis Agent

Assesses tech integration, maturity, and identifies gaps

  • Integration level (Low/Medium/High)
  • Maturity & Innovation scores
  • Technology categorization
  • Gap identification

Enhancement Recommendation Agent

Suggests emerging technologies with feasibility scoring

  • 3 high-impact recommendations
  • Feasibility score (ROI + Complexity)
  • Implementation roadmaps
  • Risk assessment

Rich Context Injection

Agents use project metadata to support more detailed analysis. When fields are missing, agents use graceful fallbacks (e.g., "Not specified") to ensure analysis continues:

Context Source Fields Used Benefits Fallback
Core Project Data title, description, location, partners Basic project understanding Empty string / empty list
Objectives & Period objectives, period, funding Timeline-aware recommendations "Not specified"
Additional Info keywords, themes, team, outputs Domain-specific analysis "Not specified" / "None"

Note: Not all projects have complete metadata. Agents are designed to work with partial information, using whatever context is available while providing "Not specified" placeholders for missing fields in their analysis prompts.

python s2p.py

Progress & Timing Output

During execution, the pipeline provides real-time feedback:

Progress: 5/20 institutes completed | Elapsed: 125.3s (2.1m) | ETA: 376.0s (6.3m)

On completion, detailed timing statistics are displayed:

========================================
TIMING STATISTICS
========================================
Total time: 1500.00s (25.00m)
Average per institute: 75.00s
Average per project: 3.50s
Projects per minute: 17.14

Maturity Level Assessment

The Maturity Score is a comprehensive multi-dimensional metric that evaluates technology readiness across four key dimensions. Each dimension is scored independently and then combined using a weighted formula.

Four-Dimensional Maturity Model

Dimension Weight Description Key Indicators
Technical Readiness 35% How developed and proven the technology is Production keywords, deployment status, tech count, temporal analysis
Operational Status 25% Current deployment and operational state Deployed/piloting/development keywords, scale indicators
Adoption Level 20% User adoption and scale of implementation User/beneficiary counts, partner count, geographic reach
Evidence Base 20% Validation and research backing Peer-reviewed publications, evaluations, measured results, outputs

Maturity Level Tiers

Score Range Level Characteristics Example Indicators
0.90-1.00 Production Fully deployed, proven at scale, operational in multiple sites "deployed", "operational", "live", "scaling", "proven", "fully operational"
0.70-0.89 Advanced Field-tested, validated pilots, ready for scale-up "field-tested", "validated", "beta", "commercialized", "implemented"
0.50-0.69 Intermediate Working prototypes, active pilots, showing promise "pilot", "prototype", "testing", "demonstration", "user testing"
0.25-0.49 Early Early prototypes, proof of concept, experimental "experimental", "proof of concept", "exploratory", "initial development"
0.00-0.24 Planning Concept phase, planning, research only "proposed", "planned", "roadmap", "design phase", "conceptual"

Scoring Algorithm Details

1. Technical Readiness (35% weight)

2. Operational Status (25% weight)

3. Adoption Level (20% weight)

4. Evidence Base (20% weight)

5. Quantitative Boosts (added to overall score)

Overall Score Formula

overall = (technical × 0.35) + (operational × 0.25) +
          (adoption × 0.20) + (evidence × 0.20) +
          quantitative_boost

Example Calculations

High Maturity Project (0.89): AI diagnostic platform deployed across 50 clinics, 100K+ patients processed, peer-reviewed study published.
→ Technical: 0.90 (AI proven), Operational: 0.95 (deployed at scale), Adoption: 0.80 (50 clinics), Evidence: 0.90 (published)
→ Overall: (0.90×0.35) + (0.95×0.25) + (0.80×0.20) + (0.90×0.20) = 0.89

Medium Maturity Project (0.57): Blockchain land registry pilot in 3 communities, prototype working, conference proceedings published.
→ Technical: 0.55 (prototype), Operational: 0.60 (pilot), Adoption: 0.40 (3 communities), Evidence: 0.65 (conference)
→ Overall: (0.55×0.35) + (0.60×0.25) + (0.40×0.20) + (0.65×0.20) = 0.57

Low Maturity Project (0.23): Exploring IoT sensors for water monitoring, researching options, seeking funding.
→ Technical: 0.25 (research), Operational: 0.20 (planning), Adoption: 0.10 (no users), Evidence: 0.30 (exploratory)
→ Overall: (0.25×0.35) + (0.20×0.25) + (0.10×0.20) + (0.30×0.20) = 0.23

Innovation Score Assessment

The Innovation Score is a multi-dimensional metric that evaluates how innovative and forward-looking a project's technology approach is. It considers not just what technologies are used, but how they're combined, the novelty of the approach, and the visionary nature of the project.

Four-Dimensional Innovation Model

Dimension Weight Description Key Indicators
Emerging Technology Usage 40% Presence of cutting-edge technologies AI/ML, blockchain, IoT, spatial tech, data analytics, cloud, mobile
Novelty Indicators 30% Language indicating innovative approaches Breakthrough, novel, pioneering, cutting-edge, proprietary, patented
Technology Combination 20% Combinatorial innovation from tech synergy Premium tech pairs (AI+IoT, AI+Blockchain, etc.), diversity bonus
Forward-Looking Language 10% Future-oriented, visionary statements Will transform, next-generation, paradigm shift, revolutionize

Innovation Score Tiers

Score Range Level Characteristics Example Indicators
0.80-1.00 Transformative Breakthrough innovation combining multiple cutting-edge technologies Multiple premium tech combos, novel language, strong vision
0.60-0.79 High Innovation Advanced use of emerging technologies with novel approaches AI/ML + spatial, cutting-edge language, some combinations
0.40-0.59 Moderate Innovation Good technology mix with some innovative elements Single emerging tech, moderate novelty, basic combinations
0.20-0.39 Low Innovation Conventional technology approach with limited novelty Commoditized tech only, standard approaches, minimal novelty
0.00-0.19 Minimal Innovation Traditional or no significant technology component No emerging tech, established/conventional language only

Scoring Algorithm Details

1. Emerging Technology Usage (40% weight)

2. Novelty Indicators (30% weight)

3. Technology Combination (20% weight)

4. Forward-Looking Language (10% weight)

Overall Score Formula

innovation = (tech_innovation × 0.40) + (novelty_score × 0.30) +
              (combinatorial_score × 0.20) + (forward_looking_score × 0.10)

Example Calculations

High Innovation Project (0.88): AI-powered satellite imagery analysis for disaster response. Uses cutting-edge deep learning with geospatial data. Breakthrough approach with proprietary algorithms.
→ Tech: 0.60 (AI 0.35 + Spatial 0.25), Novelty: 1.00 (breakthrough + cutting-edge + proprietary), Combination: 0.62 (AI+Spatial premium), Forward: 0.50 (will transform + next-generation)
→ Overall: (0.60×0.40) + (1.00×0.30) + (0.62×0.20) + (0.50×0.10) = 0.88

Medium Innovation Project (0.55): Mobile app for data collection in rural health clinics. Uses cloud storage and basic analytics. Emerging technology approach with modern UI.
→ Tech: 0.30 (Mobile 0.15 + Cloud 0.15), Novelty: 0.65 (emerging + modern + advanced), Combination: 0.30 (2 categories), Forward: 0.38 (scalable to + next phase)
→ Overall: (0.30×0.40) + (0.65×0.30) + (0.30×0.20) + (0.38×0.10) = 0.55

Low Innovation Project (0.18): Traditional database system for record management. Established technology with conventional methods. Standard approach.
→ Tech: 0.0 (no emerging tech), Novelty: 0.20 (established + conventional -0.20), Combination: 0.0 (1 category), Forward: 0.30 (minimal forward-looking)
→ Overall: (0.0×0.40) + (0.20×0.30) + (0.0×0.20) + (0.30×0.10) = 0.18

Technology Integration Level Assessment

The Integration Level is a sophisticated multi-dimensional metric that evaluates how deeply and effectively technologies are embedded within a project. Unlike simple technology counts, this assessment considers breadth, depth, interconnectedness, and architectural sophistication.

Four-Dimensional Integration Model

Dimension Weight Description Key Indicators
Breadth 25% Technology count and category diversity Number of technologies, diversity across categories (AI, IoT, cloud, etc.)
Depth 30% How deeply technologies are embedded in project implementation Tech in objectives, implementation language, multiple mentions across sections
Interconnectedness 25% Whether technologies form an integrated ecosystem Premium tech combinations, integration keywords, data flow indicators
Sophistication 20% Architectural complexity and modern development practices Advanced patterns (microservices, serverless), DevOps, scalability considerations

Integration Level Tiers

Score Range Level Characteristics Example Profile
0.75-1.00 High Multiple diverse technologies deeply integrated with sophisticated architecture 5+ technologies across 3+ categories, integrated ecosystem, advanced patterns
0.45-0.74 Medium Good technology mix with some integration and implementation depth 3+ technologies or 2+ categories, moderate interconnectedness
0.00-0.44 Low Limited technology use with minimal integration Few technologies, single category, shallow implementation

Scoring Algorithm Details

1. Breadth Score (25% weight)

2. Depth Score (30% weight)

3. Interconnectedness Score (25% weight)

4. Sophistication Score (20% weight)

Overall Score Formula

integration = (breadth × 0.25) + (depth × 0.30) +
               (interconnectedness × 0.25) + (sophistication × 0.20)

Example Calculations

High Integration Project (0.82): Distributed AI platform with microservices architecture integrating IoT sensors, blockchain for data integrity, and cloud infrastructure.
→ Breadth: 0.80 (6 techs, 4 categories), Depth: 0.90 (tech in objectives + implement language), Interconnectedness: 0.85 (AI+IoT+Blockchain combos + ecosystem), Sophistication: 0.75 (microservices + DevOps)
→ Overall: (0.80×0.25) + (0.90×0.30) + (0.85×0.25) + (0.75×0.20) = 0.82

Medium Integration Project (0.52): Mobile application using cloud storage and basic data analytics for health data collection.
→ Breadth: 0.55 (3 techs, 2 categories), Depth: 0.60 (use/utilize language), Interconnectedness: 0.40 (mobile+data combo), Sophistication: 0.45 (API-based + scalable)
→ Overall: (0.55×0.25) + (0.60×0.30) + (0.40×0.25) + (0.45×0.20) = 0.52

Low Integration Project (0.28): Basic website using standard HTML/CSS with minimal technology stack.
→ Breadth: 0.15 (1 tech, 1 category), Depth: 0.35 (superficial mention), Interconnectedness: 0.20 (no combinations), Sophistication: 0.35 (basic patterns)
→ Overall: (0.15×0.25) + (0.35×0.30) + (0.20×0.25) + (0.35×0.20) = 0.28

Hybrid Scoring: Rule-Based + LLM Semantic Analysis

The pipeline uses a hybrid approach that combines deterministic rule-based algorithms with LLM semantic understanding. This provides the best of both worlds: consistent, reproducible scoring with deep semantic nuance.

Two-Stage Scoring Process

1️⃣ Rule-Based Scoring

Deterministic algorithms compute baseline scores for Integration Level, Maturity, and Innovation using keyword matching, pattern detection, and weighted formulas.

2️⃣ LLM Semantic Enhancement

LLM receives rule-based scores as context, performs deep semantic analysis, and can refine scores based on nuanced understanding that rule-based methods might miss.

Dimension-Specific Approaches

Dimension Rule-Based Contribution LLM Enhancement Role Override Behavior
Integration Level Multi-dimensional analysis across breadth, depth, interconnectedness, and sophistication with 4 sub-methods Semantic understanding of how technologies meaningfully contribute to project goals and work together LLM can override - When deep semantic analysis suggests different integration than rule-based patterns indicate
Maturity Score Four-dimensional model (Technical 35%, Operational 25%, Adoption 20%, Evidence 20%) with context-weighted scoring Interprets nuanced deployment status, validates evidence quality, assesses real-world operational readiness LLM can override - When semantic context indicates different maturity than keyword patterns suggest
Innovation Score Four-dimensional model (Tech 40%, Novelty 30%, Combination 20%, Forward-Looking 10%) with weighted tech categories Detects genuine breakthrough approaches vs marketing buzzwords, assesses true novelty in domain context LLM can override - When semantic analysis reveals genuine innovation or identifies inflated claims

Benefits of Hybrid Approach

✓ Consistency

Rule-based scoring ensures same input always produces same baseline score, enabling comparability across projects.

✓ Semantic Nuance

LLM adds human-like understanding of context, intent, and genuine innovation that keyword matching cannot capture.

✓ Hallucination Mitigation

Rule-based baseline anchors LLM analysis, reducing risk of hallucination by providing grounded starting point.

✓ Graceful Degradation

If LLM fails or returns incomplete data, system falls back to rule-based scores, ensuring reliability.

How LLM Override Works

Step 1: Rule-based algorithms compute initial scores with full debug logging
Step 2: Initial scores are provided to LLM as context/suggested values in the prompt
Step 3: LLM can choose to:
    • Keep the score if it seems accurate based on semantic analysis
    • Adjust the score if deep understanding suggests different assessment
    • Return nothing (falls back to rule-based via data.get("key", rule_based_value))
Step 4: Final score uses LLM value if provided, otherwise rule-based value

LLM's Qualitative Contributions

Beyond quantitative scores, the LLM provides essential qualitative analysis that rule-based methods cannot generate:

Output Field Purpose Why LLM is Essential
analysis Narrative explanation of technology approach Synthesizes complex information into coherent human-readable summary
strengths What's done well technically Identifies specific technical merits and implementation quality
gaps Missing technologies or capabilities Recognizes what's missing based on domain knowledge and best practices
scalability_assessment Scaling potential and limitations Evaluates architectural decisions for scalability implications
interoperability_notes Integration with existing systems Assesses compatibility with standards and existing infrastructure
tech_recommendations Specific technology improvements Suggests relevant, actionable enhancements based on project context

Phase 3: Dashboard (dashboard20.py)

The dashboard generator creates an interactive HTML visualization with both Global Analysis and Institute View tabs:

Global Analysis Tab

Institute View Tab

Interactive Features

python dashboard20.py

Output (Anthropic): dashboard_advanced.html - Open in any modern web browser.