AEO Compliance Engine | Technical Documentation

Strategic Purpose

As detailed in "The New Architecture of Visibility," search engines have evolved into Answer Engines. The goal of this tool is not to check for keywords, but to verify AEO Compliance.

It audits for the three pillars of the new search economy:

Entity Verification: Does the site speak "Schema" so AI models understand the concepts?
Citation Stacking: Is content formatted (lists/tables) for easy AI synthesis?
Digital Trust (E-E-A-T): Are authors and sources explicitly defined to prevent hallucinations?

1. The Schema Extraction (Entity Protocol)

The Challenge

Modern AI crawlers (like Gemini or Copilot) act as "Knowledge Acquisition" agents. If they cannot parse your structured data, they cannot "trust" your content. Standard text converters often strip this hidden metadata.

The Solution

The script implements a dedicated extract_schema_tags function. It bypasses the visual rendering layer to perform a "deep read" of the raw HTML, ensuring that JSON-LD (the language of entities) is captured and validated.

def extract_schema_tags(html: str) -> str:
    # Captures JSON-LD to verify if "Entities" are properly defined
    pattern = r'<script\s+type=["\']application/ld\+json["\'][^>]*>(.*?)</script>'
    # This data is fed directly to the LLM judge
    matches = re.findall(pattern, html, re.DOTALL | re.IGNORECASE)
    return matches
                

2. The LLM Judge (Qualitative Analysis)

Why not use standard validators?

Traditional SEO tools check if code is valid. This tool checks if code is meaningful. We use an LLM to simulate how a search engine's AI actually interprets your page.

The Metrics

Schema.org Status: Is the JSON-LD merely present, or does it actually describe the content?
GEO Formatting: (Generative Engine Optimization) - Does the text use bullets, headers, and tables that allow an AI to summarize it easily?
E-E-A-T: Does the content explicitly link to human experts, satisfying the "Trust" requirement?

system_prompt = """
You are an AI SEO Auditor...
metrics": [
  {"name": "Schema.org", "details": "Is JSON-LD present?"},
  {"name": "GEO Formatting", "details": "Is it optimized for synthesis?"},
  {"name": "E-E-A-T", "details": "Is authorship explicit?"}
]
"""
                

3. Crawler Configuration (Resilience)

In the era of JavaScript-heavy frameworks (Next.js, React), a simple HTTP request is insufficient. This tool uses a headless browser with a 3-second hydration delay. This ensures that dynamic content, often where the most valuable "Answers" live, is fully loaded before analysis begins.

4. Data Output (Actionable Intelligence)

The script generates a seo_report.csv. This file is designed for Webmasters to prioritize fixes based on Citation Probability rather than just ranking position. It highlights the "Top Recommendation" to make the page "machine-readable."

Sample CSV Output showing Score and Schema Status columns

Figure 1: Sample structure of the generated CSV report, highlighting the AI Score and Schema Status columns.

5. Open Source & Availability

This AEO Auditor is now available as open-source software under the MIT License. The complete source code, documentation, and installation instructions are publicly available for use and contribution.

Get the Tool

GitHub Repository: https://github.com/ngstcf/ai-seo-auditor
Author:Ng Chong (@ngstcf)
License: MIT License — free for commercial and personal use

Key Features

Schema markup (JSON-LD) detection and analysis
llms.txt presence checking
AI-friendly content structure evaluation
E-E-A-T signal assessment
Multi-page crawling with configurable depth
CSV export for historical tracking

Reference Documentation

For more on SEO in the AI era, see the accompanying article: "SEO for the AI Era: A 2025 Quick Guide"