Methodology
ScoreGeo analyzes your URL server-side: we fetch your raw HTML, your /robots.txt and your /llms.txt, then evaluate 9 weighted criteria totaling 100 points.
The 9 criteria and their weights
- 20
Server rendering
We fetch the raw HTML without running JavaScript. We count visible words: a SPA whose content appears only after hydration is nearly invisible to LLMs.
- 20
Structured data (JSON-LD)
We extract every <script type="application/ld+json"> block (including recursive @graph). Bonus for types most exploited by AI: FAQPage, HowTo, LocalBusiness, Product.
- 10
Answer-first structure
We look for the first meaningful paragraph in <main>/<article>/after the H1. Sweet spot: 15-80 words, ≤ 600 characters. This is the format LLMs extract as a direct citation.
- 10
Content freshness
JSON-LD dateModified/datePublished + <time datetime> tags. A page less than 6 months old is valued, > 36 months penalized.
- 10
Semantic HTML
A single <h1>, coherent heading hierarchy (no jump h2→h4), a single <main>, presence of <article> or <section>.
- 10
SEO metadata
title (30-65 chars), meta description (70-160 chars), canonical, OpenGraph (title/description/image/type), Twitter Card. These signals also serve LLMs to summarize.
- 10
AI crawler access
Parsing of /robots.txt. We check access for the 6 major AI crawlers: GPTBot, ChatGPT-User, Google-Extended (Gemini & AI Overviews), ClaudeBot, PerplexityBot, CCBot.
- 5
Entity richness
Counters for numbers, lists, tables, definition lists, FAQ-style questions ("How…?", "Why…?"). The denser the content in entities, the more citable.
- 5
llms.txt file
Presence of a /llms.txt file at root (emerging standard proposed by Jeremy Howard) that points LLMs to your priority Markdown content.
Score calculation
Each criterion returns a sub-score normalized between 0 and 1, multiplied by its weight. The sum gives the global score on 100. A severity (passed / to improve / failed) is derived from the ratio obtained:
- ≥ 0.85 → passed
- ≥ 0.40 → to improve
- < 0.40 → failed
Global rating
- Excellent≥ 80
- Good60-79
- Average40-59
- At risk< 40
Acknowledged limits
- We read the HTML rendered without JavaScript — this is deliberate, this is what ChatGPT/Claude/Perplexity see during their crawl.
- Heuristics are fast and readable, not an absolute truth. A site can technically have a poor score and still be widely cited (rare but possible).
- The "are you cited by AI?" module measures actual brand citation by Claude — optional add-on activated by entering brand + sector + city, requires a server-side Anthropic key.