Geo Scoring Explained: The 13-Criteria Methodology Behind Modern AI Citation Audits
Geo scoring is the practice of measuring how citable a website is by generative AI engines, expressed as a single rating out of 100 built from a weighted basket of technical, editorial and off-page criteria. It plays the same role for AI search that domain authority or PageSpeed play for classic SEO: a leading indicator you can track, benchmark and act on. This guide breaks down the exact 13-criteria methodology behind a modern geo score, the empirical weights, and the studies they come from.
Why geo scoring exists: the problem LLMs created
Classic SEO scoring (domain authority, PageRank, Lighthouse) was built for one user journey: type a query, get ten blue links, click one. That model is breaking. Per Sistrix (April 2026), 58% of Google queries in France now trigger an AI Overview that synthesizes an answer before any blue link. ChatGPT Search reports 600M MAU (OpenAI, late 2025), Perplexity exceeds 30M. For a growing share of queries, the first touchpoint with a brand is an AI paragraph that either cites you or does not.
AI engines pick sources using rules that overlap only partially with SEO. Vercel + MERJ confirmed across 500M+ GPTBot fetches that GPTBot, ClaudeBot and PerplexityBot never execute JavaScript: a SEO-optimized React SPA can score 90 on Lighthouse and be invisible to ChatGPT. Conversely, a static page with strong brand mentions on Wikipedia and Reddit can be cited despite mediocre Lighthouse. Two different games. Geo scoring fills the gap with signals empirical studies tie to AI citations.
The 13 weighted criteria of geo scoring
The methodology below was recalibrated in May 2026 after the Ahrefs controlled JSON-LD study forced a downward revision of structured-data weights and the Ahrefs 75k-brands study elevated off-page brand mentions to a primary signal. Total: 100 points.
1. Server rendering — 15 points
The heaviest criterion. GPTBot, ClaudeBot, PerplexityBot and Google-Extended do not execute JavaScript — Vercel + MERJ verified zero JS execution across 500M+ GPTBot fetches. A React, Vue or Angular SPA without SSR appears to these crawlers as an empty <div id="root"></div>. Only Googlebot performs a two-phase render. Three of the four major LLM crawlers are blocking on this single point, hence the dominant weight.
2. Answer-first structure — 13 points
ALM Corp's 2025 analysis of 1.2M ChatGPT responses and 18k citations found 44.2% of citations come from the first third of the page. The Princeton, Allen Institute and Georgia Tech paper ("GEO: Generative Engine Optimization", KDD 2024) tested nine optimization strategies on a 10k-query benchmark and reported +27.8% generation-visibility for the "Quotation Addition" treatment. A question-shaped H2 followed by a 40-75 word standalone answer is the strongest single editorial predictor of AI citation.
3. Off-page brand mentions — 11 points
Ahrefs analyzed 75k brands in December 2025 and reported a 0.664 correlation between branded web mentions and ChatGPT visibility — roughly three times more predictive than backlinks (0.218). YouTube mentions: 0.737. Now treated as a primary signal. Scoring queries Claude's parametric knowledge of a brand across third-party sources rather than parsing the site's HTML.
4. Content freshness — 10 points
Ahrefs's analysis of 17M citations across seven AI platforms found cited pages are 25.7% fresher than the Google organic average. Practical threshold: pages with a visible dateModified under 12 months are favored; pages older than 36 months are penalized. The score rewards a recent last-updated date in both JSON-LD and visible byline.
5. Entity and number richness — 9 points
KDD 2024 reported +25.9% generation visibility for "Statistics Addition" — sourced numbers, percentages and dates added to existing copy. GenOptima 2025: pages with 3+ statistics per 300 words are cited 2.1× more often. The score counts numbers, percentages, amounts, dates, lists and tables in the rendered HTML.
6. Presence on AI-cited sources — 9 points
Semrush analyzed 150k LLM citations in June 2025 and identified three dominant sources: Reddit (40.1%), Wikipedia (26.3%) and YouTube (23.5%). Profound's parallel study on 680M citations placed Wikipedia in top-10 sources for 47.9% of ChatGPT answers. Scoring probes brand presence on these three platforms via Claude's knowledge, not by scraping the platforms themselves.
7. AI crawler access — 8 points
A binary blocker. The score parses /robots.txt against seven crawlers: GPTBot, OAI-SearchBot, ChatGPT-User, Google-Extended, ClaudeBot, PerplexityBot and CCBot. A single blocked bot makes the site invisible to that LLM. Many sites silently inherit Disallow: / clauses from CMS templates or security plugins.
8. E-E-A-T author signals — 7 points
Contently 2026: +40% citation rate for pages with visible author bio. Wellows: 96% of AI Overviews citations come from sources with explicit E-E-A-T signals. The score looks for <meta name="author">, Person JSON-LD, byline patterns and credentials in visible text.
9. Semantic HTML — 5 points
Single <h1>, coherent heading hierarchy, structured lists and tables, content inside <main>. Useful but partially redundant with answer-first scoring — weight cut from 10 to 5 in the May 2026 recalibration.
10. Structured data / JSON-LD — 4 points
Biggest cut of the May 2026 update, from 14 to 4. Ahrefs's controlled study (March 2026, 1,885 pages) found no statistically significant lift from adding schema markup on ChatGPT or AI Mode, and a slight negative effect (-12 citations/day) on AI Overviews. The historical 3× correlation was selection bias — well-marked sites also tend to be well-resourced editorially. JSON-LD is kept because it still helps Google's SERP features.
11. SEO metadata — 4 points
Title (30-65 chars), meta description (70-160 chars), canonical, OpenGraph, Twitter Card. Important for Google and social sharing, but LLMs mostly read the body. Low weight by design.
12. Original first-party data — 4 points
Yext's 6.8M-citations study: 86% of AI citations come from brand-managed sources. Am I Cited 2026: +30-40% visibility for pages featuring original research. The score measures density of proprietary numbers (customer base sizes, measured results, named case studies) over recycled industry stats.
13. llms.txt file — 1 point
Symbolic weight. Gary Illyes (Google, July 24 2025) stated publicly that Google does not support llms.txt. Anthropic and Perplexity acknowledge fetching it at the margin. Present = micro-bonus, absent = no real penalty.
How each criterion is measured
All on-page criteria are measured against the raw HTML returned by a single GET request, no JavaScript execution. It's exactly what GPTBot, ClaudeBot and PerplexityBot see. Measuring rendered DOM would inflate scores beyond what LLMs actually ingest.
Server rendering returns 1.0 if main content is in the raw HTML, near 0.0 if only an empty SPA shell. Answer-first scoring locates the first paragraph after the H1, rewards a 40-75 word standalone statement with no marketing filler. Off-page brand mentions query Claude with a structured prompt about the brand's footprint on Wikipedia, Reddit, YouTube and press — the only criterion that doesn't parse the URL itself.
Freshness reads <meta property="article:modified_time">, dateModified in Article JSON-LD, and any visible "Updated on" pattern. Entity richness counts numeric tokens, percentages, ISO dates and list items normalized by word count. Crawler access parses /robots.txt against seven crawlers. E-E-A-T scoring looks for <meta name="author">, Person JSON-LD and byline patterns near the H1. JSON-LD scoring validates @type against an allowlist and rewards required-field completeness. Original data is flagged by a heuristic distinguishing first-person numbers ("our customers", "we measured") from externally cited numbers. Full details at scoregeo.ai/methodology.
Geo scoring vs SEO scoring: 5 fundamental differences
Both produce a number out of 100, but they measure different things and rely on different signals. The 5 differences below explain why a strong SEO score doesn't automatically imply a strong geo score.
Difference 1: JavaScript execution. SEO tools render JavaScript (Googlebot two-phase rendering since 2019, Lighthouse runs in headless Chrome). Geo scoring deliberately does not, because GPTBot, ClaudeBot and PerplexityBot don't. A site scoring 92 on Lighthouse can score 35 on a geo audit if it's a non-SSR SPA.
Difference 2: weighting of structured data. SEO scoring still rewards JSON-LD generously for rich snippets. Geo scoring, post-Ahrefs March 2026, weights it at only 4 points. The two systems diverged on this exact criterion in early 2026.
Difference 3: editorial format. SEO barely measures intro format; Google tolerates storytelling. Geo scoring weights answer-first at 13 points because LLMs prefer pages that answer in the first 75 words. The largest gap.
Difference 4: off-page signals. SEO scoring relies on backlinks and domain authority (Ahrefs DR, Moz DA). Geo scoring relies on brand mentions across Wikipedia, Reddit, YouTube and press — three times more predictive of AI citation than backlinks per Ahrefs's 75k-brands study.
Difference 5: primary KPI. SEO scoring tracks SERP position and organic clicks. Geo scoring tracks AI citation rate, measured by sampling ChatGPT, Claude, Perplexity, Gemini and AI Overviews on target queries. Different dashboards (Search Console vs ScoreGeo).
Calculating your Geo Score
Each of the 13 criteria returns a normalized sub-score between 0 and 1, multiplied by its weight; the sum gives the global rating on 100. A severity tier is derived from the ratio (≥ 0.85 passed, ≥ 0.40 to improve, < 0.40 failed). Global tiers: 80+ Excellent, 60-79 Good, 40-59 Average, < 40 At risk.
Two takeaways. The per-criterion breakdown matters more than the aggregate — two sites at 62 can have opposite action plans. And the criteria aren't independent: fixing server rendering often unlocks gains on semantic HTML and answer-first because the markup becomes visible to the scorer.
ScoreGeo runs this exact methodology on any URL in about 6 seconds, free. The analysis returns the global score, the 13 sub-scores, the severity tiers and the top 3 prioritized fixes. The full methodology is documented openly at scoregeo.ai/methodology — no black box. For background on why GEO matters beyond the score, the what-is-geo primer covers the underlying mechanics.
Frequently asked questions
What is geo scoring?
Geo scoring is the practice of measuring how well a website is positioned to be cited by generative AI engines (ChatGPT, Claude, Perplexity, Gemini, AI Overviews) and expressing it as a single rating, typically out of 100. The score aggregates weighted technical, editorial and off-page criteria calibrated on empirical AI-citation studies.
What is a technical geo score?
A technical geo score is the subset of a geo audit that focuses on machine-readability criteria: server rendering, semantic HTML, JSON-LD, robots.txt access for AI crawlers, llms.txt and metadata. In the ScoreGeo methodology, the technical block represents about 35 of the 100 points. Fixable in days to weeks because the work is purely on-site.
How is a geo score calculated?
Each of the 13 criteria returns a normalized sub-score between 0 and 1, multiplied by its weight; the sum gives the global rating on 100. Weights are calibrated on empirical studies (Ahrefs 75k brands, Princeton KDD 2024, Vercel + MERJ 500M fetches, Semrush 150k citations). The full breakdown is public at scoregeo.ai/methodology.
Is there a free geo score calculator?
Yes. ScoreGeo runs the 13-criteria audit on any public URL for free and returns the global score, the per-criterion breakdown and the prioritized fixes in about 6 seconds. No account required.
What's a good geo score in 2026?
80+ is Excellent, 60-79 Good, 40-59 Average, < 40 At risk. Context matters: a 68 in competitive B2B SaaS often beats a 78 in a low-competition vertical. Above 80, marginal cost rises sharply — most sites gain by capping between 80 and 85 and reinvesting elsewhere.
Why does geo scoring weight JSON-LD so low now?
Because the Ahrefs controlled study (March 2026, 1,885 pages) found no statistically significant lift from adding schema markup on ChatGPT or AI Mode, and a small negative effect on AI Overviews. The historical 3× correlation was selection bias. The methodology was recalibrated from 14 to 4 points in May 2026.