What is a GEO checklist?

A GEO checklist is a structured list of the signals and configuration items that determine whether AI answer engines like ChatGPT, Perplexity, Claude, and Google AI Overviews can find, read, and cite your website. It covers technical crawlability (robots.txt, sitemap, rendering), structured data (JSON-LD schema), content signals (title clarity, FAQ sections, original data), and entity authority (brand consistency across the web).

What are the most important GEO checklist items?

The highest-impact GEO checklist items are: (1) Allow GPTBot, PerplexityBot, ClaudeBot, and Google-Extended in robots.txt. (2) Add Sitemap: directive to robots.txt. (3) Verify AI crawlers receive 600+ words of raw HTML — run curl -A 'GPTBot' on your homepage. (4) Add FAQPage and Organization JSON-LD schema in the HTML head. (5) Deploy a static crawler profile page with full schema markup. (6) Publish original data and statistics. (7) Ensure consistent brand name and description across all web surfaces.

How do I check if my site passes the GEO crawlability test?

Run this command: curl -A 'GPTBot' https://yourdomain.com — the HTML returned is exactly what GPTBot reads. Count the words. Under 600 means you have a partial render problem. Also visit https://yourdomain.com/robots.txt and check for explicit User-agent: GPTBot / Allow: / stanzas. Then check https://yourdomain.com/sitemap.xml exists and is referenced via a Sitemap: directive in robots.txt.

What is the difference between the AI content score and crawl score in GEO?

The AI content score measures what AI crawlers actually read and understand — title clarity, meta description, hero text, paragraph density, heading structure, original statistics, FAQ headings, author signals, and OG completeness. The crawl score measures technical accessibility — schema placement, CSR severity, robots.txt permissions, sitemap, llms.txt, Common Crawl indexing, canonical URL, and hreflang. Both scores must be high for strong AI visibility. A high content score with a low crawl score means AI has good content to read but can't find or access your site.

How often should I run a GEO audit?

Run a full GEO audit whenever you make significant changes to your homepage, navigation structure, meta tags, or robots.txt. For ongoing monitoring, Brandioz's Track Changes feature sends score updates by email when your GEO scores change. For retrieval-first platforms like Perplexity, freshness matters — aim to publish or update content at least monthly to stay within the 30-day freshness window that maximizes citation rates.

GEO Checklist 2026 — Complete AI Visibility Checklist

How this checklist is structured

Brandioz measures AI visibility across two independent scores. This checklist covers both — the 12 AI content signals and the 15 crawl signals — plus entity authority and platform-specific items that don't fit neatly into either score but directly affect citation rates.

AI Content Score

12 signals

What AI crawlers actually read and understand. Title, meta, hero, paragraph density, headings, original data, FAQ, author, freshness, tables, OG tags.

Crawl Score

15 signals

Whether AI crawlers can find, access, and parse your site. Schema, CSR severity, robots.txt, sitemap, llms.txt, Common Crawl, canonical, hreflang.

01 Crawlability 6 items

High impact

Allow all major AI crawlers in robots.txt

GPTBot, PerplexityBot, ClaudeBot, Google-Extended, and CCBot each need an explicit User-agent stanza with Allow: /. A blanket User-agent: * / Disallow: / rule from staging blocks all of them.

Test: visit yourdomain.com/robots.txt and look for each crawler by name.

High impact

Add Sitemap: directive to robots.txt

Add Sitemap: https://yourdomain.com/sitemap.xml as the last line of robots.txt. AI crawlers read this to auto-discover your sitemap without relying on Google Search Console submission.

Test: open robots.txt and check the last few lines for a Sitemap: entry.

High impact

Verify 600+ words of raw HTML on homepage

The partial render problem — AI crawlers receiving fewer than 600 words from a JS-rendered site — is the single most common GEO failure. Sites built on React, Vue, or Angular without SSR commonly fail this.

Test: curl -A "GPTBot" https://yourdomain.com | wc -w — under 600 means fix needed.

High impact

Deploy a static crawler profile page

A static HTML page at /crawler-profile.html with full JSON-LD schema — noindex so it doesn't appear in Google, but every AI crawler that finds it via your sitemap gets a complete, JS-free picture of your brand. Solves the partial render problem for brands that can't immediately implement SSR.

Add the file to /public/ in Next.js or your web root. Add its URL to sitemap.xml with a recent lastmod date.

Medium impact

Sitemap.xml exists and is complete

All key pages — homepage, blog posts, about, solutions, methodology — should be in sitemap.xml with accurate lastmod dates. Retrieval-first platforms like Perplexity use lastmod as a freshness signal.

Test: visit yourdomain.com/sitemap.xml and verify your key pages are listed.

Medium impact

Verify Common Crawl indexing (CCBot allowed)

Common Crawl feeds training data to OpenAI, Anthropic, and multiple other AI companies simultaneously. Allowing CCBot is one of the highest-leverage moves for long-term parametric AI citation rates.

Add User-agent: CCBot / Allow: / to robots.txt. Check Common Crawl index API for your domain.

02 Structured Data (Schema) 5 items

High impact

FAQPage schema in HTML head

FAQPage JSON-LD is the highest-citation-rate schema type. AI engines extract question-answer pairs verbatim for direct answers. Must be in <head> — body-only schema is often ignored by AI crawlers. Include at minimum 3–5 question-answer pairs covering what you are, who you serve, and how you work.

Test: curl https://yourdomain.com | grep -o 'FAQPage' — should return a hit.

High impact

Organization schema in HTML head

Organization JSON-LD gives AI crawlers a machine-readable identity anchor — name, url, description, and optionally logo, foundingDate, and knowsAbout. This is the primary entity signal that feeds parametric-first AI platforms. Must be in <head>.

Test: curl https://yourdomain.com | grep -o 'Organization' — verify it appears in the head section, not body.

Medium impact

SoftwareApplication or Product schema (if applicable)

For SaaS platforms, tools, and apps — SoftwareApplication schema with applicationCategory, featureList, and offers helps AI systems categorize and describe your product accurately. Syncs with Organization schema to give AI a complete picture.

Add only if your site is a software tool, app, or product. Category-appropriate schema outperforms generic WebSite schema for AI extraction.

Medium impact

WebPage schema with speakable selector

WebPage JSON-LD with a speakable cssSelector pointing to your key content sections tells AI systems exactly which parts of your page are most citable. Target your H1, introductory summary, and FAQ answers.

Add speakable: { @type: SpeakableSpecification, cssSelector: [".your-summary", "h1", ".faq-answers"] } to your WebPage schema.

Medium impact

BreadcrumbList schema on key pages

BreadcrumbList schema gives AI crawlers structural context — where this page sits in your site hierarchy. Particularly useful for blog posts and nested pages. Aids extraction confidence.

Add on all blog posts and deep pages: position 1 = homepage, position 2 = section, position 3 = current page.

03 Content Signals 8 items

High impact

Title tag: 6+ descriptive words

The single most weighted AI content signal (10/100). "Welcome to Acme" scores near zero. "AI-powered GEO platform for website AI visibility" scores full. The title is the first thing AI reads and anchors everything else.

Count the words. If under 6, rewrite to include what you do, who for, and the key differentiator.

High impact

Meta description: 12+ specific words

For JS-heavy sites, the meta description may be the only text AI reliably reads — it must work standalone. Write it as a one-sentence company profile: who you are, what you do, who you serve, what problem you solve. Avoid vague superlatives.

Test: read your meta description to a stranger with no context. Can they explain what you do in one sentence? If not, rewrite.

High impact

Hero section: 40+ words in static HTML

Hero text is the highest-weighted content signal (12/100). If your hero is JS-rendered, AI sees zero here. Test with curl -A "GPTBot" to verify hero text appears in raw HTML. Aim for 40+ words covering what you do, who it's for, and a concrete value claim.

If hero is JS-rendered: implement SSR, or ensure the meta description and title carry the full message alone.

High impact

Original statistics: 5+ quantified claims

Original data is the single highest-leverage citation signal across all AI platforms. Publish numbers nobody else has — your own benchmarks, analysis, or research. Even small-scale data ("we analyzed 50 sites and found X") gets cited because AI can't get it elsewhere.

Count the specific numbers on your page (percentages, counts, measurements). Under 5 means add more data-backed claims.

Medium impact

FAQ headings: 3+ question-format H2/H3s

Question-format headings (H2/H3s that start with "What", "How", "Why", "Does", "Can") map directly to how AI synthesizes answers. Pages with 3+ FAQ headings get substantially more AI extraction attempts than equivalent pages with statement headings.

Audit your page headings. Convert at least 3 statement headings to question format on key pages.

Medium impact

Heading hierarchy: H1 → H2 → H3

Clear heading hierarchy (one H1, multiple H2s, H3s under H2s) signals organized, authoritative content. Brandioz measures H2 count per 100 words of paragraph content — sweet spot is 1–4 H2s per 100 words.

Check for multiple H1s (common on JS sites), skipped heading levels, and heading count ratio.

Medium impact

Author and freshness signals

Author signals (bylines, author schema, author page links) and freshness signals (<time> tags, article dates) each contribute 4/100 to the AI content score. Both are binary checks — present or not. Freshness is particularly important for retrieval-first platforms.

Add a byline and <time datetime="YYYY-MM-DD"> to all blog posts. Add author schema to your site.

Medium impact

OG tags: complete set

og:title, og:description, og:type, og:url, og:image — all five. OG completeness contributes 10/100 to the AI content score. These are static HTML tags parsed regardless of JS rendering status, making them particularly important for client-side rendered sites.

Check with a social preview tool or curl and look for og: meta tags in the raw HTML head.

04 Discovery & Navigation 3 items

Medium impact

llms.txt at domain root

A plain text file at /llms.txt listing your most important pages for AI navigation. Tells AI systems where to start rather than making them explore hundreds of pages. Low effort, low risk, increasingly adopted. Think of it as a sitemap for AI systems rather than search engines.

Create /public/llms.txt with sections for core product pages, blog, reference pages, and key facts about your brand.

Lower impact

Canonical URL on all pages

Canonical tags prevent AI crawlers from indexing duplicate content (www vs non-www, trailing slash variants, parameter URLs). Contributes to crawl score and prevents dilution of AI citation signals across URL variants.

Verify <link rel="canonical" href="https://yourdomain.com/page"> appears in the head of all indexable pages.

Lower impact

Hreflang for multilingual sites

If your site targets multiple languages or regions, hreflang attributes help AI crawlers understand which version of a page to cite for which audience. Contributes 3/100 to the crawl score.

Only relevant for multilingual sites. Skip if you serve one language/region.

05 Entity Authority 5 items

High impact (parametric AI)

Consistent brand name and description everywhere

Your brand name, category description, and key claims must be identical across your website, social profiles, press mentions, and third-party directories. Inconsistency creates uncertainty in AI systems — uncertain sources get skipped. This is the foundational entity authority requirement.

Audit your LinkedIn About, Twitter/X bio, Crunchbase description, and website meta description for consistency.

High impact (parametric AI)

Third-party brand mentions on authoritative sites

Being discussed on Reddit, industry blogs, press outlets, and authoritative directories builds the training data presence that feeds parametric-first AI citation rates. This is what ChatGPT uses when web browsing is off — it cites from memory, and memory comes from training data.

Track brand mentions. Pursue press coverage, community discussions, and directory listings. Each mention in training data is a future parametric citation.

Medium impact

Wikipedia presence (if applicable)

Wikipedia is cited by parametric-first platforms at 4.8% of citations — higher than most domains. If your brand is notable enough for a Wikipedia article, it is one of the strongest single entity authority signals available.

Check if a Wikipedia article exists. If not and you meet notability guidelines, create one with accurate, sourced content.

Medium impact

Publishing cadence for retrieval-first platforms

Retrieval-first platforms like Perplexity heavily weight freshness. Content updated within 30 days gets substantially more citations than older content. A publishing cadence (weekly or bi-weekly) consistently outperforms a static site even if the static content is higher quality.

Aim for at minimum one new or substantially updated piece of content per month. Update lastmod dates in sitemap.xml when content changes.

Medium impact

Answer-first content formatting

Content that leads with the direct answer — before context, background, or caveats — gets cited significantly more often. This is contrary to traditional editorial structure but essential for GEO. The opening paragraph of every page and post should directly answer the question the title implies.

Audit your blog posts. Does the first paragraph answer the headline question directly? If not, restructure to answer-first.

Quick test commands

Run these before and after making GEO changes to measure impact:

Check what GPTBot reads

curl -A "GPTBot" https://yourdomain.com | wc -w # Should return 600+. Under 600 = partial render problem.

Check for JSON-LD schema

curl https://yourdomain.com | grep -o 'application/ld+json' # Should return at least 2 hits (Organization + FAQPage minimum)

Check robots.txt crawler permissions

curl https://yourdomain.com/robots.txt | grep -E "GPTBot|PerplexityBot|ClaudeBot" # Should return User-agent lines for each crawler

Check llms.txt

curl https://yourdomain.com/llms.txt # Should return your structured navigation file

Scoring your checklist

Items passing	Estimated AI readability	Priority action
0–5 items	0–30 / 100	Start with crawlability — robots.txt, sitemap, partial render fix
6–12 items	30–55 / 100	Add FAQPage + Organization schema, fix title and meta description
13–19 items	55–75 / 100	Add original statistics, improve FAQ headings, deploy llms.txt
20–27 items	75–100 / 100	Focus on entity authority and publishing cadence

Frequently asked questions

How often should I run a GEO audit?: After any significant change to your homepage, meta tags, robots.txt, or site structure. For ongoing monitoring, use Brandioz's Track Changes feature — it sends email alerts when your AI scores change meaningfully. For Perplexity specifically, check freshness monthly.
Which checklist item has the biggest immediate impact?: For most sites: verifying that AI crawlers are allowed in robots.txt and that your homepage returns 600+ words of raw HTML. These two items fix the most common GEO failure mode — being technically blocked or invisible to AI crawlers despite having excellent content.
Can I automate this checklist?: Yes. Brandioz runs all 27 signal checks automatically and returns a scored breakdown with specific fixes ranked by impact. Run it at brandioz.com/dashboard — takes 10 seconds, free tier available.
Does fixing GEO hurt Google rankings?: No. GEO fixes are either neutral for Google (llms.txt, crawler profile pages) or actively beneficial for both (clear headings, FAQ sections, original data, fast server responses, complete meta tags). The only potential conflict is client-side rendering — but the GEO fix (SSR or static pages) is also a Google PageSpeed improvement.