Brandioz / Reference

GEO Checklist 2026 — Complete AI Visibility Checklist

27 signals across 5 categories. Check each one to know exactly where your AI visibility gaps are and what to fix first. Updated June 2026.

How this checklist is structured

Brandioz measures AI visibility across two independent scores. This checklist covers both — the 12 AI content signals and the 15 crawl signals — plus entity authority and platform-specific items that don't fit neatly into either score but directly affect citation rates.

AI Content Score
12 signals
What AI crawlers actually read and understand. Title, meta, hero, paragraph density, headings, original data, FAQ, author, freshness, tables, OG tags.
Crawl Score
15 signals
Whether AI crawlers can find, access, and parse your site. Schema, CSR severity, robots.txt, sitemap, llms.txt, Common Crawl, canonical, hreflang.
01 Crawlability 6 items
High impact
Allow all major AI crawlers in robots.txt
GPTBot, PerplexityBot, ClaudeBot, Google-Extended, and CCBot each need an explicit User-agent stanza with Allow: /. A blanket User-agent: * / Disallow: / rule from staging blocks all of them.
Test: visit yourdomain.com/robots.txt and look for each crawler by name.
High impact
Add Sitemap: directive to robots.txt
Add Sitemap: https://yourdomain.com/sitemap.xml as the last line of robots.txt. AI crawlers read this to auto-discover your sitemap without relying on Google Search Console submission.
Test: open robots.txt and check the last few lines for a Sitemap: entry.
High impact
Verify 600+ words of raw HTML on homepage
The partial render problem — AI crawlers receiving fewer than 600 words from a JS-rendered site — is the single most common GEO failure. Sites built on React, Vue, or Angular without SSR commonly fail this.
Test: curl -A "GPTBot" https://yourdomain.com | wc -w — under 600 means fix needed.
High impact
Deploy a static crawler profile page
A static HTML page at /crawler-profile.html with full JSON-LD schema — noindex so it doesn't appear in Google, but every AI crawler that finds it via your sitemap gets a complete, JS-free picture of your brand. Solves the partial render problem for brands that can't immediately implement SSR.
Add the file to /public/ in Next.js or your web root. Add its URL to sitemap.xml with a recent lastmod date.
Medium impact
Sitemap.xml exists and is complete
All key pages — homepage, blog posts, about, solutions, methodology — should be in sitemap.xml with accurate lastmod dates. Retrieval-first platforms like Perplexity use lastmod as a freshness signal.
Test: visit yourdomain.com/sitemap.xml and verify your key pages are listed.
Medium impact
Verify Common Crawl indexing (CCBot allowed)
Common Crawl feeds training data to OpenAI, Anthropic, and multiple other AI companies simultaneously. Allowing CCBot is one of the highest-leverage moves for long-term parametric AI citation rates.
Add User-agent: CCBot / Allow: / to robots.txt. Check Common Crawl index API for your domain.
02 Structured Data (Schema) 5 items
High impact
FAQPage schema in HTML head
FAQPage JSON-LD is the highest-citation-rate schema type. AI engines extract question-answer pairs verbatim for direct answers. Must be in <head> — body-only schema is often ignored by AI crawlers. Include at minimum 3–5 question-answer pairs covering what you are, who you serve, and how you work.
Test: curl https://yourdomain.com | grep -o 'FAQPage' — should return a hit.
High impact
Organization schema in HTML head
Organization JSON-LD gives AI crawlers a machine-readable identity anchor — name, url, description, and optionally logo, foundingDate, and knowsAbout. This is the primary entity signal that feeds parametric-first AI platforms. Must be in <head>.
Test: curl https://yourdomain.com | grep -o 'Organization' — verify it appears in the head section, not body.
Medium impact
SoftwareApplication or Product schema (if applicable)
For SaaS platforms, tools, and apps — SoftwareApplication schema with applicationCategory, featureList, and offers helps AI systems categorize and describe your product accurately. Syncs with Organization schema to give AI a complete picture.
Add only if your site is a software tool, app, or product. Category-appropriate schema outperforms generic WebSite schema for AI extraction.
Medium impact
WebPage schema with speakable selector
WebPage JSON-LD with a speakable cssSelector pointing to your key content sections tells AI systems exactly which parts of your page are most citable. Target your H1, introductory summary, and FAQ answers.
Add speakable: { @type: SpeakableSpecification, cssSelector: [".your-summary", "h1", ".faq-answers"] } to your WebPage schema.
Medium impact
BreadcrumbList schema on key pages
BreadcrumbList schema gives AI crawlers structural context — where this page sits in your site hierarchy. Particularly useful for blog posts and nested pages. Aids extraction confidence.
Add on all blog posts and deep pages: position 1 = homepage, position 2 = section, position 3 = current page.
03 Content Signals 8 items
High impact
Title tag: 6+ descriptive words
The single most weighted AI content signal (10/100). "Welcome to Acme" scores near zero. "AI-powered GEO platform for website AI visibility" scores full. The title is the first thing AI reads and anchors everything else.
Count the words. If under 6, rewrite to include what you do, who for, and the key differentiator.
High impact
Meta description: 12+ specific words
For JS-heavy sites, the meta description may be the only text AI reliably reads — it must work standalone. Write it as a one-sentence company profile: who you are, what you do, who you serve, what problem you solve. Avoid vague superlatives.
Test: read your meta description to a stranger with no context. Can they explain what you do in one sentence? If not, rewrite.
High impact
Hero section: 40+ words in static HTML
Hero text is the highest-weighted content signal (12/100). If your hero is JS-rendered, AI sees zero here. Test with curl -A "GPTBot" to verify hero text appears in raw HTML. Aim for 40+ words covering what you do, who it's for, and a concrete value claim.
If hero is JS-rendered: implement SSR, or ensure the meta description and title carry the full message alone.
High impact
Original statistics: 5+ quantified claims
Original data is the single highest-leverage citation signal across all AI platforms. Publish numbers nobody else has — your own benchmarks, analysis, or research. Even small-scale data ("we analyzed 50 sites and found X") gets cited because AI can't get it elsewhere.
Count the specific numbers on your page (percentages, counts, measurements). Under 5 means add more data-backed claims.
Medium impact
FAQ headings: 3+ question-format H2/H3s
Question-format headings (H2/H3s that start with "What", "How", "Why", "Does", "Can") map directly to how AI synthesizes answers. Pages with 3+ FAQ headings get substantially more AI extraction attempts than equivalent pages with statement headings.
Audit your page headings. Convert at least 3 statement headings to question format on key pages.
Medium impact
Heading hierarchy: H1 → H2 → H3
Clear heading hierarchy (one H1, multiple H2s, H3s under H2s) signals organized, authoritative content. Brandioz measures H2 count per 100 words of paragraph content — sweet spot is 1–4 H2s per 100 words.
Check for multiple H1s (common on JS sites), skipped heading levels, and heading count ratio.
Medium impact
Author and freshness signals
Author signals (bylines, author schema, author page links) and freshness signals (<time> tags, article dates) each contribute 4/100 to the AI content score. Both are binary checks — present or not. Freshness is particularly important for retrieval-first platforms.
Add a byline and <time datetime="YYYY-MM-DD"> to all blog posts. Add author schema to your site.
Medium impact
OG tags: complete set
og:title, og:description, og:type, og:url, og:image — all five. OG completeness contributes 10/100 to the AI content score. These are static HTML tags parsed regardless of JS rendering status, making them particularly important for client-side rendered sites.
Check with a social preview tool or curl and look for og: meta tags in the raw HTML head.
04 Discovery & Navigation 3 items
Medium impact
llms.txt at domain root
A plain text file at /llms.txt listing your most important pages for AI navigation. Tells AI systems where to start rather than making them explore hundreds of pages. Low effort, low risk, increasingly adopted. Think of it as a sitemap for AI systems rather than search engines.
Create /public/llms.txt with sections for core product pages, blog, reference pages, and key facts about your brand.
Lower impact
Canonical URL on all pages
Canonical tags prevent AI crawlers from indexing duplicate content (www vs non-www, trailing slash variants, parameter URLs). Contributes to crawl score and prevents dilution of AI citation signals across URL variants.
Verify <link rel="canonical" href="https://yourdomain.com/page"> appears in the head of all indexable pages.
Lower impact
Hreflang for multilingual sites
If your site targets multiple languages or regions, hreflang attributes help AI crawlers understand which version of a page to cite for which audience. Contributes 3/100 to the crawl score.
Only relevant for multilingual sites. Skip if you serve one language/region.
05 Entity Authority 5 items
High impact (parametric AI)
Consistent brand name and description everywhere
Your brand name, category description, and key claims must be identical across your website, social profiles, press mentions, and third-party directories. Inconsistency creates uncertainty in AI systems — uncertain sources get skipped. This is the foundational entity authority requirement.
Audit your LinkedIn About, Twitter/X bio, Crunchbase description, and website meta description for consistency.
High impact (parametric AI)
Third-party brand mentions on authoritative sites
Being discussed on Reddit, industry blogs, press outlets, and authoritative directories builds the training data presence that feeds parametric-first AI citation rates. This is what ChatGPT uses when web browsing is off — it cites from memory, and memory comes from training data.
Track brand mentions. Pursue press coverage, community discussions, and directory listings. Each mention in training data is a future parametric citation.
Medium impact
Wikipedia presence (if applicable)
Wikipedia is cited by parametric-first platforms at 4.8% of citations — higher than most domains. If your brand is notable enough for a Wikipedia article, it is one of the strongest single entity authority signals available.
Check if a Wikipedia article exists. If not and you meet notability guidelines, create one with accurate, sourced content.
Medium impact
Publishing cadence for retrieval-first platforms
Retrieval-first platforms like Perplexity heavily weight freshness. Content updated within 30 days gets substantially more citations than older content. A publishing cadence (weekly or bi-weekly) consistently outperforms a static site even if the static content is higher quality.
Aim for at minimum one new or substantially updated piece of content per month. Update lastmod dates in sitemap.xml when content changes.
Medium impact
Answer-first content formatting
Content that leads with the direct answer — before context, background, or caveats — gets cited significantly more often. This is contrary to traditional editorial structure but essential for GEO. The opening paragraph of every page and post should directly answer the question the title implies.
Audit your blog posts. Does the first paragraph answer the headline question directly? If not, restructure to answer-first.

Quick test commands

Run these before and after making GEO changes to measure impact:

Check what GPTBot reads

curl -A "GPTBot" https://yourdomain.com | wc -w # Should return 600+. Under 600 = partial render problem.

Check for JSON-LD schema

curl https://yourdomain.com | grep -o 'application/ld+json' # Should return at least 2 hits (Organization + FAQPage minimum)

Check robots.txt crawler permissions

curl https://yourdomain.com/robots.txt | grep -E "GPTBot|PerplexityBot|ClaudeBot" # Should return User-agent lines for each crawler

Check llms.txt

curl https://yourdomain.com/llms.txt # Should return your structured navigation file

Scoring your checklist

Items passing Estimated AI readability Priority action
0–5 items 0–30 / 100 Start with crawlability — robots.txt, sitemap, partial render fix
6–12 items 30–55 / 100 Add FAQPage + Organization schema, fix title and meta description
13–19 items 55–75 / 100 Add original statistics, improve FAQ headings, deploy llms.txt
20–27 items 75–100 / 100 Focus on entity authority and publishing cadence

Frequently asked questions

How often should I run a GEO audit?
After any significant change to your homepage, meta tags, robots.txt, or site structure. For ongoing monitoring, use Brandioz's Track Changes feature — it sends email alerts when your AI scores change meaningfully. For Perplexity specifically, check freshness monthly.
Which checklist item has the biggest immediate impact?
For most sites: verifying that AI crawlers are allowed in robots.txt and that your homepage returns 600+ words of raw HTML. These two items fix the most common GEO failure mode — being technically blocked or invisible to AI crawlers despite having excellent content.
Can I automate this checklist?
Yes. Brandioz runs all 27 signal checks automatically and returns a scored breakdown with specific fixes ranked by impact. Run it at brandioz.com/dashboard — takes 10 seconds, free tier available.
Does fixing GEO hurt Google rankings?
No. GEO fixes are either neutral for Google (llms.txt, crawler profile pages) or actively beneficial for both (clear headings, FAQ sections, original data, fast server responses, complete meta tags). The only potential conflict is client-side rendering — but the GEO fix (SSR or static pages) is also a Google PageSpeed improvement.