How the analysis works.
Every Brandioz score is the direct sum of measurable signals extracted from your site's HTML. No black boxes. No ML guesswork. Every point is traceable to a specific signal.
Component weights
Max points per dimension — total 100
Signal sub-weights
Contribution of each signal to total score
Client-rendered detection
SPA · CSR · Framework identification
Framework detection
is_client_renderedbooleanframework_detected"Next.js" | "Framer" | nullWhy this matters
Client-rendered sites get a modified extraction strategy to preserve content. The framework name tells you exactly what we detected.
Sites built with Next.js, Framer, or other SPAs require special handling — the raw HTML often contains minimal content.
Geo Signals Extraction
Localization & regional indicators
tldTop-level domain
.com, .uk, .de, etc.
.com → global, .co.uk → UK
languagePrimary language
en, es, fr, de, etc.
from html lang attribute
hreflangAlternate language/region URLs
indicates international targeting
links to localized versions
localized_contentRegion-specific terms
pricing, addresses, phone numbers
GBP, €, local office addresses
Geo signals help determine if a site is region-specific or global — affecting benchmark comparisons.
The heuristic scorer computes every point directly from HTML signals — title confidence, word count, semantic density, value proposition patterns, and understanding curve scores. No model, no approximation. The formula is deterministic and fully auditable.
Pipeline
1. Render page → extract signals
2. Run heuristic formula → raw score
3. Apply coherence rules → final score
4. PCA transform → dominant weakness
5
Score components
structure to hierarchy
9
Coherence rules
internal consistency
4
PCA components
71.3% variance
4
Extraction caps
high / medium / low / failed
Signal importance in PCA
Which signals drive the most variance across sites
Understanding curve scores (deep/scroll/imm) account for 49.7% of PCA variance. Sites that improve progressive clarity gain the most across all 4 dimensions.
Content depth curve
Points earned vs word count — climbs fast at first, then slows down
understanding_depth
Does the site get clearer the more you read?
How deeply an AI can understand the site across progressive reading depths.
↑ deep_score_norm, scroll_score_norm, imm_score_norm
signal_imbalance
Are the site's strongest signals balanced or lopsided?
Tension between title clarity and hero/value-prop quality.
↑ title_confidence, hero_confidence, value_prop_confidence
density_vs_description
Is the site dense with facts, or clear in its descriptions?
Trade-off between information density and descriptive meta content.
↑ semantic_density_norm, description_confidence, word_count_norm
value_prop_speed
How fast does an AI grasp what the site offers?
How quickly value proposition signals appear.
↑ value_prop_confidence, high_info_ratio, title_confidence
4 components → 71.3% variance explained
remaining 28.7% = site-specific noiseScree plot — how much each component adds
We chose 4 components because the 5th barely adds anything (7.1%). The elbow is at 4.
Feature loadings matrix
How much each signal pushes each dimension up (+) or down (−)
| Signal | PC0understanding_depth | PC1signal_imbalance | PC2density_vs_description | PC3value_prop_speed |
|---|---|---|---|---|
| deep_score_norm | +0.460 | -0.180 | -0.140 | -0.294 |
| scroll_score_norm | +0.441 | -0.190 | -0.160 | -0.270 |
| imm_score_norm | +0.400 | -0.200 | -0.170 | -0.294 |
| hero_confidence | +0.338 | -0.464 | +0.120 | +0.100 |
| heading_confidence | +0.313 | +0.200 | -0.080 | +0.150 |
| title_confidence | +0.200 | +0.528 | -0.246 | +0.343 |
| value_prop_confidence | +0.180 | -0.430 | +0.157 | +0.639 |
| high_info_ratio | +0.150 | +0.120 | +0.100 | +0.380 |
| word_count_norm | +0.100 | +0.200 | -0.398 | -0.150 |
| semantic_density_norm | +0.090 | +0.300 | +0.625 | +0.100 |
| description_confidence | +0.080 | +0.150 | -0.567 | +0.298 |
Understanding curve scores (imm/scroll/deep) dominate PC0 with loadings > 0.40 — the strongest PCA dimension is driven entirely by progressive AI comprehension, not metadata.
PC0 vs PC1 — site positions
understanding_depth (x-axis) vs signal_imbalance (y-axis)
Low PC0 = AI struggles even after full read. High PC1 = title dominates over hero/value-prop. Both axes matter for strong visibility.
Heuristic scores across the range
15 representative sites — from excellent to poor AI visibility
Scores above 75 typically have high-quality hero sections, clear meta descriptions, and strong progressive understanding. Sites below 50 usually have failed hero extraction or thin content.
Stripe vs Basecamp — signal radar
How two high-scoring sites compare signal by signal
Stripe's meta description and hero are near-perfect. Basecamp compensates with very clear title and heading structure — showing multiple paths to a strong score.
Dominant weakness distribution
Most common PCA weakness across representative sites
understanding_depth is the most common weakness — most sites have decent metadata but AI struggles to build a clear picture through progressive reading.
title_confidenceStructureIs the page title descriptive enough for an AI to know what you do?
Word count, entity words (platform, tool, ai, clinic, school), benefit descriptors, and H1 compensation if title is brand-only.
description_confidenceStructureDoes the meta description explain your product clearly?
Scored for capability language, audience terms, sentence structure patterns, and optimal length (50–160 chars).
heading_confidenceStructureDo the headings form a logical hierarchy and cover diverse topics?
H1 presence, no level skips, capability heading ratio, plus semantic diversity via TF-IDF cosine across all headings.
hero_confidenceStructureDoes the hero section explain what you do, or is it just a tagline?
Hero word count, entity and benefit language, compositional VP score. Penalises nav-polluted and repetitive text.
value_prop_confidenceValue PropositionCan an AI identify your product type, who it's for, and why it's useful?
Pattern-matches product type, target audience, and benefits. Hero and title carry 3× weight vs full page text.
high_info_ratioSemantic QualityWhat fraction of sentences actually say something useful vs filler?
Fraction of sentences above informativeness threshold. Scored on action verbs, numbers, AI phrases, unique word ratio. UI chrome filtered out.
breadth_scoreContent DepthDoes the page cover multiple distinct topics, or just repeat the same idea?
KMeans clustering on paragraph TF-IDF vectors. Counts distinct topic clusters, normalised to expected count.
avg_sentence_lengthSemantic QualityAre sentences the right length? Too short = choppy. Too long = hard to parse.
Mean words per sentence. Ideal 15–20 words. Penalties applied above 25 and below 12.
imm_scoreUnderstandingFirst impression — how much does an AI understand from just the hero section?
BFS traversal score at hero/top-level depth. Based on value prop signal presence and section quality.
scroll_scoreUnderstandingMid-read — does understanding improve as the AI reads further down the page?
BFS traversal score at mid-depth sections. Monotonically enforced — can only equal or exceed imm_score.
deep_scoreUnderstandingFull-read — at peak comprehension, how well does an AI understand the site?
BFS traversal score at full depth. Single most important signal (18.4% of PCA variance).
Pipeline additions
11
HTML Extraction
Signals from raw HTML
4
Geo signals
TLD, language, hreflang
2
Framework detection
Next.js, Framer
endpoint
Competitor tracking
Separate router
Geo signals → benchmark selection · Framework detection → extraction strategy · Competitor tracking → separate endpoint
How competitor analysis works
Separate endpoint · Real-time comparison
POST /analyze/competitors1. Identifies 5–7 relevant competitors from your domain and category
2. Fetches each competitor's homepage HTML
3. Extracts the same signals — title, meta, hero, headings, content
4. Analyzes each competitor's AI visibility
5. Returns comparison with strengths and relevance reasons
What you get back
Per-competitor analysis
ai_visibility_levelexcellent / good / average / poorscore0-100 AI visibility estimatewhat_they_do_well2-3 specific AI-friendly practiceskey_strengths2-3 competitive advantagesrelevant_becauseWhy they're a relevant comparisonNo black boxes.
Every number in Brandioz is computable from the raw HTML of your site. The weights on this page are the exact weights in production. No ML model adjusting scores behind the scenes.
If a score is worth trusting, you should be able to understand exactly how it was computed — signal by signal, weight by weight.
Production formula
score = ( title_conf × 10 # structure + desc_conf × 10 + hero_conf × 8 + heading_conf × 5 + log(words, 22) # content depth + info_score × 12 # semantic quality + sent_score × 4 + diversity × 6 + value_prop × 18 # value prop + hierarchy × 5 - cross_penalties # internal consistency - curve_penalty # understanding curve ) # coherence engine applies up to 9 caps # PCA identifies dominant_weakness # Geo signals → benchmark selection # Framework detection → extraction strategy