Analysis Pipeline

Under the hood · Pipeline

How Brandioz scores

Every analysis runs the same deterministic 18-step pipeline — from URL validation to AI narrative. No black boxes.

Input4

Extraction2

Analysis5

Scoring3

Output4

Input

Parses the URL, strips deep paths and UTM params down to the homepage root, then runs a live DNS lookup to confirm the domain actually exists before any work begins.

inputDNS + normalisation

Fetches the raw HTML and distinguishes JS-rendered shells from pages with real body content — a critical split that prevents SPA homepages from scoring as if they were fully readable.

inputHTTP fetch + BeautifulSoup

Detects if the site is client-rendered (Next.js, Framer, etc.) — critical because SPAs often hide content behind JavaScript, requiring special handling to extract what AI crawlers actually see. Flags `is_client_rendered` and `framework_detected` for downstream scoring adjustments.

inputSPA · CSR · Framework

Extracts geographic signals from the page: TLD, language, hreflang tags, and localized content indicators. Used to determine if the site is region-specific or global, affecting benchmark comparisons and recommendation relevance.

inputextract_geo_signals

Extraction

Extracts exactly what an AI crawler sees: title, meta description, hero section, headings, paragraphs, OG tags, structured data, and internal links. For SPAs, uses a modified extraction strategy (skip_clean) to preserve client-rendered content.

extractionextract_ai_view

Builds a hierarchical map of the page and traverses it breadth-first to find where value proposition signals appear relative to the fold — key for the understanding curve.

extractionbuild_section_tree · bfs_traverse

Analysis

A Groq LLM call classifies the site's primary intent (e-commerce, SaaS, media, blog, etc.) with a confidence score. The intent label drives category-specific scoring weights downstream.

analysisinfer_site_intent_llm · Groq

Async function that builds what AI 'believes' about the site — category, capabilities, target audience, and a confidence score — by cross-referencing title, meta, hero, and intent signals.

analysisbuild_heuristic_belief

Measures how fast AI clarity builds as it reads deeper: first impression, after scrolling, full page. Produces a named shape — fast_clear, partial, thin, or flat — that feeds directly into scoring.

analysiscompute_understanding_curve

Computes sentence length, semantic density, high and low info ratios, and concept count. Also scores signal confidence for title, meta description, headings, and value proposition.

analysiscompute_metrics · analyze_clarity

Determines presence tier (high / medium / low / unknown) from web signals and training data indicators. Must run before coherence — it feeds the known-brand floor rule that prevents globally recognised brands from scoring too low.

analysiscompute_brand_presence

Scoring

Calculates the raw score across five components: structure (33pts), content depth (22pts), semantic quality (22pts), value proposition (18pts), and hierarchy (5pts) — weighted by page type and extraction quality.

scoringcalculate_score_by_mode

Projects the full feature vector into four PCA dimensions: understanding depth, signal balance, content density, and value prop speed. Surfaces the dominant weakness — the dimension with the highest improvement leverage.

scoringPCAModel · build_feature_vector

Applies correction rules on top of the raw score: known-brand floors for high-presence sites, extraction quality caps, curve penalties, cross-signal consistency checks, and client-rendering adjustments. This produces the final published score.

scoringapply_coherence

Output

Compares against a static corpus of 91 analyzed sites. If the category is too niche for a meaningful static comparison, falls back to a Groq-generated dynamic peer set. Geo signals influence which benchmarks are most relevant.

outputgenerate_benchmarks · dynamic fallback

Compares the site's AI visibility against competitors in the same category. Shows where you rank relative to market leaders and identifies gaps in AI discoverability. Available as a separate endpoint for ongoing monitoring.

outputcompetitors router

Generates targeted fixes from heuristic rules and PCA weakness overrides, then filters out irrelevant ones based on page type, belief context, and client-rendering detection — so every recommendation is specific, not generic.

outputgenerate_recommendations · pca_overrides

The final step. Groq receives the complete result (including geo signals, client detection, and competitor context) and writes a plain-English summary — what's working, what isn't, and why — personalised to the site's actual score, category, and signals.

outputgenerate_ai_narrative · Groq

Select a stage

Click any step to see exactly what it does and why it matters.

Pipeline totals

Total stages

LLM calls

Async ops

Why deterministic?

Same input → same output. No model drift, no prompt fragility. Every score is reproducible and auditable.

Tap any stage to expand