Category: Technical SEO

Deep dives into crawlability, indexation, site speed, and technical website health.

  • How to Find Broken Links in Screaming Frog: A Step-by-Step Guide

    Broken links silently erode your site authority, waste crawl budget, and frustrate users. Screaming Frog is the fastest way to find and fix them. Here is exactly how I do it across enterprise sites with tens of thousands of pages.

    Why Broken Links Matter for SEO

    Every broken link on your site is a dead end for both users and search engine crawlers. When Googlebot hits a 404, it wastes crawl budget that could be spent discovering and indexing your valuable content. Broken internal links also leak PageRank into nowhere, weakening the authority distribution across your site. For large eCommerce catalogues and enterprise sites, this compounds quickly — I’ve seen sites with thousands of broken links haemorrhaging authority without anyone noticing. Fixing them is one of the most impactful quick wins in a technical SEO audit.

    Step 1: Configure Screaming Frog for a Broken Link Audit

    Open Screaming Frog and before you crawl, go to Configuration > Spider and ensure “Check External Links” is ticked. This tells the crawler to check both internal and external links for broken responses. For large sites, set the crawl speed to a reasonable rate (2–5 URLs per second) to avoid overloading your server. Set the user agent to Googlebot to see exactly what Google sees.

    Step 2: Run the Crawl

    Enter your domain URL and hit Start. For a site with 10,000 pages, expect the crawl to take 15–30 minutes depending on your settings. Screaming Frog will systematically follow every link on every page, recording the HTTP status code for each URL it encounters. Let it run to completion — partial crawls miss broken links buried deep in your site architecture. If you’re new to the tool, read our beginner’s guide to Screaming Frog first.

    Step 3: Filter for Client Error (4xx) Responses

    Once the crawl completes, go to the Response Codes tab and filter by “Client Error (4xx)”. This shows you every URL that returned a 404 Not Found, 410 Gone, or other 4xx error — these are your broken links. The most important column here is “Inlinks”, which tells you how many pages on your site link to each broken URL. Prioritise fixing broken links with the most inlinks first, as these have the greatest impact on your site architecture and authority flow.

    Step 4: Export and Prioritise

    Right-click the filtered results and export to CSV. Sort by inlinks descending to prioritise the highest-impact fixes first. For each broken link, click on the URL in Screaming Frog and check the “Inlinks” tab at the bottom — this shows you exactly which pages contain the broken link, making it easy to locate and fix them efficiently.

    Step 5: Fix or Redirect

    For each broken link, you have three options: update the link to point to the correct URL if the content has moved, set up a 301 redirect from the broken URL to the most relevant live page, or remove the link entirely if the content no longer exists and there’s no suitable replacement. For enterprise sites, I typically set up 301 redirects in bulk using a redirect mapping spreadsheet — it’s faster and more systematic than manually editing hundreds of pages.

    Pro Tips From a Decade of Broken Link Audits

    Schedule monthly crawls. Broken links accumulate constantly as content is updated, products are discontinued, and external sites go offline. Also check for soft 404s — pages that return a 200 status code but display a “page not found” message. These are invisible to standard broken link checks but just as damaging. Screaming Frog can detect these if you configure custom extraction rules to look for common 404 page text patterns. Addressing these issues regularly is part of solid crawl error management in Google Search Console.

  • Beginner’s Guide to Using Screaming Frog for SEO Audits

    Screaming Frog SEO Spider is the single most important tool in any technical SEO toolkit. I’ve used it on every client engagement since 2014. Here is everything you need to know to get started — and the features most beginners miss.

    What Is Screaming Frog?

    Screaming Frog SEO Spider is a desktop website crawler that analyses URLs in real time. It crawls your site the same way Google does — following links, reading HTML, and collecting data on every page it finds. The free version crawls up to 500 URLs. The paid licence (currently around GBP 239 per year) removes that limit and unlocks advanced features like JavaScript rendering, custom extraction, and crawl scheduling.

    Installing and Setting Up

    Download Screaming Frog from screamingfrog.co.uk and install it — it runs on Windows, Mac, and Linux. Before your first crawl, I recommend adjusting a few settings. Go to Configuration > Spider and tick “Check External Links” and “Crawl Outside of Start Folder” (only if you want to check subdomains). Under Configuration > Speed, set the max threads to 2–5 for your first crawl to prevent overloading your server.

    Running Your First Crawl

    Enter your homepage URL in the top bar and click Start. Screaming Frog will begin crawling your site page by page, following every internal link it finds. As it runs, you’ll see data populating in real time across multiple tabs: Internal, External, Response Codes, Page Titles, Meta Descriptions, Headings, Images, and more. Each tab gives you a different lens into your site’s health.

    The Key Tabs Every Beginner Should Check

    Response Codes — Filter by “Client Error (4xx)” to find broken pages. Any 5xx errors indicate server problems needing immediate attention. This tab is also essential for a broken link audit.

    Page Titles — Filter by “Missing”, “Duplicate”, or “Over 60 Characters” to quickly identify title tag issues across your entire site. Title tags remain one of the strongest on-page ranking signals.

    Meta Descriptions — Filter by “Missing” or “Duplicate”. While meta descriptions don’t directly affect rankings, they heavily influence click-through rates from search results.

    H1 Headings — Filter by “Missing” or “Duplicate”. Every indexable page should have exactly one unique H1 that includes your target keyword.

    Images — Filter by “Missing Alt Text” to find images without alt attributes, which are important for accessibility and image SEO.

    Advanced Features Worth Learning Early

    Custom Extraction — Under Configuration > Custom, you can set up XPath or CSS selectors to extract specific data from every page. I use this to check for structured data, verify tracking scripts, or confirm that certain elements exist across thousands of pages.

    JavaScript Rendering — If your site uses JavaScript frameworks (React, Angular, Vue), enable JavaScript rendering under Configuration > Spider > Rendering. This tells Screaming Frog to render pages like a browser rather than reading raw HTML — critical for auditing modern web applications.

    Crawl Comparison — Save your crawl data and compare it against future crawls to track changes over time. This is invaluable for monitoring the impact of site migrations, redesigns, or large-scale SEO changes.

    How I Use Screaming Frog on Client Engagements

    Screaming Frog is the first tool I open on every new engagement. A full crawl gives me a comprehensive baseline of site health within minutes. I combine crawl data with Google Search Console, Google Analytics, and backlink data to build a complete picture of where the site stands and where the biggest opportunities lie. Monthly re-crawls then track progress and catch new issues before they compound. If you want to understand how this fits into a full audit process, read my guide on how to conduct a technical SEO audit.

  • Structured Data and Schema Markup: A Beginner’s Guide

    Structured data is one of the least understood areas of technical SEO — and one of the most rewarding when you get it right. It doesn’t directly boost your rankings, but it gives Google the context it needs to display your content in enhanced, visually rich formats in the search results. Done well, it can significantly increase your click-through rate without moving a single position.

    What Is Structured Data?

    Structured data is a standardised format for providing information about a page and classifying the content on it. In plain terms, it’s a way of adding extra information to your webpage in a format that search engines can clearly understand — not just the text on the page, but what that text means and what type of content it represents. The most common vocabulary is Schema.org — a collaborative project supported by Google, Bing, Yahoo, and Yandex.

    What Is Schema Markup?

    Schema markup is structured data implemented using Schema.org vocabulary. It’s typically added in one of three formats: JSON-LD, Microdata, or RDFa. Google strongly recommends JSON-LD — it’s a separate block of code that sits in the <head> or <body> without being mixed into your HTML, making it easier to add, modify, and troubleshoot.

    What Are Rich Results?

    Rich results are enhanced search results that display additional information beyond the standard blue link and meta description: star ratings and review counts, product pricing and availability, FAQ accordions that expand in the SERP, recipe cards, event listings with dates and venues, and breadcrumb trails. Rich results attract significantly higher click-through rates — they take up more visual space, build trust at a glance, and stand out from competing results.

    The Most Useful Schema Types for Most Businesses

    LocalBusiness — Explicitly tells Google your business name, address, phone number, and opening hours. Reinforces your local SEO signals and helps ensure your information is displayed correctly in local search results and Google Maps. Pairs naturally with a strong local SEO strategy.

    FAQPage — Can cause your result to expand in the SERP to show questions and answers without the user even visiting your site. This dramatically increases your SERP real estate and is increasingly relevant as Google AI Overviews become more prominent.

    Article / BlogPosting — Helps Google understand the structure and authorship of your content. Particularly useful for building E-E-A-T signals by explicitly identifying the author, their credentials, and the article’s subject matter.

    Product — For e-commerce sites, Product schema enables price, availability, and review information to appear directly in search results. One of the most commercially significant rich result types available.

    Person — For consultants, coaches, authors, and professionals. Explicitly identifies you as an individual with a specific expertise and job title, contributing to the knowledge graph signals Google uses to assess authority.

    How to Implement Schema Markup

    On WordPress, the simplest approach is an SEO plugin. Rank Math and Yoast SEO both offer built-in schema functionality. For more control, write JSON-LD directly and test using Google’s Rich Results Test to verify your markup is valid and eligible for rich results. Structured data issues will often surface in a technical SEO audit — so if you haven’t done one recently, that’s a good starting point.

    What Structured Data Won’t Do

    Structured data does not directly improve your rankings. It helps Google understand your content and makes you eligible for rich results, but it doesn’t add ranking signals the way backlinks or content quality do. Rich results are also not guaranteed — even with valid structured data, Google makes its own decision about whether to show them. Eligibility is necessary but not sufficient.

    Bringing It All Together

    Structured data is the finishing layer of technical SEO — it sits on top of a well-crawled, well-indexed, fast, mobile-friendly site. If those fundamentals aren’t in place, schema markup won’t move the needle. But once your technical foundation is solid, it’s one of the highest-leverage improvements you can make to increase visibility and click-through rates from organic search. For the full picture, start with The Complete Guide to Technical SEO.

  • Mobile-First Indexing: What It Means for Your Website

    If you’ve built your website primarily with desktop users in mind, you may be unknowingly putting yourself at a significant disadvantage in Google’s search results. Since completing the rollout of mobile-first indexing across all websites, Google now uses the mobile version of your website as its primary source for indexing and ranking — not the desktop version. How your site performs on a small screen isn’t just a user experience concern. It’s a ranking concern.

    What Is Mobile-First Indexing?

    Mobile-first indexing means that when Google’s crawler visits your website, it primarily uses a smartphone user agent rather than a desktop one. Google began rolling this out in 2018 and completed the full rollout for all websites in 2023. If your mobile site has the same content, structure, and signals as your desktop site, mobile-first indexing makes no difference. The problem arises when there’s a gap between what mobile and desktop users see.

    What Mobile-First Indexing Means in Practice

    The most important implication: if content, structured data, or other signals exist only on your desktop site and not on your mobile site, Google may not see or credit them. For sites using responsive design, this is typically not a problem. The risk is primarily for sites with separate mobile subdomains or dynamic serving where different HTML is delivered based on user agent.

    Common Mobile-First Indexing Issues

    Content hidden on mobile — Google can access content hidden behind tabs or accordions. However, if entire sections of content are simply absent from the mobile HTML, that content won’t be indexed. Use the URL Inspection tool in Search Console and select “Test Live URL” to see how Googlebot views your page on mobile.

    Mobile site has less content than desktop — On sites with separate mobile URLs, it’s common for the mobile version to be stripped down. Since Google is indexing the mobile version, leaner content is what’s being evaluated for ranking purposes. If your desktop pages have rich content and your mobile pages have half that, you’re only getting credit for the mobile version.

    Structured data only on desktop — If you’ve implemented schema markup on your desktop pages but not on your mobile pages, Google won’t see it. This affects your eligibility for rich results — star ratings, product information, FAQs.

    Poor mobile page speed — A page that loads in 2 seconds on desktop might take 6 seconds on a mobile device on 4G. Since Google assesses the mobile experience, poor mobile speed directly affects your Core Web Vitals scores. Always test using PageSpeed Insights with a focus on the mobile score. For a deeper guide, read our post on how to improve page speed for SEO.

    Intrusive interstitials — Google has an explicit policy against intrusive interstitials on mobile — pop-ups that cover the main content shortly after landing. These are treated as a demotion signal.

    How to Check Your Mobile SEO Health

    Open Google Search Console and check the Mobile Usability report — it flags specific pages with issues like text too small to read or clickable elements too close together. Use the URL Inspection tool to see how Google’s mobile crawler views individual pages. Use Google’s Mobile-Friendly Test for a quick pass/fail assessment. Running a full technical SEO audit will surface any mobile issues alongside every other factor affecting your visibility.

    The Best Approach: Responsive Design

    If you’re building or redesigning a website today, responsive design is the right approach — one set of HTML delivered to all devices, with CSS handling the layout. This eliminates the content parity problem entirely, ensures consistent structured data, and simplifies maintenance. Google explicitly recommends responsive design as the preferred implementation for mobile-first indexing.

    Continue with the Technical SEO Series

    Mobile-first indexing connects directly to Core Web Vitals, page speed, and structured data. Start with The Complete Guide to Technical SEO for the complete picture, or explore the individual guides on Core Web Vitals, page speed, and structured data.

  • How to Improve Page Speed for SEO

    Page speed has been a confirmed Google ranking factor since 2010 for desktop and 2018 for mobile. But its importance goes far beyond rankings. A slow website loses visitors — often before they’ve read a single word. And visitors who leave immediately send negative engagement signals back to Google, which can suppress your rankings further. It’s a cycle that’s hard to escape once you’re in it.

    Why Page Speed Matters for SEO

    Speed is part of the broader “page experience” signal. Google has been clear that content relevance will outweigh page experience in most cases — but in competitive niches where several pages are closely matched on quality, speed becomes a tiebreaker. More significantly, speed affects user behaviour. Even a one-second delay increases bounce rates meaningfully. Users on mobile — the majority of web traffic — are particularly sensitive to load times, which is directly tied to mobile-first indexing.

    How to Measure Your Page Speed

    Start with Google PageSpeed Insights (pagespeed.web.dev). It shows Core Web Vitals scores for both mobile and desktop, plus specific recommendations. Crucially, it shows both lab data and field data from real Chrome users. Google Search Console’s Core Web Vitals report shows which pages are rated Good, Needs Improvement, or Poor across your site. WebPageTest.org gives you a detailed waterfall chart of every request — useful for identifying exactly what’s causing delays.

    The Most Impactful Page Speed Improvements

    Optimise Your Images

    Images are almost always the largest contributor to page weight. Use modern formats like WebP or AVIF (25–35% smaller than JPEG/PNG). Compress images without visible quality loss. Serve images at the dimensions they’ll actually be displayed — don’t upload 3000px wide images and scale down in browser. Use loading=”lazy” for below-fold images. Don’t lazy-load your hero image — this will hurt your LCP score.

    Improve Server Response Time

    Time to First Byte (TTFB) is how long your server takes to respond. Google’s threshold is under 800ms, with under 200ms being ideal. Common causes of slow TTFB: poor hosting infrastructure, slow database queries, no server-side caching. Moving to better hosting, implementing caching, and optimising database queries are the usual fixes.

    Implement Caching

    Caching stores a pre-built version of your pages and serves them directly to users without rebuilding from scratch each time. For WordPress, use WP Rocket, W3 Total Cache, or LiteSpeed Cache. Browser caching tells returning visitors’ browsers to store static assets locally so they don’t re-download on subsequent visits.

    Eliminate or Defer Render-Blocking Resources

    Render-blocking resources are JavaScript and CSS the browser must download and process before displaying any content. Non-critical CSS can be deferred or loaded asynchronously. JavaScript that doesn’t need to run before the page displays can be given the defer or async attribute. Critical CSS can be inlined directly in the HTML to avoid an additional request.

    Use a Content Delivery Network (CDN)

    A CDN stores copies of your static assets on servers distributed around the world, serving them from the location closest to each user. Cloudflare has a free tier that handles the basics well. For businesses with national or international audiences, a CDN can make a meaningful difference to load times.

    Minimise Third-Party Scripts

    Third-party scripts — analytics, chat widgets, social sharing buttons, ad scripts — are a common cause of slow pages. Each is an additional request to an external server you have no control over. Audit what’s on your site and ask whether each script delivers enough value to justify its performance cost. Load non-critical scripts asynchronously and review what’s active regularly.

    A Note on Mobile Speed

    Always test and optimise for mobile first. A page that loads in 2 seconds on desktop broadband might take 6 seconds on a mid-range mobile on 4G. Always check your PageSpeed Insights scores on mobile — they’re typically significantly lower and tell a more realistic story about what most users experience.

    Continue with the Technical SEO Series

    Page speed is closely tied to Core Web Vitals and mobile usability — all within the broader framework of technical SEO. Explore the other guides covering crawl errors, XML sitemaps, canonicalisation, and structured data.

  • Core Web Vitals Explained: What They Are and Why They Matter

    If you’ve been doing any reading about SEO in the last few years, you’ve almost certainly come across the term Core Web Vitals. Google has made it clear that page experience matters — and Core Web Vitals are the specific, measurable metrics it uses to assess it. Understanding what they are, how they’re measured, and what you can do about them is now a fundamental part of any serious technical SEO strategy.

    What Are Core Web Vitals?

    Core Web Vitals are a set of real-world performance metrics that Google uses to measure the quality of a user’s experience on a web page. They focus on three things: how fast the main content loads, how quickly the page responds to user interaction, and how stable the page is visually as it loads. Google introduced Core Web Vitals as an official ranking factor in 2021 as part of its “page experience” signal.

    The Three Core Web Vitals Metrics

    1. Largest Contentful Paint (LCP)

    LCP measures how long it takes for the largest visible element on the page to load — typically the hero image, a large header, or a significant block of text. Good: 2.5 seconds or faster. Needs Improvement: 2.5 to 4 seconds. Poor: more than 4 seconds. The most common causes of poor LCP are large unoptimised images, slow server response times, and render-blocking JavaScript and CSS. Our guide to improving page speed for SEO covers the specific fixes in detail.

    2. Interaction to Next Paint (INP)

    INP replaced First Input Delay (FID) as a Core Web Vital in March 2024. It measures the responsiveness of a page throughout the entire visit — the time between a user interacting with the page and the browser visually responding. Good: 200 milliseconds or less. Needs Improvement: 200–500ms. Poor: over 500ms. Usually caused by heavy JavaScript blocking the browser’s main thread.

    3. Cumulative Layout Shift (CLS)

    CLS measures visual stability — how much elements on a page move around unexpectedly as it loads. Good: 0.1 or less. Needs Improvement: 0.1–0.25. Poor: over 0.25. CLS is caused by elements loading without reserved space — images without defined dimensions, web fonts that swap after initial render, or dynamically injected content that pushes content down. Fix it by specifying sizes for images and embeds in advance.

    How Does Google Measure Core Web Vitals?

    Google primarily collects real-world data from actual Chrome users visiting your site — called field data or Chrome User Experience Report (CrUX) data. This means your score in Search Console reflects how real users have actually experienced your pages. A page can score well in Lighthouse but underperform in the field if real users are on slower devices or connections.

    Do Core Web Vitals Actually Affect Rankings?

    Yes — but not in the way many people imagine. Core Web Vitals are a tiebreaker signal, not a primary ranking factor. A highly relevant page with mediocre Core Web Vitals will still rank above a fast page with poor content. However, in competitive niches where multiple pages have similar quality, Core Web Vitals can tip the balance. The indirect effect is also significant: poor Core Web Vitals cause higher bounce rates, sending negative engagement signals back to Google.

    How to Improve Your Core Web Vitals

    Improving LCP: Optimise your images — use WebP or AVIF format, compress without quality loss, and serve images at the right display dimensions. Use a CDN. Prioritise your hero image with fetchpriority=”high”. Improve your server response time (TTFB).

    Improving INP: Audit your JavaScript for long-running tasks that block the main thread. Defer non-critical scripts and audit third-party scripts (analytics, chat widgets, ad scripts) — these are a common culprit.

    Improving CLS: Always define explicit width and height attributes on images and video elements. Use font-display: optional or preload your fonts to reduce layout instability. Avoid inserting dynamic content above existing content unless you reserve space for it.

    Prioritising What to Fix

    If you have limited developer resources, prioritise anything scoring “Poor” first — these have the most negative impact. Focus on your highest-traffic and most commercially important pages. Look for patterns across the site — template-level changes often fix many pages at once. For the full context on where Core Web Vitals sit within your technical health, a technical SEO audit will help you prioritise effectively.

    Continue with the Technical SEO Series

    Core Web Vitals sit within the broader discipline of technical SEO. Read The Complete Guide to Technical SEO and explore the other guides on crawl errors, sitemaps, canonicalisation, mobile-first indexing, and structured data.

  • What is Canonicalisation and How Does it Affect SEO?

    Canonicalisation is one of those technical SEO concepts that sounds intimidating until you understand what problem it’s actually solving. Once you get it, it becomes one of the most useful tools in your technical toolkit. This guide explains what canonicalisation is, why it matters, and exactly how to implement it correctly.

    What Is Canonicalisation?

    Canonicalisation is the process of telling search engines which version of a URL is the definitive, preferred version — the “canonical” version. It solves the problem of duplicate content: situations where the same or very similar content is accessible at multiple different URLs. The canonical tag — a <link rel=”canonical”> element in the HTML head — is the primary tool. When you add it to a page, you’re saying: “This content exists at multiple URLs, but this is the one I want you to index and rank.”

    Why Does Duplicate Content Happen?

    Duplicate content is far more common than most website owners realise. Common causes include: HTTP and HTTPS versions both accessible, www and non-www versions both accessible, URL parameters generating near-identical product pages, paginated content without proper handling, printer-friendly versions of pages, and trailing slash variations (yourdomain.com/page vs yourdomain.com/page/). All of these are standard issues covered in a technical SEO audit.

    Why Duplicate Content Is a Problem

    When Google finds multiple versions of the same content, it has to decide which one to index and rank. If you don’t tell it, Google makes its own choice — and it might not choose the version you prefer. More significantly, if external links point to different versions of the same page, the authority is split rather than concentrated on one URL. Canonicalisation consolidates that authority on the version you want to rank.

    How to Implement Canonical Tags

    The canonical tag goes in the <head> section of your HTML: <link rel=”canonical” href=”https://www.yourdomain.com/your-page/” />. Every page on your site should have a canonical tag — even pages with no duplicate versions. Self-referencing canonical tags are best practice. For duplicate pages, all duplicate versions should point to the preferred canonical URL. WordPress SEO plugins like Yoast or Rank Math handle canonical tags automatically, but it’s worth verifying they’re configured correctly.

    Canonical Tags vs. 301 Redirects

    A 301 redirect is the stronger signal — it permanently redirects one URL to another, and the old URL becomes inaccessible to users. Use it when you want to permanently consolidate two URLs. A canonical tag is a softer signal — a recommendation to Google, not a command. Use canonical tags when both URLs need to remain technically accessible (e.g., filter URLs for product pages) but you want one version indexed. Also use them when syndicating content to other sites. For more on how redirects work in practice, see our guide on fixing crawl errors in Google Search Console.

    Common Canonicalisation Mistakes

    Canonicalising to the wrong URL — always verify that canonical URLs return a 200 status code and aren’t themselves redirected. Using canonical tags inconsistently — implementation should be consistent across the entire site, ideally at template level. And using canonical tags to consolidate content on entirely different topics — canonical is a duplicate management tool, not a content consolidation strategy.

    Continue with the Technical SEO Series

    Canonicalisation sits alongside crawl management, page speed, sitemaps, and other technical disciplines as part of a complete SEO strategy. Read The Complete Guide to Technical SEO for the full overview, or explore the specific guides on Core Web Vitals, crawl errors, mobile-first indexing, and structured data.

  • XML Sitemaps: What They Are and How to Optimise Them

    An XML sitemap is one of the simplest technical SEO tools available to you — but also one of the most commonly misunderstood. Used correctly, it helps Google discover and understand your content faster. Used incorrectly, or ignored entirely, it can actively mislead search engines and undermine your technical SEO efforts.

    What Is an XML Sitemap?

    An XML sitemap is a file — usually at yourdomain.com/sitemap.xml — that lists the URLs on your website along with metadata: when each page was last modified, how frequently it changes, and its relative priority. Think of it as a map you hand directly to search engines. Rather than relying entirely on Googlebot to discover your pages by following links, the sitemap provides a direct inventory of what exists on your site.

    Important: submitting a sitemap doesn’t make pages rank. It helps Google know your pages exist and consider them for indexing. Whether they rank depends on their quality, relevance, and the authority of your site.

    Why XML Sitemaps Matter for SEO

    New websites benefit particularly from sitemaps because they have few external links pointing at them. Without a sitemap, Googlebot might not discover all their pages quickly. Large websites — e-commerce stores, news sites, databases — have too many pages for Google to reliably discover through link-following alone. Sites with isolated content — pages not well-linked internally — rely on the sitemap to get those pages found. Good internal linking and a clean sitemap work hand in hand; think of sitemaps as a complement to strong site architecture, not a substitute for it.

    What to Include in Your XML Sitemap

    Include only the canonical, indexable versions of pages you want Google to consider for ranking: key landing pages, service or product pages, blog posts, and category pages. Exclude pages with a noindex tag, pages returning non-200 status codes, duplicate or parameter-driven URLs, admin pages, and anything not intended for Google’s index. Including low-quality or redirected pages wastes Google’s time and signals poor site hygiene. This is one of the areas a technical SEO audit will typically flag quickly.

    How to Create and Submit Your XML Sitemap

    If you’re running WordPress, a sitemap is almost certainly already being generated automatically. Plugins like Yoast SEO, Rank Math, and All in One SEO all generate XML sitemaps. Check yours at yourdomain.com/sitemap.xml or yourdomain.com/sitemap_index.xml. For large sites with thousands of URLs, use a sitemap index file — a master file linking to multiple individual sitemaps, each limited to 50,000 URLs.

    Submit your sitemap via Google Search Console: go to Indexing > Sitemaps and enter the URL. Google will show you how many URLs were submitted versus indexed, and flag any errors. Also reference your sitemap in your robots.txt file so other search engines like Bing can find it automatically.

    How to Optimise Your XML Sitemap

    Keep lastmod dates accurate — only update lastmod when page content has genuinely changed. If Google sees the date changed but content is identical, it’ll start ignoring your lastmod values. Don’t over-rely on priority — Google largely ignores it. Keep your sitemap current as you add or delete pages. Run it through a checker periodically to ensure all URLs return a 200 status code. Any URLs returning errors or redirects should be removed and addressed — see our guide on fixing crawl errors in Google Search Console.

    Common XML Sitemap Mistakes

    Including redirected URLs. Including noindex pages (which contradicts your own directives). Having an outdated sitemap referencing deleted pages. Forgetting to submit to Search Console. All of these send poor signals about site quality and waste Google’s crawl budget.

    Continue with the Technical SEO Series

    XML sitemaps are one piece of a comprehensive technical SEO strategy. Head back to The Complete Guide to Technical SEO to see how they fit into the bigger picture, and explore the other guides on crawl errors, Core Web Vitals, canonicalisation, page speed, mobile-first indexing, and structured data.

  • How to Fix Crawl Errors in Google Search Console

    Crawl errors are one of those technical SEO issues that can quietly suppress your rankings for months without you realising it. They happen when Google tries to visit a page on your website and gets back an unexpected or unhelpful response. The result is that pages you want ranking either don’t get indexed, or get indexed inconsistently — and pages with errors dilute your overall site quality signals.

    What Is a Crawl Error?

    A crawl error occurs when Googlebot attempts to access a URL on your website and encounters a problem — a page that no longer exists, a server that’s temporarily unavailable, a URL that redirects incorrectly, or a page blocked by robots.txt. Not all crawl errors are equally serious. A handful of 404s on old pages nobody links to is completely normal. But systematic errors affecting important pages, or a high volume of errors across the site, needs prompt attention.

    Where to Find Crawl Errors

    The primary place to find and monitor crawl errors is Google Search Console. Navigate to the Indexing section, then look at the Pages report. This shows which pages are indexed and groups non-indexed pages into categories explaining why. Also use a crawl tool like Screaming Frog or Sitebulb for a more granular view — broken internal links, redirect chains, unexpected status codes. These tools can also help you find and fix broken links across your entire site.

    The Most Common Crawl Errors and How to Fix Them

    404 Errors (Page Not Found)

    A 404 error means the server couldn’t find the page at that URL. 404s become a problem when they appear on URLs that other sites link to, that users have bookmarked, or that internal pages still link to — inbound links to 404 pages are wasted link equity. Fix: if the page has moved, set up a 301 redirect from the old URL to the new one. A 301 tells Google the page has permanently moved and passes the link equity along. Don’t redirect every 404 to your homepage — this creates a “soft 404.”

    Server Errors (5xx Status Codes)

    5xx errors indicate a server-side problem. Consistent 5xx errors can cause Google to reduce its crawl rate for your site, slowing how quickly new content gets indexed. Check your server logs to identify what’s triggering the errors. The culprit is usually hosting resources being maxed out, database connection limits, or a plugin causing server crashes. This typically requires your developer or hosting provider.

    Redirect Errors

    A redirect chain is when URL A redirects to URL B, which redirects to URL C. Each hop dilutes the link equity being passed — best practice is to redirect directly to the final destination. A redirect loop is when URL A redirects back to URL A, catching the crawler in an infinite loop. Fix both by identifying them in Screaming Frog, then updating your .htaccess or CMS redirect settings.

    Blocked by Robots.txt

    If your robots.txt has a Disallow rule that matches a URL Googlebot is trying to crawl, it stops and records it as blocked. Robots.txt blocks are often intentional — but the problem is when they accidentally block pages you want indexed. This is most common after site migrations where old robots.txt rules are copied across without review. A technical SEO audit always includes a full robots.txt review.

    Soft 404s

    A soft 404 is when a page returns a 200 status code but the content signals the page doesn’t exist — for example, a search results page returning “no results found.” Google detects these and excludes them from the index. Having lots of them wastes crawl budget and signals poor site quality. Fix: redirect to a relevant alternative, or return a proper 404 status.

    How to Prioritise Crawl Error Fixes

    Focus first on errors affecting important business pages. Fix errors on pages that have inbound links — a 404 with backlinks is wasted authority that a redirect can recover. Investigate patterns of 5xx errors as these indicate server problems that worsen over time. Then clean up redirect chains and loops systematically.

    Keeping on Top of Crawl Errors Ongoing

    Check Google Search Console’s Pages report at least once a month and run a fresh crawl whenever you make significant changes. Set up email alerts in Search Console so you’re notified of new crawl issues. That way, you catch problems early rather than discovering months later that a key page has been returning a 404 since your last site update.

    More from the Technical SEO Series

    Crawl errors are one piece of the technical SEO puzzle. For a complete picture, read The Complete Guide to Technical SEO and explore the rest of the series covering Core Web Vitals, sitemaps, canonicalisation, page speed, mobile-first indexing, and structured data.

  • What is a Technical SEO Audit? (And How to Do One)

    A technical SEO audit is the diagnostic process every good SEO strategy should start with. Before you write a single word of new content or chase a single backlink, you need to know whether Google can actually access, understand, and rank your website properly. Without an audit, you’re guessing. With one, you have a clear picture of exactly what’s working, what isn’t, and where to focus your energy first.

    What Is a Technical SEO Audit?

    A technical SEO audit is a structured review of your website’s technical health from a search engine’s perspective. It looks at how crawlers interact with your site — whether they can find your pages, whether those pages are being indexed correctly, how fast they load, how they perform on mobile, and dozens of other factors that influence where your site appears in search results. It’s specifically about the infrastructure of your website — the stuff underneath the content. Done properly, it surfaces issues that might be silently suppressing your rankings.

    Why You Need One Before Anything Else

    Here’s a scenario that comes up more than you’d think. A business invests months producing excellent blog content. They’ve targeted good keywords, written genuinely useful articles, and earned quality backlinks. But rankings barely move. Nine times out of ten, the culprit is a technical issue. A robots.txt misconfiguration quietly blocking sections of the site. A canonical tag pointing in the wrong direction, splitting authority across duplicate pages. A site scoring so poorly on Core Web Vitals that it’s deprioritised before it gets the chance to compete. Think of a technical audit as the structural survey you’d commission before renovating a house — not glamorous, but essential.

    How to Do a Technical SEO Audit: Step by Step

    Step 1: Start with Google Search Console

    Google Search Console is free and gives you information straight from Google about how it sees your website. In the Coverage report, you’ll find which pages Google has indexed, which it’s excluded, and which have errors. Also check the Core Web Vitals report — any pages in the “Poor” category should be treated as a priority. For issues with crawling, the crawl errors guide walks you through how to address each type.

    Step 2: Crawl Your Site

    Run a full crawl using a dedicated tool. The most widely used are Screaming Frog (free up to 500 URLs), Sitebulb, or the site audit tools built into Ahrefs or Semrush. A crawl tool visits every page on your site — just like Googlebot would — and returns a detailed report: broken links, duplicate title tags, redirect chains, thin content, images without alt text. Screaming Frog is also invaluable for finding broken links across your entire site.

    Step 3: Audit Your Indexation

    Do a sanity check on how many of your pages Google is actually indexing. Type site:yourdomain.com into Google. Compare the result count to the number of pages your crawl tool found. A significant mismatch is a red flag that pages are being blocked, or that Google is finding many pages it doesn’t think are worth indexing.

    Step 4: Check Your Robots.txt and Sitemap

    Your robots.txt file lives at yourdomain.com/robots.txt. Read through it carefully for any Disallow rules that might accidentally block pages you want Google to crawl. After a site migration, outdated robots.txt rules are one of the most common causes of sudden traffic drops. For sitemaps, read our detailed guide on XML sitemaps.

    Step 5: Assess Page Speed and Core Web Vitals

    Run your key pages through Google’s PageSpeed Insights. This gives you Core Web Vitals scores along with specific recommendations. Focus on your most important pages first: homepage, main service pages, and highest-traffic blog posts. See our full guide on how to improve page speed for SEO for actionable fixes.

    Step 6: Check Mobile Usability

    Google’s Mobile Usability report in Search Console flags specific pages with mobile issues. Given that Google uses the mobile version of your site as its primary version for indexing, these issues directly affect your rankings.

    Step 7: Review Site Structure and Internal Linking

    Map out your site structure. Are your most important pages reachable within two or three clicks from the homepage? Are there orphaned pages with no internal links pointing to them? Internal links are how PageRank flows through your site — a deliberate internal linking strategy is one of the highest-leverage actions available to you.

    Step 8: Look for Duplicate Content Issues

    Duplicate content means Google is finding multiple versions of the same content and doesn’t know which version to rank. Common causes: HTTP vs HTTPS, www vs non-www, URL parameters creating multiple versions, paginated content without proper handling. The fix is usually canonical tags, 301 redirects, or parameter handling in Search Console.

    What to Do With Your Audit Findings

    An audit is only useful if it leads to action. Create a prioritised list of issues grouped by impact. Critical issues affecting crawlability or indexation go at the top. Quick wins with minimal development effort come next. More complex improvements are planned in phases. Document everything with recommended fixes and who is responsible.

    How Often Should You Audit Your Site?

    Technical SEO is not a one-time exercise. Run a full technical audit at least once a year, with lighter monthly checks using Search Console and your crawl tool. After any significant site change — a migration, a domain change, a major redesign — run an audit immediately. These events are the most common triggers for sudden traffic drops.

    Ready to Go Deeper?

    This guide gives you the framework. For a fuller understanding of each area, work through the rest of the technical SEO series. If you’d rather have a professional do this for you, our SEO consulting service includes a thorough technical audit as the starting point for every engagement.