The cost of indexing a rapidly expanding web

In March 2022, Gary Illyes shared that 60% of the internet is duplicate content.

Crawling the internet is expensive. Google has 23 data centers across the globe. Together Amazon, Microsoft and Google now account for more than 50% of the world’s largest data centers.

And the portions allocated to Google Search are filling up. While other search engines have joined forces for IndexNow. Google experimented with joining in the shared data set back in January 2023 but has seen rejected the sustainable effort.

As of November 2022, Google is not listed as an IndexNow participant.

SEOs across the internet have been rabbling for well over a year about indexation delays and wild fluctuations.

Fundamentally, SEOs need to understand that search engines are businesses. Google doesn't crawl your site to be altruistic.

They want to find pages that answer user intents to display in SERP. It costs very real resources to crawl your site.

Think of it like a vending machine.

Google puts a dollar in and expects a candy bar out. If that only happens every five times, they're going to find a more reliable vending machine. One that provides more consistently useful results.

The Helpful Content Update

The Helpful Content Update started rolling out in late August 2022 with a peppy name and a sitewide impact:

“This update introduces a new site-wide signal that we consider among many other signals for ranking web pages. Our systems automatically identify content that seems to have little value, low-added value or is otherwise not particularly helpful to those doing searches.”

The sunshine and buzzsaws algorithm is aiming to carve away crawl budget (and Google WRS resources) from unhelpful sites.

"Any content — not just unhelpful content — on sites determined to have relatively high amounts of unhelpful content overall is less likely to perform well in Search, assuming there is other content elsewhere from the web that's better to display. For this reason, removing unhelpful content could help the rankings of your other content."

Update Mechanics

Lily Ray pulled out 4 key takeaways about how the update works in a conversation with Google Search Liason Danny Sullivan.

Mechanics disclosed include:

  1. The update applies a classifier to sites with a disproportionate amount of unhelpful content (we knew this part). This causes sitewide negative rankings.

  2. If the update didn't impact your site, it means your content is helpful - no classifier was applied.

  3. The classifier is using machine learning, so it's going to get smarter and better all the time. This could potentially lead to broader impacts in the future.

  4. Danny noted that the classifier can be ramped up or down as a signal during future core updates. This will be interesting to pay attention to.

Indexing Implications of Helpful Content Update

The newly added documentation on technical requirements lays out the low bar for content to be indexed in Google Search:

  1. Googlebot isn't blocked.

  2. The page works, meaning that Google receives an HTTP 200 (success) status code.

  3. The page has indexable content.

The article stars the takeaway:

"Just because a page meets these requirements doesn't mean that a page will be indexed; indexing isn't guaranteed."

With resources source and Google's algorithm designed to pull crawl budget away from low ROI resources, the question is clear:

Will getting content indexed the next big challenge?

We know the search engine is already making judgment calls on which URLs they’re including in the index based.

  1. Direct duplication of the page content

  2. Duplication of the value proposition

  3. Duplication of user satisfaction, e.g., the content serves the same purpose as other content on the website on a different URI when it comes to answering a user query

SEOs began reporting issues with site indexing in November 2021. In December, Tomek Rudzki published an article detailing how Google appeared to be forgetting about URLs in its indexing queue.

Indexing issues reemerged in July 2022, when an issue with indexing was causing a large number of sites to experience delayed indexing.

This indexing issue occurred at the same time Google launched the Video Index. While these events are correlative, the underlying implication of resource limitations is reasonable.

Sites are now competing for crawling and WRS resources as Google prunes its index.

Google is already making judgment calls on which URLs they’re including in the index by looking at the Index Coverage Report.

Looking toward the future of indexing

In order to thrive in an index coverage drought, sites must be intentional with the resources they present for indexing. This applies not only at the page level but also site-wide.

If a new dynamic functionality will generate 100 new landing pages but only 4 of them are unique and valuable to human users, then the overall balance of the site is negatively impacted.

Pages should meet the minimum requirements for publication. These will be unique to the site and its value proposition.

Crufting non-useful content now can aid in providing visibility into how Google is crawling and indexing your site. Properly segmented sitemaps can be a useful tool in identifying which areas of the site are experiencing indexing issues.

Sites with high amounts of dynamically generated content and those writing pages/primary content blocks for search engines will be most impacted.

Resources


Published on 1/2/2026 by Jamie Indigo