Last Updated:

Why Google doesn't index some pages on my site

If you're working with a site, especially a large one, you've probably noticed that not all pages on your site are indexed. There may be several reasons.

Many SEOs still believe that Google can't index content due to technical considerations, but that's a myth. The truth is that Google may not index your pages unless you send consistent technical signals about which pages you want to index.

As for other technical issues: Things like JavaScript really make indexing harder, your site can suffer from serious indexing issues, even if it's written in pure HTML.

Reasons Why Google Doesn't Index Your Pages

Checking the most popular online stores in the world, they found that on average, 15% of their indexed product pages cannot be found on Google.

The result is amazing. Why? What are the reasons why Google decides not to index something that technically needs to be indexed?

Google Search Console reports several statuses of unindexed pages, such as "Crawled - Not Currently Indexed" or "Detected - Not Currently Indexed." While this information clearly doesn't help solve the problem, it's a good start to the diagnosis.

Main problems with indexing

The most popular indexing issues reported by Google Search Console are:

1. "Scanned – not currently indexed"

In this case, Google visited the page but didn't index it.

In my experience, this is usually a content quality issue. Given the e-commerce boom that is currently taking place, it is expected that Google has become more demanding on the quality of sites. So if you notice that your pages are "crawled – not currently indexed," make sure the content on those pages has unique value:

  • Use unique titles, descriptions, and text on all indexed pages.
  • Avoid copying product descriptions from external sources.
  • Use canonical tags to combine duplicate content.
  • Prevent Google from crawling or indexing low-quality sections of your site using a robots.txt file or a noindex tag.

2. "Detected - Currently not indexed"

This problem can cover everything from scanning issues to insufficient content quality. This is a serious problem, especially in the case of large online stores. And this can happen to tens of millions of URLs on a single site.

Google Index 1
Caption

 

Google may report that e-commerce product pages are "discovered – not currently indexed" for the following reasons:

  • Crawl budget issue: There may be too many URLs in the crawl queue and they can be crawled and indexed later.
  • Quality issue: Google may think that some pages on this domain aren't worth crawling and decide not to visit them by looking for a template in their URL.

It takes some experience to deal with this problem. If you find that your pages are "discovered – not currently indexed," do the following:

  1. Determine if there are page templates that fall into this category. Maybe the problem is related to a certain category of products, and the whole category has no internal connection? Or maybe a huge portion of product pages are waiting in line to be indexed?
  2. Optimize your crawling budget. Focus on detecting low-quality pages that Google spends a lot of time crawling. Common suspicions include filtered category pages and internal search pages – these pages can easily reach tens of millions on a typical e-commerce site. If Googlebot is free to crawl them, it may not have the resources to access valuable materials on your site indexed by Google.

3. "Recurring Content"

 

Duplicate content can be caused by various reasons, for example:

  • Language variants (for example, English in the United Kingdom, the United States, or Canada). If you have multiple versions of the same page targeting different countries, some of those pages may not be indexed.
  • Duplicate content used by your competitors. This often happens in e-commerce, when multiple sites use the same product description provided by the manufacturer.

Aside from using rel=canonical, 301 redirects, or creating unique content, I would focus on providing unique value to users. Fast-growing-trees.com can be an example. Instead of boring descriptions and tips for planting and watering on the site, you can see a detailed FAQ for many products.

In addition, you can easily compare similar products.

There is a FAQ for many products.

How to Check Your Website's Indexing

You can easily check how many pages of your site aren't indexed by opening the indexing report in Google Search Console.

Index in Google

The first thing you should pay attention to is the number of pages excluded. Then try to find a pattern — what types of pages aren't indexed?

If you have an online store, you're more likely to see unindexed product pages. While this should always be a warning sign, you can't expect all of your product pages to be indexed, especially on a large site. For example, in a large online store, there will necessarily be duplicate pages and products with an expired shelf life or are out of stock. These pages may lack the quality that would put them at the top of Google's indexing queue (and that's if Google decides to crawl these pages at all).

In addition, large online stores tend to have problems with the scanning budget. I've seen cases where there have been over a million products in online stores, while 90% of them have been classified as "detected – currently not indexed." But if you see important pages being excluded from Google's index, you should be seriously concerned.

How to Make Your Pages More Likely to Be Indexed by Google

Each site is different and can have different indexing issues. However, here are tips to help your pages get indexed:

1. Avoid "Soft 404" errors.

Make sure there's nothing on your pages that might falsely indicate a soft 404 status. This includes anything from using "Not Found" or "Not Available" in the copy to having a "404" number in the URL.

2. Use internal links.

Internal links are one of the key signals to Google that a given page is an important part of the site and deserves to be indexed. Don't leave extra pages in the structure of your site and don't forget to include all indexed pages in sitemaps. Internal links are one of the elements of a kaizen site.

3. Implement a robust scanning strategy.

Don't let Google crawl your site. If too many resources are spent crawling the less valuable parts of your domain, it may take Too Long for Google to get to the right one. Analyzing the server log can give you a complete picture of what Googlebot is crawling and how to optimize it.

4. Elimination of low-quality and duplicate content.

Every big site ends up with pages that shouldn't be indexed. Make sure these pages don't make it into your sitemaps, and use the noindex tag and robots.txt file if necessary. If you allow Google to spend too much time in the worst parts of your site, it may underestimate the overall quality of your domain.

5. Send consistent SEO signals.

One common example of sending inconsistent SEO signals to Google is changing canonical tags using JavaScript. As Google's Martin Splitt said while running JavaScript SEO Office Hours, you can never be sure of what Google will do if you have one canonical tag in the original HTML and another after rendering the JavaScript.

The Internet is getting too big

Over the past couple of years, Google has made a giant leap in how JavaScript is handled, making it easier for search engine optimizers to work. These days, it's less common to see JavaScript-based sites that don't get indexed because of the specific technical stack they use.

But can we expect the same with non-JavaScript indexing issues? I don't think so. The Internet is constantly growing. Every day there are new sites, and existing ones are growing. Will Google be able to deal with this problem?

This question pops up from time to time. Google quote:

"Google has a limited number of resources, so when it encounters the almost infinite amount of content available on the web, Googlebot can only find and crawl a portion of that content. Then, from the scanned content, we can index only a part of it."

In other words, Google can only visit a fraction of all the pages on the web and index an even smaller portion. And even if your website is great, you have to keep that in mind.

Google probably won't visit all the pages of your site, even if it's relatively small. Your job is to make sure that Google can discover and index pages that are important to your business.