In short, if you are building doorway type pages without unique content on them, Google won’t index them all properly. If you are sloppy, and also produce thin pages on the site, Google won’t exactly reward that behaviour either.
A content performance analysis will gauge how well each section of the site performs. If Google is de-indexing large swaths of your content that you have actually submitted as part of an XML sitemap, then a problem is often afoot. Some URLs are just not welcome to be indexed as part of your website content anymore. You can use a noindex on low-quality pages if page quality cannot be improved.
I usually start with a performance analysis that involves merging data from a physical crawl of a website with analytics seo help data and Google Search Console data. A content type analysis will identify the type of pages the cms generates.
An XML Sitemap is a file on your server with which you can help Google easily crawl & index all the pages on your site. This is evidently useful for very large sites that publish lots of new content or updates content regularly.
Ultimately the recommendation is to focus on “improving content” as “you have the potential to go further down if you remove that content“. REMEMBER – DEAD PAGES are only one aspect of a site review.
Google has said very recently XML and RSS are still a very useful discovery method for them to pick out recently updated content on your site. You should check to see if pages you want indexed are included in this list of URLs.
There’s going to be a large percentage of any site that gets a little organic traffic but still severely underperforms, too – tomorrows DEAD pages. Judicious use of ‘noindex,follow‘ directive in robots meta tags, and sensible use of the canonical link element are required implementation on most sites I see these days. False positives aside, once you identify the pages receiving no traffic, you very largely isolate the type of pages on your site that Google doesn’t rate – for whatever reason. The thinking is if the pages were high-quality, they would be getting some kind of organic traffic. If you have 100,000 pages on a site, and only 1,000 pages get organic traffic from Google over a 3-6 month period – you can make the argument 99% of the site is rated as ‘crap’ .
A good 404 page and proper setup prevents a lot of this from happening in the first place. A poor 404 page and user interaction with it, can only lead to a ‘poor user experience’ signal at Google’s end, for a number of reasons. I will highlight a poor 404 page in my audits and actually programmatically look for signs of this issue when I scan a site. I think rather that any rating would be a second order scoring including data from user activity on the SERPs – stuff we as SEO can’t see.
If they are, this could be indicative of a page quality issue. Review how you use canonical link elements throughout the site. Identify your primary content assets and improve them instead of optimising low-quality pages . What you do to handle paginated content will depend on your circumstances. If you are making websites and want them to rank, the Quality Raters Guidelines document is a great guide for Webmasters to avoid low-quality ratings and potentially avoid punishment algorithms. If you properly deal with mishandled 404 errors that have some link equity, you reconnect equity that was once lost – and this ‘backlink reclamation’ evidently has value. Google doesn’t want to index pages without a specific purpose or sufficient main content.