How to Avoid Google’s Doorway Page Spam Penalty by Finding and Fixing Orphan Pages

Doorway pages invite Google penalties and algorithmic downgrades that can hurt your visibility in search results or, in worst case scenarios, remove you from the index entirely.

Avoiding the intentional use of doorway pages to manipulate search engines is easy enough, of course. However, it’s possible to inadvertently create URLs that appear to be doorway pages to search engines, while in pursuit of a very different goal.

Read on to learn how to find and fix pages that might unintentionally fall into this category.

Doorway pages AKA “spamdexing”

Doorway pages, also known as gateway pages, are pages created purely to capture search engine traffic but not add value for users.

Examples of these types of pages, as referenced in Google’s spam guidelines:

  • Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page.
  • Pages generated to funnel visitors into the actual usable or relevant portion of your site(s).
  • Substantially similar pages that are closer to search results than a clearly defined, browsable hierarchy.

Orphan pages as unintentional doorways

One of the most common features of a doorway page is that it links back to the main site, but it does not receive any links from anywhere else on the site. This fact alone doesn’t mean that it is a doorway page, but it is a warning sign to search engines that you may be trying to use the pages to funnel visitors from search results onto less relevant pages, and that you don’t want anybody visiting these pages if they’ve already landed on more relevant pages.

But orphan pages can occur for a number of legitimate reasons, too:

  • Landing pages set up for specific events like webinars and similar may not be linked publicly from your site because they are intended for a limited audience.
  • Landing pages intended for use as part of ad campaigns are typically isolated and not linked to from the main site, since they are designed for very specific audiences and there may be multiple versions for different audiences or for split testing.
  • Old landing pages for old campaigns that are no longer relevant.
  • Current pages that were added to the site but never properly promoted with navigational links for one reason or another.

While these are all legitimate reasons to have orphan pages, they can still flag as doorway pages to search engines. Even if not, the lack of inbound navigational links prevents these pages from performing well in search results.

5 ways to locate orphan pages

For the fastest version, skip to “Get Your URLs From Google Analytics” and proceed from there. Provided Google Analytics has been installed for long enough, you can use it to find all pages that have ever been visited on your site, which won’t necessarily include all of your orphans, but should come close. If you have over 5,000 pages, Google will only be able to export the top 5,000.

1. WordPress method

If your site resides in WordPress, start with this method. WordPress typically makes all URLs in its database accessible via links on the site, but there may be exceptions for some pages and some themes.

  1. Log in to WordPress.
  2. Backup WordPress, you are about to install a new plugin.
  3. In the left navigation, go to Plugins > Add New.
  4. In the “Search” bar enter “export all urls” and press “Search Plugins.”
  5. The first result should be “Export All URLs” by Atlas Gondal. Click the “Details” link and check the pop-up window to verify that it says “Author: Atlas Gondal.”
  6. Click “Install Now.”
  7. Click “Activate Plugin” after installation completes.
  8. Navigate to Settings > Export All URLs.
  9. Under “Select a Post Type to Extract Data,” select “All Types.”
  10. Under “Additional Data” make sure that “URLs” is selected. You may also select “Titles” and “Categories” for your convenience, although this is not strictly necessary.
  11. Choose “CSV” under “Export Type.”
  12. Press “Export.”
  13. Save the CSV in a memorable location for step 5.

2. Ahrefs method

This step requires the paid version of Ahrefs, and it is optional, so if you aren’t already paying for Ahrefs, feel free to skip this step and go for the next one.

  1. Go to ahrefs.com and log in.
  2. Type your homepage URL into the search bar and press ENTER
  3. In the left navigation, go to Organic search > Top pages
  4. Click the “Export” link at the top right of the Top Pages chart.
  5. Select “Full Export” and press the “Start Export” button.
  6. Within a few minutes, there should be a notification in the top right of Ahrefs letting you know that they spreadsheet is available for download. Download it and save it in a memorable location for step 5.

3. Google Analytics method

Here’s how to use Google Analytics to capture all pages on your site that have ever received traffic since Google Analytics was installed.

  1. Log in to Google Analytics.
  2. In the top right corner, adjust the date range to cover the entire history of time where Analytics was collecting data.
  3. In the left navigation, go to BEHAVIOR > Site Content > All Pages.
  4. In the bottom right corner of the Pages screen, click the “Show rows” drop down and select the highest number of pages possible (which will likely be 5,000).
  5. In the top menu, select Export > CSV.
  6. The CSV will download. Save it in a memorable location for step 5.

4. Screaming Frog method

You will need to compare the URLs discoverable on the web with the URLs discoverable through links on your site. You can use the free version of Screaming Frog to collect the URLs discoverable through links on your site.

  1. After downloading and installing Screaming Frog, open up the program.
  2. Type your site’s homepage into the bar at the top of the window and either hit the ENTER key or press the “Start” button, then allow the progress bar to climb to 100%, indicating that the site has been fully crawled.
  3. Verify that you are in the “Internal” tab and select “HTML” from the “Filter” drop down menu.
  4. Select the entire “Address” column, and copy these URLs into a spreadsheet for the next step.

5. Use Excel or Google Sheets to identify your orphan pages

Use your spreadsheet software of choice to identify orphan pages using these steps.

  1. In cell A1, enter “Screaming Frog” as the heading for column A.
  2. Paste the URLs from Screaming Frog into column A of your spreadsheet, starting in cell A2.
  3. In cell B1, enter “Web URLs” as the heading for column B.
  4. Paste the URLs from Google Analytics (from step 3) into cell B2.
  5. The URLs from Google Analytics don’t include the parent domain, only the subfolders. In column C, type the parent domain, and drag it down column C so that it is listed next to every URL in column B. (Make sure to include the http, https, www, or anything else at the beginning of your parent URL. Do not include a trailing slash “/” at the end of your parent URL.)
  6. In cell D2, type the formula =concat(C2,B2). This should populate cell D2 with your the full URL. (The most common error here is to have two slashes rather than one between the parent domain name and subfolder.)
  7. Drag the formula from cell D2 down to the bottom.
  8. Select and copy the full URLs from column D.
  9. Right-click cell B2 and select Paste Special > Paste Values Only.
  10. Column B should now be populated with the full URLs from Google Analytics, rather than the subfolders.
  11. Clear columns C and D.
  12. If you collected URLs from WordPress (step 1) or Ahrefs (step 2), paste these into column B as well, below the final cell currently populated with URLs from Google Analytics.
  13. Type “Match?” into cell C1 as the heading for column C.
  14. Type the formula =match(B2,A:A,0) into cell C2. This formula returns the position of the URL from column B within column A. The important part is that the result will be “#N/A” if the URL does not existing in column A, meaning that the URL couldn’t be found in Screaming Frog, meaning that it is an Orphan Page.
  15. For convenience, you can select all of column C, copy it, right-click cell C1 and select Paste Special > Values Only, then sort the results to isolate all of the “#N/A” results in one location. The corresponding URLs in column B are your orphan pages.

After finding orphan pages, you should do one of the following:

  • Add navigational links to the page, and editorial links within your content, to promote the pages, unless there is a legitimate reason why you wouldn’t want to promote these on your main site.
  • If for some reason the pages should remain hidden from your main site visitors, they should likely remain hidden from the search engines as well. Use the noindex tag to do this.

All set!

There is more where this came from…

The best articles from this blog are available all in one place – our book. Now on it’s 6th edition.

Content Chemistry, The Illustrated Handbook for Content Marketing, is packed with practical tips, real-world examples, and expert insights. A must-read for anyone looking to build a content strategy that drives real business impact. Check out the reviews on Amazon.

Buy now direct $29.95