As of today, if you run a Google search to see what pages of my site are in the index you’ll find quite a bit. However, the pages in the index reflect a URL structure that I’m not thrilled with for SEO reasons. In order to remedy this problem I did some playing with Google’s webmaster tools and my .htaccess file. As of today we’re going to employ some 301 redirects to get the SERP’s pointing in the right place, remove some old pages from the index, and modify my robots.txt. Hopefully these changes will concentrate my search results to show off the real content of this site and lose some of the fluff. Along the way, we’re also going to find out how long it takes for these corrections to take place.
First things first, here’s the problem, if I do a google site search like this
site:www.ericdelabar.com I get results that look like this:
Notice the entries that look something like
www.ericedelabar.com/?cat=11? They all work, but they’re not ideal for SEO, so first things first, the
redirect 301 /?cat=1 http://www.ericdelabar.com/
redirect 301 /?cat=5 http://www.ericdelabar.com/category/standards
redirect 301 /?cat=6 http://www.ericdelabar.com/category/seo
redirect 301 /?cat=7 http://www.ericdelabar.com/category/firefox-extensions
redirect 301 /?cat=9 http://www.ericdelabar.com/category/eclipse
redirect 301 /?cat=10 http://www.ericdelabar.com/category/the-hard-way
redirect 301 /?cat=11 http://www.ericdelabar.com/category/css
redirect 301 /?cat=12 http://www.ericdelabar.com/category/view-from-the-trenches
redirect 301 /?cat=13 http://www.ericdelabar.com/category/user-behavior
redirect 301 /?cat=14 http://www.ericdelabar.com/category/usability
redirect 301 /?p=5 http://www.ericdelabar.com/2007/02/in-beginning-there-was-doctype.html
redirect 301 /?p=7 http://www.ericdelabar.com/2007/03/lets-talk-about-tools-part-2.html
redirect 301 /?p=8 http://www.ericdelabar.com/2007/03/css-things-i-learned-hard-way-absolute.html
Each line in the file is a rule, the
redirect is the command, the number is the type, which in this case is 301, which means “moved permanently,” the original URL comes next, and then finally the desired URL. Looking at my list, I have two basic redirects here, redirects to convert from category id to the category name, and redirects to convert from post id to post slug. These lines must be before the
# BEGIN WordPress comment in your .htaccss file, if they are not, they will not work.
Now that categories and posts are redirecting, there is one final problem, the last few pages of results contain quite a few search result pages. These pages have a URL pattern something like the following:
http://www.ericdelabar.com/index.php?s=firebug. I don’t really want my search result pages in the SERPs, but I’m not ready to get rid of them all yet. To remedy this I simply installed the Search Permalink plug-in which redirects the
?s=keyword pattern to
/search/keyword/, which is not perfect, but looks a little nicer than the query string. In the long run I will probably remove them from the SERPs altogether, but I haven’t decided on my search strategy for this site yet, so I’ll leave that for another day.
Next we want to keep Google out of the site admin, so we add the WordPress login page and admin section to the
robots.txt file by adding the following lines.
Next, we’re going to use the Google Webmaster Tools to remove the wp-login and wp-admin pages from the index. This requires that the pages have either a robots.txt file disallowing them or a robots metatag with
noindex specified. Since our robots.txt file should handle this, our request should look like the following:
Finally, we resubmit the site for crawling and hope that this gets cleared up within a few days.
Now this methodology only specifically addresses Google, my site has also been indexed in MSN, Yahoo, and Ask, and steps will have to be taken to resolve these as well, but fixing Google is definitely a good start!