npeterson wrote:Great post Sherri and very timely! I was just looking into doing the same thing for my blog and never thought to use the experienced knowledge here on Betternetworker.
I've come up with this list so far:
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: */feed
Disallow: /i/
Disallow: /f/
Disallow: /t/
Disallow: /c/
Disallow: /j/
User-agent: Mediapartners-Google
Allow: /
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Image
Allow: /
User-agent: Googlebot-Mobile
Allow: /
User-agent: ia_archiver-web.archive.org
Disallow: /
Sitemap:
http://nicolerpeterson.com/sitemap.xml.gzI believe I will be adding some that were mentioned above by Kerry. In fact Kerry seems to have quite a bit of knowledge on this subject. Do I even need everything I've mentioned here?
Thanks for any input!
Nicole, this is my robots.txt file that I currently use with my blogs.
-----------------------------------------------------------------------------------------------------
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /tags
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
<-----------You need to see why these are being generated in the first place. You'll need them excluded until you solve the issue! So add them in the file.
Disallow: /i/
Disallow: /f/
Disallow: /t/
Disallow: /c/
Disallow: /j/
Allow: /wp-content/uploads
# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
# digg mirror
User-agent: duggmirror
Disallow: /
Sitemap:
http://put-your-url-here/sitemap.xmlGive this one a try and then just keep an eye out for pages that get indexed that you don't want indexed. If you do get a bad url in the index, you can remove it via google webmaster tools. If you need help let me know. Kerry