Forums

Best (Darn) robot.txt File Ever...

Learn how to promote your business opportunity on Google with Pay-per-click and Search Engine Optimization via a LIVE discussion with top industry leaders.

Moderator: admin

Best (Darn) robot.txt File Ever...

Postby Sherri Beauchamp on Wed Mar 16, 2011 9:31 pm

I wanted to check out what other people are doing with their robot.txt files for their websites? I would like to create the Best (Darn) robot.txt file ever (with your ridiculous overwhelming brilliance of course ;) ).

I realize you get penalized for duplicate content on your site so I want to make sure that I have all the bases covered.

Currently, I have a pretty simple file created for now...

User-agent:*
Disallow: /wp-
Disallow: /feed/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /date/
Disallow: /archives/
Disallow: /thank-you/
Disallow: /category/
Disallow: /tag/
Disallow: /rss/
Disallow: /trackback/

Is there anything else that you would suggest?

I would appreciate any feedback you have... thanks :)

See you at the top,
Sherri
How To 'Marry' On & Offline
Go Get Your 7 Day

FREE 'Blueprint To Freedom' Course
User avatar
Sherri Beauchamp
Company: Magnetic Sponsoring
Contribution Level: 2
 
Posts: 11
Joined: Mon Aug 10, 2009 8:52 am

Re: Best (Darn) robot.txt File Ever...

Postby Kerry Thomas on Wed Mar 16, 2011 10:33 pm

Sherrri Beauchamp wrote:I wanted to check out what other people are doing with their robot.txt files for their websites? I would like to create the Best (Darn) robot.txt file ever (with your ridiculous overwhelming brilliance of course ;) ).

I realize you get penalized for duplicate content on your site so I want to make sure that I have all the bases covered.

Currently, I have a pretty simple file created for now...

User-agent:*
Disallow: /wp-
Disallow: /feed/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /date/
Disallow: /archives/
Disallow: /thank-you/
Disallow: /category/
Disallow: /tag/
Disallow: /rss/
Disallow: /trackback/

Is there anything else that you would suggest?

I would appreciate any feedback you have... thanks :)

See you at the top,
Sherri


Hello Sherri, I see a couple of things that I would add.
Disallow: /privacy/
Disallow: /disclaimer/
Disallow: /sign-up/

I don't think you will want those pages indexed as they can be found by the visitor.

Here is a tip that will help you find things that may creep by you and get indexed. I always do a site: operator search with Google. In your case you would simply enter site:definedestiny.com/ into the Google search bar and you can see what pages are getting indexed.

Just look through the results and see if there are any duplicate pages or unimportant pages showing up. If you find them then edit the robots.txt file to exclude them and you will keep a very clean indexing of your site.

I also want to commend you for handling your tags via the robots.txt file and excluding them. I see lots of blogs out there that have numerous URLS (from each tag) pointing to the same page. Your set up allows the visitors to be able to use the tags and keeps the search engines from picking all those duplicate URLS up. Great Job! Kerry
Read my latest BN article on Branding Yourself
User avatar
Kerry Thomas
Contribution Level: 2
 
Posts: 88
Joined: Sun Jan 02, 2011 11:23 am

Re: Best (Darn) robot.txt File Ever...

Postby Sherri Beauchamp on Thu Mar 17, 2011 9:29 am

Hey Kerry,

Thanks for the input! I couldn't remember the exact site operator search with google thanks for the reminder :D

There are a couple of pages that I have listed as well in the robot.txt but will have to go back and edit it again, just wanted to see what others are doing so thanks for sharing your wisdom.

Have a great day Kerry! (maybe a green beer or 2??)

Sherri
How To 'Marry' On & Offline
Go Get Your 7 Day

FREE 'Blueprint To Freedom' Course
User avatar
Sherri Beauchamp
Company: Magnetic Sponsoring
Contribution Level: 2
 
Posts: 11
Joined: Mon Aug 10, 2009 8:52 am

Re: Best (Darn) robot.txt File Ever...

Postby Ben Fitts on Thu Mar 17, 2011 9:59 am

You're trying to setup a robots.txt for a blog to control where the robots index.

You're not simply trying to setup a robots.txt for a standard web site. That is very important to distinguish because a blog is setup differently. It is a dynamic web site so controlling indexing through your robots.txt file is only going to give you limited results.

Get rid of your robots.txt file.

Instead download the Wordpress SEO plugin from Joost De Valk (yoast.com)
http://wordpress.org/extend/plugins/wordpress-seo/

It is a better solution because in it you can simply answer questions as to what to allow robots to index. They help guide you in the process so you make sure you're allowing the right things to be indexed and disallowing the right things. They will run you through questions like, is this a single author blog? If so then there is no reason to allow author based indexing. Do you use tags? Do you use categories? Do you want your search page to be indexed? etc.

For example your robots.txt will allow your search page to be indexed which could lead to some duplicate content issues. ;)
Benjamin Fitts
Contact me at: 877-BEN-FITTS or Skype: BenjaminFitts
Send Out Cards - One of the top 19 Distributors in the company WORLD WIDE
User avatar
Ben Fitts
Company: SendOutCards
Contribution Level: 3
 
Posts: 257
Joined: Wed Oct 24, 2007 10:24 am

Re: Best (Darn) robot.txt File Ever...

Postby Kerry Thomas on Thu Mar 17, 2011 11:51 am

benfitts wrote:You're trying to setup a robots.txt for a blog to control where the robots index.

You're not simply trying to setup a robots.txt for a standard web site. That is very important to distinguish because a blog is setup differently. It is a dynamic web site so controlling indexing through your robots.txt file is only going to give you limited results.

Get rid of your robots.txt file.

Instead download the Wordpress SEO plugin from Joost De Valk (yoast.com)
http://wordpress.org/extend/plugins/wordpress-seo/

It is a better solution because in it you can simply answer questions as to what to allow robots to index. They help guide you in the process so you make sure you're allowing the right things to be indexed and disallowing the right things. They will run you through questions like, is this a single author blog? If so then there is no reason to allow author based indexing. Do you use tags? Do you use categories? Do you want your search page to be indexed? etc.

For example your robots.txt will allow your search page to be indexed which could lead to some duplicate content issues. ;)


Ben, I believe the SEO plugin you are talking about is simply a wizard to help create what she has already done essentially. It does edit the robots.txt file, so she wouldn't be getting rid of it.

She also has the option to use an .htaccess file to help control her content. Looks as though the plugin does this too, but if she prefers even more flexibility than what the plugin offers then she would be best to do all this manually.

As for indexing her search page...I actually didn't see a search feature on her site, but if one exists then it can be excluded via the robots.txt file anyway.

The plugin would be great for those that don't fully understand the concepts here. Kerry
Read my latest BN article on Branding Yourself
User avatar
Kerry Thomas
Contribution Level: 2
 
Posts: 88
Joined: Sun Jan 02, 2011 11:23 am

Re: Best (Darn) robot.txt File Ever...

Postby Nicole R. Peterson on Thu Mar 17, 2011 12:09 pm

Great post Sherri and very timely! I was just looking into doing the same thing for my blog and never thought to use the experienced knowledge here on Betternetworker.

I've come up with this list so far:
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: */feed
Disallow: /i/
Disallow: /f/
Disallow: /t/
Disallow: /c/
Disallow: /j/
 
User-agent: Mediapartners-Google
Allow: /
 
User-agent: Adsbot-Google
Allow: /
 
User-agent: Googlebot-Image
Allow: /
 
User-agent: Googlebot-Mobile
Allow: /
 
User-agent: ia_archiver-web.archive.org
Disallow: /
 
Sitemap: http://nicolerpeterson.com/sitemap.xml.gz

I believe I will be adding some that were mentioned above by Kerry. In fact Kerry seems to have quite a bit of knowledge on this subject. Do I even need everything I've mentioned here?

Thanks for any input!
Nicole R. Peterson
http://NicoleRPeterson.com
User avatar
Nicole R. Peterson
Company: KB Gold
Contribution Level: 2
 
Posts: 1
Joined: Tue Mar 17, 2009 9:26 am

Re: Best (Darn) robot.txt File Ever...

Postby Ben Fitts on Thu Mar 17, 2011 12:49 pm

I would suggest that the question isn't why you should use the plugin. The question is why shouldn't you?

I can't think of a reason not to use the plugin.

The plugin is written by Joost De Valk. He is the cohost of the Wordpress podcast. His posts show up in your wordpress dashboard ;) He is the author of at least a half dozen (maybe more like a dozen) awesome wordpress plugins. He knows more about wordpress than 99.9% of us.

This one plugin allows you to replace a half dozen other wordpress plugins. (replace: sitemap, headspace,allinoneseo, breadcrumbs, etc.)
Benjamin Fitts
Contact me at: 877-BEN-FITTS or Skype: BenjaminFitts
Send Out Cards - One of the top 19 Distributors in the company WORLD WIDE
User avatar
Ben Fitts
Company: SendOutCards
Contribution Level: 3
 
Posts: 257
Joined: Wed Oct 24, 2007 10:24 am

Re: Best (Darn) robot.txt File Ever...

Postby Kerry Thomas on Thu Mar 17, 2011 1:18 pm

benfitts wrote:I would suggest that the question isn't why you should use the plugin. The question is why shouldn't you?

I can't think of a reason not to use the plugin.

The plugin is written by Joost De Valk. He is the cohost of the Wordpress podcast. His posts show up in your wordpress dashboard ;) He is the author of at least a half dozen (maybe more like a dozen) awesome wordpress plugins. He knows more about wordpress than 99.9% of us.

This one plugin allows you to replace a half dozen other wordpress plugins. (replace: sitemap, headspace,allinoneseo, breadcrumbs, etc.)


Hello Ben, That is a good question regarding the plugin...As i stated I think that if there is someone out there that needs help to do these tasks, then by all means go for it -- use the plugin.

As for my personal preference, I just really prefer to do these types of things myself. This simply gives me a more granular level of control over my sites. Hope this makes sense. Kerry
Read my latest BN article on Branding Yourself
User avatar
Kerry Thomas
Contribution Level: 2
 
Posts: 88
Joined: Sun Jan 02, 2011 11:23 am

Re: Best (Darn) robot.txt File Ever...

Postby Kerry Thomas on Thu Mar 17, 2011 3:44 pm

npeterson wrote:Great post Sherri and very timely! I was just looking into doing the same thing for my blog and never thought to use the experienced knowledge here on Betternetworker.

I've come up with this list so far:
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: */feed
Disallow: /i/
Disallow: /f/
Disallow: /t/
Disallow: /c/
Disallow: /j/
 
User-agent: Mediapartners-Google
Allow: /
 
User-agent: Adsbot-Google
Allow: /
 
User-agent: Googlebot-Image
Allow: /
 
User-agent: Googlebot-Mobile
Allow: /
 
User-agent: ia_archiver-web.archive.org
Disallow: /
 
Sitemap: http://nicolerpeterson.com/sitemap.xml.gz

I believe I will be adding some that were mentioned above by Kerry. In fact Kerry seems to have quite a bit of knowledge on this subject. Do I even need everything I've mentioned here?

Thanks for any input!


Nicole, this is my robots.txt file that I currently use with my blogs.
-----------------------------------------------------------------------------------------------------
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /tags
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?

<-----------You need to see why these are being generated in the first place. You'll need them excluded until you solve the issue! So add them in the file.
Disallow: /i/
Disallow: /f/
Disallow: /t/
Disallow: /c/
Disallow: /j/



Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# digg mirror
User-agent: duggmirror
Disallow: /

Sitemap: http://put-your-url-here/sitemap.xml

Give this one a try and then just keep an eye out for pages that get indexed that you don't want indexed. If you do get a bad url in the index, you can remove it via google webmaster tools. If you need help let me know. Kerry
Read my latest BN article on Branding Yourself
User avatar
Kerry Thomas
Contribution Level: 2
 
Posts: 88
Joined: Sun Jan 02, 2011 11:23 am

Re: Best (Darn) robot.txt File Ever...

Postby Better Networker Support on Fri Mar 18, 2011 6:51 am

UPDATE: Spam Account removed and blocked from this Forum topic.

For more information on Betternetworker Site Rules and Guidelines:
http://www.betternetworker.com/page/bet ... guidelines
Thanks,

Better Networker Support
http://www.betternetworker.com/help
User avatar
Better Networker Support
Contribution Level: 3
 
Posts: 273
Joined: Thu Oct 07, 2010 7:28 am

Next

Return to PPC and SEO

Who is online

Users browsing this forum: MaxMcRae and 0 guests