I'm having an issue with my robots.txt that I believe is hurting my indexing. In short, we were operating for a couple months without one.
The Personal Development Company
About 6 months ago we switched from an X-cart store to a stomper commerce store. In the process, multiple categorie URLs were created and indexed that were not ultimately used for the site.
We also used the default product pages with stomper commerce, which were different than the x-cart urls again resulting in hundreds of URLs in the index that are no longer being used.
We did not re-direct these pages to the new site because we did not have a ton of links to the pages and we controlled most of them so we were able to change the urls.
In addition to the problems above, we also did not include a robots.txt file excluding all of the duplicate pages generated by the sort features of stomper commerce.
Now we are having significant issues getting indexed in google.
I added a robots.txt file about 4 months ago.
In our google webmaster tools the crawl errors under restricted by robots.txt started to take off and peaked about two months ago at 4000. It has now been sitting at 2000 for about a month without movement. I tried removing some of those URLs to see if that would speed up the process. It did not seem to. I have some pages on the restricted page that show as last crawled on November 8. When I first put the robots.txt file up it took about 3 weeks after it was crawled for a file to get located and removed now it is taking months.
When I do a site:www.thepersonaldevelopmentcompany.com It shows about 1370 results. Then when you click through it comes up with 232 indexed pages which mostly consists of our category pages and a couple product pages with a large amount of links. Our site actually has about 750 pages (not including duplicate pages from the sort functions). Are these results relevant and should the 1370 ideally read 750?
Has anyone else had this issue? Is there anything I can do to speed this up. We've added some re-directs to the htaccess file, for pages that had links. If I go through and re-direct everything, will that make a difference?
Thanks in advance for your help.
Forums
Robots.txt Question
Moderator: admin
3 posts • Page 1 of 1
Re: Robots.txt Question
Hello Ben, Looking at you robots.txt file I see that you included a disallow: /*? in the file. This will block most if not all of your urls from getting indexed.
Here's what I would do to fix this:
First edit your robots.txt file to look like this:
User-agent: *
Disallow:
This version allows all pages to be indexed by all user-agents. At this point you are giving Google access to all your pages. Give them some time to index the site and start monitoring what is getting indexed via the site: search operator.
Once you see URLs that you don't want indexed, then you can edit your robots.txt file to exclude those specifically. Instead of using the wildcard "*" use more targeted directives to block the unwanted urls.
If you start getting dynamic urls indexed you can use webmaster tools to help you handle parameters such as the sort functions,searches, etc.
To answer this question, The best way to do this is to going to be to let Google reindex your site and then exclude any of those duplicates -- you can probably use the parameter handling in webmaster tools to take care of the "sort" duplicates.
I would hold off on the redirects and not try to remove any pages until you have some new results from the indexing.
I'm going to make a note to start watching your site and monitoring your robots.txt file and will PM you to help you get this resolved. Kerry
Here's what I would do to fix this:
First edit your robots.txt file to look like this:
User-agent: *
Disallow:
This version allows all pages to be indexed by all user-agents. At this point you are giving Google access to all your pages. Give them some time to index the site and start monitoring what is getting indexed via the site: search operator.
Once you see URLs that you don't want indexed, then you can edit your robots.txt file to exclude those specifically. Instead of using the wildcard "*" use more targeted directives to block the unwanted urls.
If you start getting dynamic urls indexed you can use webmaster tools to help you handle parameters such as the sort functions,searches, etc.
When I do a site:www.thepersonaldevelopmentcompany.com It shows about 1370 results. Then when you click through it comes up with 232 indexed pages which mostly consists of our category pages and a couple product pages with a large amount of links. Our site actually has about 750 pages (not including duplicate pages from the sort functions). Are these results relevant and should the 1370 ideally read 750?
To answer this question, The best way to do this is to going to be to let Google reindex your site and then exclude any of those duplicates -- you can probably use the parameter handling in webmaster tools to take care of the "sort" duplicates.
Has anyone else had this issue? Is there anything I can do to speed this up. We've added some re-directs to the htaccess file, for pages that had links. If I go through and re-direct everything, will that make a difference?
I would hold off on the redirects and not try to remove any pages until you have some new results from the indexing.
I'm going to make a note to start watching your site and monitoring your robots.txt file and will PM you to help you get this resolved. Kerry
Read my latest BN article on Branding Yourself
-

Kerry Thomas
Contribution Level: 2 - Posts: 88
- Joined: Sun Jan 02, 2011 11:23 am
Re: Robots.txt Question
Kerry,
Thanks for your input. I did a little research based on your input, and I think you are probably right. I think I was confusing the problem that I had with all of the old URLs from the site change over with this other potential cannonical issue. The one thing that concerns me though is that I was already set-up to exclude the duplicate pages in my paramaters, but some were still showing up. I'm going to go ahead and try it and see what happens.
Thanks for your help.
Thanks for your input. I did a little research based on your input, and I think you are probably right. I think I was confusing the problem that I had with all of the old URLs from the site change over with this other potential cannonical issue. The one thing that concerns me though is that I was already set-up to exclude the duplicate pages in my paramaters, but some were still showing up. I'm going to go ahead and try it and see what happens.
Thanks for your help.
-

Ben Sanderson
Contribution Level: 1 - Posts: 11
- Joined: Thu Jan 20, 2011 10:13 pm
3 posts • Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest


