Go Back   Web Hosting UK Forums | Linux Windows Dedicated Server and cPanel VPS Hosting Forum > Services > Search Engine Optimization

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-06-2006, 12:44 AM
Moderator
 
Join Date: Sep 2006
Posts: 629
Send a message via MSN to harry
Default Robots.txt or .htaccess file

Which file you will prefer to prevent unnecessary robots from crawling your site;
using a robots.txt file or a .htaccess file.
Reply With Quote
  #2 (permalink)  
Old 12-06-2006, 08:50 AM
kev woodman's Avatar
Premium Member
 
Join Date: Jul 2006
Location: Newport, Wales, UK.
Posts: 1,494
Default

Robots if you're just interested in keeping out spiders. You can do all sorts of clever stuff with .htaccess but it's probably over the top for blocking search engines. If you're talking about securing an area of the site then a proper user authentication/management system is a better solution than both of the above.
__________________
homo sum: humani nil a me alienum puto ... ( just Google it )
Reply With Quote
  #3 (permalink)  
Old 12-06-2006, 09:50 AM
Rodney's Avatar
Senior Member
 
Join Date: Sep 2006
Posts: 179
Default

IMO using robots.txt file is a good option for stopping unnecessary spiders from crawling your website.
Reply With Quote
  #4 (permalink)  
Old 12-06-2006, 10:52 AM
Moderator
 
Join Date: Sep 2006
Posts: 185
Post

I would probably go with robots.txt file to prevent robots from crawling the site. I think Kev has explained it in a nut-shell.
Reply With Quote
  #5 (permalink)  
Old 12-06-2006, 09:14 PM
Moderator
 
Join Date: Sep 2006
Posts: 629
Send a message via MSN to harry
Default

Thanks for your suggestions,
I think using robots.txt will not solve the main problem because there are some bad robots which don’t obey or follow robots.txt file it would be better to prevent them by specifying in .htaccess file as this file would not allow them to enter in the site itself.
Reply With Quote
  #6 (permalink)  
Old 12-06-2006, 10:56 PM
kev woodman's Avatar
Premium Member
 
Join Date: Jul 2006
Location: Newport, Wales, UK.
Posts: 1,494
Default

You might find that using htaccess is no more effective against bad bots I'm afraid Harry. If you were going to the trouble of harvesting using a bot you'd probably code it to identify itself as a 'good' bot anyway. Still every little helps.
__________________
homo sum: humani nil a me alienum puto ... ( just Google it )
Reply With Quote
  #7 (permalink)  
Old 12-07-2006, 08:45 PM
Moderator
 
Join Date: Sep 2006
Posts: 629
Send a message via MSN to harry
Default

kev, using log files once you found who are the culprits then you can block IP addresses from where these bad robots are coming from using .htaccess file,
then there would be no way for bad rebots for peeping into your site.
Reply With Quote
  #8 (permalink)  
Old 12-07-2006, 08:59 PM
kev woodman's Avatar
Premium Member
 
Join Date: Jul 2006
Location: Newport, Wales, UK.
Posts: 1,494
Default

Well if it was me doing the harvesting I'd just fake the IP and the agent identification and I'd get the script to randomise these as well...and just for good measure I'd route through proxies as well. If you're trying to stop bots harvesting emails and the like you're going to need to use some sort of server side scripting as htaccess or a robots file just won't cut it.
__________________
homo sum: humani nil a me alienum puto ... ( just Google it )
Reply With Quote
  #9 (permalink)  
Old 12-07-2006, 09:28 PM
Moderator
 
Join Date: Sep 2006
Posts: 629
Send a message via MSN to harry
Default

kev, I think it’s a programmer view. I know you would like to go behind scripts.
Although your suggestion is worth.

Last edited by harry; 12-07-2006 at 09:40 PM.
Reply With Quote
  #10 (permalink)  
Old 12-07-2006, 09:38 PM
kev woodman's Avatar
Premium Member
 
Join Date: Jul 2006
Location: Newport, Wales, UK.
Posts: 1,494
Default

Scripting Web bots is a really interesting topic - they are really autonomous or semi-autonomous software agents and the good ones include a fair amount of AI. You could make a fair amount of money scripting them if you don't mind walking on the dark side for a bit.
__________________
homo sum: humani nil a me alienum puto ... ( just Google it )
Reply With Quote
  #11 (permalink)  
Old 12-07-2006, 10:20 PM
Moderator
 
Join Date: Sep 2006
Posts: 629
Send a message via MSN to harry
Default

I hope we should be at constructive end while scripting for these web bots and should take care they will not hurt anybody.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:30 PM.
Copyright 2002-2007 WebHosting.uk.com. All rights reserved.
Web Hosting UK Forum