| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read | ![]() |
|
||||||
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Thanks for your suggestions,
I think using robots.txt will not solve the main problem because there are some bad robots which don’t obey or follow robots.txt file it would be better to prevent them by specifying in .htaccess file as this file would not allow them to enter in the site itself. |
|
||||
|
You might find that using htaccess is no more effective against bad bots I'm afraid Harry. If you were going to the trouble of harvesting using a bot you'd probably code it to identify itself as a 'good' bot anyway. Still every little helps.
__________________
homo sum: humani nil a me alienum puto ... ( just Google it ) |
|
|||
|
kev, using log files once you found who are the culprits then you can block IP addresses from where these bad robots are coming from using .htaccess file,
then there would be no way for bad rebots for peeping into your site. |
|
||||
|
Well if it was me doing the harvesting I'd just fake the IP and the agent identification and I'd get the script to randomise these as well...and just for good measure I'd route through proxies as well. If you're trying to stop bots harvesting emails and the like you're going to need to use some sort of server side scripting as htaccess or a robots file just won't cut it.
__________________
homo sum: humani nil a me alienum puto ... ( just Google it ) |
|
||||
|
Scripting Web bots is a really interesting topic - they are really autonomous or semi-autonomous software agents and the good ones include a fair amount of AI. You could make a fair amount of money scripting them if you don't mind walking on the dark side for a bit.
__________________
homo sum: humani nil a me alienum puto ... ( just Google it ) |
![]() |
| Thread Tools | |
| Display Modes | |
|
|