Robots.txt file is a simple text file which is responsible for preventing crawling of your sites pages. Suppose you have certain pages which are not SE friendly such as pages including repeated keywords, some hidden tags with keywords and some other bugs which may be responsible for getting your site banned from search engines.
The robots.txt file is a simple text file which can be created using Notepad.
It should be saved at root directory of your site. eg.http://www.yoursitename.com/robots.txt.
User-agent: robot or spider name
Disallow: files or directories.
If you want to exclude all the search engine spiders from your entire domain, Just use this tiny code, but be sure you need it.
If you want to prevent your certain directories, you can specify them in Disallow field.
Disallow: / directoryname2/
Similarly if you want to restrict specific files then type in the path of the files.
Disallow: / directoryname1/filename.html
If you dont want certain spiders crawling your site, which are not useful for you or are just eating up your bandwidth, you can specify them in User-agent. Eg. You dont want bot of Alta vista (Scooter) from crawl your whole site use following code.
While using robots.txt file you should be careful, It may stop your specified pages from appearing in search engines.There are many hundreds of bots and spiders crawling the Internet, most of them respect your robot.txt file while some may not.
(*) You can include comments in your robots.txt file, by putting a pound-sign (#) at the front of the line to be commented.
(*) You can find the names of robots which are crawling around your site in your log file..