Announcement

Collapse
No announcement yet.

robot.txt

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • robot.txt

    What is the importance of robot.txt, how does it work and what does it help in?
    Leif Nisaan

  • #2
    Originally posted by LeifNisaan View Post
    What is the importance of robot.txt, how does it work and what does it help in?
    The robots.txt file is used to give instructions to robots(Spiders, crawlers etc etc), ie where you don't want them to crawl, via the Robots Exclusion Protocol.

    Not all robots, will listen to your robots.txt, especially malware built ones, and given that the information contained within the txt file is public, it's not recommended to use it as a way to try and keep specific urls private, but more a way to stop the 'legitimate' bots crawling the parts of the website that they don't need to, take a forums for example, there's many parts to it that a spider wouldn't need to crawl, such as newthread, or newreply etc. By limiting what the bots connect to, it saves on both bandwidth and server resources if they don't need to. I remember a while back seeing on my forum what the googlebot gets up to when he's having a five minute break, playing games ! lol
    Hexo
    -------
    The man that knows how, is always working for the man that knows why

    Comment


    • #3
      A good example of robots.txt is Google.

      http://www.google.com/robots.txt
      Webhosting.UK.com || cPanel VPS Hosting || Reseller Hosting

      Sales: 0808-262-0855
      Support: 0800-612-8725
      International: +44 191 303 8191

      Comment


      • #4
        robots.txt is just a simple text file that contain instructions for robots/spiders/bots belonging to search engines. You may use that file to keep certain bots away or keep all bots away from specific pages.
        Grover

        Comment


        • #5
          yes, robot.txt is a file that restricts the search engine bots for indexing a certain pages. But, i have also heard that sometimes Google bots also crawls the pages which are specified in the file, if they find the specified pages on other sources on web.

          Comment


          • #6
            how much do robot listen to the instructions in the robot.txt, i mean they themselves follow some rules to crawl a particular page. Also what if any website does not contain robot.txt, is it a major fault ?

            Comment


            • #7
              Robot.txt is really a important file, if you want to have a good rankings in search engine.The robots.txt file is a simple text file (no HTML), that must be placed in your root directory yourwebsite.com/robots.txt.When a search engine crawler visits your site, it will look for the special file which is called robots.txt and it tells the search engine spider, which Web page of your site should be indexed and which Web pages shouldn't be.
              It can be useful for A Robots.txt file is helpful to keep out unwanted search engine spiders like email retrievers, image strippers, etc.

              This can be useful for you

              Comment


              • #8
                You can also use a robots.txt file to stop search engine spiders from crawling content that isn't useful to people searching for your site, but will use up bandwidth by the robots crawling it.

                Best place for info about robots.txt is at Google webmaster tools.. as said above.

                Comment


                • #9
                  Originally posted by Welford View Post
                  how much do robot listen to the instructions in the robot.txt, i mean they themselves follow some rules to crawl a particular page.
                  Mostly all the robots from search engine look up for robots.txt for indexing a website, . But as mentioned above by Hexosphere, all bots wont listen to your robot.txt file if they are malicious.

                  Originally posted by Welford View Post
                  Also what if any website does not contain robot.txt, is it a major fault ?
                  If there is no robot.txt on your website and if a robot comes to visit. It looks for the robot.txt file, If it does not find it because it isn't there. The robot then feels free to visit all your web pages and content because this is what it is programmed to do in this situation. Under this situation even all those links would get indexed which you dont want to get indexed in a search engine.

                  Comment

                  Working...
                  X