The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,Middle School,High School,College,University,PHD
| Teaching Since: | Apr 2017 |
| Last Sign in: | 103 Weeks Ago, 3 Days Ago |
| Questions Answered: | 4870 |
| Tutorials Posted: | 4863 |
MBA IT, Mater in Science and Technology
Devry
Jul-1996 - Jul-2000
Professor
Devry University
Mar-2010 - Oct-2016
1.34 A robot (also known as a bot or spider or crawler ) is a program that accesses web documents automatically rather than in direct response to a user input. For example, the Google search engine uses a program called googlebot to automatically crawl the World Wide Web and build its searchable index of Web pages. An indexing robot such as googlebot begins by reading some Web document, then reading documents linked to by the initial document, and recursively continuing this process on previously unread documents. Some informal standards have been developed to allow Web site administrators and document authors to request robots not to read certain documents. (a) Read the first part of Section 4.1 of Appendix B of the HTML 4.01 Recommendation [W3C-HTML-4.01], and explain what you would do in order to request that robots not crawl the documents accessible from your Tomcat web server. (See http://www.robotstxt.org/wc/norobots.html for more information on the Robot Exclusion Standard.) (b) For one or more Web sites as directed by your instructor, list for each the robots (if any) that are explicitly excluded from crawling one or more of the files at that site.