robots.txt

Vote:
 
Would anyone dare to give a recommendation of what to include in a robots.txt file for a typical public EPiServer 4.5 site with MondoSearch installed? I would suggest the following: User-agent: * Disallow: /util/ Disallow: /webservices/ Disallow: /mondosearch/ Comments please!!!
#13345
Mar 25, 2008 18:36
Vote:
 
I guess a good first questions, does MondoSearch log in to the site when it is crawling the site? If so then you might get some problems with the /util/ folder. MondoSearch crawls using HTTP and and if it logs in to a site it does so using the login.aspx in the util folder. (normally) Please note that all robots.txt settings on a site can be overruled in the MondoSearch GrabMap: Crawler setup>hosts>GrabMap. Just change the “index, follow” rules as you like/need… Also I believe that MondoSearch is suppose to support a "mondo” agent name so that it can have its own settings in the Robosts.txt file... Note sure it works thou... Good luck! :-)
#15623
Mar 25, 2008 18:47
Vote:
 
Hmm! I think I misslead you by mentioning MondoSearch. My question was really about all other search engines out there. I presumed that MondoSearch would be configured to crawl the site correctly (I haven't been involved in setting it up - so I don't know if it logs on while crawling). I guess that what I'm mainly seeking approval of is the removal of the disallow instructions to crawl the edit and admin subdirectories as this would contradict the propose of renaming them for security reasons.
#15624
Mar 25, 2008 18:47
Vote:
 
Then the Robots.txt settings above seam to be good to go! I guess you could add the path to the edit and admin tools as well but they will always requere the user to login so as long as you have the util folder you should be good to go. Michael
#15625
Mar 25, 2008 18:47
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.