Showing posts with label wget. Show all posts
Showing posts with label wget. Show all posts

Thursday, November 19, 2009

wget and robots.txt

Well, the webmasters are trying to ward off robots... meanwhile robots are getting smarter and smarter. A natural competition, it seems.

Here's what you do to bypass the "robot police":

So what if you don't want wget to obey by the robots.txt file? You can simply add -e robots=off to the command like this:

wget -r -p -e robots=off http://www.example.com


Using wget To Download Entire Websites
courtesy Jam's Ubuntu Linux Blog.