Thursday, November 19, 2009

wget and robots.txt

Well, the webmasters are trying to ward off robots... meanwhile robots are getting smarter and smarter. A natural competition, it seems.

Here's what you do to bypass the "robot police":

So what if you don't want wget to obey by the robots.txt file? You can simply add -e robots=off to the command like this:

wget -r -p -e robots=off

Using wget To Download Entire Websites
courtesy Jam's Ubuntu Linux Blog.

Thursday, October 8, 2009

Solaris the tree killer no more

In one of my earlier posts I complained about being unable to turn off the banner page for print jobs sent from a Solaris print client to a printer controlled by a Linux print server. Soon thereafter we found a solution which I have been forgetting to talk about in this space... But, as they say, better late than never.

The solution is basically to tell the server (Linux) machine to forget that banner page. The way to do that is as follows.

1) Make sure you have cups-lpd installed. Solaris seems to only speak lpd, I am not sure how one could enable it to interface to a cups server directly.

2) On our CentOS machine we had to use the following syntax in /etc/xinetd.d/cups:

service printer
socket_type = stream
protocol = tcp
wait = no
user = lp
group = sys
passenv =
server = /usr/lib/cups/daemon/cups-lpd
server_args = -o document-format=application/octet-stream -o job-sheets=none

Emphasis on the "-o job-sheets=none" part.

A good yum tip

Apparently works on CentOS when you are trying to run an update/upgrade and dependency issues pop up. Don't know if it works under all circumstances - but definitely worked for both Shahbaz Javeed here and for yours truly just a few minutes ago. Like Shahbaz, I am also running CentOS 5.3 on the machine in question.

What you do is:

# yum clean all

Yes, it was this simple - at least this time!