Boris Epstein's Technology Blog: Linux

Showing posts with label Linux. Show all posts

Thursday, January 26, 2017

Fun with custom Ubuntu ISO's

Apparently, when you are making an ISO the genisoimage command does not like it when you issue try to use a list of files and subdirectories as your argument, however, works fine when you present a single argument that is a directory name.

So let us say you are in the directory where you have configured your custom distro and you now want to pack it as a bootable ISO. It appears the safest syntax to use is something like this:

sudo genisoimage -no-emul-boot -boot-load-size 4 -boot-info-table -b isolinux/isolinux.bin -c isolinux/boot.cat -V "my image" -o ../my_image.iso .

As a result, you get the image packed into my_image.iso one level above.

Additionally, here is a nice link to a manual on how to modify your distro step by step:

LiveCDCustomization

Playing with it right now. More to follow.

Tuesday, November 29, 2016

Epoch & Unix Timestamp Conversion Tools

Yes, this is just a shameless prop for a utility I happened to like.

Long story short - I was running a ping with a "-D" (timestamp) option and I wanted to see how to turn something like this:

[1480443528.089965] 1208 bytes from 1.2.3.4: icmp_seq=21548 ttl=244 time=781 ms
[1480443529.079493] 1208 bytes from 1.2.3.4: icmp_seq=21549 ttl=244 time=771 ms
[1480443530.119068] 1208 bytes from 1.2.3.4: icmp_seq=21550 ttl=244 time=811 ms

into something where I would know when the ping happened without counting off seconds since the start of 1970.

So I started searching and found this utility site:

http://www.epochconverter.com/

I think it is quite useful and so - have at it!

Saturday, June 30, 2012

MooseFS: continuing progress

As you know I have been getting multiple MooseFS installation up and running. See here and here. This is just a continuation of the series.

On one of the machines which I was trying to configure as a MooseFS server I decided to put the chunk server's storage into an XFS mounted in a file. The reason I found it necessary to do that was that a chunnk server is expecting its storage area to be of a constant size - or else it gets confused about how much space it has at its disposal. Hence the best option is a dedicated filesystem. I had the "parent" XFS already filled with some content and occupying all the available disk - and at the time filesystem-in-a-file seemed like a reasonable solution. But it was not.

The way you accomplish this is you create a large file - in another XFS, as it were - and then run mkfs.xfs on that file and then mount it with a -loop option.

I did that and did a little testing on that file. The performance for single file reads and writes was quite good - though I don't remember the precise numbers at this point. However, it was on par with the sort of performance you would get out of a filesystem sitting directly "on the metal" (i.e., on the RAID as the case was with this system). However I never suspected that the latency associated with the multi-file I/O in this two-tired filesystem would become my undoing.

About a week into running in the configuration described above the first "missing chunks" appeared. In general MooseFS displayed rather uneven performance and seemed only marginally usable if that. Later, the number of those missing chunks increased. As I am now sure it was due to the latency-induced timeouts of some kind; however as this email exchange indicates even people who work with the code are finding this odd. That may well be because no one ever envisioned such a twisted configuration as the one I created.

So, finally, even though by then I had already put about 15 TB of data into that MooseFS it was clearly not acceptable to continue running losing massive amounts of data in the process. So I scrapped that, did away with the "parent" XFS - and, naturally, with the filesystem-in-a-file - repartitioned the underlying disk in such a way as to give the chunk server a dedicated partition and relaunched the MooseFS. It has been a few days now and at this point all is well, the data is written at over 15 MB/s, it can be read at about 50 MB/s and there has not been a single error message.

The lessons learned thus far appear to be the following: no VM's and no UNFSD server on the same hardware - at least so long as this hardware is in the low-end server class. And it also looks like MooseFS data needs to reside as close to the actual hardware as possible - i.e., no middle layers such as this filesystem-in-a-file.

And while this was a bit time consuming I now have multiple MooseFS installations running and ready for growth.

Tuesday, June 19, 2012

MooseFS taking shape

I am continuing experimenting with MooseFS. However, the final configuration looks somewhat different from what I had originally envisioned. For one thing, the idea of placing the master and metadata servers in VirtualBox VM's didn't quite work out. I guess that created just too many levels of execution and as a result that lead to the overall load growing too much and the performance suffering as soon as any serious load was applied.

So I switched to simply running all the processes (master, meta, chunkserver) on the same hardware and got rid of all the VM's. That worked fine. I defined a separate network - currently fully confined to the same host - in order to host the MooseFS installation. And MooseFS clients have started to run just fine. I got a performance of up to 80 MB/s for reading data from the MooseFS over a 1 Gbit/s network.

However, one problem remained. Running UNFSD on the same machine I got very poor performance. As few as 5 clients could drive it down to just 30 KB/s! And that on an 8-core 48 GB RAM machine - while a MooseFS client would read at 3 orders of magnitude as much!

Surprisingly, the fix was simple: if I ran UNFSD on a separate physical machine the performance went back into the tens of MB/s range. So that was what I settled for. That NFS server machine is currently just a CentOS Linus MooseFS client sitting on the "general" network - different from the one hosting MooseFS - and sporting a mere 2 cores and 2 GB of RAM. So I guess for now I have a working solution.

Saturday, June 2, 2012

MooseFS

Just reporting that I started playing with it - and more than playing. So far so good. The architecture is really simplistic, the executables very lightweight. For more detail see here:

http://www.moosefs.org/

I am running it on several server-class boxes using VirtualBox VM's to emulate a network so as to be able to distribute it to multiple hardware boxes later on. Both the VirtualBox hosts and the VM's are running CentOS 6.

The only problems so far seem to have to do with integrating MooseFS with other technologies. I tried using the UNFS3 user-space NFS server to create an NFS gateway to the MooseFS installation. And so far it looks like the UNFS3 server does not scale well to multiple connections. In other words, you get an excellent performance with one NFS client, you get a decent quality with 2-3 connections but when it is above 5 it seems to go down the drain and accessing one's home directory over such an NFS connection becomes pretty much unfeasible. So at this point what's lacking is a good NFS gateway for situations where a MooseFS client is for some reason not available or not a workable solution. Or perhaps I will choose a different sharing method. Time will tell.

Wednesday, April 4, 2012

The mysterious evince

evince, otherwise known as "Document Viewer", is a pretty much standard feature of many a Linux distribution. So there I was, trying to use it on a mostly up-to-date Ubuntu 10.04 LTS 64-bit machine and I kept getting messages that looked like the following:

(evince:5691): EggSMClient-WARNING **: Failed to connect to the session manager: Authentication Rejected, reason : None of the authentication protocols specified are supported and host-based authentication failed

There were other messages as well, some stated that evince could not open the display(???) even though other X-applications, such as xterm, for instance, would run just fine from the same command line.

To make things even more bizarre that would happen for some users on the machine and not others. Attempts to play with Gnome settings - or even evince-specific settings - such as deleting ~/.gnome2 directory or specifically ~/.gnome2/evince subdirectory - appeared to be making no difference.

After some web searches I found a solution that seems to work. Hat tip to the participants of this discussion on the Ubuntu Forums. The following seemed to actually fix the problem for everybody:

sudo bash
cd /usr/bin
mv evince evince.bin
ln -s evince.bin evince

And don't ask me why!

Tuesday, March 13, 2012

Large-scale matching exercise using MySQL

In my previous post (MySQL: A Few Metrics, 3 March 2012) I have mentioned some parameters of a task I recently faced. We are now going to examine that task in more detail.

We have two large text files, File1 and File2. They both contain text entries, one per line, over 400 million lines each. We know almost nothing about the content beyond that; it is definitely unsorted within each of the files, some lines may be repetitive. So for the purpose of this discussion let us say File1 is 430 million lines and File2 is 440 million lines.

To recap: the only machine I had available for this task was an VM that had plenty of disk - about 1 TB unused - but little processing power and only 512 MB RAM. It was running CentOS 6 and MySQL 5.1.52. First word of caution: if you intend to manipulate large tables it is advisable to ascertain that either /tmp has plenty of room for its invisible temporary files, or else change the temporary directory to something else. You can do that by setting the TMPDIR environment variable to the desirable location. On CentOS I just inserted the appropriate line towards the top of /etc/init.d/mysqld and that did the trick:

#
# Alternate temporary storage directory
#
export TMPDIR=/home/mysql/tmp

My first instinct was to first sort the two lists individually and then, after they are sorted, find matches as well as content exclusive to either list by doing one forward pass through both. I still believe that approach was sensible - however, the sorting phase proved to be more time-consuming than I expected. The most likely reason for that was that, as I already mentioned, inexact comparisons take qualitatively longer than exact ones - and sorting, no matter how you do it, is based upon inexact comparisons.

However, that same fact could be used to our advantage. We could do exact comparisons to determine the intersection of the two lists - and then separate the entries exclusive to either list.

Let us now run through a practical example that reflects what I ended up doing after some trials and errors. The names have been adjusted from those I used to make this text more readable. My apologies for any possible typos in that syntax.

Alright, let us get going now. We have our files: File1 (430 million lines) and File2 (440 million lines).

So first let us create the necessary tables:

mysql> CREATE TABLE f1_list (f1_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, text_line TEXT NOT NULL, INDEX (text_line(400) ASC));
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE f2_list (f2_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, text_line TEXT NOT NULL, INDEX (text_line(400) ASC));
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE common_list (cl_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, f1_id INT NOT NULL, f2_id INT NOT NULL, text_line TEXT NOT NULL, INDEX (f1_id ASC), INDEX (f2_id ASC), INDEX (text_line(400) ASC));
Query OK, 0 rows affected (0.00 sec)

mysql>

Now let us populate f1_list and f2_list with the contents of File1 and File2 respectively. You will need to run roughly the following commands using full paths to File1 and File2:

mysql> LOAD DATA LOCAL INFILE "/home/my_user/big_files/File1" INTO TABLE f1_list (text_line);
Query OK, 430000000 rows affected (8 hours 13 min 10.01 sec)
Records: 430000000 Deleted: 0 Skipped: 0 Warnings: 0

mysql> LOAD DATA LOCAL INFILE "/home/my_user/big_files/File2" INTO TABLE f2_list (text_line);
Query OK, 440000000 rows affected (10 hours 01 min 15.0 sec)
Records: 440000000 Deleted: 0 Skipped: 0 Warnings: 0

mysql>

The times are roughly consistent with what I saw; I'd estimate the import times as 8-12 hours per file.

Now let us extract the commonalities into common_list:

mysql> INSERT INTO common_list (f1_id, f2_id, text_line) SELECT f1_list.f1_id,
f2_list.f2_id, f1_list.text_line FROM f1_list, f2_list WHERE f1_list.text_line = f2_list.text_line;
Query OK, 427677003 rows affected, 2 warnings (16 hours 12 min 16.58 sec)
Records: 427677003 Duplicates: 0 Warnings: 0

mysql>

Now let us extract the exclusive content:

mysql> DELETE FROM f1_list WHERE f1_id IN (SELECT f1_id FROM common_list);
Query OK, 427001324 rows affected (16 hours 24 min 41.20 sec)

mysql> DELETE FROM f2_list WHERE f2_id IN (SELECT f2_id FROM common_list);
Query OK, 427001129 rows affected (16 hours 39 min 44.47 sec)

mysql>

And so by now it looks like we are done. We have content exclusive to List 1 listed in f1_list, content exclusive to List 2 in f2_list and the common content in common_list. The total processing time - even if the operations are performed serially as delineated above - can be capped at about 76 hours. I used a little padding there, too. For instance, I counted 20 hours for the processes that in reality took about 16. Thus it appears realistic to sort this mass of data within 3-4 days using a machine which in this day and age would be considered substandard in terms of its performance.

Wednesday, January 18, 2012

Sparse Files

OK, I heard the term before but never had to really delve into them. That's an interesting concept, though. First delved into it needing a large (multi-terrabyte) file to house a file system. See here.

Then came across this article:

Sparse files – what, why, and how

I like the concept - though it does come with a few pitfalls, it seems. More on that later.

Friday, December 30, 2011

SSHFS and AutoFS

Mounting an SSH-accessible remote directory automatically is a nifty capability. Here's a nice description of how one can do that - specifically under Ubuntu but it will work the same just fine under most other Linux distributions:

Automatically mounting a remote directory in Ubuntu using autofs + sshfs

One thing not mentioned there - and something I keep forgetting about between the instances I need to recall it - is that the passwordless SSH login will fail unless the user directory on the SSH server is writable by the owner only! So, using the same terms as in the example above one should do the following:

1) Log into example.com as remoteusername

2) Execute the following command:
chmod g-w,o-w ~

Monday, December 12, 2011

ClipGrab

Oh, how fast things change!

Just a little while ago it looked like DamnVid was all the rage - but now YouTube has changed something and it is dead in the water. However, it looks like ClipGrab is a viable alternative... for now.

Monday, October 17, 2011

DamnVid

It came time to download some videos off YouTube to migrate them from one account to another... Dont' ask me why, that doesn't really matter. So, be that as it may - it is not as easy as it may seem. The FlashPlayer used to save them in your /tmp directory on Linux - but no more. The directory recommened at the link:

.mozilla/firefox/userprofile/cache

does not seem to contain anything resembling the video files one would expect either. However, then I followed a link specified at the above to get DamnVid and this seems to be working. The video ends up right under your ~/Videos directory hierarchy.

Thursday, June 30, 2011

NFS on Mac

It is really strange that NFS on Mac OS X is an absolute mess even though Mac OS X is UNIX-based and UNIX/Linux machines NFS mount to each other easily and conveniently.

I just moved my NFS server from SLES 10 to Cent OS 5.5 and guess what - Macs just want nothing to do with it. Oh, well, to be continued.

Saturday, April 30, 2011

SFTP vs SSH

One might be inclined to think that they are one and the same, pretty much. Well, not quite.

We had this problem whereby the SSH worked fine but SFTP not at all. See the following desription:

SFTP seems to fail for NIS accounts under OpenSSH 5.x

Well, it ended up being a little different. What most of those NIS-based accounts had in common was a particular customisation in their BASH startup scripts (.bashrc, etc.) One .bashrc was removed the SFTP started working again.

Moral of the story? Whenever anything that could have to do with the login procedure goes wrong it might make sense to just maximally simplify the login procedure - such as remove all customisations, for instance - and try again.

Sunday, April 24, 2011

SANE: fixing an annoying network scan problem

On our network we've got a couple of Canon PIXMA network printer/scanner/FAX devices. Some versions of SANE apparently get confused trying to scan them with "network discovery" activated. The xsane application simply crashes.

After some searching on the web I have discovered what appears to be the fix. At least the fix has worked on a variety of Ubuntu 10.x machines, both 32-bit and 64-bit. I haven's saved the sources I used and can't locate them again so whoever was my inspiration on this one - please accept my apologies for failing to credit you.

So, be it as it may, here's the fix. Got to /etc/sane.d and edit the following file: epson2.conf . Comment out the following line:

net autodiscovery

You are done.

Final notice: after yet another round of updates on a 64-bit Ubuntu 10.10 machine I noticed that even this fix is not necessary it seems. But on slightly different installations this tip may still be of use.

Thursday, November 19, 2009

wget and robots.txt

Well, the webmasters are trying to ward off robots... meanwhile robots are getting smarter and smarter. A natural competition, it seems.

Here's what you do to bypass the "robot police":

So what if you don't want wget to obey by the robots.txt file? You can simply add -e robots=off to the command like this:

wget -r -p -e robots=off http://www.example.com

Using wget To Download Entire Websites
courtesy Jam's Ubuntu Linux Blog.

Thursday, October 8, 2009

Solaris the tree killer no more

In one of my earlier posts I complained about being unable to turn off the banner page for print jobs sent from a Solaris print client to a printer controlled by a Linux print server. Soon thereafter we found a solution which I have been forgetting to talk about in this space... But, as they say, better late than never.

The solution is basically to tell the server (Linux) machine to forget that banner page. The way to do that is as follows.

1) Make sure you have cups-lpd installed. Solaris seems to only speak lpd, I am not sure how one could enable it to interface to a cups server directly.

2) On our CentOS machine we had to use the following syntax in /etc/xinetd.d/cups:

service printer
{
socket_type = stream
protocol = tcp
wait = no
user = lp
group = sys
passenv =
server = /usr/lib/cups/daemon/cups-lpd
server_args = -o document-format=application/octet-stream -o job-sheets=none
}

Emphasis on the "-o job-sheets=none" part.

A good yum tip

Apparently works on CentOS when you are trying to run an update/upgrade and dependency issues pop up. Don't know if it works under all circumstances - but definitely worked for both Shahbaz Javeed here and for yours truly just a few minutes ago. Like Shahbaz, I am also running CentOS 5.3 on the machine in question.

What you do is:

# yum clean all

Yes, it was this simple - at least this time!

Thursday, August 7, 2008

A new mysterious strain of PDF

Just to spice up my life and possibly yours too... here's this out-of-the blue surprise.

Surprise

This looks, at the first glance, like a regular PDF file. file command on a Linux box identifies it as "PDF document, version 1.4". It opens fine in pretty much any PDF reader (Acroread, KPDF, evince, whatever). However, if you decide to print it then it becomes a whole different ballgame.

It prints very slowly when it does. From Acroread 8 it does not print at all. That was checked on both OpenSuSE Linux 10.3 and MS Windows Vista so I have reason to believe that the problem at hand is most likely OS-agnostic.

It does print using the system print (lp) on OpenSuSE Linux 10.3. It prints on OpenSuSE Linux in Acroread 7, CentOS 5 Linux under Acroread 5 as well as on MacOS 10 under Preview. When converted to Postscript via pdftops it yields a humongous (100+ MB) Postscript file which is quite impressive given that the PDF file being converted is only a less-than-a-megabyte 13-page document.

If you know what this mystery PDF file is about or have encountered this mutation of PDF yourself - shout, and together we shall prevail!

Friday, June 27, 2008

Yes, it does look like ReiserFS may at times be losing data

A little while ago we had an experience that made me suspect that ReiserFS may be "forgetful" - in other words, that files under it might at times go missing. Based on a few peoples' experiences that I have recently read about it appears that my suspicions may have been not entirely without merit.

So the practical advice to be derived from this appears to be the following: unless there is a compelling reason to use ReiserFS use ext3 instead.

Monday, February 4, 2008

Extra! Extra! Expert Help Wanted!

Yes, this is for real! If you are an expert in SuSE Linux there is a potential part-time consulting opportunity for you in Cambridge, MA. And at a unique Harvard-affiliated lab at that!

Basically, we are pretty technically-savvy as it is but sometimes we hit things in SuSE which we just don't know how to deal with. And SuSE (or OpenSuSE, to be precise) being our platforom of choice, it is important that we deal with issues we discover. And that is where your expertise will come in. So in short I would guess we are looking for someone who not only is well-versed in Linux/UNIX, networking and IT in general but also has some serious experience specifically with SuSE Linux under their belt.

This will be consulting on an as-needed basis. We will do our best not to bore you with trivial problems!

So if you think this is something you are interested in please respond in the comments section or e-mail me directly.

My Other Sites

Blog Archive

About Me

Labels