Saturday, June 30, 2012

MooseFS: continuing progress

As you know I have been getting multiple MooseFS installation up and running. See here and here. This is just a continuation of the series.

On one of the machines which I was trying to configure as a MooseFS server I decided to put the chunk server's storage into an XFS mounted in a file. The reason I found it necessary to do that was that a chunnk server is expecting its storage area to be of a constant size - or else it gets confused about how much space it has at its disposal. Hence the best option is a dedicated filesystem. I had the "parent" XFS already filled with some content and occupying all the available disk - and at the time filesystem-in-a-file seemed like a reasonable solution. But it was not.

The way you accomplish this is you create a large file - in another XFS, as it were - and then run mkfs.xfs on that file and then mount it with a -loop option.

I did that and did a little testing on that file. The performance for single file reads and writes was quite good - though I don't remember the precise numbers at this point. However, it was on par with the sort of performance you would get out of a filesystem sitting directly "on the metal" (i.e., on the RAID as the case was with this system). However I never suspected that the latency associated with the multi-file I/O in this two-tired filesystem would become my undoing.

About a week into running in the configuration described above the first "missing chunks" appeared. In general MooseFS displayed rather uneven performance and seemed only marginally usable if that. Later, the number of those missing chunks increased. As I am now sure it was due to the latency-induced timeouts of some kind; however as this email exchange indicates even people who work with the code are finding this odd. That may well be because no one ever envisioned such a twisted configuration as the one I created.

So, finally, even though by then I had already put about 15 TB of data into that MooseFS it was clearly not acceptable to continue running losing massive amounts of data in the process. So I scrapped that, did away with the "parent" XFS - and, naturally, with the filesystem-in-a-file - repartitioned the underlying disk in such a way as to give the chunk server a dedicated partition and relaunched the MooseFS. It has been a few days now and at this point all is well, the data is written at over 15 MB/s, it can be read at about 50 MB/s and there has not been a single error message.

The lessons learned thus far appear to be the following: no VM's and no UNFSD server on the same hardware - at least so long as this hardware is in the low-end server class. And it also looks like MooseFS data needs to reside as close to the actual hardware as possible - i.e., no middle layers such as this filesystem-in-a-file.

And while this was a bit time consuming I now have multiple MooseFS installations running and ready for growth.

No comments: