Saturday, June 30, 2012

MooseFS: continuing progress

As you know I have been getting multiple MooseFS installation up and running. See here and here. This is just a continuation of the series.

On one of the machines which I was trying to configure as a MooseFS server I decided to put the chunk server's storage into an XFS mounted in a file. The reason I found it necessary to do that was that a chunnk server is expecting its storage area to be of a constant size - or else it gets confused about how much space it has at its disposal. Hence the best option is a dedicated filesystem. I had the "parent" XFS already filled with some content and occupying all the available disk - and at the time filesystem-in-a-file seemed like a reasonable solution. But it was not.

The way you accomplish this is you create a large file - in another XFS, as it were - and then run mkfs.xfs on that file and then mount it with a -loop option.

I did that and did a little testing on that file. The performance for single file reads and writes was quite good - though I don't remember the precise numbers at this point. However, it was on par with the sort of performance you would get out of a filesystem sitting directly "on the metal" (i.e., on the RAID as the case was with this system). However I never suspected that the latency associated with the multi-file I/O in this two-tired filesystem would become my undoing.

About a week into running in the configuration described above the first "missing chunks" appeared. In general MooseFS displayed rather uneven performance and seemed only marginally usable if that. Later, the number of those missing chunks increased. As I am now sure it was due to the latency-induced timeouts of some kind; however as this email exchange indicates even people who work with the code are finding this odd. That may well be because no one ever envisioned such a twisted configuration as the one I created.

So, finally, even though by then I had already put about 15 TB of data into that MooseFS it was clearly not acceptable to continue running losing massive amounts of data in the process. So I scrapped that, did away with the "parent" XFS - and, naturally, with the filesystem-in-a-file - repartitioned the underlying disk in such a way as to give the chunk server a dedicated partition and relaunched the MooseFS. It has been a few days now and at this point all is well, the data is written at over 15 MB/s, it can be read at about 50 MB/s and there has not been a single error message.

The lessons learned thus far appear to be the following: no VM's and no UNFSD server on the same hardware - at least so long as this hardware is in the low-end server class. And it also looks like MooseFS data needs to reside as close to the actual hardware as possible - i.e., no middle layers such as this filesystem-in-a-file.

And while this was a bit time consuming I now have multiple MooseFS installations running and ready for growth.

Monday, June 25, 2012


Boris Epstein
Malden (Metro Boston), Massachusetts, US (map)

Phone: 617.816.9654


Extensive experience in IT and programming and good practical ability to apply that experience to real-world problems. Have on several occasions successfully suggested and introduced solutions that have made a qualitative difference to the efficiency and reliability of the system in question. Capable of thinking outside the box and showing initiative. Good ability to see the big picture as well as the details of the immediate task.
Quick learner, capable of rapidly coming up to speed on new technologies. Have a good understanding of logistics, process and technology. Have experience working in small teams where being a jack of all trades is pretty much required and learning new technologies is almost an everyday constant.

Operating Systems: Linux/UNIX, Mac OS X, MS Windows, MS DOS, Apollo Aegis, VMS
Programming Languages: C/C++, Pascal, PERL, Ruby, JAVA, UNIX (C-Shell, Bourne, Korn), Tcl/Tk, PHP, Lisp/Scheme
Tools: Netbeans, Eclipse, TogetherJ, Emacs, MS Office, OpenOffice, LibreOffice, FrameMaker, VI
Technologies: MySQL, PostgreSQL, SYBASE, Xen, VirtualBox, OpenVPN, Oracle RDBMS, CVS, SVN, ClearCase, DSEE, NFS, NIS, LDAP, DHCP, Webmin, TCP/IP, routers/firewalls, etc.
Hardware: IBM-compatible PC, HP, SUN, Mac


May 2007 - present
Cambridge, MA
Work as something akin to a one-person IT department for a small cutting-edge research lab. Support communication and integration needs between the lab and various collaborators. Plan and implement the lab's hardware and software infrastructure while at the same time provide everyday support to lab's users. Participate in coding lab's internal utilities as well as the publicly available OpenMIMS Analysis Module. Integrate various software and hardware systems for the lab's needs. When necessary fulfill general office management duties such as scheduling meetings, placing orders, etc.
During my over 5 year tenure have played a pivotal role in the lab's transition to a enterprise-level configuration featuring MooseFS-based distributed file storage, functionally segmented network architecture, Subversion-based code control, Bugzilla-based bug tracking, etc.Have a track record of setting up systems and services that sometimes ran for over two years with no downtime.
Beyond officially delineated responsibilities maintain my own privately-run VPN solution to provide connectivity to the lab's employees and collaborators outside the lab. Due to the distributed nature of the lab's activities this is a function critical to the lab's success.

April 2004 - May 2007
Travel. Educational activities. Occasional freelance projects.Provided technical assistance to several online projects including Cooperative Research and New England JAVA Users Group.In 2005, in the wake of Hurricane Katrina helped coordinate volunteer relief activities in the Gulf and personally participated in those activities.

July 2001 - April 2004
Waltham, MA
Worked as a programmer involved in development and support of the ETMS air traffic management software for the FAA at the Volpe National Transportation Systems Center in Cambridge, MA. Specific area of concentration was the CDMcomponent of the ETMS. The tasks included design and development of new functionality, as well as supporting legacy code. A large codebase together with the need for system reliability provide for a challenging task. The code was originally written in Pascal and later migrated to C.
The system was in essence a large transportation management system receiving and reflecting frequent (mostly once-a-minute) updates regarding the status of flights operating over the US airspace. Near realtime requirements were in place for processing and analyzing data which made for an exciting and challenging task.

December 1999 - April 2001
Boston, MA
Performed multiple roles on a daily basis, including those of a senior system designer and developer, Windows NT and UNIX system administrator and web hosting support engineer. Key player in the technology planning and implementation area. As a sole expertise in a number of areas, including object-oriented design and development, networking, network security and systems management, advised other team members on various technology issues.
Main tasks included web site backend implementation for clients, in-house product design and implementation as well as day-to-day activities mentioned above. Most of the coding was done using object-oriented technology, with JAVA as a programming language of choice.
Provided critical insight which allowed to greatly improve stability and efficiency of the internal systems.
Clients included Davox Corporation (now part of Aspect Software), Pilates StoreLobsters-Online, Inc. and others.

September 1999 - January 2000
Medford, MA
Worked as a UNIX administrator in a large-scale web and dataserver hosting facility. Was responsible for maintenance and troubleshooting of multiple industrial-scale UNIX servers in SUN, HP and IBM platforms. Performed database maintenance of SYBASE and ORACLE databases as well as data recovery and general server troubleshooting.

May 1999 - July 1999
GTE International (currently part of Level3 Communications)
Cambridge, MA
As part of the Y2K team worked on the remediation of custom SUN Solaris machines hosting clients' mission-critical WWW sites and applications. Complex upgrades and modifications had to be accomplished requiring an absolute minimum in customers' downtime. Tasks included upgrades of the OS, DB servers and various other third-party software. Custom scripting and coding was often required to facilitate the necessary transition.

November 1995 - May 1999
Cambridge, MA
Originally hired as an outside contractor. Accepted a permanent staff position in a three months' time. Throughout my whole tenure was a critical part of a small and continually overtasked team. Responsibilities included day-to-day maintenance and support of a network of UNIX hosts. Maintained code control systems. Designed and implemented a data backup/archival system for in-house use. Created web pages using HTML, PERL and JAVA. With the emphasis on publicly available software restructured the environment to optimize and economize the development and production process. Modified publicly available software for local needs using C, Tcl/Tk, PERL, JAVA, etc. Was also involved in equipment and software installation and support at client sites. Worked with medical applications and protocols including DICOM, ISG's VRS graphical application and AWARE wavelet compression. Wrote system installation and maintenance scripts. Modified and integrated various third party software packages. Worked with multimedia devices in medical data capture/processing systems. Set up and supported LANs and WANs.

July 1992 - November 1995
Kenan Systems Corporation (currently part of Alcatel-Lucent)
Denver, CO - Cambridge, MA
Participated in database design of SYBASE databases.As part of a product team supported development effort in a heterogeneous UNIX environment. Oversaw the operation of a distributed development environment which included multiple geographically disjoint locations. Wrote a suite of PERL scripts that encompassed the local customizations to ClearCase as required by the project. Other responsibilities included release/code management using ClearCase code control system, SYBASE database administration, system design activities.

Summers of 1990 and 1991
Cambridge, MA
Participated in the development of ATMS (Automated Traffic Management System), an air traffic management and control system for the FAA. Coding was done in Pascal against a proprietary database. The network consisted of a multitude of Aegis hosts on the Apollo platform. The development was done under DSEE as an integrated code control and management environment.

Born in St. Petersburg, Russia in 1969. Have lived in the US since 18 years of age. Attended Tver University (website in Russian) in Tver, Russia; Boston University and University of Massachusetts at Amherst graduating in 1992 with a Bachelor's Degree in Mathematics/Computer Science.

Available upon request.

Tuesday, June 19, 2012

MooseFS taking shape

I am continuing experimenting with MooseFS. However, the final configuration looks somewhat different from what I had originally envisioned. For one thing, the idea of placing the master and metadata servers in VirtualBox VM's didn't quite work out. I guess that created just too many levels of execution and as a result that lead to the overall load growing too much and the performance suffering as soon as any serious load was applied.

So I switched to simply running all the processes (master, meta, chunkserver) on the same hardware and got rid of all the VM's. That worked fine. I defined a separate network - currently fully confined to the same host - in order to host the MooseFS installation. And MooseFS clients have started to run just fine. I got a performance of up to 80 MB/s for reading data from the MooseFS over a 1 Gbit/s network.

However, one problem remained. Running UNFSD on the same machine I got very poor performance.  As few as 5 clients could drive it down to just 30 KB/s! And that on an 8-core 48 GB RAM machine - while a MooseFS client would read at 3 orders of magnitude as much!

Surprisingly, the fix was simple: if I ran UNFSD on a separate physical machine the performance went back into the tens of MB/s range. So that was what I settled for. That NFS server machine is currently just a CentOS Linus MooseFS client sitting on the "general" network - different from the one hosting MooseFS - and sporting a mere 2 cores and 2 GB of RAM. So I guess for now I have a working solution.

Saturday, June 2, 2012


Just reporting that I started playing with it - and more than playing. So far so good. The architecture is really simplistic, the executables very lightweight. For more detail see here:

I am running it on several server-class boxes using VirtualBox VM's to emulate a network so as to be able to distribute it to multiple hardware boxes later on. Both the VirtualBox hosts and the VM's are running CentOS 6.

The only problems so far seem to have to do with integrating MooseFS with other technologies. I tried using the UNFS3 user-space NFS server to create an NFS gateway to the MooseFS installation. And so far it looks like the UNFS3 server does not scale well to multiple connections. In other words, you get an excellent performance with one NFS client, you get a decent quality with 2-3 connections but when it is above 5 it seems to go down the drain and accessing one's home directory over such an NFS connection becomes pretty much unfeasible. So at this point what's lacking is a good NFS gateway for situations where a MooseFS client is for some reason not available or not a workable solution. Or perhaps I will choose a different sharing method. Time will tell.

The Ultimate Boot CD

Just a boot CD distro with lots and lots of utilities. See here: Came in real handy for me when I was trying to test RAM in this server-class box which for some reason decided not to play with a regular memtest CD.