Boris Epstein's Technology Blog: CentOS

Showing posts with label CentOS. Show all posts

Wednesday, October 26, 2016

Nginx, Drupal and mbstring: a major mess, and how to avoid it

This is not a particularly interesting issue, merely one of a particularly confusing nature. So I decided to write about it for future reference. And if that saves anybody time when they encounter the same situation I was in that would make it worth the effort to document it as it took me quite some time to sort out.

So I have tried to get Drupal 7 to run under NGINX 1.10 on my CentOS 7 VM. The first major requirement that required custom work was the one calling for PHP v5.2.4 or higher (full list of requirements available here). I have chosen to go for PHP v5.5 (I forget the exact reason but now, at least based on the Wiki article, I may be due for another update - at least to PHP 5.6 or PHP 7). So I did indeed install PHP 5.5. I don't remember the exact sequence for that - and that stage went through smoothly, something along the lines outlined here. At any rate, that was an easy and uneventful phase of the process.

Now with that out of the way I thought things were on track to get better - but fate would have it different. As I tried to go through the moves and initialize the Drupal-based website I was working on I got the following:

Multibyte string input conversion in PHP is active and must be disabled. Check the php.ini mbstring.http_input setting. Please refer to the PHP mbstring documentation for more information.

As I tried resolving the issue I asked for advice on the NGINX user forum (the thread available here, in Russian). Skipping every twist and turn of this rather protracted affair let me just summarize it by saying that mostly everyone, myself included was looking for ways to turn off the multibyte string processing without realizing that the relevant module had to first be installed which it was not.

So the solution turned out to be simple. But like I said - it took time. The solution was this:

1) Install the relevant module:

yum install php55w-mbstring

2) Change the relevant lines in the config files. The lines to alter need to look as follows:

ini_set('mbstring.http_input', 'pass');
ini_set('mbstring.http_output', 'pass');

as detailed here.

So yes - Drupal wants you to install the functionality it does not need (multi-byte strng processing) and then wants you to disable it. Go figure.

Once again - thanks to everyone who helped me in the process, and if that helps somebody avoid the pain I had to go through, then this post is of some use.

Saturday, June 2, 2012

MooseFS

Just reporting that I started playing with it - and more than playing. So far so good. The architecture is really simplistic, the executables very lightweight. For more detail see here:

http://www.moosefs.org/

I am running it on several server-class boxes using VirtualBox VM's to emulate a network so as to be able to distribute it to multiple hardware boxes later on. Both the VirtualBox hosts and the VM's are running CentOS 6.

The only problems so far seem to have to do with integrating MooseFS with other technologies. I tried using the UNFS3 user-space NFS server to create an NFS gateway to the MooseFS installation. And so far it looks like the UNFS3 server does not scale well to multiple connections. In other words, you get an excellent performance with one NFS client, you get a decent quality with 2-3 connections but when it is above 5 it seems to go down the drain and accessing one's home directory over such an NFS connection becomes pretty much unfeasible. So at this point what's lacking is a good NFS gateway for situations where a MooseFS client is for some reason not available or not a workable solution. Or perhaps I will choose a different sharing method. Time will tell.

Tuesday, March 13, 2012

Large-scale matching exercise using MySQL

In my previous post (MySQL: A Few Metrics, 3 March 2012) I have mentioned some parameters of a task I recently faced. We are now going to examine that task in more detail.

We have two large text files, File1 and File2. They both contain text entries, one per line, over 400 million lines each. We know almost nothing about the content beyond that; it is definitely unsorted within each of the files, some lines may be repetitive. So for the purpose of this discussion let us say File1 is 430 million lines and File2 is 440 million lines.

To recap: the only machine I had available for this task was an VM that had plenty of disk - about 1 TB unused - but little processing power and only 512 MB RAM. It was running CentOS 6 and MySQL 5.1.52. First word of caution: if you intend to manipulate large tables it is advisable to ascertain that either /tmp has plenty of room for its invisible temporary files, or else change the temporary directory to something else. You can do that by setting the TMPDIR environment variable to the desirable location. On CentOS I just inserted the appropriate line towards the top of /etc/init.d/mysqld and that did the trick:

#
# Alternate temporary storage directory
#
export TMPDIR=/home/mysql/tmp

My first instinct was to first sort the two lists individually and then, after they are sorted, find matches as well as content exclusive to either list by doing one forward pass through both. I still believe that approach was sensible - however, the sorting phase proved to be more time-consuming than I expected. The most likely reason for that was that, as I already mentioned, inexact comparisons take qualitatively longer than exact ones - and sorting, no matter how you do it, is based upon inexact comparisons.

However, that same fact could be used to our advantage. We could do exact comparisons to determine the intersection of the two lists - and then separate the entries exclusive to either list.

Let us now run through a practical example that reflects what I ended up doing after some trials and errors. The names have been adjusted from those I used to make this text more readable. My apologies for any possible typos in that syntax.

Alright, let us get going now. We have our files: File1 (430 million lines) and File2 (440 million lines).

So first let us create the necessary tables:

mysql> CREATE TABLE f1_list (f1_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, text_line TEXT NOT NULL, INDEX (text_line(400) ASC));
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE f2_list (f2_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, text_line TEXT NOT NULL, INDEX (text_line(400) ASC));
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE common_list (cl_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, f1_id INT NOT NULL, f2_id INT NOT NULL, text_line TEXT NOT NULL, INDEX (f1_id ASC), INDEX (f2_id ASC), INDEX (text_line(400) ASC));
Query OK, 0 rows affected (0.00 sec)

mysql>

Now let us populate f1_list and f2_list with the contents of File1 and File2 respectively. You will need to run roughly the following commands using full paths to File1 and File2:

mysql> LOAD DATA LOCAL INFILE "/home/my_user/big_files/File1" INTO TABLE f1_list (text_line);
Query OK, 430000000 rows affected (8 hours 13 min 10.01 sec)
Records: 430000000 Deleted: 0 Skipped: 0 Warnings: 0

mysql> LOAD DATA LOCAL INFILE "/home/my_user/big_files/File2" INTO TABLE f2_list (text_line);
Query OK, 440000000 rows affected (10 hours 01 min 15.0 sec)
Records: 440000000 Deleted: 0 Skipped: 0 Warnings: 0

mysql>

The times are roughly consistent with what I saw; I'd estimate the import times as 8-12 hours per file.

Now let us extract the commonalities into common_list:

mysql> INSERT INTO common_list (f1_id, f2_id, text_line) SELECT f1_list.f1_id,
f2_list.f2_id, f1_list.text_line FROM f1_list, f2_list WHERE f1_list.text_line = f2_list.text_line;
Query OK, 427677003 rows affected, 2 warnings (16 hours 12 min 16.58 sec)
Records: 427677003 Duplicates: 0 Warnings: 0

mysql>

Now let us extract the exclusive content:

mysql> DELETE FROM f1_list WHERE f1_id IN (SELECT f1_id FROM common_list);
Query OK, 427001324 rows affected (16 hours 24 min 41.20 sec)

mysql> DELETE FROM f2_list WHERE f2_id IN (SELECT f2_id FROM common_list);
Query OK, 427001129 rows affected (16 hours 39 min 44.47 sec)

mysql>

And so by now it looks like we are done. We have content exclusive to List 1 listed in f1_list, content exclusive to List 2 in f2_list and the common content in common_list. The total processing time - even if the operations are performed serially as delineated above - can be capped at about 76 hours. I used a little padding there, too. For instance, I counted 20 hours for the processes that in reality took about 16. Thus it appears realistic to sort this mass of data within 3-4 days using a machine which in this day and age would be considered substandard in terms of its performance.

Thursday, June 30, 2011

NFS on Mac

It is really strange that NFS on Mac OS X is an absolute mess even though Mac OS X is UNIX-based and UNIX/Linux machines NFS mount to each other easily and conveniently.

I just moved my NFS server from SLES 10 to Cent OS 5.5 and guess what - Macs just want nothing to do with it. Oh, well, to be continued.

Thursday, October 8, 2009

Solaris the tree killer no more

In one of my earlier posts I complained about being unable to turn off the banner page for print jobs sent from a Solaris print client to a printer controlled by a Linux print server. Soon thereafter we found a solution which I have been forgetting to talk about in this space... But, as they say, better late than never.

The solution is basically to tell the server (Linux) machine to forget that banner page. The way to do that is as follows.

1) Make sure you have cups-lpd installed. Solaris seems to only speak lpd, I am not sure how one could enable it to interface to a cups server directly.

2) On our CentOS machine we had to use the following syntax in /etc/xinetd.d/cups:

service printer
{
socket_type = stream
protocol = tcp
wait = no
user = lp
group = sys
passenv =
server = /usr/lib/cups/daemon/cups-lpd
server_args = -o document-format=application/octet-stream -o job-sheets=none
}

Emphasis on the "-o job-sheets=none" part.

A good yum tip

Apparently works on CentOS when you are trying to run an update/upgrade and dependency issues pop up. Don't know if it works under all circumstances - but definitely worked for both Shahbaz Javeed here and for yours truly just a few minutes ago. Like Shahbaz, I am also running CentOS 5.3 on the machine in question.

What you do is:

# yum clean all

Yes, it was this simple - at least this time!

Boris Epstein's Technology Blog

My Other Sites

Blog Archive

About Me

Labels