Archive for January, 2008

HOWTO: Flash in x86_64 Linux

Tuesday, January 15th, 2008

Adobe has, for whatever reason, never released a 64-bit version of Flash. Thankfully, the 32-bit Firefox works just fine in 64-bit Linux – which means we can use the Flash player in spite of their short-sighted decision.

I know there are dozens of ways to do this. Here’s what works for me quick-and-easy in Fedora 8 (although it should work with most other flavors of Linux too):

1. Download the latest i686 (32-bit) version from http://www.mozilla.com/en-US/

2. “Install” Firefox as follows:

# uncompress and untar it...
tar zxvf firefox-2.0.0.11.tar.gz

# move the resulting firefox directory to /usr/local
mv firefox /usr/local/

# make sure it works...
cd /usr/local/firefox
./firefox

# close the browser for now...

On one of my workstations I didn’t already have the i386 version of compat-libstdc++ installed (which caused Firefox to complain about a missing libstdc++.so.5). Easy fix for that:

yum install compat-libstdc++-33.i386

3. Download the latest Flash player in tar.gz format from: http://www.adobe.com

4. “Install” Flash as follows:

# uncompress and untar it...
tar zxvf install_flash_player_9_linux.tar.gz

# give the resulting directory a shorter, more proper name...
mv install_flash_player_9_linux flash

# move it to /usr/local
mv flash /usr/local

# set up the needed symbolic link for firefox to find and use flash...
cd /usr/local/firefox/plugins/

ln -s /usr/local/flash/libflashplayer.so libflashplayer.so

5. Use your menu editor (in my case the KDE Menu Editor) to make sure the correct firefox gets launched from the menu. If you already had a version of firefox installed, it’s probably already set to use:

/usr/bin/firefox

Just change that to be:

/usr/local/firefox/firefox

6. Done! Launch Firefox and visit a site that uses Flash, such as youtube.com and enjoy.

Note: It is likely that you will need elevated privileges for the directory moving and symbolic linking operations. I simply do the whole thing as root to make it easy.

NetBeans 6.0 IDE ?

Saturday, January 12th, 2008

Did Jerry actually just mention an IDE? WTF? Surely the Four Horsemen of the Apocalypse have arrived. Anyone that knows me is probably already working to figure out the Baker Act procedure to have me committed before I hurt myself. I’ve been a died-in-the-wool-hard-core user of the vi editor for many years and the mere mention of me considering an IDE will have many people looking for safe cover. I must confess to having been a heavy IDE user many years ago when I was writing a lot of software for the Windows platform – but not many people know that about me. Since then I’ve worked on many non-Windows projects ranging in size from tiny to huge and written every single line of code for them using vi.

After being submerged down in the code soup, server administration, networking, and many other things for so long – I haven’t had time until now to check out the state of the various IDE tools to see if any of them sucked a lot less than the last time I looked at them. Even now I really don’t have the time for this, but the mental distraction is welcome plus the hope of finding something that truly works well and maybe improve productivity is worth it.

First stop – Eclipse. I’ve tried it several times in the distant past, didn’t like it then, and found I still don’t like it now. Several things that should have worked just didn’t work and I didn’t feel like doing the legwork to fix them. I know it’s a fine tool that is used and respected by many – but not for me today.

I suppose I should point out that I’m currently running Fedora 8 as my primary workstation OS.

Next I tried the new Netbeans 6.0. Not really all that new considering 6.0 has been out for a while, but still new to me. Perhaps it was luck, but the initial installation went very smoothly without any of the Java-related dependency problems I somewhat expected to pop up. In only a few minutes I installed it and had the code for my existing Ruby-on-Rails project loaded into it and was exploring the UI. My friend Adam had mentioned it supported vi key bindings so I searched for a plug-in and found jVi – a vi editor clone and installed it. WOW. There I was – using the most commonly needed vi commands while inside the very attractive NetBeans UI.

Ok – I needed to get back on track here and start figuring out why and how this thing sucks. I was looking forward to some disappointment with either the SVN integration or the debugging. Surely one of them would fail miserably!

Our SVN repository access is via svn+ssh with the magic of public key authentication – which makes getting to the secure repository just about as easy as it possibly could be. A half-hour later I had to conclude that NetBeans’ built-in SVN tool set was very nice indeed. I loved how the diff tool works in that I can actually edit and save the current file right there in the diff window (using standard vi commands too!). This would be great for adding those last few code comments just before committing the code back to the repository.

Looking forward to some disappointment with the debugger, I first needed to see what the work-flow would be like if I used the “Run Project” (F6) functionality. Since it wanted to run Webrick (which doesn’t support SSL) I had to make a few very minor tweaks to my code such that no redirects to an SSL page happen when RAILS_ENV == ‘development’. All was well. Very well actually.

Somewhere along the way I got distracted and watched the debugging example video. I clearly remember thinking – yeah right – it would take me all day to solve dozens of dependency problems before _I_ would be allowed to debug like they showed in that video. In reality, it took me maybe 5 minutes to deal with a few minor problems (totally unrelated to Netbeans) installing the ruby-debug-ide gem.

A few minutes later I was setting breakpoints in my code, stepping into the abyss that is the Rails stack, stepping back out into my own code, watching variables – and it wasn’t sucking. Matter of fact, it was a lot nicer than doing it the “old way” where I might be using PrettyPrint to view some big ugly hash variable and getting a headache trying to find the hash element I was looking for in the console output.

I later discovered many other useful things that just work as expected. Code completion works well – even to the point where it automatically constructs all the find_by_somefieldname method names for you. Some things will always be much faster for me at the command line – but for now I’ll be sticking with NetBeans 6.0.

Be sure to check out the tutorials and demos at: http://www.netbeans.org/kb/trails/ruby.html

Lighttpd, 500 errors, and FCGI load balancing… (part 2)

Friday, January 11th, 2008

So I very recently upgraded Lighttpd on 3 production application servers to see what, if any, effect it would have on users seeing error 500 and/or the FCGI processes dying. I would be looking for lines similar to this in the lighttpd error logs:

2008-01-10 22:26:56: (mod_fastcgi.c.2462) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7012

Time for some command line magic to figure out the death count on the FCGI processes in the recent past. I ran this command on each of the servers:

for d in 01 02 03 04 05 06 07 08 09 10; do echo -n "2008-01-$d   "; tail -20000 lighttpd.error.log | grep died | grep -c "2008-01-$d"; done

For the 3 servers, results were:

Server: app01
2008-01-01   39
2008-01-02   115
2008-01-03   571
2008-01-04   119
2008-01-05   98
2008-01-06   101
2008-01-07   882
2008-01-08   100
2008-01-09   52
2008-01-10   4
2008-01-11   7

Server: app02
2008-01-01   27
2008-01-02   21
2008-01-03   38
2008-01-04   136
2008-01-05   114
2008-01-06   48
2008-01-07   62
2008-01-08   138
2008-01-09   29
2008-01-10   55
2008-01-11   7

Server: app03
2008-01-01   19
2008-01-02   26
2008-01-03   236
2008-01-04   51
2008-01-05   28
2008-01-06   38
2008-01-07   49
2008-01-08   300
2008-01-09   59
2008-01-10   6
2008-01-11   3

When I saw the result from app01, I got excited until I looked at 55 failures for app02 on 2008-01-10. However, early indications are that a substantial reduction was made in the failures as a by-product of fully utilizing all the available FCGI processes (even though I haven’t done a thing _yet_ toward figuring out why they’re dying in the first place).

I will keep monitoring this as I continue the search for a permanent solution. I am considering an architecture change to eliminate FCGI and start using the new mod_proxy_core in Lighttpd 1.5 in front of multiple Mongrels or maybe even try out Thin.

Lighttpd, 500 errors, and FCGI load balancing… (part 1)

Wednesday, January 9th, 2008

Today’s challenge is to figure out why some of our site’s visitors are occasionally seeing status code 500 (Internal Server Error). First stop is the Lighttpd error logs from one of the application servers where I found entries like this:

2008-01-09 11:21:43: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:21:42: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:21:43: (mod_fastcgi.c.3181) response not received, request sent: 1386 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:21:42: (mod_fastcgi.c.3181) response not received, request sent: 1497 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:47:11: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:47:11: (mod_fastcgi.c.3181) response not received, request sent: 1336 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:47:11: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:47:11: (mod_fastcgi.c.3181) response not received, request sent: 1295 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:47:12: (mod_fastcgi.c.2689) establishing connection failed: Connection refused socket: tcp:127.0.0.1:7000
2008-01-09 11:47:12: (mod_fastcgi.c.2689) establishing connection failed: Connection refused socket: tcp:127.0.0.1:7000
2008-01-09 11:47:18: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000
2008-01-09 11:47:18: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000

Looks like Lighttpd is occasionally handing requests off to fcgi processes that have died. Simple enough…. but why are they dying?

Next stop is a look at some output from this command:

ps -eFH | grep fcgi | grep -v grep

as shown here:

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
lighttpd 30189     1  0 20492 23320   3 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/script/process/spawner fcgi -p 7000 -i 20 -r 5
lighttpd 30200     1  0 101967 219912 2 Jan08 ?        00:01:01   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30202     1  0 101364 217368 2 Jan08 ?        00:00:20   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30204     1  0 58803 57416   0 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30206     1  0 58805 57416   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30208     1  0 58805 57420   0 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30210     1  0 58803 57416   2 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30212     1  0 58802 57412   2 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30214     1  0 58803 57416   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30216     1  0 58805 57420   1 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30218     1  0 58803 57416   0 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30222     1  0 58805 57420   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30225     1  0 58805 57420   2 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30227     1  0 58804 57416   2 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30229     1  0 58805 57420   1 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30231     1  0 58803 57416   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30233     1  0 58805 57416   1 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 32106     1  1 103389 225564 3 11:08 ?        00:01:15   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 32164     1  0 100314 213196 3 11:08 ?        00:00:16   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd  5059     1 44 106262 237152 0 11:47 ?        00:17:53   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd  8606     1 14 98805 207184  3 12:26 ?        00:00:08   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi

Our current production deployment stack is a simple FCGI configuration. Each application server has 20 instances of dispatch.fcgi waiting for Lighttpd to hand off requests to them. They service the request, return the result to Lighttpd, then Lighttpd returns the request to the user’s browser.

In our case, more than half of the dispatch.fcgi processes appear to be untouched according to the memory usage indicated in the RSS column (meaning they have not yet “bloated” their memory consumption beyond the initial amount). Since first deploying the application many months ago I’ve always known this was the case and I always assumed it was due to Lighttpd not getting enough concurrent requests to need to call upon those idle processes.

Timestamp values in the STIME column indicate we have 4 fcgi processes which were recently started by spawner. Spawner (the first process shown in the above list) simply checks to make sure the specified number of processes exist and spawns new processes if needed. In an ideal world, we’d start 20 processes and they would faithfully handle requests forever…. but such is not the case.

The astute observer will have noticed the processes which are being used are all consuming 200+ MB of RAM each! We can probably attribute much of the heavy RAM usage to the typically inefficient manner in which in-memory objects are created when ActiveRecord, for the sake of convenience, loads many columns of data into memory we aren’t going to need at that moment. This is another problem to be discussed and debated (at length) at some other time.

I was running an older version of Lighttpd (1.4.13) on these servers and hadn’t bothered to upgrade because 1) it has been working pretty well, and 2) I’ve had plenty of other things to do. On the off-chance that Lighttpd might be at fault here I decided to have a look at the change logs for the newer versions of Lighttpd. A few things in the change logs for the newer versions caught my eye:

from the 1.4.14 change log:
  * fix http 500 errors (colin.stephen/at/o2.com) #1041 [1663]
  * Added round-robin support to mod_fastcgi [1500]
  * fix cpu hog in certain requests [1473] CVE-2007-1869from the 1.4.16 change log:
  * fixed check on stale errno values, which broke handling of broken fastcgi applications. (#1245)

from the 1.4.18 change log:
  * fixed FastCGI header overrun in mod_fastcgi (reported by mattias@secweb.se)

The first two I listed from 1.4.14 especially interested me…

So I downloaded the latest Lighttpd source (currently version 1.4.18) and compiled/installed it on one of the application servers to see if anything changes for the better. I immediately noticed something very interesting from the ps output as shown here:

UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
lighttpd 11946 1 0 20444 23248 3 Jan08 ? 00:00:02 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/script/process/spawner fcgi -p 7000 -i 20 -r 5
lighttpd 11959 1 0 126376 317356 3 Jan08 ? 00:06:43 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11961 1 0 106143 236516 1 Jan08 ? 00:06:19 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11965 1 0 103744 227076 2 Jan08 ? 00:05:36 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11967 1 0 103759 227156 1 Jan08 ? 00:05:14 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11969 1 0 102879 223588 1 Jan08 ? 00:04:23 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11971 1 0 103456 225780 2 Jan08 ? 00:05:15 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11979 1 0 105246 233092 3 Jan08 ? 00:05:40 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11981 1 0 103516 226188 3 Jan08 ? 00:04:55 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11986 1 0 103541 226244 1 Jan08 ? 00:04:21 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11988 1 0 103077 224392 2 Jan08 ? 00:04:27 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11990 1 0 102421 221764 2 Jan08 ? 00:04:45 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 18924 1 0 122312 301352 1 Jan08 ? 00:11:05 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 12774 1 9 104090 228480 3 10:03 ? 00:18:28 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 12776 1 4 103632 226648 3 10:03 ? 00:07:28 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11144 1 4 102462 221940 1 11:25 ? 00:04:45 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 14011 1 4 102638 222600 1 11:44 ? 00:03:34 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 14515 1 4 104432 229508 3 11:49 ? 00:03:30 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 17360 1 4 102608 222544 3 12:16 ? 00:02:36 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 21006 1 7 103863 227544 1 12:46 ? 00:01:37 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 23795 1 8 101678 218672 1 12:59 ? 00:00:49 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi

Based on values in the RSS column it seems that Lighttpd is now getting all of the fcgi processes involved. This is what the change log meant by “Added round-robin support to mod_fastcgi”. Now I’ve got 20 bloated processes in memory eating up almost 5 GB of RAM (around 230MB each). It’s a good thing I have 8GB in these servers!

Let’s have another look at the Lighttpd error log:

2008-01-09 11:05:13: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000
2008-01-09 11:05:14: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:17: (log.c.75) server started
2008-01-09 11:33:15: (network_openssl.c.133) SSL: 5 -1 104 Connection reset by peer
2008-01-09 11:33:15: (connections.c.603) connection closed: write failed on fd 8
2008-01-09 11:44:38: (mod_fastcgi.c.2462) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7012
2008-01-09 11:44:38: (mod_fastcgi.c.3254) response not received, request sent: 855 on socket: tcp:127.0.0.1:7012 for /dispatch.fcgi , closing connection
2008-01-09 13:11:15: (server.c.1253) NOTE: a request for /dispatch.fcgi timed out after writing 34776 bytes. We waited 360 seconds. If this a problem increase server.max-write-idle
2008-01-09 13:11:27: (server.c.1253) NOTE: a request for /dispatch.fcgi timed out after writing 36064 bytes. We waited 360 seconds. If this a problem increase server.max-write-idle
2008-01-09 13:27:00: (mod_fastcgi.c.2462) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7018
2008-01-09 13:27:00: (mod_fastcgi.c.3254) response not received, request sent: 1226 on socket: tcp:127.0.0.1:7018 for /dispatch.fcgi , closing connection
2008-01-09 13:28:56: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure

We can see where Lighttpd was restarted at 11:20:17, and unfortunately with some continuing instances of “unexpected end-of-file (perhaps the fastcgi process died)”.

Early indications are that we’re probably better off than before due to the now functional round-robin distribution of requests. As a by-product of the improved round-robin request handling I suspect I may have reduced the frequency of 500 errors seen by the users because Lighttpd will now send the request to the next fcgi process in the round-robin pool whereas before it was not using all 20 of the available processes. I will go ahead and install this latest version of Lighttpd on all the application servers.

However, the fcgi processes are still dying….. which means I get to figure out why.

The fun never ends.