Lighttpd, 500 errors, and FCGI load balancing… (part 2)

January 11th, 2008

So I very recently upgraded Lighttpd on 3 production application servers to see what, if any, effect it would have on users seeing error 500 and/or the FCGI processes dying. I would be looking for lines similar to this in the lighttpd error logs:

2008-01-10 22:26:56: (mod_fastcgi.c.2462) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7012

Time for some command line magic to figure out the death count on the FCGI processes in the recent past. I ran this command on each of the servers:

for d in 01 02 03 04 05 06 07 08 09 10; do echo -n "2008-01-$d   "; tail -20000 lighttpd.error.log | grep died | grep -c "2008-01-$d"; done

For the 3 servers, results were:

Server: app01
2008-01-01   39
2008-01-02   115
2008-01-03   571
2008-01-04   119
2008-01-05   98
2008-01-06   101
2008-01-07   882
2008-01-08   100
2008-01-09   52
2008-01-10   4
2008-01-11   7

Server: app02
2008-01-01   27
2008-01-02   21
2008-01-03   38
2008-01-04   136
2008-01-05   114
2008-01-06   48
2008-01-07   62
2008-01-08   138
2008-01-09   29
2008-01-10   55
2008-01-11   7

Server: app03
2008-01-01   19
2008-01-02   26
2008-01-03   236
2008-01-04   51
2008-01-05   28
2008-01-06   38
2008-01-07   49
2008-01-08   300
2008-01-09   59
2008-01-10   6
2008-01-11   3

When I saw the result from app01, I got excited until I looked at 55 failures for app02 on 2008-01-10. However, early indications are that a substantial reduction was made in the failures as a by-product of fully utilizing all the available FCGI processes (even though I haven’t done a thing _yet_ toward figuring out why they’re dying in the first place).

I will keep monitoring this as I continue the search for a permanent solution. I am considering an architecture change to eliminate FCGI and start using the new mod_proxy_core in Lighttpd 1.5 in front of multiple Mongrels or maybe even try out Thin.

Leave a Reply