Archive for the 'Ruby/Rails' Category

NetBeans 6.0 IDE ?

Saturday, January 12th, 2008

Did Jerry actually just mention an IDE? WTF? Surely the Four Horsemen of the Apocalypse have arrived. Anyone that knows me is probably already working to figure out the Baker Act procedure to have me committed before I hurt myself. I’ve been a died-in-the-wool-hard-core user of the vi editor for many years and the mere mention of me considering an IDE will have many people looking for safe cover. I must confess to having been a heavy IDE user many years ago when I was writing a lot of software for the Windows platform – but not many people know that about me. Since then I’ve worked on many non-Windows projects ranging in size from tiny to huge and written every single line of code for them using vi.

After being submerged down in the code soup, server administration, networking, and many other things for so long – I haven’t had time until now to check out the state of the various IDE tools to see if any of them sucked a lot less than the last time I looked at them. Even now I really don’t have the time for this, but the mental distraction is welcome plus the hope of finding something that truly works well and maybe improve productivity is worth it.

First stop – Eclipse. I’ve tried it several times in the distant past, didn’t like it then, and found I still don’t like it now. Several things that should have worked just didn’t work and I didn’t feel like doing the legwork to fix them. I know it’s a fine tool that is used and respected by many – but not for me today.

I suppose I should point out that I’m currently running Fedora 8 as my primary workstation OS.

Next I tried the new Netbeans 6.0. Not really all that new considering 6.0 has been out for a while, but still new to me. Perhaps it was luck, but the initial installation went very smoothly without any of the Java-related dependency problems I somewhat expected to pop up. In only a few minutes I installed it and had the code for my existing Ruby-on-Rails project loaded into it and was exploring the UI. My friend Adam had mentioned it supported vi key bindings so I searched for a plug-in and found jVi – a vi editor clone and installed it. WOW. There I was – using the most commonly needed vi commands while inside the very attractive NetBeans UI.

Ok – I needed to get back on track here and start figuring out why and how this thing sucks. I was looking forward to some disappointment with either the SVN integration or the debugging. Surely one of them would fail miserably!

Our SVN repository access is via svn+ssh with the magic of public key authentication – which makes getting to the secure repository just about as easy as it possibly could be. A half-hour later I had to conclude that NetBeans’ built-in SVN tool set was very nice indeed. I loved how the diff tool works in that I can actually edit and save the current file right there in the diff window (using standard vi commands too!). This would be great for adding those last few code comments just before committing the code back to the repository.

Looking forward to some disappointment with the debugger, I first needed to see what the work-flow would be like if I used the “Run Project” (F6) functionality. Since it wanted to run Webrick (which doesn’t support SSL) I had to make a few very minor tweaks to my code such that no redirects to an SSL page happen when RAILS_ENV == ‘development’. All was well. Very well actually.

Somewhere along the way I got distracted and watched the debugging example video. I clearly remember thinking – yeah right – it would take me all day to solve dozens of dependency problems before _I_ would be allowed to debug like they showed in that video. In reality, it took me maybe 5 minutes to deal with a few minor problems (totally unrelated to Netbeans) installing the ruby-debug-ide gem.

A few minutes later I was setting breakpoints in my code, stepping into the abyss that is the Rails stack, stepping back out into my own code, watching variables – and it wasn’t sucking. Matter of fact, it was a lot nicer than doing it the “old way” where I might be using PrettyPrint to view some big ugly hash variable and getting a headache trying to find the hash element I was looking for in the console output.

I later discovered many other useful things that just work as expected. Code completion works well – even to the point where it automatically constructs all the find_by_somefieldname method names for you. Some things will always be much faster for me at the command line – but for now I’ll be sticking with NetBeans 6.0.

Be sure to check out the tutorials and demos at: http://www.netbeans.org/kb/trails/ruby.html

Lighttpd, 500 errors, and FCGI load balancing… (part 2)

Friday, January 11th, 2008

So I very recently upgraded Lighttpd on 3 production application servers to see what, if any, effect it would have on users seeing error 500 and/or the FCGI processes dying. I would be looking for lines similar to this in the lighttpd error logs:

2008-01-10 22:26:56: (mod_fastcgi.c.2462) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7012

Time for some command line magic to figure out the death count on the FCGI processes in the recent past. I ran this command on each of the servers:

for d in 01 02 03 04 05 06 07 08 09 10; do echo -n "2008-01-$d   "; tail -20000 lighttpd.error.log | grep died | grep -c "2008-01-$d"; done

For the 3 servers, results were:

Server: app01
2008-01-01   39
2008-01-02   115
2008-01-03   571
2008-01-04   119
2008-01-05   98
2008-01-06   101
2008-01-07   882
2008-01-08   100
2008-01-09   52
2008-01-10   4
2008-01-11   7

Server: app02
2008-01-01   27
2008-01-02   21
2008-01-03   38
2008-01-04   136
2008-01-05   114
2008-01-06   48
2008-01-07   62
2008-01-08   138
2008-01-09   29
2008-01-10   55
2008-01-11   7

Server: app03
2008-01-01   19
2008-01-02   26
2008-01-03   236
2008-01-04   51
2008-01-05   28
2008-01-06   38
2008-01-07   49
2008-01-08   300
2008-01-09   59
2008-01-10   6
2008-01-11   3

When I saw the result from app01, I got excited until I looked at 55 failures for app02 on 2008-01-10. However, early indications are that a substantial reduction was made in the failures as a by-product of fully utilizing all the available FCGI processes (even though I haven’t done a thing _yet_ toward figuring out why they’re dying in the first place).

I will keep monitoring this as I continue the search for a permanent solution. I am considering an architecture change to eliminate FCGI and start using the new mod_proxy_core in Lighttpd 1.5 in front of multiple Mongrels or maybe even try out Thin.

Lighttpd, 500 errors, and FCGI load balancing… (part 1)

Wednesday, January 9th, 2008

Today’s challenge is to figure out why some of our site’s visitors are occasionally seeing status code 500 (Internal Server Error). First stop is the Lighttpd error logs from one of the application servers where I found entries like this:

2008-01-09 11:21:43: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:21:42: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:21:43: (mod_fastcgi.c.3181) response not received, request sent: 1386 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:21:42: (mod_fastcgi.c.3181) response not received, request sent: 1497 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:47:11: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:47:11: (mod_fastcgi.c.3181) response not received, request sent: 1336 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:47:11: (mod_fastcgi.c.2396) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7000
2008-01-09 11:47:11: (mod_fastcgi.c.3181) response not received, request sent: 1295 on socket: tcp:127.0.0.1:7000 for /dispatch.fcgi , closing connection
2008-01-09 11:47:12: (mod_fastcgi.c.2689) establishing connection failed: Connection refused socket: tcp:127.0.0.1:7000
2008-01-09 11:47:12: (mod_fastcgi.c.2689) establishing connection failed: Connection refused socket: tcp:127.0.0.1:7000
2008-01-09 11:47:18: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000
2008-01-09 11:47:18: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000

Looks like Lighttpd is occasionally handing requests off to fcgi processes that have died. Simple enough…. but why are they dying?

Next stop is a look at some output from this command:

ps -eFH | grep fcgi | grep -v grep

as shown here:

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
lighttpd 30189     1  0 20492 23320   3 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/script/process/spawner fcgi -p 7000 -i 20 -r 5
lighttpd 30200     1  0 101967 219912 2 Jan08 ?        00:01:01   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30202     1  0 101364 217368 2 Jan08 ?        00:00:20   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30204     1  0 58803 57416   0 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30206     1  0 58805 57416   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30208     1  0 58805 57420   0 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30210     1  0 58803 57416   2 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30212     1  0 58802 57412   2 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30214     1  0 58803 57416   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30216     1  0 58805 57420   1 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30218     1  0 58803 57416   0 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30222     1  0 58805 57420   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30225     1  0 58805 57420   2 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30227     1  0 58804 57416   2 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30229     1  0 58805 57420   1 Jan08 ?        00:00:02   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30231     1  0 58803 57416   3 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 30233     1  0 58805 57416   1 Jan08 ?        00:00:03   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 32106     1  1 103389 225564 3 11:08 ?        00:01:15   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 32164     1  0 100314 213196 3 11:08 ?        00:00:16   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd  5059     1 44 106262 237152 0 11:47 ?        00:17:53   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd  8606     1 14 98805 207184  3 12:26 ?        00:00:08   /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi

Our current production deployment stack is a simple FCGI configuration. Each application server has 20 instances of dispatch.fcgi waiting for Lighttpd to hand off requests to them. They service the request, return the result to Lighttpd, then Lighttpd returns the request to the user’s browser.

In our case, more than half of the dispatch.fcgi processes appear to be untouched according to the memory usage indicated in the RSS column (meaning they have not yet “bloated” their memory consumption beyond the initial amount). Since first deploying the application many months ago I’ve always known this was the case and I always assumed it was due to Lighttpd not getting enough concurrent requests to need to call upon those idle processes.

Timestamp values in the STIME column indicate we have 4 fcgi processes which were recently started by spawner. Spawner (the first process shown in the above list) simply checks to make sure the specified number of processes exist and spawns new processes if needed. In an ideal world, we’d start 20 processes and they would faithfully handle requests forever…. but such is not the case.

The astute observer will have noticed the processes which are being used are all consuming 200+ MB of RAM each! We can probably attribute much of the heavy RAM usage to the typically inefficient manner in which in-memory objects are created when ActiveRecord, for the sake of convenience, loads many columns of data into memory we aren’t going to need at that moment. This is another problem to be discussed and debated (at length) at some other time.

I was running an older version of Lighttpd (1.4.13) on these servers and hadn’t bothered to upgrade because 1) it has been working pretty well, and 2) I’ve had plenty of other things to do. On the off-chance that Lighttpd might be at fault here I decided to have a look at the change logs for the newer versions of Lighttpd. A few things in the change logs for the newer versions caught my eye:

from the 1.4.14 change log:
  * fix http 500 errors (colin.stephen/at/o2.com) #1041 [1663]
  * Added round-robin support to mod_fastcgi [1500]
  * fix cpu hog in certain requests [1473] CVE-2007-1869from the 1.4.16 change log:
  * fixed check on stale errno values, which broke handling of broken fastcgi applications. (#1245)

from the 1.4.18 change log:
  * fixed FastCGI header overrun in mod_fastcgi (reported by mattias@secweb.se)

The first two I listed from 1.4.14 especially interested me…

So I downloaded the latest Lighttpd source (currently version 1.4.18) and compiled/installed it on one of the application servers to see if anything changes for the better. I immediately noticed something very interesting from the ps output as shown here:

UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
lighttpd 11946 1 0 20444 23248 3 Jan08 ? 00:00:02 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/script/process/spawner fcgi -p 7000 -i 20 -r 5
lighttpd 11959 1 0 126376 317356 3 Jan08 ? 00:06:43 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11961 1 0 106143 236516 1 Jan08 ? 00:06:19 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11965 1 0 103744 227076 2 Jan08 ? 00:05:36 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11967 1 0 103759 227156 1 Jan08 ? 00:05:14 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11969 1 0 102879 223588 1 Jan08 ? 00:04:23 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11971 1 0 103456 225780 2 Jan08 ? 00:05:15 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11979 1 0 105246 233092 3 Jan08 ? 00:05:40 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11981 1 0 103516 226188 3 Jan08 ? 00:04:55 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11986 1 0 103541 226244 1 Jan08 ? 00:04:21 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11988 1 0 103077 224392 2 Jan08 ? 00:04:27 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11990 1 0 102421 221764 2 Jan08 ? 00:04:45 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 18924 1 0 122312 301352 1 Jan08 ? 00:11:05 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 12774 1 9 104090 228480 3 10:03 ? 00:18:28 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 12776 1 4 103632 226648 3 10:03 ? 00:07:28 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 11144 1 4 102462 221940 1 11:25 ? 00:04:45 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 14011 1 4 102638 222600 1 11:44 ? 00:03:34 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 14515 1 4 104432 229508 3 11:49 ? 00:03:30 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 17360 1 4 102608 222544 3 12:16 ? 00:02:36 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 21006 1 7 103863 227544 1 12:46 ? 00:01:37 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi
lighttpd 23795 1 8 101678 218672 1 12:59 ? 00:00:49 /usr/bin/ruby /home/lighttpd/theinspirit.com/current/public/dispatch.fcgi

Based on values in the RSS column it seems that Lighttpd is now getting all of the fcgi processes involved. This is what the change log meant by “Added round-robin support to mod_fastcgi”. Now I’ve got 20 bloated processes in memory eating up almost 5 GB of RAM (around 230MB each). It’s a good thing I have 8GB in these servers!

Let’s have another look at the Lighttpd error log:

2008-01-09 11:05:13: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000
2008-01-09 11:05:14: (mod_fastcgi.c.2635) fcgi-server re-enabled: tcp:127.0.0.1:7000
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:15: (log.c.135) server stopped
2008-01-09 11:20:17: (log.c.75) server started
2008-01-09 11:33:15: (network_openssl.c.133) SSL: 5 -1 104 Connection reset by peer
2008-01-09 11:33:15: (connections.c.603) connection closed: write failed on fd 8
2008-01-09 11:44:38: (mod_fastcgi.c.2462) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7012
2008-01-09 11:44:38: (mod_fastcgi.c.3254) response not received, request sent: 855 on socket: tcp:127.0.0.1:7012 for /dispatch.fcgi , closing connection
2008-01-09 13:11:15: (server.c.1253) NOTE: a request for /dispatch.fcgi timed out after writing 34776 bytes. We waited 360 seconds. If this a problem increase server.max-write-idle
2008-01-09 13:11:27: (server.c.1253) NOTE: a request for /dispatch.fcgi timed out after writing 36064 bytes. We waited 360 seconds. If this a problem increase server.max-write-idle
2008-01-09 13:27:00: (mod_fastcgi.c.2462) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:7018
2008-01-09 13:27:00: (mod_fastcgi.c.3254) response not received, request sent: 1226 on socket: tcp:127.0.0.1:7018 for /dispatch.fcgi , closing connection
2008-01-09 13:28:56: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure

We can see where Lighttpd was restarted at 11:20:17, and unfortunately with some continuing instances of “unexpected end-of-file (perhaps the fastcgi process died)”.

Early indications are that we’re probably better off than before due to the now functional round-robin distribution of requests. As a by-product of the improved round-robin request handling I suspect I may have reduced the frequency of 500 errors seen by the users because Lighttpd will now send the request to the next fcgi process in the round-robin pool whereas before it was not using all 20 of the available processes. I will go ahead and install this latest version of Lighttpd on all the application servers.

However, the fcgi processes are still dying….. which means I get to figure out why.

The fun never ends.

Index your data already!

Thursday, November 29th, 2007

So I’m still toiling away building this Ruby-on-Rails based web application and decided to take a break from coding optimization and have a look at the MySQL slow query log files on the database servers.

Although I found several good examples of slowly executing queries, I’ll just talk about this one:

SELECT profiles.`id` AS t0_r0, profiles.`user_id` AS t0_r1, profiles.`type` AS t0_r2, profiles.`screen_name` AS t0_r3, profiles.`postal_code` AS t0_r4, profiles.`updated_at` AS t0_r5, profiles.`created_at` AS t0_r6, profiles.`real_name` AS t0_r7, profiles.`short_name` AS t0_r8, profiles.`city` AS t0_r9, profiles.`state` AS t0_r10, profiles.`hometown` AS t0_r11, profiles.`quotation` AS t0_r12, profiles.`bio` AS t0_r13, profiles.`statements` AS t0_r14, profiles.`achievements` AS t0_r15, profiles.`specialties` AS t0_r16, profiles.`inspirations` AS t0_r17, profiles.`country` AS t0_r18, profiles.`channel_id` AS t0_r19, profiles.`styles` AS t0_r20, profiles.`featured_at` AS t0_r21, profiles.`featured_by_id` AS t0_r22, profiles.`bookmarked_by_count` AS t0_r23, profiles.`group_members` AS t0_r24, profiles.`looking_for` AS t0_r25, profiles.`work_with` AS t0_r26, profiles.`favorite_events` AS t0_r27, profiles.`fan_of` AS t0_r28, profiles.`favorite_talent` AS t0_r29, profiles.`favorite_music` AS t0_r30, profiles.`favorite_films` AS t0_r31, profiles.`favorite_performances` AS t0_r32, profiles.`membership_criteria` AS t0_r33, profiles.`skills` AS t0_r34, profiles.`materials` AS t0_r35, profiles.`exhibitions` AS t0_r36, profiles.`views` AS t0_r37, profiles.`soapbox` AS t0_r38, profiles.`user_activated` AS t0_r39, profiles.`phone_number` AS t0_r40, profiles.`address` AS t0_r41, profiles.`role_cache` AS t0_r42, profiles.`genre_cache` AS t0_r43, profiles.`upload_total_bytes_cache` AS t0_r44, profiles.`upload_item_count_cache` AS t0_r45, profiles.`popularity_factor` AS t0_r46, profiles.`views_today` AS t0_r47, profiles.`portfolio_views_today` AS t0_r48, profiles.`bookmarkings_today` AS t0_r49, profiles.`acknowledgements_today` AS t0_r50, profiles.`popularity_data` AS t0_r51, profiles.`popularity_factor_raw` AS t0_r52, profiles.`display_admin_info` AS t0_r53, profiles.`places` AS t0_r54, profiles.`where` AS t0_r55, profiles.`who` AS t0_r56, profiles.`craziest` AS t0_r57, profiles.`associations` AS t0_r58, profiles.`primary_image_id` AS t0_r59, profiles.`custom_url_name` AS t0_r60, profiles.`custom_url_hits` AS t0_r61, profiles.`web_presence_data` AS t0_r62, profiles.`has_portfolio_image` AS t0_r63, profiles.`has_portfolio_audio` AS t0_r64, profiles.`has_portfolio_remote_video` AS t0_r65, profiles.`has_portfolio_writing` AS t0_r66, profiles.`last_portfolio_item_created_at` AS t0_r67, profiles.`has_portfolio_video` AS t0_r68, profiles.`spotlighted_at` AS t0_r69, profiles.`clan_name` AS t0_r70, genres.`id` AS t1_r0, genres.`channel_id` AS t1_r1, genres.`name` AS t1_r2, genres.`display` AS t1_r3, roles.`id` AS t2_r0, roles.`name` AS t2_r1, roles.`display` AS t2_r2, channels.`id` AS t3_r0, channels.`name` AS t3_r1, channels.`ad_click_id` AS t3_r2, channels.`ad_zone_id` AS t3_r3, channels.`indieclicks_zone_id` AS t3_r4, portfolios.`id` AS t4_r0, portfolios.`profile_id` AS t4_r1, portfolios.`type` AS t4_r2, portfolios.`created_at` AS t4_r3, portfolios.`updated_at` AS t4_r4
FROM profiles
LEFT OUTER JOIN profile_genres ON ( profiles.`id` = profile_genres.`profile_id` )
LEFT OUTER JOIN genres ON ( genres.`id` = profile_genres.`genre_id` )
LEFT OUTER JOIN profile_roles ON ( profiles.`id` = profile_roles.`profile_id` )
LEFT OUTER JOIN roles ON ( roles.`id` = profile_roles.`role_id` )
LEFT OUTER JOIN channels ON channels.id = profiles.channel_id
LEFT OUTER JOIN portfolios ON portfolios.profile_id = profiles.id
AND portfolios.`type` = 'FreePortfolio'
WHERE ((profiles.channel_id =9)
AND (profiles.user_activated =1))
AND (featured_at IS NOT NULL)
AND profiles.id IN ('1051301', '1047701', '805801', '1016601', '566501', '215501', '605501', '301901', '781401', '768001', '912401', '911301')
ORDER BY featured_at DESC

A Rails developer will immediately recognize that as a typical SQL statement created by ActiveRecord to support our insatiable desire for OOPH (Object Oriented Programming Happiness). There IS a potential dark side to the dot notation we’ve come to love.

Slow query log had this to say about it:

# Query_time: 5  Lock_time: 0  Rows_sent: 2847  Rows_examined: 5645986

Notice the Rows_examined value – 5,645,986 rows! It’s a good thing our database servers have a crap-ton of RAM in them along with super-fast RAID disk arrays :-)

I then manually re-executed that same query several times and it consistently needed 1 to 3 seconds to run. Time to EXPLAIN things a little more or rather have MySQL explain it for me. Running EXPLAIN on that same query revealed the need to index the profile_id field in two tables (profile_genres and profile_roles).

After taking a minute to create the new indexes (including making new migration files and whatnot), I ran EXPLAIN again and was pleased to see MySQL needed to look at only 18 rows of data now instead of 5,645,986. Furthermore, the SAME query that previously needed several seconds was now consistently finishing in 0.007 seconds.

How the hell did we let this happen?!?! Simple: when juggling too many balls some inevitably get dropped. Ok, that’s only part of it. Another part is resisting the temptation to do premature optimizations to your systems. Arguably, in this case, the indexes should have been created when those data associations were set up. This is why we use the slow query logging and must periodically check for surprises like this.

Keep in mind that data indexing comes at a price. Indexing can be a double-edged sword in the sense that those indexes must be maintained when writing data to indexed tables. It is quite possible, even easy, to kill database write performance if you go too far and index too many things trying to speed up read performance.

Checking your slow query logs frequently and making cautious indexing changes will help more than most people realize. Especially when your tables grow beyond a few thousand rows.

Concurrent Image Processing – Part One

Wednesday, November 7th, 2007

I’m building a new web site using the Ruby on Rails framework. Our users will be uploading countless image files (eventually there will be many millions of image files!). After each image file is uploaded, and depending on where that image will later be displayed, we must crop and resize it to make various sized thumbnail versions and display versions of the originally uploaded image file. Sounds simple, right? If only… In a perfect world:

  • The image would be optimized with respect to pixel dimensions and quality/compression. It would be no larger than the largest display size we use in the site and would be suitably compressed for display on a web page. This would reduce upload time, conserve filesystem space on the server, ensure the fastest possible page loading time, and reduce our bandwidth costs.
  • GIF, JPG, and PNG formats would be used appropriately.
  • They would all be RGB instead of CMYK. The CMYK vs RGB issues are complicated, but generally speaking CMYK is for print and RGB is for computer screens.

However our world isn’t so perfect because we see a lot of:

  • 8+ megapixel images straight from the user’s camera, each consuming several megabytes of disk space and taking many minutes for the user to upload.
  • Absolutely no attempt at JPG compression.
  • GIF files that most certainly needed to be JPG and vice versa.
  • the occasional CMYK file which doesn’t survive the conversion to RGB.

In reality, it’s just a whole lot easier if we let the user upload whatever they want and we’ll “fix” it on the server side. Understandably, most users have no clue about resizing, cropping, JPG compression, and all the other things we must deal with.

The real fun begins AFTER the file is uploaded. There are two general approaches for handling the server-side processing of the user’s uploaded image.

  1. Serialized processing: In the same process thread that is serving up the web page, we could do all the cropping, resizing, and compressing that is needed in a step-by-step serialized fashion followed by placing the resulting set of images on the image servers. There are 2 main problems with this approach: 1) The user’s browser session is literally put on hold and the browser will appear to be frozen during this processing and 2) During the processing, that connection to the web server is tied up and cannot service other requests thus making the application less scalable. This also happens to be the easiest way to do it, and in fact is how many sites handle such things.
  2. Concurrent processing: With almost no interruption of the user’s browsing session (other than the time it takes for the initial upload), we can place unprocessed uploads into a queue and have a SEPARATE process do all the work. This allows the user to continue using the site while their upload is being processed in the background on a different server that is dedicated to image processing. This processing approach is going to require more time to architect and implement.

For obvious reasons, I wanted to take the concurrent processing approach. To ensure that our application will scale up to millions of users, we are using a cluster of servers. Each of the servers handles a specific set of tasks related to front-end, image serving, image processing, audio serving, audio processing, database cluster, etc. The initial upload goes to the particular application server a particular user is hitting.

So, those are the issues….. what do we do?

Conceptually, the steps are actually simple and I’ll break it down by server type:

  • Application Servers: User uploads a new image (we call them assets) to whichever application server is handling their session. Behind the scenes, this consists of an application server receiving the uploaded file and placing the uploaded file in a pre-determined location on the application server which makes it available for “pick up” by an asset_processor program. The uploaded item is then inserted into an “asset processing queue”.
  • Asset Processing Servers: We have several servers each running multiple instances of an asset_processor.rb script which frequently looks in the “asset processing queue” for work to do. When a new item is found in the queue that isn’t already being processed, the asset_processor timestamps the started_at field for the item, copies the item to a local working directory and begins the processing of the item. When the processing is finished, the final set of files is copied over to the appropriate Asset Servers such that the world can get to them. After processing, the originally uploaded files are removed from the application server they were uploaded to and the new asset is made “active”. During the time period between the initial upload and completion of processing, the asset is considered to be “inactive” which causes our application to serve up placeholder slug images. As soon as the asset is made active by the processing script, the application magically starts serving up the processed images.
  • Asset Servers: we have images, audio, and (eventually) video asset servers. Fully processed assets are stored on these web servers.

Of course, no article about web development using Ruby on Rails is worth reading unless I include at least a little bit of code…

In “Part Two” of this article I will discuss several sections of the code doing some of the above described work.

Extracting email addresses with Ruby

Tuesday, October 16th, 2007

Today’s challenge was refactoring code we use for extracting email addresses from a contacts export file. The requirements were extraction from CSV, tab delimited, or LDIF files typically exported from your desktop email program’s address book.

The original developer’s approach was to go hunt down some rubygems or plug-ins with which to parse these files “properly” according to their file type. Turns out there’s plenty of ready made code available for this. The original developer’s approach was just fine and would have worked perfectly after we fixed a couple very minor problems.

Taking a step back and thinking about the problem brought me to the realization that all I really cared about was extracting unique email addresses from any file in which those email addresses are human-readable. Regular Expressions are the nearly perfect answer to this problem!

Here’s my solution presented as a one line standalone program that can take stdin (Standard Input). Put this in a file named email_extractor, place that file in your path, and make it executable:

#!/usr/bin/env ruby
# if you don't like my email matching regex, replace it with your own...
STDIN.read.gsub('mail=', '').scan(/[A-Z0-9._=%-]+@[A-Z0-9.-]+.[A-Z]{2,4}/i).uniq.sort.each { |email| puts email }

If you’re paying attention, you may be wondering why I have the gsub(‘mail=’, ”) in there. I found that LDIF export files contain lines like this which would cause my code to mistakenly return ‘mail=name@domain.com’ as an email address:

dn: mail=name@domain.com

Then invoke it like this and watch in amazement as any email address found in the source file is found, sorted, and displayed to stdout:

cat any_file_containing_email_addresses | email_extractor

Of course, you can take my code and wrap it up in a function for inclusion in your program and revise it as needed. For example, in my case today, I was refactoring some already existing code which returned a result in a specific array format. So I ended up with this one line solution to replace the previous 25 lines of code which returns the same array structure as the original function did:

def extract_emails_from_file
  @uploaded_file.read.gsub('mail=', '').scan(/[A-Z0-9._=%-]+@[A-Z0-9.-]+.[A-Z]{2,4}/i).uniq.sort.map{ |email| ['', email] }
rescue
  return []
end

This simple bit of refactoring resulted in a much more flexible email address importing solution because it will work with any file type. Plus it didn’t break any of the tests we’d already written!

Credit goes to my friend K. Adam Christensen for suggesting I apply some “Rubyisms” to condense my original 3 lines of code down to one line.

Linear Interpolation with Ruby

Monday, October 8th, 2007

Who needs curve-fitting when the much simpler Linear Interpolation will suffice?

# Linear interpolation. Takes two known data points, say (xa,ya) and (xb,yb),
# and the interpolant is given by:
#
#      y = ya + ((x - xa) * (yb - ya) / (xb - xa)) at the point (x,y)
#
# Arguments are:
#
#    ** known_data_points is a hash of key => value pairs where key is
#       the "x" values and value is the "y" values. Example hash:
#         {  0 => 10,
#           10 => 90,
#           95 => 280,
#          100 => 300 }
#
#    ** x is the known point somewhere in the range of "x" values
#
# Solves for y, a point somewhere in the range of supplied y values which
# is relative to the position of the x point in the supplied x values.
#
# Example results of calling the function with the above example hash
# and these known x values:
#     x = 5    => 50
#     x = 10   => 90
#     x = 50   => 179
#
# Thusly, you can supply as FEW or as MANY known data points as is needed to approximate
# any imaginable curve fitting equation. Simply keep in mind that linear interpolation is
# being performed between the two closest known points. You may only need 4 or 5 known data
# points to represent a gentle sloping curve WHEREAS you may need 15 or more known points
# to approximate a more aggressive curve shape.
#
def interpolate(known_data_points, x)

  xmin = 0
  xmax = 0

  # find the first known x value at or below the provided x value...
  known_data_points.keys.sort.reverse_each do |k|
    if k <= x
      xmin = k
      break
    end
  end

  # find the first known x value at or above the provided x value...
  known_data_points.keys.sort.each do |k|
    if k >= x
      xmax = k
      break
    end
  end

  # if supplied argument "x" is outside the range of known "x" values, bail out now!
  raise InterpolationError if x > xmax || x < xmin

  # if supplied argument "x" falls exactly on a known x data point, simply return
  # the relative y value now!
  return known_data_points[xmax] if x == xmax
  return known_data_points[xmin] if x == xmin

  # prevent divide by zero errors...
  if (xmax - xmin) == 0
    raise InterpolationError
  end

  # finally, interpolate and return the answer!
  return known_data_points[xmin] + (((x - xmin) * (known_data_points[xmax] - known_data_points[xmin])) / (xmax - xmin))

end

ActiveRecord vs. Me

Thursday, October 4th, 2007

Most of the time AR (ActiveRecord) makes Rails developers happy…. when they’re not thinking about (or suffering from) the depth of the call-stack involved. Don’t believe me? Fire up ruby-debug and start single-stepping down into the AR call-stack for the grand tour. While you’re at it, check out this nice ruby-debug tutorial at railscasts.com

For the past many weeks I have been keeping a close eye on a complicated data processing job that runs on the back-end of our site (TalentDatabase.com). Starting with the very first time it ran, I was never happy about how long it took to finish and less happy about the load it put on the application and database servers while running.

I had done everything “correctly” as far as Rails best-practices and AR were concerned…. but it was still unacceptably sloooowwwww.

In a nutshell, we have a table/model containing around 8,000 records and the program makes several iterative passes through the data while calculating the result for each record. From an Object Oriented Programming perspective, it was damn convenient to simply write code like this:

Foo.find(:all).each do |foo|
  foo.datafield = result
  foo.save
end

and then plow through the data doing what needed to be done while happily letting AR take care of all the database interaction. But this was one of those cases where all the extra baggage AR brings along with it was really slowing things down. A LOT.

Fact of the matter is, there was a combination of factors contributing to this problem.

  • AR does a lot of ugly hard work for you by assembling and executing some very scary looking SQL statements against your data server. In my case, the issue was with the UPDATE statements including every single field in the table…which in turn means MySQL must expend even more CPU cycles maintaining indexes. Index maintenance is a GOOD thing, except for the fact that I really only needed to change the value of a single indexed data field instead of forcing MySQL to do index maintenance on the other dozen or so fields that are also indexed in that table.
  • MySQL is certainly not at fault for doing whatever is needed to maintain my data indexes. However, it can bring an entire database-driven web site to a crawl if it’s too busy to quickly service browsing requests.
  • AR simply isn’t well suited for the kind of bulk data processing I was trying to do.

Finally, I’d had enough of it and decided it was time to try it without using AR. I revised the program to use ‘mysql’ directly and rolled my own SQL statements from top to bottom. The results were very pleasing, as follows:

Before, using ActiveRecord:

Finished pass #1 in 2.498 seconds
Finished pass #2 in 533.655 seconds
Finished pass #3 in 515.770 seconds
Processed 7563 records in 1051.923 seconds (7.190 per second)

After, with my own SQL:

Finished pass #1 in 1.206 seconds
Finished pass #2 in 3.426 seconds
Finished pass #3 in 2.112 seconds
Processed 7658 records in 6.744 seconds (1135.468 per second)

Considering that nothing changed other than how I’m accessing the database in this otherwise complicated sequence of calculations, this is a HUGE improvement in execution time.

Lost in a sea of code…

Monday, June 18th, 2007

After leading the charge for 10 months on our ever-growing Ruby on Rails based application, it has long since reached the point where I can easily waste several minutes looking for something either I or one of the other developers has worked on. Here’s a little bash script I wrote to make things faster and easier to find:

#!/bin/sh
# /usr/local/bin/grepp (note the extra p on the end)
# assuming I'm already in the root of my app directory, I only want to look in these
locations="app bin lib"

for location in $locations; do
  echo "-----------------------------------------------------"
  echo "Searching in $location/*"
  grep -ir "$1" $location/* | grep -v svn
done

Super-Simple and saves me a lot of time when I need to quickly dig into it and modify things!