Archive for June, 2008

Gigabit Network Performance ??

Thursday, June 5th, 2008

Are you really seeing the full benefit of your gigabit network infrastructure? Today we’re trying to figure out why it seems the private network interconnect links used by our Oracle RAC are causing problems.

First thing is to verify that the NIC ports appear to be configured properly:

root@backup01
/backup/mysql_snapshots >ethtool eth0
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes

Then we’ll move on to some basic link testing. My two favorite utilities for testing network throughput at a relatively low level are:

Here’s a quick example of how both are used…. first we’ll use ttcp:

Set up the receiving server…

root@images01
~ >ttcp -r -s -f M

Then run the test from the transmitting server…

root@backup01
~ >ttcp -t -s -l 16384 -n 16384 -f M 10.0.0.7
ttcp-t: buflen=16384, nbuf=16384, align=16384/0, port=5001  tcp  -> 10.0.0.7
ttcp-t: socket
ttcp-t: connect
ttcp-t: 268435456 bytes in 2.26 real seconds = 113.51 MB/sec +++
ttcp-t: 16384 I/O calls, msec/call = 0.14, calls/sec = 7264.61
ttcp-t: 0.0user 0.3sys 0:02real 15% 0i+0d 0maxrss 0+5pf 199+2csw

Now we can use the handy bandwidth units conversion tool found here to see that 113.51 MB/sec equals 0.9522 Gbps (gigabits/sec). Nice – nearly the full 1.0 Gbps we wanted to see! I ran the above test with various values for -l and -n with similar results.

Let’s test again – this time using iperf:

Set up the receiving server…

root@images01
~ >iperf -s

Then run the test from the transmitting server…

root@backup01
~ >iperf -c 10.0.0.7
------------------------------------------------------------
Client connecting to 10.0.0.7, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.6 port 52323 connected with 10.0.0.7 port 5001
[  3]  0.0-10.0 sec  1.09 GBytes    936 Mbits/sec

Again, nearly the full 1.0 Gbps is being achieved!

However, keep in mind that these tests do not take things into account which happen higher up in the protocol stack – such as when you’re using SCP or FTP to copy files across the network.

For example, I checked the speed of an SCP file transfer between the same 2 servers that I used above. You may be surprised to see what appears to be terrible performance like this:

root@backup01
/backup/mysql_snapshots >scp mysql_replicated.2008-06-05-040005.tgz images01:/home/
mysql_replicated.2008-06-05-040005.tgz                                    100%  334MB  33.4MB/s   00:10

Hmmm…. 33.4MB/s is only 0.2802 Gbps. What happened?! Actually, a lot of extra things happened all of which eat up time but the single biggest one is the added overhead of the SSH encryption used during the SCP copy job.

The point being – when you’re examining network performance, you must remember to take EVERYTHING into account. Just having a gigabit switch and some el-cheapo gigabit network adapters is no guarantee of anything. Here’s a VERY incomplete list of things to consider:

  • How are your adapters connected to the system? 64-bit PCI slot filled with a high-end NIC? A port built into the main board? A $12 “gigabit NIC” you found at the local wholesale supplier?
  • Hard drive speeds in the case of copying files or doing anything else involving read/write operations to disk storage
  • What else is happening concurrently?
  • Using the correct drivers?
  • Are you certain the patch cables are OK?
  • Are you certain the ports are in fact set at 1000Mbit/sec speed?
  • Is there anything strange happening at the switch level? VLAN misconfigured? Routing problem?
  • have you eliminated the possibility of DNS trouble? Tried the IP address(es) directly?

If gigabit still isn’t fast enough, step up to link aggregation per 802.3ad (along with the more costly class of high-end switches you’ll need to take full advantage of it). If your budget is flexible enough check out the newer 10 gigabit options.

There’s plenty of additional information online about these topics – google.com is your friend.