DFS direct link vs. broker (back of the envelope)

0 views

Skip to first unread message

Doug Judd

unread,

Apr 13, 2008, 11:26:59 PM4/13/08

to hyperta...@googlegroups.com

There has been some discussion recently about how the system would perform if linked directly with libhdfs as opposed to the current broker architecture. See Hypertable and HDFS. The following is an attempt at a "back of the envelope" calculation for the performance penalty incurred by moving data through the DFS broker process.

As a concrete example, let's use the AOL query logs. The raw input file is 1.5 GB. Let's assume that the additional work required to send the data over to the broker is two memory copies. One to copy the data from user space into the kernel and the other to copy the data from the kernel into the broker's user space. Here we're assuming that the TCP/IP protocol stack is smart enough to move buffers directly from a process's send buffer to another process's receive buffer on the same machine without copying. We're also assuming that the ACK overhead is negligible.

On one of our newer Opterons, the L2 cache line size is 128 bytes and the L2 miss latency is ~150 clock cycles (see calibrator). Calibrator also reports a "replace time" of about 150 clock cycles, which, according to the calibrator documentation:

"... is the penalty for a cache-miss on a busy bus, i.e., when each miss is immediately followed by the next one ..."

Assuming that both apply, the cost of reading a 128-byte cache line is ~300 clock cycles. Let's also assume that the cost of writing a 128-byte cache line is also ~300 clock cycles, therefore the total cost to copy a 128-byte cache line is ~600 clock cycles.

Let's also assume that the amount of data copied for a single scan of the data is approximately equal to half the size of the input file (assuming compression ratio of .5). So, the number of 128-byte cache lines that are copied would be equal to:

(1500000000 / 2) / 128 = 5859375

It follows that the cost of the two memory copies is:

2 copies * 600 cycles / copy * 5859375 = 7,031,250,000 cycles

At 2GHz, that's about 3.5 seconds. The "SELECT *" scan test for the AOL query logs ran in approximately 110 seconds, which puts the overhead around 3%.

The impact on the write scenario would be less. Even though there is approximately 4X the amount of writing (commit log write plus compacations), the overhead gets amortimzed over 8 CPUs. So the overall additional time added to the write test would be closer to 1.75 seconds.

- Doug

Reply all

Reply to author

Forward

0 new messages