The client machine is Debian running the same kernel and OFED-1.3 and
lustre 1.6.4.3.
The MDT and OST are both single partitions on the the same disk (yes,
I know this is not optimal...)
The network uses Mellanox ConnectX HCAs running through a Voltaire
ISR2004 switch.
The basic RDMA setup seems to work in either direction:
murray@nasnu3:/slut$ ib_rdma_bw 192.168.3.50 (Lustre server)
5605: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 |
iters=1000 | duplex=0 | cma=0 |
5605: Local address: LID 0x07, QPN 0x22004e, PSN 0x323aa6 RKey
0x1a002800 VAddr 0x002aaaaaad6000
5605: Remote address: LID 0x05, QPN 0x8004f, PSN 0x67c28c, RKey
0x8002800 VAddr 0x002aaaab705000
5605: Bandwidth peak (#0 to #985): 1332.53 MB/sec
5605: Bandwidth average: 1332.47 MB/sec
5605: Service Demand peak (#0 to #985): 1462 cycles/KB
5605: Service Demand Avg : 1462 cycles/KB
[murray@lusty bin]$ ib_rdma_bw 192.168.3.30 (Lustre client)
3845: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 |
iters=1000 | duplex=0 | cma=0 |
3845: Local address: LID 0x05, QPN 0xa004f, PSN 0x4f4712 RKey
0xa002800 VAddr 0x002aaaab705000
3845: Remote address: LID 0x07, QPN 0x24004e, PSN 0xa740c1, RKey
0x1c002800 VAddr 0x002aaaaaad6000
3845: Bandwidth peak (#0 to #956): 1533.5 MB/sec
3845: Bandwidth average: 1533.43 MB/sec
3845: Service Demand peak (#0 to #956): 1146 cycles/KB
3845: Service Demand Avg : 1146 cycles/KB
Local disk speed on the Lustre server seems fine, as does speed when
the Lustre machine writes
to the Lustre mounted drive (50-80 MB/s).
[murray@lusty slut]$ dd if=/dev/zero of=foo bs=1048576 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 13.5875 seconds, 79.0 MB/s
Performance of the client machine writing to the Lustre drive is poor
(12 MB/s)
murray@nasnu3:/slut$ mount -t lustre -l
192.168.3.50@o2ib:/lusty on /slut type lustre (rw)
murray@nasnu3:/slut$ dd if=/dev/zero of=foo bs=1048576 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 88.9857 seconds, 12.1 MB/s
Similar results from using Bonnie++ for the testing.
Any ideas as to what might be going on?
Thanks,
murray
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
What is you Lustre network configurations (i.e. lnet options)? If not
sure, what's the output of 'lctl list_nids' on the client and the
server?
Isaac
1. On both client and server: modprobe lnet_selftest
2. On the client:
export LST_SESSION=$$
lst new_session --timeo 100000 test
lst add_group s 192.168.3.50@o2ib
lst add_group c 192.168.3.30@o2ib
lst add_batch bw
lst add_test --batch bw --loop 500 --concurrency 8 --distribute 1:1 --from c --to s brw write size=1M
lst run bw
lst stat c
Isaac
On Tue, Jun 03, 2008 at 07:39:44AM -0400, Murray Smigel wrote:
> ......
> **
> On server: (I have tried both msi_x=1 and 0 with no difference)
> modprobe.conf -->
> options lnet networks="tcp,o2ib"
> options mlx4_core msi_x=0
> alias ib0 ib_ipoib
> alias ib1 ib_ipoib
> [root@lusty lustre-iokit-1.2]# lctl list_nids
> 192.168.1.94@tcp
> 192.168.3.50@o2ib
> On client:
> /etc/modprobe.d/lustre -->
> options lnet networks="tcp,o2ib"
> options mlx4_core msi_x=1
> alias ib0 ib_ipoib
> alias ib1 ib_ipoib
> nasnu3:/home/murray/lustre-iokit-1.2# lctl list_nids
> 192.168.1.156@tcp
> 192.168.3.30@o2ib
Known bug, and already fixed:
https://bugzilla.lustre.org/show_bug.cgi?id=14300
> ......
> lusty-MDT0000-mdc-ffff810226207000: Connection restored to service
> lusty-MDT0000 using nid 192.168.1.94@tcp.
It seemed to me that Lustre was actually using the @tcp network, which
could be a Lustre configuration issue.
Isaac
Thanks,
murray
options lnet networks="o2ib"
options mlx4_core msi_x=1
alias ib0 ib_ipoib
alias ib1 ib_ipoib
Now I am seeing write bw > 70 MB/sec.
Thanks,
murray
On Mon, Jun 02, 2008 at 01:40:20PM -0400, Murray Smigel wrote:Hi, I have built a simple lustre setup. MDS and OSS are both on a Centos5 machine using the red hat lustre modified kernel 2.6.18-8.1.14.el5_lustre.1.6.4.1 running OFED-1.3. Lustre is 1.6.4.3. The client machine is Debian running the same kernel and OFED-1.3 and lustre 1.6.4.3. The MDT and OST are both single partitions on the the same disk (yes, I know this is not optimal...) The network uses Mellanox ConnectX HCAs running through a Voltaire ISR2004 switch.What is you Lustre network configurations (i.e. lnet options)? If not sure, what's the output of 'lctl list_nids' on the client and the server?
Isaac