Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Socket Direct Protocol: help (2)

10 views
Skip to first unread message

Andrea Gozzelino

unread,
Apr 12, 2010, 10:40:02 AM4/12/10
to
On Apr 12, 2010 10:14 AM, Andrea Gozzelino
<Andrea.G...@lnl.infn.it> wrote:

> Good morning,
>
> I'm testing some Neteffect cards (Intel code E10G81GP - Neteffect
> NE020.LP.1.SSR).
> PC has Linux| (kernel version) 2.6.18-164.15.1.el5 | x86_64 x86_64
> x86_64 GNU/Linux.
>
> In this phase, I measure the bandwidth with the netserver/nerperf
> (version netperf-2.4.5) ad hoc tests.
> They work fine with TCP protocol - as OFED 1.5.1 example programs -
> and
> they have some problems with SDP one.
>
> I'm trying test with the command lines below:
>
> server: LD_PRELOAD=/usr/local/lib64/libsdp.so netserver
>
> client: LD_PRELOAD=/usr/local/lib64/libsdp.so netperf -H
> server_address
> -c -C
> -- -m 65536
>
> The /etc/libsdp.conf file contains rules below:
> use both listen * *:*
> use both connect * *:*
> log min-level 9 destination file libsdp.log
>
> Client displays "Connection error: Can not allocate memory" and the
> connection fails.
> (original text on client log file:libsdp Error connect: failed for SDP
> fd:6 with error:Cannot allocate memory)
>
> The library path is:
> /usr/local/lib64/libsdp.so
>
>
> Could someone explain me how LD_PRELOAD environment variable must be
> set?
> I don't understand why the test work with TCP and not with SDP.
> Could I work with wrong Linux kernel environment or parameters?
>
> I don't know if there is a specific mailing list for SDP so I ask you
> help.
>
> Thank you very much,
> Andrea
>
>
>
>
>
>
>
> Andrea Gozzelino
>
> INFN - Laboratori Nazionali di Legnaro (LNL)
> Viale dell'Universita' 2
> I-35020 - Legnaro (PD)- ITALIA
> Tel: +39 049 8068346
> Fax: +39 049 641925
> Mail: andrea.g...@lnl.infn.it
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

Hi all,

I add that in kernel space SDP debug the error is:

command line: dmesg
sdp_init_qp:95 sdp_sock( 2100:2 40720:0): recv sge's. capability: 4
needed: 9
sdp_init_qp:95 sdp_sock( 2100:2 41203:0): recv sge's. capability: 4
needed: 9

The structure sdp_init_qp() is defined in
/usr/src/ofa_kernel-1.5.1/drivers/infiniband/ulp/sdp/sdp_cma.c (lines 76
- 141).

Could be a firmware problem?
I have this situation:
command line: ethtool -i eth2
driver: iw_nes
version: 1.5.0.0
firmware-version: 3.16
bus-info: 0000:03:00.0

Thank you very much,
Andrea
Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro (LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.g...@lnl.infn.it

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Tung, Chien Tin

unread,
Apr 13, 2010, 12:30:02 PM4/13/10
to
>I add that in kernel space SDP debug the error is:
>
>command line: dmesg
>sdp_init_qp:95 sdp_sock( 2100:2 40720:0): recv sge's. capability: 4
>needed: 9
>sdp_init_qp:95 sdp_sock( 2100:2 41203:0): recv sge's. capability: 4
>needed: 9
>
>The structure sdp_init_qp() is defined in
>/usr/src/ofa_kernel-1.5.1/drivers/infiniband/ulp/sdp/sdp_cma.c (lines 76
>- 141)

NE020 supports 4 SGEs. I don't know enough about SDP to know why it is
using this calculation for # of send_sge:

#define SDP_MAX_RECV_SKB_FRAGS (PAGE_SIZE > 0x8000 ? 1 : 0x8000 / PAGE_SIZE)


>Could be a firmware problem?

This is not a firmware problem.

Chien

Steve Wise

unread,
Apr 13, 2010, 12:40:02 PM4/13/10
to

> NE020 supports 4 SGEs. I don't know enough about SDP to know why it is
> using this calculation for # of send_sge:
>
> #define SDP_MAX_RECV_SKB_FRAGS (PAGE_SIZE > 0x8000 ? 1 : 0x8000 / PAGE_SIZE)
>
>

Chien, does the NE020 support FMRs? I looked at the nes ofed-1.5 code
and it appears to do nothing in the map_phys_fmr functions.

Tung, Chien Tin

unread,
Apr 13, 2010, 4:10:01 PM4/13/10
to
>> NE020 supports 4 SGEs. I don't know enough about SDP to know why it is
>> using this calculation for # of send_sge:
>>
>> #define SDP_MAX_RECV_SKB_FRAGS (PAGE_SIZE > 0x8000 ? 1 : 0x8000 / PAGE_SIZE)
>>
>>
>
>Chien, does the NE020 support FMRs? I looked at the nes ofed-1.5 code
>and it appears to do nothing in the map_phys_fmr functions.

We never implemented map_phys_fmr. Is it relevant to the # of SGEs?

Chien

Steve Wise

unread,
Apr 13, 2010, 4:20:01 PM4/13/10
to
Tung, Chien Tin wrote:
>>> NE020 supports 4 SGEs. I don't know enough about SDP to know why it is
>>> using this calculation for # of send_sge:
>>>
>>> #define SDP_MAX_RECV_SKB_FRAGS (PAGE_SIZE > 0x8000 ? 1 : 0x8000 / PAGE_SIZE)
>>>
>>>
>>>
>> Chien, does the NE020 support FMRs? I looked at the nes ofed-1.5 code
>> and it appears to do nothing in the map_phys_fmr functions.
>>
>
> We never implemented map_phys_fmr. Is it relevant to the # of SGEs?
>
No, but SDP uses FMRs. I don't think it will run without FMR support.

Tung, Chien Tin

unread,
Apr 13, 2010, 4:30:02 PM4/13/10
to
>>> Chien, does the NE020 support FMRs? I looked at the nes ofed-1.5 code
>>> and it appears to do nothing in the map_phys_fmr functions.
>>>
>>
>> We never implemented map_phys_fmr. Is it relevant to the # of SGEs?
>>
>No, but SDP uses FMRs. I don't think it will run without FMR support.
>


Good to know. Thanks.

Chien

Andrea Gozzelino

unread,
Apr 14, 2010, 5:00:01 AM4/14/10
to
On Apr 13, 2010 10:22 PM, "Tung, Chien Tin" <chien.t...@intel.com>
wrote:

> >>> Chien, does the NE020 support FMRs? I looked at the nes ofed-1.5
> >>> code
> >>> and it appears to do nothing in the map_phys_fmr functions.
> >>>
> >>
> >> We never implemented map_phys_fmr. Is it relevant to the # of SGEs?
> >>
> >No, but SDP uses FMRs. I don't think it will run without FMR support.
> >
>
>
> Good to know. Thanks.
>
> Chien
>
>

Hi Steve and Chien,

I understand that NE020 cards have problem with SDP connected with
map_phy_fmr (FMR stands for Fast Memory Region).
Is it possible to solve/fix this point?
If yes, have you an idea about the time that is necessary to code
development/build?
If no, can you suggest me a card that supports SDP protocol?

I work on NE020 cards from February 2010 for an INFN experimental
proposal, called REDIGO (Read out at 10 Gbits/s), about the data
acquisition and movement systems. The covergence of storage protocols
around 10 Gigabits/s Ethernet protocols shows that one way could be the
Remote Direct Memory Access (RDMA). The goals are the investigations of
latency time, the throughput, the buffer size schemes and finally the
global event building bandwidth.

Do you know if NE020 cards have problems with librdma (RDMA procedures,
in general) and / or with MPI versions?

Thank you very much,
Andrea

Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro (LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.g...@lnl.infn.it

--

Amir Vadai

unread,
Apr 14, 2010, 10:40:03 AM4/14/10
to
Hi,

FMR are being used only in a special mode called ZCopy.

You could disable this mode by setting the module paramter
sdp_zcopy_thresh to 0, or by issuing:
# echo 0 > /sys/module/ib_sdp/parameters/sdp_zcopy_thresh

This means that you won't get the benefits of Zero-copy.

- Amir

> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

Amir Vadai

unread,
Apr 14, 2010, 10:50:02 AM4/14/10
to
One more thing - Please open a bug regarding the num_sge limitation at:
https://bugs.openfabrics.org/

Thanks,
Amir

Steve Wise

unread,
Apr 14, 2010, 11:00:02 AM4/14/10
to
Hey Amir,

I don't think this helps because sdp_add_device() will not add rdma
devices that fail to create fmr pools.

So I guess you could key off of fmr pool failures and set
sdp_zcopy_thresh to 0 and allow the device to be used?

But what we really need is sdp support for fastreg_mrs as an
alternative to fmrs.


Steve.

Amir Vadai

unread,
Apr 14, 2010, 11:10:02 AM4/14/10
to
You are right - I missed it.

Andrea, Please open a bug at bugzilla (https://bugs.openfabrics.org) -
so that you will be notified as soon as I will fix SDP not use FMR if
not supported.

As to fastreg_mrs support - I don't know this mechanism. Do you mean FRWR?

Thanks,
Amir

Tung, Chien Tin

unread,
Apr 14, 2010, 11:10:02 AM4/14/10
to
>One more thing - Please open a bug regarding the num_sge limitation at:
>https://bugs.openfabrics.org/

Done, Bug 2027.

Chien

Steve Wise

unread,
Apr 14, 2010, 11:10:02 AM4/14/10
to
Tung, Chien Tin wrote:
>> One more thing - Please open a bug regarding the num_sge limitation at:
>> https://bugs.openfabrics.org/
>>
>
> Done, Bug 2027.
>
> Chien
>

And 2028 opened to request fastreg support.

Steve.

Steve Wise

unread,
Apr 14, 2010, 11:10:04 AM4/14/10
to
Amir Vadai wrote:
> You are right - I missed it.
>
> Andrea, Please open a bug at bugzilla (https://bugs.openfabrics.org) -
> so that you will be notified as soon as I will fix SDP not use FMR if
> not supported.
>
> As to fastreg_mrs support - I don't know this mechanism. Do you mean FRWR?
>
>

ib_alloc_fast_reg_mr(), ib_alloc_fast_reg_page_list() and friends, plus
the IB_WR_FAST_REG_MR work request.

Amir Vadai

unread,
Apr 14, 2010, 11:20:03 AM4/14/10
to
ok - actually I used it in an early version of SDP before changing to FMR...

- Amir

Andrea Gozzelino

unread,
Apr 14, 2010, 11:20:03 AM4/14/10
to
Thank you very much.
I check the status of 2027 and 2028 bugs.

Andrea


n Apr 14, 2010 05:05 PM, Steve Wise <sw...@opengridcomputing.com> wrote:

> Tung, Chien Tin wrote:
> >> One more thing - Please open a bug regarding the num_sge limitation
> >> at:
> >> https://bugs.openfabrics.org/
> >>
> >
> > Done, Bug 2027.
> >
> > Chien
> >
>
> And 2028 opened to request fastreg support.
>
> Steve.
> --

> To unsubscribe from this list: send the line "unsubscribe linux-rdma"


> in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro (LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.g...@lnl.infn.it

--

Tung, Chien Tin

unread,
Apr 14, 2010, 2:50:01 PM4/14/10
to
>Tung, Chien Tin wrote:
>>> One more thing - Please open a bug regarding the num_sge limitation at:
>>> https://bugs.openfabrics.org/
>>>
>>
>> Done, Bug 2027.
>>
>> Chien
>>
>
>And 2028 opened to request fastreg support.
>


I am open to test fixes for these two bugs.

Chien

Tung, Chien Tin

unread,
Apr 14, 2010, 3:30:02 PM4/14/10
to
>I work on NE020 cards from February 2010 for an INFN experimental
>proposal, called REDIGO (Read out at 10 Gbits/s), about the data
>acquisition and movement systems. The covergence of storage protocols
>around 10 Gigabits/s Ethernet protocols shows that one way could be the
>Remote Direct Memory Access (RDMA). The goals are the investigations of
>latency time, the throughput, the buffer size schemes and finally the
>global event building bandwidth.
>
>Do you know if NE020 cards have problems with librdma (RDMA procedures,
>in general) and / or with MPI versions?


If you run into any issues, please email me. NE020 supports Mvapich,
Mvapich2, OpenMPI, Intel MPI and HP MPI.

Chien

Amir Vadai

unread,
Apr 15, 2010, 2:30:02 AM4/15/10
to
I hope to have a fix next week for the first one.

Thanks,
Amir

Andrea Gozzelino

unread,
Apr 15, 2010, 3:10:01 AM4/15/10
to
On Apr 15, 2010 08:24 AM, Amir Vadai <am...@mellanox.co.il> wrote:

> I hope to have a fix next week for the first one.
>
> Thanks,
> Amir
>
> On 04/14/2010 09:48 PM, Tung, Chien Tin wrote:
> >> Tung, Chien Tin wrote:
> >>
> >>>> One more thing - Please open a bug regarding the num_sge
> >>>> limitation at:
> >>>> https://bugs.openfabrics.org/
> >>>>
> >>>>
> >>> Done, Bug 2027.
> >>>
> >>> Chien
> >>>
> >>>
> >> And 2028 opened to request fastreg support.
> >>
> >>
> >
> > I am open to test fixes for these two bugs.
> >
> > Chien
> >
> >
>

Hi Amir,
Hi Chien,

I understand that the bug 2027 could be solved next week, so I will test
SDP protocol performance on NE020 cards.
Is it correct?
If yes, could you point out the code modifies?

Keep in touch and take care.
Regards,
Andrea


Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro (LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.g...@lnl.infn.it

--

Amir Vadai

unread,
Apr 15, 2010, 4:40:02 AM4/15/10
to
It should be a simple fix and I plan to do soon - just add yourself as
CC in bugzilla - that way I won't forget to notify you.

- amir

Andrea Gozzelino

unread,
Apr 20, 2010, 10:00:04 AM4/20/10
to
Hi Amir,

have you any news about bugs 2027 "SDP not respecting # SGEs as reported
from HW" and 2028 "SDP should support fastreg mrs"?

When those bugs will be fixed, I will test the NE020 cards performance
with SDP protocol and I will compare SDP and TCP.

Keep in touch,

Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro (LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.g...@lnl.infn.it

Amir Vadai

unread,
Apr 21, 2010, 8:10:02 AM4/21/10
to
Hi Andrea,

I am preparing the fix right now.

- Amir

> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

Andrea Gozzelino

unread,
Apr 23, 2010, 10:40:02 AM4/23/10
to
Hi Amir,

have you any news about bugs 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea

Andrea Gozzelino

unread,
Apr 23, 2010, 10:40:03 AM4/23/10
to
Hi Amir,

have you any news about bug 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea


On Apr 21, 2010 02:01 PM, Amir Vadai <am...@mellanox.co.il> wrote:

Andrea Gozzelino

unread,
Apr 23, 2010, 10:40:02 AM4/23/10
to
Hi Amir,

have you any news about bugs 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea

On Apr 21, 2010 02:01 PM, Amir Vadai <am...@mellanox.co.il> wrote:


Andrea Gozzelino

--

Amir Vadai

unread,
Apr 25, 2010, 3:40:02 AM4/25/10
to
I have a fix for 2027 (number of SGE's issue) read.
I'm busy with other things - but hopefully will succeed testing it and
pushing it today.

As to 2028 (No FMR support) - I need pushed a fix to only disable ZCopy
when no FMR facility.
I don't have time right now to add support for fast memory registeration
when FMR is not available - but Steve said he'll help with implementing it.

- Amir

0 new messages