Discussion:
lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP
Steve Wise
2014-07-23 15:44:35 UTC
Permalink
Hello,

I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and connection setup
is failing at the server due to kiblnd_startup() calling rdma_listen() with a backlog of
0. This effectively rejects all incoming connection requests. I looked at lustre-1.8.7,
and the backlog was 256 in that release.

Q: Why was it changed to 0?

Thanks,

Steve.
Dilger, Andreas
2014-07-23 18:52:50 UTC
Permalink
Post by Steve Wise
Hello,
I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and connection setup
is failing at the server due to kiblnd_startup() calling rdma_listen() with a backlog of
0. This effectively rejects all incoming connection requests. I looked at lustre-1.8.7,
and the backlog was 256 in that release.
Q: Why was it changed to 0?
Since I'm not familiar with the LNET code myself, I'd recommend to check
the
commit messages in Git to see if there is an explanation, or in the linked
Jira/Bugzilla ticket.

You may also want to see if this is fixed with the 1.8.9 release.

Cheers, Andreas
--
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division
Steve Wise
2014-07-24 14:22:17 UTC
Permalink
Post by Dilger, Andreas
Post by Steve Wise
Hello,
I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and
connection setup
is failing at the server due to kiblnd_startup() calling rdma_listen() with a backlog of
0. This effectively rejects all incoming connection requests. I looked
at lustre-1.8.7,
and the backlog was 256 in that release.
Q: Why was it changed to 0?
Since I'm not familiar with the LNET code myself, I'd recommend to check
the
commit messages in Git to see if there is an explanation, or in the linked
Jira/Bugzilla ticket.
You may also want to see if this is fixed with the 1.8.9 release.
+ sean hefty
+ Isaac Huang

This commit changed the backlog to 0:

commit 7b442f1a43714455fad06c527b6fbc10f82af857
Author: Isaac Huang <he.h.huang-QHcLZuEGTsvQT0dZR+***@public.gmane.org>
Date: Wed Nov 17 07:14:46 2010 -0700

b=20153 add IB bonding failover support to o2iblnd

O2iblnd changes to support failover events from an IB
bonding IPoIB interface. Mostly to recreate device
specific resources, e.g. listener CMID.

i=isaac
i=liang

Bug: https://projectlava.xyratex.com/show_bug.cgi?id=20153

I'm not sure why it was changed to 0 though. It definitely breaks iwarp support. I'm not
yet sure what the semantics are for creating a listening cm_id with a backlog of 0. Was
the assumption that 0 means "let the system choose" or "max supported backlog"? The iwarp
CM interprets 0 to mean no connection requests allowed. :)

Isaac, can you explain?

Thanks,

Steve.
Hefty, Sean
2014-07-24 15:08:25 UTC
Permalink
Post by Steve Wise
I'm not sure why it was changed to 0 though. It definitely breaks iwarp support. I'm not
yet sure what the semantics are for creating a listening cm_id with a backlog of 0. Was
the assumption that 0 means "let the system choose" or "max supported backlog"? The iwarp
CM interprets 0 to mean no connection requests allowed. :)
0 should mean let the system choose. Interpreting 0 as no connections allowed doesn't really make sense, since the app can get that by not calling listen at all.
Steve Wise
2014-07-24 15:13:50 UTC
Permalink
Post by Hefty, Sean
Post by Steve Wise
I'm not sure why it was changed to 0 though. It definitely breaks iwarp
support. I'm not
yet sure what the semantics are for creating a listening cm_id with a
backlog of 0. Was
the assumption that 0 means "let the system choose" or "max supported
backlog"? The iwarp
CM interprets 0 to mean no connection requests allowed. :)
0 should mean let the system choose. Interpreting 0 as no connections allowed doesn't
really make sense, since the app can get that by not calling listen at all.
Ok then we can fix this in the iwcm. I'll post a patch to Linux-rdma soon.

Thanks,

Steve.

Loading...