[PATCH 00/12] NTB: Performance and bug fixes in ntb drivers

27 views
Skip to first unread message

Allen Hubbe

unread,
Jul 6, 2015, 11:30:59 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
This series includes the following changes to fix bugs, add new hardware
support, and improve usability and performance of the ntb drivers.

Bug Fixes (issues preexisting the new ntb api):
NTB: Fix ntb_transport out-of-order RX update
NTB: Schedule to receive on QP link up
NTB: Fix zero size or integer overflow in ntb_set_mw
NTB: ntb_netdev not covering all receive errors
NTB: Fix oops in debugfs when transport is half-up

Bug Fixes (regressions since the new ntb api):
NTB: Fix transport stats for multiple devices

Additions and Performance Improvements:
NTB: Remove dma_sync_wait from ntb_async_rx
NTB: Add PCI Device IDs for Broadwell Xeon
NTB: Add flow control to the ntb_netdev
NTB: Make the transport list in order of discovery
NTB: Clean up QP stats info
NTB: Use unique DMA channels for TX and RX


Allen Hubbe (4):
NTB: Fix ntb_transport out-of-order RX update
NTB: Schedule to receive on QP link up
NTB: Remove dma_sync_wait from ntb_async_rx
NTB: Fix zero size or integer overflow in ntb_set_mw

Dave Jiang (8):
NTB: Add flow control to the ntb_netdev
NTB: Fix transport stats for multiple devices
NTB: ntb_netdev not covering all receive errors
NTB: Fix oops in debugfs when transport is half-up
NTB: Add PCI Device IDs for Broadwell Xeon
NTB: Make the transport list in order of discovery
NTB: Clean up QP stats info
NTB: Use unique DMA channels for TX and RX

drivers/net/ntb_netdev.c | 89 +++++++++-
drivers/ntb/hw/intel/ntb_hw_intel.c | 15 ++
drivers/ntb/hw/intel/ntb_hw_intel.h | 3 +
drivers/ntb/ntb_transport.c | 322 ++++++++++++++++++++++++------------
include/linux/ntb_transport.h | 1 +
5 files changed, 320 insertions(+), 110 deletions(-)

--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:17 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

When the remote side is not up, we do not have all the context for the
transport, and that causes NULL ptr access. Have the debugfs reads check
to see if transport is up before we make access.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 7e2de6764b9f..73bb58bbcaec 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -439,13 +439,17 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
char *buf;
ssize_t ret, out_offset, out_count;

+ qp = filp->private_data;
+
+ if (!qp || !qp->link_is_up)
+ return 0;
+
out_count = 1000;

buf = kmalloc(out_count, GFP_KERNEL);
if (!buf)
return -ENOMEM;

- qp = filp->private_data;
out_offset = 0;
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"NTB QP stats\n");
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:17 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

ntb_netdev is allowing the link to come up even when -ENOMEM is returned
from ntb_transport_rx_enqueue. Fix to cover all possible errors.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/net/ntb_netdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index 84d094df1a63..e7072e544567 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -266,7 +266,7 @@ static int ntb_netdev_open(struct net_device *ndev)

rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
ndev->mtu + ETH_HLEN);
- if (rc == -EINVAL) {
+ if (rc) {
dev_kfree_skb(skb);
goto err;
}
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:19 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
A plain 32 bit integer will overflow for values over 4GiB.

Change the plain integer size to the appropriate size type in
ntb_set_mw. Change the type of the size parameter and two local
variables used for size.

Even if there is no overflow, a size of zero is invalid here.

Reported-by: Juyoung Jung <jj...@micron.com>
Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/ntb/ntb_transport.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index fbddeab13b4c..42ae5a122412 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -648,13 +648,16 @@ static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
}

static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
- unsigned int size)
+ resource_size_t size)
{
struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
struct pci_dev *pdev = nt->ndev->pdev;
- unsigned int xlat_size, buff_size;
+ size_t xlat_size, buff_size;
int rc;

+ if (!size)
+ return -EINVAL;
+
xlat_size = round_up(size, mw->xlat_align_size);
buff_size = round_up(size, mw->xlat_align);

@@ -674,7 +677,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
if (!mw->virt_addr) {
mw->xlat_size = 0;
mw->buff_size = 0;
- dev_err(&pdev->dev, "Unable to alloc MW buff of size %d\n",
+ dev_err(&pdev->dev, "Unable to alloc MW buff of size %lu\n",
buff_size);
return -ENOMEM;
}
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:20 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Right now if we push the NTB really hard, we start dropping packets due
to not able to process the packets fast enough. We need to stop the
upper layer from flooding us when that happens.

A timer is necessary in order to restart the queue once the resource has
been processed on the receive side. Due to the way NTB is setup, the
resources on the tx side are tied to the processing of the rx side and
there's no async way to know when the rx side has released those
resources.

The following module parameters are added:
tx_time: The time in usecs for us to wait before we check if we restart
transmit queue. (default: 1)
tx_start: The number of descriptors we wait for to free up before resuming
transmit queue. (default: 10)
tx_stop: The low water mark of free descriptors before we stop tx queue.
(default: 5)

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/net/ntb_netdev.c | 80 +++++++++++++++++++++++++++++++++++++++++++
drivers/ntb/ntb_transport.c | 18 +++++++++-
include/linux/ntb_transport.h | 1 +
3 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index 5f1ee7c05f68..84d094df1a63 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -61,11 +61,24 @@ MODULE_VERSION(NTB_NETDEV_VER);
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Intel Corporation");

+static unsigned int tx_time = 1;
+module_param(tx_time, uint, 0644);
+MODULE_PARM_DESC(tx_time, "Time in usecs for tx resource reaper");
+
+static unsigned int tx_start = 10;
+module_param(tx_start, uint, 0644);
+MODULE_PARM_DESC(tx_start, "Number of descriptors to free before resume tx");
+
+static unsigned int tx_stop = 5;
+module_param(tx_stop, uint, 0644);
+MODULE_PARM_DESC(tx_stop, "Number of descriptors before stop upper layer tx");
+
struct ntb_netdev {
struct list_head list;
struct pci_dev *pdev;
struct net_device *ndev;
struct ntb_transport_qp *qp;
+ struct timer_list tx_timer;
};

#define NTB_TX_TIMEOUT_MS 1000
@@ -136,11 +149,42 @@ enqueue_again:
}
}

+static int __ntb_netdev_maybe_stop_tx(struct net_device *netdev,
+ struct ntb_transport_qp *qp, int size)
+{
+ struct ntb_netdev *dev = netdev_priv(netdev);
+
+ netif_stop_queue(netdev);
+ /* Make sure to see the latest value of ntb_transport_tx_free_entry()
+ * since the queue was last started.
+ */
+ smp_mb();
+
+ if (likely(ntb_transport_tx_free_entry(qp) < size)) {
+ mod_timer(&dev->tx_timer, jiffies + usecs_to_jiffies(tx_time));
+ return -EBUSY;
+ }
+
+ netif_start_queue(netdev);
+ return 0;
+}
+
+static int ntb_netdev_maybe_stop_tx(struct net_device *ndev,
+ struct ntb_transport_qp *qp, int size)
+{
+ if (netif_queue_stopped(ndev) ||
+ (ntb_transport_tx_free_entry(qp) >= size))
+ return 0;
+
+ return __ntb_netdev_maybe_stop_tx(ndev, qp, size);
+}
+
static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
void *data, int len)
{
struct net_device *ndev = qp_data;
struct sk_buff *skb;
+ struct ntb_netdev *dev = netdev_priv(ndev);

skb = data;
if (!skb || !ndev)
@@ -155,6 +199,15 @@ static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
}

dev_kfree_skb(skb);
+
+ if (ntb_transport_tx_free_entry(dev->qp) >= tx_start) {
+ /* Make sure anybody stopping the queue after this sees the new
+ * value of ntb_transport_tx_free_entry()
+ */
+ smp_mb();
+ if (netif_queue_stopped(ndev))
+ netif_wake_queue(ndev);
+ }
}

static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
@@ -163,10 +216,15 @@ static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
struct ntb_netdev *dev = netdev_priv(ndev);
int rc;

+ ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+
rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
if (rc)
goto err;

+ /* check for next submit */
+ ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+
return NETDEV_TX_OK;

err:
@@ -175,6 +233,23 @@ err:
return NETDEV_TX_BUSY;
}

+static void ntb_netdev_tx_timer(unsigned long data)
+{
+ struct net_device *ndev = (struct net_device *)data;
+ struct ntb_netdev *dev = netdev_priv(ndev);
+
+ if (ntb_transport_tx_free_entry(dev->qp) < tx_stop) {
+ mod_timer(&dev->tx_timer, jiffies + msecs_to_jiffies(tx_time));
+ } else {
+ /* Make sure anybody stopping the queue after this sees the new
+ * value of ntb_transport_tx_free_entry()
+ */
+ smp_mb();
+ if (netif_queue_stopped(ndev))
+ netif_wake_queue(ndev);
+ }
+}
+
static int ntb_netdev_open(struct net_device *ndev)
{
struct ntb_netdev *dev = netdev_priv(ndev);
@@ -197,8 +272,11 @@ static int ntb_netdev_open(struct net_device *ndev)
}
}

+ setup_timer(&dev->tx_timer, ntb_netdev_tx_timer, (unsigned long)ndev);
+
netif_carrier_off(ndev);
ntb_transport_link_up(dev->qp);
+ netif_start_queue(ndev);

return 0;

@@ -219,6 +297,8 @@ static int ntb_netdev_close(struct net_device *ndev)
while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
dev_kfree_skb(skb);

+ del_timer_sync(&dev->tx_timer);
+
return 0;
}

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 98e58c765f2e..76a03d4070a9 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -488,6 +488,12 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
"tx_index - \t%u\n", qp->tx_index);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_max_entry - \t%u\n", qp->tx_max_entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "qp->remote_rx_info->entry - \t%u\n",
+ qp->remote_rx_info->entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "free tx - \t%u\n",
+ ntb_transport_tx_free_entry(qp));

out_offset += snprintf(buf + out_offset, out_count - out_offset,
"\nQP Link %s\n",
@@ -529,6 +535,7 @@ static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
}
entry = list_first_entry(list, struct ntb_queue_entry, entry);
list_del(&entry->entry);
+
out:
spin_unlock_irqrestore(lock, flags);

@@ -1827,7 +1834,7 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
if (!entry) {
qp->tx_err_no_buf++;
- return -ENOMEM;
+ return -EBUSY;
}

entry->cb_data = cb;
@@ -1953,6 +1960,15 @@ unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
}
EXPORT_SYMBOL_GPL(ntb_transport_max_size);

+unsigned int ntb_transport_tx_free_entry(struct ntb_transport_qp *qp)
+{
+ unsigned int head = qp->tx_index;
+ unsigned int tail = qp->remote_rx_info->entry;
+
+ return tail > head ? tail - head : qp->tx_max_entry + tail - head;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_tx_free_entry);
+
static void ntb_transport_doorbell_callback(void *data, int vector)
{
struct ntb_transport_ctx *nt = data;
diff --git a/include/linux/ntb_transport.h b/include/linux/ntb_transport.h
index 2862861366a5..7243eb98a722 100644
--- a/include/linux/ntb_transport.h
+++ b/include/linux/ntb_transport.h
@@ -83,3 +83,4 @@ void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len);
void ntb_transport_link_up(struct ntb_transport_qp *qp);
void ntb_transport_link_down(struct ntb_transport_qp *qp);
bool ntb_transport_link_query(struct ntb_transport_qp *qp);
+unsigned int ntb_transport_tx_free_entry(struct ntb_transport_qp *qp);
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:22 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
It was possible for a synchronous update of the RX index in the error
case to get ahead of the asynchronous RX index update in the normal
case. Change the RX processing to preserve an RX completion order.

There were two error cases. First, if a buffer is not present to
receive data, there would be no queue entry to preserve the RX
completion order. Instead of dropping the RX frame, leave the RX frame
in the ring. Schedule RX processing when RX entries are enqueued, in
case there are RX frames waiting in the ring to be received.

Second, if a buffer is too small to receive data, drop the frame in the
ring, mark the RX entry as done, and indicate the error in the RX entry
length. Check for a negative length in the receive callback in
ntb_netdev, and count occurrences as rx_length_errors.

Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/net/ntb_netdev.c | 7 ++
drivers/ntb/ntb_transport.c | 169 ++++++++++++++++++++++++++------------------
2 files changed, 107 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index 3cc316cb7e6b..5f1ee7c05f68 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -102,6 +102,12 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,

netdev_dbg(ndev, "%s: %d byte payload received\n", __func__, len);

+ if (len < 0) {
+ ndev->stats.rx_errors++;
+ ndev->stats.rx_length_errors++;
+ goto enqueue_again;
+ }
+
skb_put(skb, len);
skb->protocol = eth_type_trans(skb, ndev);
skb->ip_summed = CHECKSUM_NONE;
@@ -121,6 +127,7 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
return;
}

+enqueue_again:
rc = ntb_transport_rx_enqueue(qp, skb, skb->data, ndev->mtu + ETH_HLEN);
if (rc) {
dev_kfree_skb(skb);
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index efe3ad4122f2..98e58c765f2e 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -142,10 +142,11 @@ struct ntb_transport_qp {

void (*rx_handler)(struct ntb_transport_qp *qp, void *qp_data,
void *data, int len);
+ struct list_head rx_post_q;
struct list_head rx_pend_q;
struct list_head rx_free_q;
- spinlock_t ntb_rx_pend_q_lock;
- spinlock_t ntb_rx_free_q_lock;
+ /* ntb_rx_q_lock: synchronize access to rx_XXXX_q */
+ spinlock_t ntb_rx_q_lock;
void *rx_buff;
unsigned int rx_index;
unsigned int rx_max_entry;
@@ -534,6 +535,27 @@ out:
return entry;
}

+static struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock,
+ struct list_head *list,
+ struct list_head *to_list)
+{
+ struct ntb_queue_entry *entry;
+ unsigned long flags;
+
+ spin_lock_irqsave(lock, flags);
+
+ if (list_empty(list)) {
+ entry = NULL;
+ } else {
+ entry = list_first_entry(list, struct ntb_queue_entry, entry);
+ list_move_tail(&entry->entry, to_list);
+ }
+
+ spin_unlock_irqrestore(lock, flags);
+
+ return entry;
+}
+
static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
unsigned int qp_num)
{
@@ -941,10 +963,10 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
INIT_DELAYED_WORK(&qp->link_work, ntb_qp_link_work);
INIT_WORK(&qp->link_cleanup, ntb_qp_link_cleanup_work);

- spin_lock_init(&qp->ntb_rx_pend_q_lock);
- spin_lock_init(&qp->ntb_rx_free_q_lock);
+ spin_lock_init(&qp->ntb_rx_q_lock);
spin_lock_init(&qp->ntb_tx_free_q_lock);

+ INIT_LIST_HEAD(&qp->rx_post_q);
INIT_LIST_HEAD(&qp->rx_pend_q);
INIT_LIST_HEAD(&qp->rx_free_q);
INIT_LIST_HEAD(&qp->tx_free_q);
@@ -1107,22 +1129,47 @@ static void ntb_transport_free(struct ntb_client *self, struct ntb_dev *ndev)
kfree(nt);
}

-static void ntb_rx_copy_callback(void *data)
+static void ntb_complete_rxc(struct ntb_transport_qp *qp)
{
- struct ntb_queue_entry *entry = data;
- struct ntb_transport_qp *qp = entry->qp;
- void *cb_data = entry->cb_data;
- unsigned int len = entry->len;
- struct ntb_payload_header *hdr = entry->rx_hdr;
+ struct ntb_queue_entry *entry;
+ void *cb_data;
+ unsigned int len;
+ unsigned long irqflags;
+
+ spin_lock_irqsave(&qp->ntb_rx_q_lock, irqflags);
+
+ while (!list_empty(&qp->rx_post_q)) {
+ entry = list_first_entry(&qp->rx_post_q,
+ struct ntb_queue_entry, entry);
+ if (!(entry->flags & DESC_DONE_FLAG))
+ break;
+
+ entry->rx_hdr->flags = 0;
+ iowrite32(entry->index, &qp->rx_info->entry);
+
+ cb_data = entry->cb_data;
+ len = entry->len;
+
+ list_move_tail(&entry->entry, &qp->rx_free_q);

- hdr->flags = 0;
+ spin_unlock_irqrestore(&qp->ntb_rx_q_lock, irqflags);

- iowrite32(entry->index, &qp->rx_info->entry);
+ if (qp->rx_handler && qp->client_ready)
+ qp->rx_handler(qp, qp->cb_data, cb_data, len);

- ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry, &qp->rx_free_q);
+ spin_lock_irqsave(&qp->ntb_rx_q_lock, irqflags);
+ }

- if (qp->rx_handler && qp->client_ready)
- qp->rx_handler(qp, qp->cb_data, cb_data, len);
+ spin_unlock_irqrestore(&qp->ntb_rx_q_lock, irqflags);
+}
+
+static void ntb_rx_copy_callback(void *data)
+{
+ struct ntb_queue_entry *entry = data;
+
+ entry->flags |= DESC_DONE_FLAG;
+
+ ntb_complete_rxc(entry->qp);
}

static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void *offset)
@@ -1138,19 +1185,18 @@ static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void *offset)
ntb_rx_copy_callback(entry);
}

-static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset,
- size_t len)
+static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset)
{
struct dma_async_tx_descriptor *txd;
struct ntb_transport_qp *qp = entry->qp;
struct dma_chan *chan = qp->dma_chan;
struct dma_device *device;
- size_t pay_off, buff_off;
+ size_t pay_off, buff_off, len;
struct dmaengine_unmap_data *unmap;
dma_cookie_t cookie;
void *buf = entry->buf;

- entry->len = len;
+ len = entry->len;

if (!chan)
goto err;
@@ -1226,7 +1272,6 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
struct ntb_payload_header *hdr;
struct ntb_queue_entry *entry;
void *offset;
- int rc;

offset = qp->rx_buff + qp->rx_max_frame * qp->rx_index;
hdr = offset + qp->rx_max_frame - sizeof(struct ntb_payload_header);
@@ -1255,65 +1300,43 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
return -EIO;
}

- entry = ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q);
+ entry = ntb_list_mv(&qp->ntb_rx_q_lock, &qp->rx_pend_q, &qp->rx_post_q);
if (!entry) {
dev_dbg(&qp->ndev->pdev->dev, "no receive buffer\n");
qp->rx_err_no_buf++;
-
- rc = -ENOMEM;
- goto err;
+ return -EAGAIN;
}

+ entry->rx_hdr = hdr;
+ entry->index = qp->rx_index;
+
if (hdr->len > entry->len) {
dev_dbg(&qp->ndev->pdev->dev,
"receive buffer overflow! Wanted %d got %d\n",
hdr->len, entry->len);
qp->rx_err_oflow++;

- rc = -EIO;
- goto err;
- }
+ entry->len = -EIO;
+ entry->flags |= DESC_DONE_FLAG;

- dev_dbg(&qp->ndev->pdev->dev,
- "RX OK index %u ver %u size %d into buf size %d\n",
- qp->rx_index, hdr->ver, hdr->len, entry->len);
+ ntb_complete_rxc(qp);
+ } else {
+ dev_dbg(&qp->ndev->pdev->dev,
+ "RX OK index %u ver %u size %d into buf size %d\n",
+ qp->rx_index, hdr->ver, hdr->len, entry->len);

- qp->rx_bytes += hdr->len;
- qp->rx_pkts++;
+ qp->rx_bytes += hdr->len;
+ qp->rx_pkts++;

- entry->index = qp->rx_index;
- entry->rx_hdr = hdr;
+ entry->len = hdr->len;

- ntb_async_rx(entry, offset, hdr->len);
+ ntb_async_rx(entry, offset);
+ }

qp->rx_index++;
qp->rx_index %= qp->rx_max_entry;

return 0;
-
-err:
- /* FIXME: if this syncrhonous update of the rx_index gets ahead of
- * asyncrhonous ntb_rx_copy_callback of previous entry, there are three
- * scenarios:
- *
- * 1) The peer might miss this update, but observe the update
- * from the memcpy completion callback. In this case, the buffer will
- * not be freed on the peer to be reused for a different packet. The
- * successful rx of a later packet would clear the condition, but the
- * condition could persist if several rx fail in a row.
- *
- * 2) The peer may observe this update before the asyncrhonous copy of
- * prior packets is completed. The peer may overwrite the buffers of
- * the prior packets before they are copied.
- *
- * 3) Both: the peer may observe the update, and then observe the index
- * decrement by the asynchronous completion callback. Who knows what
- * badness that will cause.
- */
- hdr->flags = 0;
- iowrite32(qp->rx_index, &qp->rx_info->entry);
-
- return rc;
}

static void ntb_transport_rxc_db(unsigned long data)
@@ -1333,7 +1356,7 @@ static void ntb_transport_rxc_db(unsigned long data)
break;
}

- if (qp->dma_chan)
+ if (i && qp->dma_chan)
dma_async_issue_pending(qp->dma_chan);

if (i == qp->rx_max_entry) {
@@ -1609,7 +1632,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
goto err1;

entry->qp = qp;
- ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
&qp->rx_free_q);
}

@@ -1634,7 +1657,7 @@ err2:
while ((entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
kfree(entry);
err1:
- while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+ while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
kfree(entry);
if (qp->dma_chan)
dma_release_channel(qp->dma_chan);
@@ -1689,11 +1712,16 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
qp->tx_handler = NULL;
qp->event_handler = NULL;

- while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+ while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
kfree(entry);

- while ((entry = ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q))) {
- dev_warn(&pdev->dev, "Freeing item from a non-empty queue\n");
+ while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_pend_q))) {
+ dev_warn(&pdev->dev, "Freeing item from non-empty rx_pend_q\n");
+ kfree(entry);
+ }
+
+ while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_post_q))) {
+ dev_warn(&pdev->dev, "Freeing item from non-empty rx_post_q\n");
kfree(entry);
}

@@ -1724,14 +1752,14 @@ void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len)
if (!qp || qp->client_ready)
return NULL;

- entry = ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q);
+ entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_pend_q);
if (!entry)
return NULL;

buf = entry->cb_data;
*len = entry->len;

- ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry, &qp->rx_free_q);
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);

return buf;
}
@@ -1757,15 +1785,18 @@ int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
if (!qp)
return -EINVAL;

- entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q);
+ entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q);
if (!entry)
return -ENOMEM;

entry->cb_data = cb;
entry->buf = data;
entry->len = len;
+ entry->flags = 0;
+
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);

- ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry, &qp->rx_pend_q);
+ tasklet_schedule(&qp->rxc_db_work);

return 0;
}
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:22 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Currently the debugfs does not have files for all NTB transport queue
pairs. When there are multiple NTBs present in a system, the QP names
of the last transport clobber the names of previously added transport
QPs. Only the last added QPs can be observed via debugfs.

Create a directory per NTB transport to associate the QPs with that
transport. Name the directory the same as the PCI device.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 76a03d4070a9..7e2de6764b9f 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -212,6 +212,8 @@ struct ntb_transport_ctx {
bool link_is_up;
struct delayed_work link_work;
struct work_struct link_cleanup;
+
+ struct dentry *debugfs_node_dir;
};

enum {
@@ -952,12 +954,12 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
qp->tx_max_frame = min(transport_mtu, tx_size / 2);
qp->tx_max_entry = tx_size / qp->tx_max_frame;

- if (nt_debugfs_dir) {
+ if (nt->debugfs_node_dir) {
char debugfs_name[4];

snprintf(debugfs_name, 4, "qp%d", qp_num);
qp->debugfs_dir = debugfs_create_dir(debugfs_name,
- nt_debugfs_dir);
+ nt->debugfs_node_dir);

qp->debugfs_stats = debugfs_create_file("stats", S_IRUSR,
qp->debugfs_dir, qp,
@@ -1060,6 +1062,12 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
goto err2;
}

+ if (nt_debugfs_dir) {
+ nt->debugfs_node_dir =
+ debugfs_create_dir(pci_name(ndev->pdev),
+ nt_debugfs_dir);
+ }
+
for (i = 0; i < qp_count; i++) {
rc = ntb_transport_init_queue(nt, i);
if (rc)
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:22 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Allocate two DMA channels, one for TX operation and one for RX
operation, instead of having one DMA channel for everything. This
provides slightly better performance, and also will make error handling
cleaner later on.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 77 ++++++++++++++++++++++++++++++++++-----------
1 file changed, 58 insertions(+), 19 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index dd4e3ab824aa..fbddeab13b4c 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -119,7 +119,8 @@ struct ntb_transport_qp {
struct ntb_transport_ctx *transport;
struct ntb_dev *ndev;
void *cb_data;
- struct dma_chan *dma_chan;
+ struct dma_chan *tx_dma_chan;
+ struct dma_chan *rx_dma_chan;

bool client_ready;
bool link_is_up;
@@ -504,7 +505,11 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"\n");
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "Using DMA - \t%s\n", use_dma ? "Yes" : "No");
+ "Using TX DMA - \t%s\n",
+ qp->tx_dma_chan ? "Yes" : "No");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "Using RX DMA - \t%s\n",
+ qp->rx_dma_chan ? "Yes" : "No");
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"QP Link - \t%s\n",
qp->link_is_up ? "Up" : "Down");
@@ -1217,7 +1222,7 @@ static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset)
{
struct dma_async_tx_descriptor *txd;
struct ntb_transport_qp *qp = entry->qp;
- struct dma_chan *chan = qp->dma_chan;
+ struct dma_chan *chan = qp->rx_dma_chan;
struct dma_device *device;
size_t pay_off, buff_off, len;
struct dmaengine_unmap_data *unmap;
@@ -1378,8 +1383,8 @@ static void ntb_transport_rxc_db(unsigned long data)
break;
}

- if (i && qp->dma_chan)
- dma_async_issue_pending(qp->dma_chan);
+ if (i && qp->rx_dma_chan)
+ dma_async_issue_pending(qp->rx_dma_chan);

if (i == qp->rx_max_entry) {
/* there is more work to do */
@@ -1446,7 +1451,7 @@ static void ntb_async_tx(struct ntb_transport_qp *qp,
{
struct ntb_payload_header __iomem *hdr;
struct dma_async_tx_descriptor *txd;
- struct dma_chan *chan = qp->dma_chan;
+ struct dma_chan *chan = qp->tx_dma_chan;
struct dma_device *device;
size_t dest_off, buff_off;
struct dmaengine_unmap_data *unmap;
@@ -1639,14 +1644,27 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
dma_cap_set(DMA_MEMCPY, dma_mask);

if (use_dma) {
- qp->dma_chan = dma_request_channel(dma_mask, ntb_dma_filter_fn,
- (void *)(unsigned long)node);
- if (!qp->dma_chan)
- dev_info(&pdev->dev, "Unable to allocate DMA channel\n");
+ qp->tx_dma_chan =
+ dma_request_channel(dma_mask, ntb_dma_filter_fn,
+ (void *)(unsigned long)node);
+ if (!qp->tx_dma_chan)
+ dev_info(&pdev->dev, "Unable to allocate TX DMA channel\n");
+
+ qp->rx_dma_chan =
+ dma_request_channel(dma_mask, ntb_dma_filter_fn,
+ (void *)(unsigned long)node);
+ if (!qp->rx_dma_chan)
+ dev_info(&pdev->dev, "Unable to allocate RX DMA channel\n");
} else {
- qp->dma_chan = NULL;
+ qp->tx_dma_chan = NULL;
+ qp->rx_dma_chan = NULL;
}
- dev_dbg(&pdev->dev, "Using %s memcpy\n", qp->dma_chan ? "DMA" : "CPU");
+
+ dev_dbg(&pdev->dev, "Using %s memcpy for TX\n",
+ qp->tx_dma_chan ? "DMA" : "CPU");
+
+ dev_dbg(&pdev->dev, "Using %s memcpy for RX\n",
+ qp->rx_dma_chan ? "DMA" : "CPU");

for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
entry = kzalloc_node(sizeof(*entry), GFP_ATOMIC, node);
@@ -1681,8 +1699,10 @@ err2:
err1:
while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
kfree(entry);
- if (qp->dma_chan)
- dma_release_channel(qp->dma_chan);
+ if (qp->tx_dma_chan)
+ dma_release_channel(qp->tx_dma_chan);
+ if (qp->rx_dma_chan)
+ dma_release_channel(qp->rx_dma_chan);
nt->qp_bitmap_free |= qp_bit;
err:
return NULL;
@@ -1707,12 +1727,27 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)

pdev = qp->ndev->pdev;

- if (qp->dma_chan) {
- struct dma_chan *chan = qp->dma_chan;
+ if (qp->tx_dma_chan) {
+ struct dma_chan *chan = qp->tx_dma_chan;
+ /* Putting the dma_chan to NULL will force any new traffic to be
+ * processed by the CPU instead of the DAM engine
+ */
+ qp->tx_dma_chan = NULL;
+
+ /* Try to be nice and wait for any queued DMA engine
+ * transactions to process before smashing it with a rock
+ */
+ dma_sync_wait(chan, qp->last_cookie);
+ dmaengine_terminate_all(chan);
+ dma_release_channel(chan);
+ }
+
+ if (qp->rx_dma_chan) {
+ struct dma_chan *chan = qp->rx_dma_chan;
/* Putting the dma_chan to NULL will force any new traffic to be
* processed by the CPU instead of the DAM engine
*/
- qp->dma_chan = NULL;
+ qp->rx_dma_chan = NULL;

/* Try to be nice and wait for any queued DMA engine
* transactions to process before smashing it with a rock
@@ -1960,16 +1995,20 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
{
unsigned int max;
+ unsigned int copy_align;

if (!qp)
return 0;

- if (!qp->dma_chan)
+ if (!qp->tx_dma_chan && !qp->rx_dma_chan)
return qp->tx_max_frame - sizeof(struct ntb_payload_header);

+ copy_align = max(qp->tx_dma_chan->device->copy_align,
+ qp->rx_dma_chan->device->copy_align);
+
/* If DMA engine usage is possible, try to find the max size for that */
max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
- max -= max % (1 << qp->dma_chan->device->copy_align);
+ max -= max % (1 << copy_align);

return max;
}
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:25 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
Schedule to receive on QP link up, to make sure that the doorbell is
properly cleared for interrupts.

Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/ntb/ntb_transport.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 73bb58bbcaec..4bba3c4a4a3a 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -902,6 +902,8 @@ static void ntb_qp_link_work(struct work_struct *work)

if (qp->event_handler)
qp->event_handler(qp->cb_data, qp->link_is_up);
+
+ tasklet_schedule(&qp->rxc_db_work);
} else if (nt->link_is_up)
schedule_delayed_work(&qp->link_work,
msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:27 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Adding PCI Device IDs for B2B (back to back), RP (root port, primary),
and TB (transparent bridge, secondary) devices.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/hw/intel/ntb_hw_intel.c | 15 +++++++++++++++
drivers/ntb/hw/intel/ntb_hw_intel.h | 3 +++
2 files changed, 18 insertions(+)

diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.c b/drivers/ntb/hw/intel/ntb_hw_intel.c
index 87751cfd6f4f..c2bc56b67e63 100644
--- a/drivers/ntb/hw/intel/ntb_hw_intel.c
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.c
@@ -190,14 +190,17 @@ static inline int pdev_is_xeon(struct pci_dev *pdev)
case PCI_DEVICE_ID_INTEL_NTB_SS_SNB:
case PCI_DEVICE_ID_INTEL_NTB_SS_IVT:
case PCI_DEVICE_ID_INTEL_NTB_SS_HSX:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_BDX:
case PCI_DEVICE_ID_INTEL_NTB_PS_JSF:
case PCI_DEVICE_ID_INTEL_NTB_PS_SNB:
case PCI_DEVICE_ID_INTEL_NTB_PS_IVT:
case PCI_DEVICE_ID_INTEL_NTB_PS_HSX:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_BDX:
case PCI_DEVICE_ID_INTEL_NTB_B2B_JSF:
case PCI_DEVICE_ID_INTEL_NTB_B2B_SNB:
case PCI_DEVICE_ID_INTEL_NTB_B2B_IVT:
case PCI_DEVICE_ID_INTEL_NTB_B2B_HSX:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_BDX:
return 1;
}
return 0;
@@ -1843,6 +1846,9 @@ static int xeon_init_dev(struct intel_ntb_dev *ndev)
case PCI_DEVICE_ID_INTEL_NTB_SS_HSX:
case PCI_DEVICE_ID_INTEL_NTB_PS_HSX:
case PCI_DEVICE_ID_INTEL_NTB_B2B_HSX:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_BDX:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_BDX:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_BDX:
ndev->hwerr_flags |= NTB_HWERR_SDOORBELL_LOCKUP;
break;
}
@@ -1857,6 +1863,9 @@ static int xeon_init_dev(struct intel_ntb_dev *ndev)
case PCI_DEVICE_ID_INTEL_NTB_SS_HSX:
case PCI_DEVICE_ID_INTEL_NTB_PS_HSX:
case PCI_DEVICE_ID_INTEL_NTB_B2B_HSX:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_BDX:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_BDX:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_BDX:
ndev->hwerr_flags |= NTB_HWERR_SB01BASE_LOCKUP;
break;
}
@@ -1878,6 +1887,9 @@ static int xeon_init_dev(struct intel_ntb_dev *ndev)
case PCI_DEVICE_ID_INTEL_NTB_SS_HSX:
case PCI_DEVICE_ID_INTEL_NTB_PS_HSX:
case PCI_DEVICE_ID_INTEL_NTB_B2B_HSX:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_BDX:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_BDX:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_BDX:
ndev->hwerr_flags |= NTB_HWERR_B2BDOORBELL_BIT14;
break;
}
@@ -2234,14 +2246,17 @@ static const struct pci_device_id intel_ntb_pci_tbl[] = {
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_SNB)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_IVT)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_HSX)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BDX)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_JSF)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_SNB)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_IVT)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_HSX)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_BDX)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_JSF)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_SNB)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_IVT)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_HSX)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_BDX)},
{0}
};
MODULE_DEVICE_TABLE(pci, intel_ntb_pci_tbl);
diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.h b/drivers/ntb/hw/intel/ntb_hw_intel.h
index 7ddaf387b679..ea0612f797df 100644
--- a/drivers/ntb/hw/intel/ntb_hw_intel.h
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.h
@@ -67,6 +67,9 @@
#define PCI_DEVICE_ID_INTEL_NTB_PS_HSX 0x2F0E
#define PCI_DEVICE_ID_INTEL_NTB_SS_HSX 0x2F0F
#define PCI_DEVICE_ID_INTEL_NTB_B2B_BWD 0x0C4E
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_BDX 0x6F0D
+#define PCI_DEVICE_ID_INTEL_NTB_PS_BDX 0x6F0E
+#define PCI_DEVICE_ID_INTEL_NTB_SS_BDX 0x6F0F

/* Intel Xeon hardware */

--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:27 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

The list should be added from the bottom and not the top in order to
ensure the transport is provided in the same order to clients as ntb
devices are discovered.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 4bba3c4a4a3a..f92239141674 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -297,7 +297,7 @@ static LIST_HEAD(ntb_transport_list);

static int ntb_bus_init(struct ntb_transport_ctx *nt)
{
- list_add(&nt->entry, &ntb_transport_list);
+ list_add_tail(&nt->entry, &ntb_transport_list);
return 0;
}

--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:28 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Make QP stats info more readable for debugging purposes. Also add an
entry to indicate whether DMA is being used.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index f92239141674..98ecf6d50621 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -452,7 +452,7 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,

out_offset = 0;
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "NTB QP stats\n");
+ "\nNTB QP stats:\n\n");
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_bytes - \t%llu\n", qp->rx_bytes);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
@@ -470,11 +470,11 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_err_ver - \t%llu\n", qp->rx_err_ver);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "rx_buff - \t%p\n", qp->rx_buff);
+ "rx_buff - \t0x%p\n", qp->rx_buff);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_index - \t%u\n", qp->rx_index);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "rx_max_entry - \t%u\n", qp->rx_max_entry);
+ "rx_max_entry - \t%u\n\n", qp->rx_max_entry);

out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_bytes - \t%llu\n", qp->tx_bytes);
@@ -489,21 +489,28 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "tx_mw - \t%p\n", qp->tx_mw);
+ "tx_mw - \t0x%p\n", qp->tx_mw);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "tx_index - \t%u\n", qp->tx_index);
+ "tx_index (H) - \t%u\n", qp->tx_index);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "tx_max_entry - \t%u\n", qp->tx_max_entry);
- out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "qp->remote_rx_info->entry - \t%u\n",
+ "RRI (T) - \t%u\n",
qp->remote_rx_info->entry);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_max_entry - \t%u\n", qp->tx_max_entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
"free tx - \t%u\n",
ntb_transport_tx_free_entry(qp));

out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "\nQP Link %s\n",
+ "\n");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "Using DMA - \t%s\n", use_dma ? "Yes" : "No");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "QP Link - \t%s\n",
qp->link_is_up ? "Up" : "Down");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "\n");
+
if (out_offset > out_count)
out_offset = out_count;

--
2.4.0.rc0.43.gcf8a8c6

Allen Hubbe

unread,
Jul 6, 2015, 11:31:30 AM7/6/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
The dma_sync_wait can hurt the performance of workloads mixed with both
large and small frames. Large frames will be copied using the dma
engine. Small frames will be copied by the cpu. The dma_sync_wait
prevents the cpu and dma engine copying in parallel.

In the period where the cpu is copying, the dma engine is stopped. The
dma engine is not doing any useful work to copy large frames during that
time, and the additional time to restart the dma engine for the next
large frame. This will decrease the throughput for the portion of a
workload with large frames.

In the period where the dma engine is copying, the cpu is held up
waiting for dma to complete. The small frames processing will be
delayed until the dma is complete. The RX frames are completed
in-order, and the processing of small frames takes very little time, so
dma_sync_wait may have an insignificant impact on the respose time of
frames. The more significant impact is to the system, because the delay
in dma_sync_wait is implemented as busy non-blocking wait. This can
prevent the delayed core from doing any useful work, even if it could be
processing work for other drivers, unrelated to transport RX processing.

After applying the earlier patch to fix out-of-order RX acknoledgement,
the dma_sync_wait is no longer necessary. Remove it, so that cpu memcpy
will proceed immediately for small frames, in parallel with ongoing dma
for large frames. Do not hold up the cpu from doing work while dma is
in progress. The prior fix will continue to ensure in-order completion
of the RX frames to the upper layer, and in-order delivery of the RX
acknoledgement.

Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/ntb/ntb_transport.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 98ecf6d50621..dd4e3ab824aa 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1230,18 +1230,18 @@ static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset)
goto err;

if (len < copy_bytes)
- goto err_wait;
+ goto err;

device = chan->device;
pay_off = (size_t)offset & ~PAGE_MASK;
buff_off = (size_t)buf & ~PAGE_MASK;

if (!is_dma_copy_aligned(device, pay_off, buff_off, len))
- goto err_wait;
+ goto err;

unmap = dmaengine_get_unmap_data(device->dev, 2, GFP_NOWAIT);
if (!unmap)
- goto err_wait;
+ goto err;

unmap->len = len;
unmap->addr[0] = dma_map_page(device->dev, virt_to_page(offset),
@@ -1284,12 +1284,6 @@ err_set_unmap:
dmaengine_unmap_put(unmap);
err_get_unmap:
dmaengine_unmap_put(unmap);
-err_wait:
- /* If the callbacks come out of order, the writing of the index to the
- * last completed will be out of order. This may result in the
- * receive stalling forever.
- */
- dma_sync_wait(chan, qp->last_cookie);
err:
ntb_memcpy_rx(entry, offset);
qp->rx_memcpy++;
--
2.4.0.rc0.43.gcf8a8c6

Jon Mason

unread,
Jul 12, 2015, 2:28:37 PM7/12/15
to Allen Hubbe, linu...@googlegroups.com, Dave Jiang
Do you really see these being used outside of development for
finetuning performance? If not, the let's remove these and just mod
them by hand when perf tuning. There is no use adding knobs that no
one in the field will ever use (assuming I am correct, and no one will
ever change these from the defaults).

Aside that that, it looks fine.
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-ntb+...@googlegroups.com.
> To post to this group, send email to linu...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/0bccd8f9ce04c4d21f4c78b5ff4350df8831d284.1436196351.git.Allen.Hubbe%40emc.com.
> For more options, visit https://groups.google.com/d/optout.

Jon Mason

unread,
Jul 12, 2015, 2:31:30 PM7/12/15
to Allen Hubbe, linu...@googlegroups.com, Dave Jiang
On Mon, Jul 6, 2015 at 11:30 AM, Allen Hubbe <Allen...@emc.com> wrote:
> This series includes the following changes to fix bugs, add new hardware
> support, and improve usability and performance of the ntb drivers.
>
> Bug Fixes (issues preexisting the new ntb api):
> NTB: Fix ntb_transport out-of-order RX update
> NTB: Schedule to receive on QP link up
> NTB: Fix zero size or integer overflow in ntb_set_mw
> NTB: ntb_netdev not covering all receive errors
> NTB: Fix oops in debugfs when transport is half-up
>
> Bug Fixes (regressions since the new ntb api):
> NTB: Fix transport stats for multiple devices
>
> Additions and Performance Improvements:
> NTB: Remove dma_sync_wait from ntb_async_rx
> NTB: Add PCI Device IDs for Broadwell Xeon
> NTB: Add flow control to the ntb_netdev
> NTB: Make the transport list in order of discovery
> NTB: Clean up QP stats info
> NTB: Use unique DMA channels for TX and RX
>

Ignoring the outstanding quesiton on patch 2, these patches look acceptable.

For clarity, patches 1, 2, and 4 are bug fixes and can be sent to
Linus now. Patches 3, 5, 6, 9 and 10 look to be features, and should
hit the next merge window. How would you like patches 7, 8, and/or 12
handled?

Thanks,
Jon

Hubbe, Allen

unread,
Jul 13, 2015, 10:21:09 AM7/13/15
to Jon Mason, linu...@googlegroups.com, Dave Jiang
From: Jon Mason
> > +static unsigned int tx_time = 1;
> > +module_param(tx_time, uint, 0644);
> > +MODULE_PARM_DESC(tx_time, "Time in usecs for tx resource reaper");
> > +
> > +static unsigned int tx_start = 10;
> > +module_param(tx_start, uint, 0644);
> > +MODULE_PARM_DESC(tx_start, "Number of descriptors to free before
> resume tx");
> > +
> > +static unsigned int tx_stop = 5;
> > +module_param(tx_stop, uint, 0644);
> > +MODULE_PARM_DESC(tx_stop, "Number of descriptors before stop upper
> layer tx");
>
> Do you really see these being used outside of development for
> finetuning performance? If not, the let's remove these and just mod
> them by hand when perf tuning. There is no use adding knobs that no
> one in the field will ever use (assuming I am correct, and no one will
> ever change these from the defaults).

I think you are right that the parameters will never be changed away from their defaults, and it would be easy enough to add the parameters back if it is requested. I would be ok with removing them, contrary to my request to add them.

I asked to add the parameters, thinking that a different tuning might be desirable depending on the workload and link characteristics. A high frame rate will recoup credits faster than a slow frame rate. The frame rate depends on the frame size (eg, transactional vs throughput workload), and the speed of the link (eg, PCIe generation and width). I think we could observe an order of magnitude difference in frame rate with different workloads on different links.

To determine the default setting, I suggested that we run a small frame workload, and tune for the longest delay that does not affect response time, and then validate the setting by running a large frame workload. If the extra timer expirations don't impact cpu utilization too much with the large frame workload, then the parameters might be entirely unnecessary. I should have also requested to cut the link width in half for the large frame workload, so we could observe a greater effect.

Hubbe, Allen

unread,
Jul 13, 2015, 11:09:00 AM7/13/15
to Jon Mason, linu...@googlegroups.com, Dave Jiang
I rearranged patches in github into bug fixes and features.

Bug fixes are in ntb-abh.
https://github.com/allenbh/linux/commits/ntb-abh

Features are in ntb-abh-next, after the bug fixes, and the patch to add the email list to MAINTAINERS.
https://github.com/allenbh/linux/commits/ntb-abh-next

> Ignoring the outstanding quesiton on patch 2, these patches look
> acceptable.
>
> For clarity, patches 1, 2, and 4 are bug fixes and can be sent to

Patch 2 is _not_ a bug fix.

> Linus now. Patches 3, 5, 6, 9 and 10 look to be features, and should

Patch 3 fixes a regression - affecting debugfs users only.

Patch 5 is a bug fix.

Patch 6 is perhaps a bug fix, though we have not seen any breakage. I suspected this could be the reason for a link not coming up, but there was a different cause for that issue.

Patch 9 is a feature.

Patch 10 is a feature.

> hit the next merge window. How would you like patches 7, 8, and/or 12
> handled?

Patch 7 is a feature.

Patch 8 is a feature; it could be argued a bug fix because it fixes a usability issue.

Patch 12 is a bug fix, and it fixes an issue that we have observed in the wild. The bug is not a regression.

Dave Jiang

unread,
Jul 13, 2015, 12:12:42 PM7/13/15
to Allen...@emc.com, linu...@googlegroups.com, jdm...@kudzu.us
Right now if we push the NTB really hard, we start dropping packets due
to not able to process the packets fast enough. We need to stop the
upper layer from flooding us when that happens.

A timer is necessary in order to restart the queue once the resource has
been processed on the receive side. Due to the way NTB is setup, the
resources on the tx side are tied to the processing of the rx side and
there's no async way to know when the rx side has released those
resources.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/net/ntb_netdev.c | 77 +++++++++++++++++++++++++++++++++++++++++
drivers/ntb/ntb_transport.c | 18 +++++++++-
include/linux/ntb_transport.h | 1 +
3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index d8757bf..a9acf71 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -61,11 +61,21 @@ MODULE_VERSION(NTB_NETDEV_VER);
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Intel Corporation");

+/* Time in usecs for tx resource reaper */
+static unsigned int tx_time = 1;
+
+/* Number of descriptors to free before resuming tx */
+static unsigned int tx_start = 10;
+
+/* Number of descriptors still available before stop upper layer tx */
+static unsigned int tx_stop = 5;
+
struct ntb_netdev {
struct list_head list;
struct pci_dev *pdev;
struct net_device *ndev;
struct ntb_transport_qp *qp;
+ struct timer_list tx_timer;
};

#define NTB_TX_TIMEOUT_MS 1000
@@ -136,11 +146,42 @@ enqueue_again:
@@ -155,6 +196,15 @@ static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
}

dev_kfree_skb(skb);
+
+ if (ntb_transport_tx_free_entry(dev->qp) >= tx_start) {
+ /* Make sure anybody stopping the queue after this sees the new
+ * value of ntb_transport_tx_free_entry()
+ */
+ smp_mb();
+ if (netif_queue_stopped(ndev))
+ netif_wake_queue(ndev);
+ }
}

static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
@@ -163,10 +213,15 @@ static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
struct ntb_netdev *dev = netdev_priv(ndev);
int rc;

+ ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+
rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
if (rc)
goto err;

+ /* check for next submit */
+ ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+
return NETDEV_TX_OK;

err:
@@ -175,6 +230,23 @@ err:
return NETDEV_TX_BUSY;
}

+static void ntb_netdev_tx_timer(unsigned long data)
+{
+ struct net_device *ndev = (struct net_device *)data;
+ struct ntb_netdev *dev = netdev_priv(ndev);
+
+ if (ntb_transport_tx_free_entry(dev->qp) < tx_stop) {
+ mod_timer(&dev->tx_timer, jiffies + msecs_to_jiffies(tx_time));
+ } else {
+ /* Make sure anybody stopping the queue after this sees the new
+ * value of ntb_transport_tx_free_entry()
+ */
+ smp_mb();
+ if (netif_queue_stopped(ndev))
+ netif_wake_queue(ndev);
+ }
+}
+
static int ntb_netdev_open(struct net_device *ndev)
{
struct ntb_netdev *dev = netdev_priv(ndev);
@@ -197,8 +269,11 @@ static int ntb_netdev_open(struct net_device *ndev)
}
}

+ setup_timer(&dev->tx_timer, ntb_netdev_tx_timer, (unsigned long)ndev);
+
netif_carrier_off(ndev);
ntb_transport_link_up(dev->qp);
+ netif_start_queue(ndev);

return 0;

@@ -219,6 +294,8 @@ static int ntb_netdev_close(struct net_device *ndev)
while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
dev_kfree_skb(skb);

+ del_timer_sync(&dev->tx_timer);
+
return 0;
}

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 3363a89..8c9d50d5 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -494,6 +494,12 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
"tx_index - \t%u\n", qp->tx_index);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_max_entry - \t%u\n", qp->tx_max_entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "qp->remote_rx_info->entry - \t%u\n",
+ qp->remote_rx_info->entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "free tx - \t%u\n",
+ ntb_transport_tx_free_entry(qp));

out_offset += snprintf(buf + out_offset, out_count - out_offset,
"\nQP Link %s\n",
@@ -535,6 +541,7 @@ static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
}
entry = list_first_entry(list, struct ntb_queue_entry, entry);
list_del(&entry->entry);
+
out:
spin_unlock_irqrestore(lock, flags);

@@ -1843,7 +1850,7 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
if (!entry) {
qp->tx_err_no_buf++;
- return -ENOMEM;
+ return -EBUSY;
}

entry->cb_data = cb;
@@ -1969,6 +1976,15 @@ unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
}
EXPORT_SYMBOL_GPL(ntb_transport_max_size);

+unsigned int ntb_transport_tx_free_entry(struct ntb_transport_qp *qp)
+{
+ unsigned int head = qp->tx_index;
+ unsigned int tail = qp->remote_rx_info->entry;
+
+ return tail > head ? tail - head : qp->tx_max_entry + tail - head;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_tx_free_entry);
+
static void ntb_transport_doorbell_callback(void *data, int vector)
{
struct ntb_transport_ctx *nt = data;
diff --git a/include/linux/ntb_transport.h b/include/linux/ntb_transport.h
index 2862861..7243eb9 100644

Allen Hubbe

unread,
Jul 13, 2015, 1:07:33 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Currently the debugfs does not have files for all NTB transport queue
pairs. When there are multiple NTBs present in a system, the QP names
of the last transport clobber the names of previously added transport
QPs. Only the last added QPs can be observed via debugfs.

Create a directory per NTB transport to associate the QPs with that
transport. Name the directory the same as the PCI device.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 98e58c765f2e..25e973ff64cf 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -212,6 +212,8 @@ struct ntb_transport_ctx {
bool link_is_up;
struct delayed_work link_work;
struct work_struct link_cleanup;
+
+ struct dentry *debugfs_node_dir;
};

enum {
@@ -945,12 +947,12 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
qp->tx_max_frame = min(transport_mtu, tx_size / 2);
qp->tx_max_entry = tx_size / qp->tx_max_frame;

- if (nt_debugfs_dir) {
+ if (nt->debugfs_node_dir) {
char debugfs_name[4];

snprintf(debugfs_name, 4, "qp%d", qp_num);
qp->debugfs_dir = debugfs_create_dir(debugfs_name,
- nt_debugfs_dir);
+ nt->debugfs_node_dir);

qp->debugfs_stats = debugfs_create_file("stats", S_IRUSR,
qp->debugfs_dir, qp,
@@ -1053,6 +1055,12 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
goto err2;
}

+ if (nt_debugfs_dir) {
+ nt->debugfs_node_dir =
+ debugfs_create_dir(pci_name(ndev->pdev),
+ nt_debugfs_dir);
+ }
+
for (i = 0; i < qp_count; i++) {
rc = ntb_transport_init_queue(nt, i);
if (rc)
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:34 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

ntb_netdev is allowing the link to come up even when -ENOMEM is returned
from ntb_transport_rx_enqueue. Fix to cover all possible errors.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/net/ntb_netdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index 5f1ee7c05f68..d8757bf9ad75 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -191,7 +191,7 @@ static int ntb_netdev_open(struct net_device *ndev)

rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
ndev->mtu + ETH_HLEN);
- if (rc == -EINVAL) {
+ if (rc) {
dev_kfree_skb(skb);
goto err;
}
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:35 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
Remove early dereference of a pointer that is checked later in the code.

Reported-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/ntb/ntb_transport.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 04e6c345419c..3363a8968f9d 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1692,7 +1692,6 @@ EXPORT_SYMBOL_GPL(ntb_transport_create_queue);
*/
void ntb_transport_free_queue(struct ntb_transport_qp *qp)
{
- struct ntb_transport_ctx *nt = qp->transport;
struct pci_dev *pdev;
struct ntb_queue_entry *entry;
u64 qp_bit;
@@ -1745,7 +1744,7 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
while ((entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
kfree(entry);

- nt->qp_bitmap_free |= qp_bit;
+ qp->transport->qp_bitmap_free |= qp_bit;

dev_info(&pdev->dev, "NTB Transport QP %d freed\n", qp->qp_num);
}
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:39 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Allocate two DMA channels, one for TX operation and one for RX
operation, instead of having one DMA channel for everything. This
provides slightly better performance, and also will make error handling
cleaner later on.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 77 ++++++++++++++++++++++++++++++++++-----------
1 file changed, 58 insertions(+), 19 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 1ddf84ca721a..c64de9b99742 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -119,7 +119,8 @@ struct ntb_transport_qp {
struct ntb_transport_ctx *transport;
struct ntb_dev *ndev;
void *cb_data;
- struct dma_chan *dma_chan;
+ struct dma_chan *tx_dma_chan;
+ struct dma_chan *rx_dma_chan;

bool client_ready;
bool link_is_up;
@@ -504,7 +505,11 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"\n");
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "Using DMA - \t%s\n", use_dma ? "Yes" : "No");
+ "Using TX DMA - \t%s\n",
+ qp->tx_dma_chan ? "Yes" : "No");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "Using RX DMA - \t%s\n",
+ qp->rx_dma_chan ? "Yes" : "No");
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"QP Link - \t%s\n",
qp->link_is_up ? "Up" : "Down");
@@ -1220,7 +1225,7 @@ static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset)
{
struct dma_async_tx_descriptor *txd;
struct ntb_transport_qp *qp = entry->qp;
- struct dma_chan *chan = qp->dma_chan;
+ struct dma_chan *chan = qp->rx_dma_chan;
struct dma_device *device;
size_t pay_off, buff_off, len;
struct dmaengine_unmap_data *unmap;
@@ -1381,8 +1386,8 @@ static void ntb_transport_rxc_db(unsigned long data)
break;
}

- if (i && qp->dma_chan)
- dma_async_issue_pending(qp->dma_chan);
+ if (i && qp->rx_dma_chan)
+ dma_async_issue_pending(qp->rx_dma_chan);

if (i == qp->rx_max_entry) {
/* there is more work to do */
@@ -1449,7 +1454,7 @@ static void ntb_async_tx(struct ntb_transport_qp *qp,
{
struct ntb_payload_header __iomem *hdr;
struct dma_async_tx_descriptor *txd;
- struct dma_chan *chan = qp->dma_chan;
+ struct dma_chan *chan = qp->tx_dma_chan;
struct dma_device *device;
size_t dest_off, buff_off;
struct dmaengine_unmap_data *unmap;
@@ -1642,14 +1647,27 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
@@ -1684,8 +1702,10 @@ err2:
err1:
while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
kfree(entry);
- if (qp->dma_chan)
- dma_release_channel(qp->dma_chan);
+ if (qp->tx_dma_chan)
+ dma_release_channel(qp->tx_dma_chan);
+ if (qp->rx_dma_chan)
+ dma_release_channel(qp->rx_dma_chan);
nt->qp_bitmap_free |= qp_bit;
err:
return NULL;
@@ -1709,12 +1729,27 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
@@ -1962,16 +1997,20 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
{
unsigned int max;
+ unsigned int copy_align;

if (!qp)
return 0;

- if (!qp->dma_chan)
+ if (!qp->tx_dma_chan && !qp->rx_dma_chan)
return qp->tx_max_frame - sizeof(struct ntb_payload_header);

+ copy_align = max(qp->tx_dma_chan->device->copy_align,
+ qp->rx_dma_chan->device->copy_align);
+
/* If DMA engine usage is possible, try to find the max size for that */
max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
- max -= max % (1 << qp->dma_chan->device->copy_align);
+ max -= max % (1 << copy_align);

return max;
}
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:39 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Right now if we push the NTB really hard, we start dropping packets due
to not able to process the packets fast enough. We need to st:qop the
upper layer from flooding us when that happens.

A timer is necessary in order to restart the queue once the resource has
been processed on the receive side. Due to the way NTB is setup, the
resources on the tx side are tied to the processing of the rx side and
there's no async way to know when the rx side has released those
resources.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/net/ntb_netdev.c | 77 +++++++++++++++++++++++++++++++++++++++++++
drivers/ntb/ntb_transport.c | 18 +++++++++-
include/linux/ntb_transport.h | 1 +
3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index d8757bf9ad75..a9acf7156855 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
static int ntb_netdev_open(struct net_device *ndev)
{
struct ntb_netdev *dev = netdev_priv(ndev);
@@ -197,8 +269,11 @@ static int ntb_netdev_open(struct net_device *ndev)
}
}

+ setup_timer(&dev->tx_timer, ntb_netdev_tx_timer, (unsigned long)ndev);
+
netif_carrier_off(ndev);
ntb_transport_link_up(dev->qp);
+ netif_start_queue(ndev);

return 0;

@@ -219,6 +294,8 @@ static int ntb_netdev_close(struct net_device *ndev)
while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
dev_kfree_skb(skb);

+ del_timer_sync(&dev->tx_timer);
+
return 0;
}

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 3363a8968f9d..8c9d50d50e92 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -494,6 +494,12 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
"tx_index - \t%u\n", qp->tx_index);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_max_entry - \t%u\n", qp->tx_max_entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "qp->remote_rx_info->entry - \t%u\n",
+ qp->remote_rx_info->entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "free tx - \t%u\n",
+ ntb_transport_tx_free_entry(qp));

out_offset += snprintf(buf + out_offset, out_count - out_offset,
index 2862861366a5..7243eb98a722 100644
--- a/include/linux/ntb_transport.h
+++ b/include/linux/ntb_transport.h
@@ -83,3 +83,4 @@ void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len);
void ntb_transport_link_up(struct ntb_transport_qp *qp);
void ntb_transport_link_down(struct ntb_transport_qp *qp);
bool ntb_transport_link_query(struct ntb_transport_qp *qp);
+unsigned int ntb_transport_tx_free_entry(struct ntb_transport_qp *qp);
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:40 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Adding PCI Device IDs for B2B (back to back), RP (root port, primary),
and TB (transparent bridge, secondary) devices.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:40 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Jon Mason <jdm...@kudzu.us>

Add the new NTB mailing list to MAINTAINERS

Signed-off-by: Jon Mason <jdm...@kudzu.us>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8133cefb6b6e..6a634882b420 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7223,6 +7223,7 @@ NTB DRIVER CORE
M: Jon Mason <jdm...@kudzu.us>
M: Dave Jiang <dave....@intel.com>
M: Allen Hubbe <Allen...@emc.com>
+L: linu...@googlegroups.com
S: Supported
W: https://github.com/jonmason/ntb/wiki
T: git git://github.com/jonmason/ntb.git
@@ -7234,6 +7235,7 @@ F: include/linux/ntb_transport.h
NTB INTEL DRIVER
M: Jon Mason <jdm...@kudzu.us>
M: Dave Jiang <dave....@intel.com>
+L: linu...@googlegroups.com
S: Supported
W: https://github.com/jonmason/ntb/wiki
T: git git://github.com/jonmason/ntb.git
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:41 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
It was possible for a synchronous update of the RX index in the error
case to get ahead of the asynchronous RX index update in the normal
case. Change the RX processing to preserve an RX completion order.

There were two error cases. First, if a buffer is not present to
receive data, there would be no queue entry to preserve the RX
completion order. Instead of dropping the RX frame, leave the RX frame
in the ring. Schedule RX processing when RX entries are enqueued, in
case there are RX frames waiting in the ring to be received.

Second, if a buffer is too small to receive data, drop the frame in the
ring, mark the RX entry as done, and indicate the error in the RX entry
length. Check for a negative length in the receive callback in
ntb_netdev, and count occurrences as rx_length_errors.

Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/net/ntb_netdev.c | 7 ++
drivers/ntb/ntb_transport.c | 169 ++++++++++++++++++++++++++------------------
2 files changed, 107 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index 3cc316cb7e6b..5f1ee7c05f68 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -102,6 +102,12 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,

netdev_dbg(ndev, "%s: %d byte payload received\n", __func__, len);

+ if (len < 0) {
+ ndev->stats.rx_errors++;
+ ndev->stats.rx_length_errors++;
+ goto enqueue_again;
+ }
+
skb_put(skb, len);
skb->protocol = eth_type_trans(skb, ndev);
skb->ip_summed = CHECKSUM_NONE;
@@ -121,6 +127,7 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
return;
}

+enqueue_again:
rc = ntb_transport_rx_enqueue(qp, skb, skb->data, ndev->mtu + ETH_HLEN);
if (rc) {
dev_kfree_skb(skb);
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index efe3ad4122f2..98e58c765f2e 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
+static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset)
{
struct dma_async_tx_descriptor *txd;
struct ntb_transport_qp *qp = entry->qp;
struct dma_chan *chan = qp->dma_chan;
struct dma_device *device;
- size_t pay_off, buff_off;
+ size_t pay_off, buff_off, len;
struct dmaengine_unmap_data *unmap;
static void ntb_transport_rxc_db(unsigned long data)
@@ -1333,7 +1356,7 @@ static void ntb_transport_rxc_db(unsigned long data)
break;
}

- if (qp->dma_chan)
+ if (i && qp->dma_chan)
dma_async_issue_pending(qp->dma_chan);

if (i == qp->rx_max_entry) {
@@ -1609,7 +1632,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
goto err1;

entry->qp = qp;
- ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
&qp->rx_free_q);
}

@@ -1634,7 +1657,7 @@ err2:
while ((entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
kfree(entry);
err1:
- while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+ while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
kfree(entry);
if (qp->dma_chan)
dma_release_channel(qp->dma_chan);
@@ -1689,11 +1712,16 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
qp->tx_handler = NULL;
qp->event_handler = NULL;

- while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+ while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
kfree(entry);
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:42 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
Schedule to receive on QP link up, to make sure that the doorbell is
properly cleared for interrupts.

Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/ntb/ntb_transport.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index a049f96fab8d..b82171e3e07d 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -895,6 +895,8 @@ static void ntb_qp_link_work(struct work_struct *work)

if (qp->event_handler)
qp->event_handler(qp->cb_data, qp->link_is_up);
+
+ tasklet_schedule(&qp->rxc_db_work);
} else if (nt->link_is_up)
schedule_delayed_work(&qp->link_work,
msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:42 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
A plain 32 bit integer will overflow for values over 4GiB.

Change the plain integer size to the appropriate size type in
ntb_set_mw. Change the type of the size parameter and two local
variables used for size.

Even if there is no overflow, a size of zero is invalid here.

Reported-by: Juyoung Jung <jj...@micron.com>
Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/ntb/ntb_transport.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index b82171e3e07d..04e6c345419c 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -629,13 +629,16 @@ static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
}

static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
- unsigned int size)
+ resource_size_t size)
{
struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
struct pci_dev *pdev = nt->ndev->pdev;
- unsigned int xlat_size, buff_size;
+ size_t xlat_size, buff_size;
int rc;

+ if (!size)
+ return -EINVAL;
+
xlat_size = round_up(size, mw->xlat_align_size);
buff_size = round_up(size, mw->xlat_align);

@@ -655,7 +658,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
if (!mw->virt_addr) {
mw->xlat_size = 0;
mw->buff_size = 0;
- dev_err(&pdev->dev, "Unable to alloc MW buff of size %d\n",
+ dev_err(&pdev->dev, "Unable to alloc MW buff of size %lu\n",
buff_size);
return -ENOMEM;
}
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:42 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

Make QP stats info more readable for debugging purposes. Also add an
entry to indicate whether DMA is being used.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 55c26c59f4fe..c283d1c2cb72 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -452,7 +452,7 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,

out_offset = 0;
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "NTB QP stats\n");
+ "\nNTB QP stats:\n\n");
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_bytes - \t%llu\n", qp->rx_bytes);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
@@ -470,11 +470,11 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_err_ver - \t%llu\n", qp->rx_err_ver);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "rx_buff - \t%p\n", qp->rx_buff);
+ "rx_buff - \t0x%p\n", qp->rx_buff);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_index - \t%u\n", qp->rx_index);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "rx_max_entry - \t%u\n", qp->rx_max_entry);
+ "rx_max_entry - \t%u\n\n", qp->rx_max_entry);

out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_bytes - \t%llu\n", qp->tx_bytes);
@@ -489,21 +489,28 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "tx_mw - \t%p\n", qp->tx_mw);
+ "tx_mw - \t0x%p\n", qp->tx_mw);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "tx_index - \t%u\n", qp->tx_index);
+ "tx_index (H) - \t%u\n", qp->tx_index);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "tx_max_entry - \t%u\n", qp->tx_max_entry);
- out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "qp->remote_rx_info->entry - \t%u\n",
+ "RRI (T) - \t%u\n",
qp->remote_rx_info->entry);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_max_entry - \t%u\n", qp->tx_max_entry);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
"free tx - \t%u\n",
ntb_transport_tx_free_entry(qp));

out_offset += snprintf(buf + out_offset, out_count - out_offset,
- "\nQP Link %s\n",
+ "\n");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "Using DMA - \t%s\n", use_dma ? "Yes" : "No");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "QP Link - \t%s\n",
qp->link_is_up ? "Up" : "Down");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "\n");
+
if (out_offset > out_count)
out_offset = out_count;

--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:43 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
I split the v1 series into two series. These patches are _not_ bug fixes.

Numbering in this v2 starts at 8. This is the second half of the patches.

Module parameters are removed from "Add flow control".

These are in github.com/allenbh/linux.git branch ntb/next.

Allen Hubbe (1):
NTB: Remove dma_sync_wait from ntb_async_rx

Dave Jiang (5):
NTB: Add flow control to the ntb_netdev
NTB: Add PCI Device IDs for Broadwell Xeon
NTB: Make the transport list in order of discovery
NTB: Clean up QP stats info
NTB: Use unique DMA channels for TX and RX

Jon Mason (1):
NTB: Add list to MAINTAINERS

MAINTAINERS | 2 +
drivers/net/ntb_netdev.c | 77 ++++++++++++++++++++++
drivers/ntb/hw/intel/ntb_hw_intel.c | 15 +++++
drivers/ntb/hw/intel/ntb_hw_intel.h | 3 +
drivers/ntb/ntb_transport.c | 126 ++++++++++++++++++++++++++----------
include/linux/ntb_transport.h | 1 +
6 files changed, 189 insertions(+), 35 deletions(-)

--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:43 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

When the remote side is not up, we do not have all the context for the
transport, and that causes NULL ptr access. Have the debugfs reads check
to see if transport is up before we make access.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 25e973ff64cf..a049f96fab8d 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -439,13 +439,17 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
char *buf;
ssize_t ret, out_offset, out_count;

+ qp = filp->private_data;
+
+ if (!qp || !qp->link_is_up)
+ return 0;
+
out_count = 1000;

buf = kmalloc(out_count, GFP_KERNEL);
if (!buf)
return -ENOMEM;

- qp = filp->private_data;
out_offset = 0;
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"NTB QP stats\n");
--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:43 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang
From: Dave Jiang <dave....@intel.com>

The list should be added from the bottom and not the top in order to
ensure the transport is provided in the same order to clients as ntb
devices are discovered.

Signed-off-by: Dave Jiang <dave....@intel.com>
---
drivers/ntb/ntb_transport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 8c9d50d50e92..55c26c59f4fe 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -297,7 +297,7 @@ static LIST_HEAD(ntb_transport_list);

static int ntb_bus_init(struct ntb_transport_ctx *nt)
{
- list_add(&nt->entry, &ntb_transport_list);
+ list_add_tail(&nt->entry, &ntb_transport_list);
return 0;
}

--
2.5.0.rc1

Allen Hubbe

unread,
Jul 13, 2015, 1:07:45 PM7/13/15
to linu...@googlegroups.com, Jon Mason, Dave Jiang, Allen Hubbe
Signed-off-by: Allen Hubbe <Allen...@emc.com>
---
drivers/ntb/ntb_transport.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index c283d1c2cb72..1ddf84ca721a 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1233,18 +1233,18 @@ static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset)
goto err;

if (len < copy_bytes)
- goto err_wait;
+ goto err;

device = chan->device;
pay_off = (size_t)offset & ~PAGE_MASK;
buff_off = (size_t)buf & ~PAGE_MASK;

if (!is_dma_copy_aligned(device, pay_off, buff_off, len))
- goto err_wait;
+ goto err;

unmap = dmaengine_get_unmap_data(device->dev, 2, GFP_NOWAIT);
if (!unmap)
- goto err_wait;
+ goto err;

unmap->len = len;
unmap->addr[0] = dma_map_page(device->dev, virt_to_page(offset),
@@ -1287,12 +1287,6 @@ err_set_unmap:
dmaengine_unmap_put(unmap);
err_get_unmap:
dmaengine_unmap_put(unmap);
-err_wait:
- /* If the callbacks come out of order, the writing of the index to the
- * last completed will be out of order. This may result in the
- * receive stalling forever.
- */
- dma_sync_wait(chan, qp->last_cookie);
err:
ntb_memcpy_rx(entry, offset);
qp->rx_memcpy++;
--
2.5.0.rc1

Reply all
Reply to author
Forward
0 new messages