GIT b34a93300ab470162b6560d13fffa28b937d090a git+ssh://master.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git#for-mm commit Author: Steve Wise Date: Mon Nov 26 11:28:46 2007 -0600 RDMA/cxgb3: Support version 5.0 firmware The 5.0 firmware now supports translating sgls in recv work requests, so remove the host driver logic currently doing the translation. Note: this change requires 5.0 firmware. Signed-off-by: Steve Wise Signed-off-by: Roland Dreier commit 06297d4118b4399467197a7f7cb266fc4d2faf1f Author: Steve Wise Date: Mon Nov 26 11:28:44 2007 -0600 RDMA/cxgb3: Hold rtnl_lock() around ethtool get_drvinfo call Currently the call into cxgb3 to get the driver info is not serialized. The iw_cxgb3 module needs to hold the rtnl_lock around the ethtool ops call like dev_ioctl() does. Signed-off-by: Steve Wise Signed-off-by: Roland Dreier commit d6695d42c60c3efaeaaa31c8638ce6fdd6814e7f Author: Joe Perches Date: Mon Nov 19 17:48:11 2007 -0800 drivers/infiniband: Add missing "space" Add missing spaces in the middle of format strings. Signed-off-by: Joe Perches Signed-off-by: Roland Dreier commit 7bd0a9acc0e1fc1442e8dd536a4dff1311ccd7ec Author: Matthias Kaehlcke Date: Thu Nov 15 15:23:25 2007 -0800 IB/ipath: Convert ipath_eep_sem semaphore to a mutex Signed-off-by: Matthias Kaehlcke Acked-by: Michael Albaugh Tested-by: Arthur Jones Signed-off-by: Andrew Morton Signed-off-by: Roland Dreier commit 24f54f4d1a9a5424432732dda698da9e3a550531 Author: Steve Welch Date: Tue Oct 23 15:06:10 2007 -0700 IB/mad: Enable loopback of DR SMP responses from userspace The local loopback of an outgoing DR SMP response is limited to those that originate at the driver specific SMA implementation during the driver specific process_mad() function. This patch enables a returning DR SMP originating in userspace (or elsewhere) to be delivered to the local managment stack. In this specific case the driver process_mad() function does not consume or process the MAD, so a reponse mad has not be created and the original MAD must manually be copied to the MAD buffer that is to be handed off to the local agent. Signed-off-by: Steve Welch Acked-by: Hal Rosenstock Signed-off-by: Roland Dreier commit 1f8956a788c57c6428898559f6f432a4de5588a0 Author: Ralph Campbell Date: Tue Oct 23 15:07:41 2007 -0700 IB/ipath: Enable loopback of DR SMP responses from userspace This patch is in response to reviewing a patch to the core MAD processing which fixes loopback of directed route packets to/from user level MAD agents. This change enables the core code to work for ib_ipath by fixing the return code from the ipath process_mad method. Signed-off-by: Ralph Campbell Signed-off-by: Roland Dreier commit 0e4017e919aa110d2033e9d573e3798857fd9755 Author: Ralph Campbell Date: Tue Oct 23 15:04:15 2007 -0700 IB/mad: Remove redundant NULL pointer check in ib_mad_recv_done_handler() In ib_mad_recv_done_handler(), the response pointer is checked for NULL after allocating it. It is then checked again in the local process_mad() path but there is no possibility of it changing in between. Signed-off-by: Ralph Campbell Acked-by: Hal Rosenstock Signed-off-by: Roland Dreier commit 7506c9306823ffc7ad17a985bd36d35ee9a86270 Author: Steve Wise Date: Mon Oct 29 11:34:05 2007 -0500 RDMA/iwcm: Set initiator depth and responder resources to device max values Set the initiator depth and responder resources to the device max values for new connect request events in the iWARP connection manager. Signed-off-by: Steve Wise Signed-off-by: Roland Dreier commit 8b0b808be541d363b15784ddebd60af28b7d0bb5 Author: Dave Olson Date: Tue Oct 9 22:10:35 2007 -0700 IB/ipath: Improve interrupt handler cache footprint Improve interrupt handler cache footprint by noinline'ing error functions that are rarely called. Signed-off-by: Dave Olson Signed-off-by: Roland Dreier commit 723fdb20a1538152aaf07f65dccaa08ef12a9fa4 Author: Pradeep Satyanarayana Date: Mon Dec 3 14:22:13 2007 -0800 IPoIB/cm: Add connected mode support for devices without SRQs Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared receive queues). The current IPoIB connected mode support only works on devices that support SRQs. Fix this by adding support for using the receive queue of each connected mode receive QP. The disadvantage of this compared to using an SRQ is that it means a full queue of receives must be posted for each remote connected mode peer, which means that total memory usage is potentially much higher than when using SRQs. To manage this, add a new module parameter "max_nonsrq_conn_qp" that limits the number of connections allowed per interface. The rest of the changes are fairly straightforward: we use a table of struct ipoib_cm_rx to hold all the active connections, and put the table index of the connection in the high bits of receive WR IDs. This is needed because we cannot rely on the struct ib_wc.qp field for non-SRQ receive completions. Most of the rest of the changes just test whether or not an SRQ is available, and post receives or find received packets in the right place depending on the answer. Cleaning up dead connections actually becomes simpler, because we do not have to do the "last WQE reached" dance that is required to destroy QPs attached to an SRQ. We just move the QP to the error state and wait for all pending receives to be flushed. Signed-off-by: Pradeep Satyanarayana [ Completely rewritten and split up, based on Pradeep's work. Several bugs fixed and no doubt several bugs introduced. - Roland ] Signed-off-by: Roland Dreier commit eb3c492a1d67c728e321605cdf1bbafede4a5b05 Author: Roland Dreier Date: Mon Dec 3 14:22:13 2007 -0800 IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list() Factor out the code for going through the rx_reap list of struct ipoib_cm_rx and freeing each one. This consolidates the code duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and reduces the risk of error when adding additional accounting. Signed-off-by: Roland Dreier commit 6a08cc746086c5ec28fb376496298ad8ff597bc3 Author: Roland Dreier Date: Mon Dec 3 14:22:12 2007 -0800 IPoIB/cm: Factor out ipoib_cm_create_srq() Factor out the code to create an SRQ and allocate the receive ring in ipoib_cm_dev_init() into a new function ipoib_cm_create_srq(). This will make the code neater when support for devices that don't implement SRQs is added. Signed-off-by: Roland Dreier commit 95672a0625176c486086d95138026771feec4667 Author: Roland Dreier Date: Mon Dec 3 14:22:12 2007 -0800 IPoIB/cm: Factor out ipoib_cm_free_rx_ring() Factor out the code to unmap/free skbs and free the receive ring in ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring(). This function will be called from a couple of other places when support for devices that don't implement SRQs is added. Signed-off-by: Roland Dreier commit 70dd62958e221e102c5a971975a189f18b191581 Author: Roland Dreier Date: Tue Oct 23 12:57:54 2007 -0700 IPoIB: Trivial formatting cleanups Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc. Signed-off-by: Roland Dreier drivers/infiniband/core/cma.c | 10 + drivers/infiniband/core/mad.c | 15 +- drivers/infiniband/core/smi.h | 18 ++- drivers/infiniband/hw/cxgb3/iwch_provider.c | 5 + drivers/infiniband/hw/cxgb3/iwch_qp.c | 21 +-- drivers/infiniband/hw/ehca/ehca_cq.c | 2 +- drivers/infiniband/hw/ehca/ehca_qp.c | 6 +- drivers/infiniband/hw/ipath/ipath_eeprom.c | 20 +- drivers/infiniband/hw/ipath/ipath_init_chip.c | 2 +- drivers/infiniband/hw/ipath/ipath_intr.c | 4 +- drivers/infiniband/hw/ipath/ipath_kernel.h | 3 +- drivers/infiniband/hw/ipath/ipath_mad.c | 4 +- drivers/infiniband/ulp/ipoib/ipoib.h | 171 +++++++----- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 351 +++++++++++++++++------- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 8 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 40 ++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 4 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 18 +- drivers/infiniband/ulp/iser/iser_initiator.c | 2 +- 19 files changed, 444 insertions(+), 260 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0751697..5a80e74 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1262,6 +1262,7 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id, struct net_device *dev = NULL; struct rdma_cm_event event; int ret; + struct ib_device_attr attr; listen_id = cm_id->context; if (cma_disable_remove(listen_id, CMA_LISTEN)) @@ -1311,10 +1312,19 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id, sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; *sin = iw_event->remote_addr; + ret = ib_query_device(conn_id->id.device, &attr); + if (ret) { + cma_enable_remove(conn_id); + rdma_destroy_id(new_cm_id); + goto out; + } + memset(&event, 0, sizeof event); event.event = RDMA_CM_EVENT_CONNECT_REQUEST; event.param.conn.private_data = iw_event->private_data; event.param.conn.private_data_len = iw_event->private_data_len; + event.param.conn.initiator_depth = attr.max_qp_init_rd_atom; + event.param.conn.responder_resources = attr.max_qp_rd_atom; ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* User wants to destroy the CM ID */ diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 6f42877..649335a 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -701,7 +701,8 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv, } /* Check to post send on QP or process locally */ - if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD) + if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD && + smi_check_local_returning_smp(smp, device) == IB_SMI_DISCARD) goto out; local = kmalloc(sizeof *local, GFP_ATOMIC); @@ -752,8 +753,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv, port_priv = ib_get_mad_port(mad_agent_priv->agent.device, mad_agent_priv->agent.port_num); if (port_priv) { - mad_priv->mad.mad.mad_hdr.tid = - ((struct ib_mad *)smp)->mad_hdr.tid; + memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad)); recv_mad_agent = find_mad_agent(port_priv, &mad_priv->mad.mad); } @@ -1931,15 +1931,6 @@ local: if (port_priv->device->process_mad) { int ret; - if (!response) { - printk(KERN_ERR PFX "No memory for response MAD\n"); - /* - * Is it better to assume that - * it wouldn't be processed ? - */ - goto out; - } - ret = port_priv->device->process_mad(port_priv->device, 0, port_priv->port_num, wc, &recv->grh, diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h index 1cfc298..aff96ba 100644 --- a/drivers/infiniband/core/smi.h +++ b/drivers/infiniband/core/smi.h @@ -59,7 +59,8 @@ extern enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp, u8 node_type, int port_num); /* - * Return 1 if the SMP should be handled by the local SMA/SM via process_mad + * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM + * via process_mad */ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp, struct ib_device *device) @@ -71,4 +72,19 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp, (smp->hop_ptr == smp->hop_cnt + 1)) ? IB_SMI_HANDLE : IB_SMI_DISCARD); } + +/* + * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM + * via process_mad + */ +static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp, + struct ib_device *device) +{ + /* C14-13:3 -- We're at the end of the DR segment of path */ + /* C14-13:4 -- Hop Pointer == 0 -> give to SM */ + return ((device->process_mad && + ib_get_smp_direction(smp) && + !smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD); +} + #endif /* __SMI_H_ */ diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index b5436ca..69b1204 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -1053,7 +1054,9 @@ static ssize_t show_fw_ver(struct class_device *cdev, char *buf) struct net_device *lldev = dev->rdev.t3cdev_p->lldev; PDBG("%s class dev 0x%p\n", __FUNCTION__, cdev); + rtnl_lock(); lldev->ethtool_ops->get_drvinfo(lldev, &info); + rtnl_unlock(); return sprintf(buf, "%s\n", info.fw_version); } @@ -1065,7 +1068,9 @@ static ssize_t show_hca(struct class_device *cdev, char *buf) struct net_device *lldev = dev->rdev.t3cdev_p->lldev; PDBG("%s class dev 0x%p\n", __FUNCTION__, cdev); + rtnl_lock(); lldev->ethtool_ops->get_drvinfo(lldev, &info); + rtnl_unlock(); return sprintf(buf, "%s\n", info.driver); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index dd89b6b..9bb8112 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -208,36 +208,19 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe, struct ib_recv_wr *wr) { - int i, err = 0; - u32 pbl_addr[4]; - u8 page_size[4]; + int i; if (wr->num_sge > T3_MAX_SGE) return -EINVAL; - err = iwch_sgl2pbl_map(rhp, wr->sg_list, wr->num_sge, pbl_addr, - page_size); - if (err) - return err; - wqe->recv.pagesz[0] = page_size[0]; - wqe->recv.pagesz[1] = page_size[1]; - wqe->recv.pagesz[2] = page_size[2]; - wqe->recv.pagesz[3] = page_size[3]; wqe->recv.num_sgle = cpu_to_be32(wr->num_sge); for (i = 0; i < wr->num_sge; i++) { wqe->recv.sgl[i].stag = cpu_to_be32(wr->sg_list[i].lkey); wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); - - /* to in the WQE == the offset into the page */ - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % - (1UL << (12 + page_size[i]))); - - /* pbl_addr is the adapters address in the PBL */ - wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); + wqe->recv.sgl[i].to = cpu_to_be64(wr->sg_list[i].addr); } for (; i < T3_MAX_SGE; i++) { wqe->recv.sgl[i].stag = 0; wqe->recv.sgl[i].len = 0; wqe->recv.sgl[i].to = 0; - wqe->recv.pbl_addr[i] = 0; } return 0; } diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index 79c25f5..0467c15 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -246,7 +246,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, } else { if (h_ret != H_PAGE_REGISTERED) { ehca_err(device, "Registration of page failed " - "ehca_cq=%p cq_num=%x h_ret=%li" + "ehca_cq=%p cq_num=%x h_ret=%li " "counter=%i act_pages=%i", my_cq, my_cq->cq_number, h_ret, counter, param.act_pages); diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index dd12668..04e711f 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -858,7 +858,7 @@ struct ib_srq *ehca_create_srq(struct ib_pd *pd, update_mask, mqpcb, my_qp->galpas.kernel); if (hret != H_SUCCESS) { - ehca_err(pd->device, "Could not modify SRQ to INIT" + ehca_err(pd->device, "Could not modify SRQ to INIT " "ehca_qp=%p qp_num=%x h_ret=%li", my_qp, my_qp->real_qp_num, hret); goto create_srq2; @@ -872,7 +872,7 @@ struct ib_srq *ehca_create_srq(struct ib_pd *pd, update_mask, mqpcb, my_qp->galpas.kernel); if (hret != H_SUCCESS) { - ehca_err(pd->device, "Could not enable SRQ" + ehca_err(pd->device, "Could not enable SRQ " "ehca_qp=%p qp_num=%x h_ret=%li", my_qp, my_qp->real_qp_num, hret); goto create_srq2; @@ -886,7 +886,7 @@ struct ib_srq *ehca_create_srq(struct ib_pd *pd, update_mask, mqpcb, my_qp->galpas.kernel); if (hret != H_SUCCESS) { - ehca_err(pd->device, "Could not modify SRQ to RTR" + ehca_err(pd->device, "Could not modify SRQ to RTR " "ehca_qp=%p qp_num=%x h_ret=%li", my_qp, my_qp->real_qp_num, hret); goto create_srq2; diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c index e7c25db..a5b6299 100644 --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c @@ -510,10 +510,10 @@ int ipath_eeprom_read(struct ipath_devdata *dd, u8 eeprom_offset, { int ret; - ret = down_interruptible(&dd->ipath_eep_sem); + ret = mutex_lock_interruptible(&dd->ipath_eep_lock); if (!ret) { ret = ipath_eeprom_internal_read(dd, eeprom_offset, buff, len); - up(&dd->ipath_eep_sem); + mutex_unlock(&dd->ipath_eep_lock); } return ret; @@ -524,10 +524,10 @@ int ipath_eeprom_write(struct ipath_devdata *dd, u8 eeprom_offset, { int ret; - ret = down_interruptible(&dd->ipath_eep_sem); + ret = mutex_lock_interruptible(&dd->ipath_eep_lock); if (!ret) { ret = ipath_eeprom_internal_write(dd, eeprom_offset, buff, len); - up(&dd->ipath_eep_sem); + mutex_unlock(&dd->ipath_eep_lock); } return ret; @@ -616,9 +616,9 @@ void ipath_get_eeprom_info(struct ipath_devdata *dd) goto bail; } - down(&dd->ipath_eep_sem); + mutex_lock(&dd->ipath_eep_lock); eep_stat = ipath_eeprom_internal_read(dd, 0, buf, len); - up(&dd->ipath_eep_sem); + mutex_unlock(&dd->ipath_eep_lock); if (eep_stat) { ipath_dev_err(dd, "Failed reading GUID from eeprom\n"); @@ -764,14 +764,14 @@ int ipath_update_eeprom_log(struct ipath_devdata *dd) /* Grab semaphore and read current EEPROM. If we get an * error, let go, but if not, keep it until we finish write. */ - ret = down_interruptible(&dd->ipath_eep_sem); + ret = mutex_lock_interruptible(&dd->ipath_eep_lock); if (ret) { ipath_dev_err(dd, "Unable to acquire EEPROM for logging\n"); goto free_bail; } ret = ipath_eeprom_internal_read(dd, 0, buf, len); if (ret) { - up(&dd->ipath_eep_sem); + mutex_unlock(&dd->ipath_eep_lock); ipath_dev_err(dd, "Unable read EEPROM for logging\n"); goto free_bail; } @@ -779,7 +779,7 @@ int ipath_update_eeprom_log(struct ipath_devdata *dd) csum = flash_csum(ifp, 0); if (csum != ifp->if_csum) { - up(&dd->ipath_eep_sem); + mutex_unlock(&dd->ipath_eep_lock); ipath_dev_err(dd, "EEPROM cks err (0x%02X, S/B 0x%02X)\n", csum, ifp->if_csum); ret = 1; @@ -849,7 +849,7 @@ int ipath_update_eeprom_log(struct ipath_devdata *dd) csum = flash_csum(ifp, 1); ret = ipath_eeprom_internal_write(dd, 0, buf, hi_water + 1); } - up(&dd->ipath_eep_sem); + mutex_unlock(&dd->ipath_eep_lock); if (ret) ipath_dev_err(dd, "Failed updating EEPROM\n"); diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 9dd0bac..9e9d6fa 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -348,7 +348,7 @@ static int init_chip_first(struct ipath_devdata *dd, spin_lock_init(&dd->ipath_gpio_lock); spin_lock_init(&dd->ipath_eep_st_lock); - sema_init(&dd->ipath_eep_sem, 1); + mutex_init(&dd->ipath_eep_lock); done: *pdp = pd; diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index c61f9da..8f3718c 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -849,7 +849,7 @@ void ipath_clear_freeze(struct ipath_devdata *dd) /* this is separate to allow for better optimization of ipath_intr() */ -static void ipath_bad_intr(struct ipath_devdata *dd, u32 * unexpectp) +static noinline void ipath_bad_intr(struct ipath_devdata *dd, u32 *unexpectp) { /* * sometimes happen during driver init and unload, don't want @@ -892,7 +892,7 @@ static void ipath_bad_intr(struct ipath_devdata *dd, u32 * unexpectp) "ignoring\n"); } -static void ipath_bad_regread(struct ipath_devdata *dd) +static noinline void ipath_bad_regread(struct ipath_devdata *dd) { static int allbits; diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 8786dd7..a6e7a60 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -616,7 +617,7 @@ struct ipath_devdata { /* control access to actual counters, timer */ spinlock_t ipath_eep_st_lock; /* control high-level access to EEPROM */ - struct semaphore ipath_eep_sem; + struct mutex ipath_eep_lock; /* Below inc'd by ipath_snap_cntrs(), locked by ipath_eep_st_lock */ uint64_t ipath_traffic_wds; /* active time is kept in seconds, but logged in hours */ diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index 3d1432d..1978c34 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -1434,7 +1434,7 @@ static int process_subn(struct ib_device *ibdev, int mad_flags, * before checking for other consumers. * Just tell the caller to process it normally. */ - ret = IB_MAD_RESULT_FAILURE; + ret = IB_MAD_RESULT_SUCCESS; goto bail; default: smp->status |= IB_SMP_UNSUP_METHOD; @@ -1516,7 +1516,7 @@ static int process_perf(struct ib_device *ibdev, u8 port_num, * before checking for other consumers. * Just tell the caller to process it normally. */ - ret = IB_MAD_RESULT_FAILURE; + ret = IB_MAD_RESULT_SUCCESS; goto bail; default: pmp->status |= IB_SMP_UNSUP_METHOD; diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index eb7edab..d35025f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -56,42 +56,43 @@ /* constants */ enum { - IPOIB_PACKET_SIZE = 2048, - IPOIB_BUF_SIZE = IPOIB_PACKET_SIZE + IB_GRH_BYTES, + IPOIB_PACKET_SIZE = 2048, + IPOIB_BUF_SIZE = IPOIB_PACKET_SIZE + IB_GRH_BYTES, - IPOIB_ENCAP_LEN = 4, + IPOIB_ENCAP_LEN = 4, - IPOIB_CM_MTU = 0x10000 - 0x10, /* padding to align header to 16 */ - IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, - IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE, - IPOIB_CM_RX_SG = ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / PAGE_SIZE, - IPOIB_RX_RING_SIZE = 128, - IPOIB_TX_RING_SIZE = 64, + IPOIB_CM_MTU = 0x10000 - 0x10, /* padding to align header to 16 */ + IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, + IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE, + IPOIB_CM_RX_SG = ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / PAGE_SIZE, + IPOIB_RX_RING_SIZE = 128, + IPOIB_TX_RING_SIZE = 64, IPOIB_MAX_QUEUE_SIZE = 8192, IPOIB_MIN_QUEUE_SIZE = 2, + IPOIB_CM_MAX_CONN_QP = 4096, - IPOIB_NUM_WC = 4, + IPOIB_NUM_WC = 4, IPOIB_MAX_PATH_REC_QUEUE = 3, - IPOIB_MAX_MCAST_QUEUE = 3, - - IPOIB_FLAG_OPER_UP = 0, - IPOIB_FLAG_INITIALIZED = 1, - IPOIB_FLAG_ADMIN_UP = 2, - IPOIB_PKEY_ASSIGNED = 3, - IPOIB_PKEY_STOP = 4, - IPOIB_FLAG_SUBINTERFACE = 5, - IPOIB_MCAST_RUN = 6, - IPOIB_STOP_REAPER = 7, - IPOIB_MCAST_STARTED = 8, - IPOIB_FLAG_ADMIN_CM = 9, + IPOIB_MAX_MCAST_QUEUE = 3, + + IPOIB_FLAG_OPER_UP = 0, + IPOIB_FLAG_INITIALIZED = 1, + IPOIB_FLAG_ADMIN_UP = 2, + IPOIB_PKEY_ASSIGNED = 3, + IPOIB_PKEY_STOP = 4, + IPOIB_FLAG_SUBINTERFACE = 5, + IPOIB_MCAST_RUN = 6, + IPOIB_STOP_REAPER = 7, + IPOIB_MCAST_STARTED = 8, + IPOIB_FLAG_ADMIN_CM = 9, IPOIB_FLAG_UMCAST = 10, IPOIB_MAX_BACKOFF_SECONDS = 16, - IPOIB_MCAST_FLAG_FOUND = 0, /* used in set_multicast_list */ + IPOIB_MCAST_FLAG_FOUND = 0, /* used in set_multicast_list */ IPOIB_MCAST_FLAG_SENDONLY = 1, - IPOIB_MCAST_FLAG_BUSY = 2, /* joining or already joined */ + IPOIB_MCAST_FLAG_BUSY = 2, /* joining or already joined */ IPOIB_MCAST_FLAG_ATTACHED = 3, }; @@ -117,7 +118,7 @@ struct ipoib_pseudoheader { struct ipoib_mcast { struct ib_sa_mcmember_rec mcmember; struct ib_sa_multicast *mc; - struct ipoib_ah *ah; + struct ipoib_ah *ah; struct rb_node rb_node; struct list_head list; @@ -186,27 +187,29 @@ enum ipoib_cm_state { }; struct ipoib_cm_rx { - struct ib_cm_id *id; - struct ib_qp *qp; - struct list_head list; - struct net_device *dev; - unsigned long jiffies; - enum ipoib_cm_state state; + struct ib_cm_id *id; + struct ib_qp *qp; + struct ipoib_cm_rx_buf *rx_ring; + struct list_head list; + struct net_device *dev; + unsigned long jiffies; + enum ipoib_cm_state state; + int recv_count; }; struct ipoib_cm_tx { - struct ib_cm_id *id; - struct ib_qp *qp; + struct ib_cm_id *id; + struct ib_qp *qp; struct list_head list; struct net_device *dev; struct ipoib_neigh *neigh; struct ipoib_path *path; struct ipoib_tx_buf *tx_ring; - unsigned tx_head; - unsigned tx_tail; - unsigned long flags; - u32 mtu; - struct ib_wc ibwc[IPOIB_NUM_WC]; + unsigned tx_head; + unsigned tx_tail; + unsigned long flags; + u32 mtu; + struct ib_wc ibwc[IPOIB_NUM_WC]; }; struct ipoib_cm_rx_buf { @@ -215,25 +218,26 @@ struct ipoib_cm_rx_buf { }; struct ipoib_cm_dev_priv { - struct ib_srq *srq; + struct ib_srq *srq; struct ipoib_cm_rx_buf *srq_ring; - struct ib_cm_id *id; - struct list_head passive_ids; /* state: LIVE */ - struct list_head rx_error_list; /* state: ERROR */ - struct list_head rx_flush_list; /* state: FLUSH, drain not started */ - struct list_head rx_drain_list; /* state: FLUSH, drain started */ - struct list_head rx_reap_list; /* state: FLUSH, drain done */ + struct ib_cm_id *id; + struct list_head passive_ids; /* state: LIVE */ + struct list_head rx_error_list; /* state: ERROR */ + struct list_head rx_flush_list; /* state: FLUSH, drain not started */ + struct list_head rx_drain_list; /* state: FLUSH, drain started */ + struct list_head rx_reap_list; /* state: FLUSH, drain done */ struct work_struct start_task; struct work_struct reap_task; struct work_struct skb_task; struct work_struct rx_reap_task; struct delayed_work stale_task; struct sk_buff_head skb_queue; - struct list_head start_list; - struct list_head reap_list; - struct ib_wc ibwc[IPOIB_NUM_WC]; - struct ib_sge rx_sge[IPOIB_CM_RX_SG]; + struct list_head start_list; + struct list_head reap_list; + struct ib_wc ibwc[IPOIB_NUM_WC]; + struct ib_sge rx_sge[IPOIB_CM_RX_SG]; struct ib_recv_wr rx_wr; + int nonsrq_conn_qp; }; /* @@ -269,30 +273,30 @@ struct ipoib_dev_priv { struct work_struct pkey_event_task; struct ib_device *ca; - u8 port; - u16 pkey; - u16 pkey_index; - struct ib_pd *pd; - struct ib_mr *mr; - struct ib_cq *cq; - struct ib_qp *qp; - u32 qkey; + u8 port; + u16 pkey; + u16 pkey_index; + struct ib_pd *pd; + struct ib_mr *mr; + struct ib_cq *cq; + struct ib_qp *qp; + u32 qkey; union ib_gid local_gid; - u16 local_lid; + u16 local_lid; unsigned int admin_mtu; unsigned int mcast_mtu; struct ipoib_rx_buf *rx_ring; - spinlock_t tx_lock; + spinlock_t tx_lock; struct ipoib_tx_buf *tx_ring; - unsigned tx_head; - unsigned tx_tail; - struct ib_sge tx_sge; + unsigned tx_head; + unsigned tx_tail; + struct ib_sge tx_sge; struct ib_send_wr tx_wr; - unsigned tx_outstanding; + unsigned tx_outstanding; struct ib_wc ibwc[IPOIB_NUM_WC]; @@ -317,10 +321,10 @@ struct ipoib_dev_priv { struct ipoib_ah { struct net_device *dev; - struct ib_ah *ah; + struct ib_ah *ah; struct list_head list; - struct kref ref; - unsigned last_send; + struct kref ref; + unsigned last_send; }; struct ipoib_path { @@ -331,11 +335,11 @@ struct ipoib_path { struct list_head neigh_list; - int query_id; + int query_id; struct ib_sa_query *query; struct completion done; - struct rb_node rb_node; + struct rb_node rb_node; struct list_head list; }; @@ -344,7 +348,7 @@ struct ipoib_neigh { #ifdef CONFIG_INFINIBAND_IPOIB_CM struct ipoib_cm_tx *cm; #endif - union ib_gid dgid; + union ib_gid dgid; struct sk_buff_head queue; struct neighbour *neighbour; @@ -455,12 +459,14 @@ void ipoib_drain_cq(struct net_device *dev); #ifdef CONFIG_INFINIBAND_IPOIB_CM -#define IPOIB_FLAGS_RC 0x80 -#define IPOIB_FLAGS_UC 0x40 +#define IPOIB_FLAGS_RC 0x80 +#define IPOIB_FLAGS_UC 0x40 /* We don't support UC connections at the moment */ #define IPOIB_CM_SUPPORTED(ha) (ha[0] & (IPOIB_FLAGS_RC)) +extern int ipoib_max_conn_qp; + static inline int ipoib_cm_admin_enabled(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -491,6 +497,12 @@ static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *t neigh->cm = tx; } +static inline int ipoib_cm_has_srq(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + return !!priv->cm.srq; +} + void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx); int ipoib_cm_dev_open(struct net_device *dev); void ipoib_cm_dev_stop(struct net_device *dev); @@ -500,7 +512,7 @@ void ipoib_cm_dev_cleanup(struct net_device *dev); struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, struct ipoib_neigh *neigh); void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); -void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, +void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb, unsigned int mtu); void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc); @@ -508,6 +520,8 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc); struct ipoib_cm_tx; +#define ipoib_max_conn_qp 0 + static inline int ipoib_cm_admin_enabled(struct net_device *dev) { return 0; @@ -533,6 +547,11 @@ static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *t { } +static inline int ipoib_cm_has_srq(struct net_device *dev) +{ + return 0; +} + static inline void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx) { @@ -582,7 +601,7 @@ int ipoib_cm_add_mode_attr(struct net_device *dev) return 0; } -static inline void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, +static inline void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb, unsigned int mtu) { dev_kfree_skb_any(skb); @@ -624,12 +643,12 @@ extern struct ib_sa_client ipoib_sa_client; extern int ipoib_debug_level; #define ipoib_dbg(priv, format, arg...) \ - do { \ + do { \ if (ipoib_debug_level > 0) \ ipoib_printk(KERN_DEBUG, priv, format , ## arg); \ } while (0) #define ipoib_dbg_mcast(priv, format, arg...) \ - do { \ + do { \ if (mcast_debug_level > 0) \ ipoib_printk(KERN_DEBUG, priv, format , ## arg); \ } while (0) @@ -642,7 +661,7 @@ extern int ipoib_debug_level; #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #define ipoib_dbg_data(priv, format, arg...) \ - do { \ + do { \ if (data_debug_level > 0) \ ipoib_printk(KERN_DEBUG, priv, format , ## arg); \ } while (0) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 059cf92..90a9847 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -39,6 +39,13 @@ #include #include +int ipoib_max_conn_qp = 128; + +module_param_named(max_nonsrq_conn_qp, ipoib_max_conn_qp, int, 0444); +MODULE_PARM_DESC(max_nonsrq_conn_qp, + "Max number of connected-mode QPs per interface " + "(applied only if shared receive queue is not available)"); + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA static int data_debug_level; @@ -81,7 +88,7 @@ static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags, ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); } -static int ipoib_cm_post_receive(struct net_device *dev, int id) +static int ipoib_cm_post_receive_srq(struct net_device *dev, int id) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_recv_wr *bad_wr; @@ -104,7 +111,33 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) return ret; } -static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags, +static int ipoib_cm_post_receive_nonsrq(struct net_device *dev, + struct ipoib_cm_rx *rx, int id) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_recv_wr *bad_wr; + int i, ret; + + priv->cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV; + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].addr = rx->rx_ring[id].mapping[i]; + + ret = ib_post_recv(rx->qp, &priv->cm.rx_wr, &bad_wr); + if (unlikely(ret)) { + ipoib_warn(priv, "post recv failed for buf %d (%d)\n", id, ret); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + rx->rx_ring[id].mapping); + dev_kfree_skb_any(rx->rx_ring[id].skb); + rx->rx_ring[id].skb = NULL; + } + + return ret; +} + +static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, + struct ipoib_cm_rx_buf *rx_ring, + int id, int frags, u64 mapping[IPOIB_CM_RX_SG]) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -141,7 +174,7 @@ static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int goto partial_error; } - priv->cm.srq_ring[id].skb = skb; + rx_ring[id].skb = skb; return skb; partial_error: @@ -155,7 +188,23 @@ partial_error: return NULL; } -static void ipoib_cm_start_rx_drain(struct ipoib_dev_priv* priv) +static void ipoib_cm_free_rx_ring(struct net_device *dev, + struct ipoib_cm_rx_buf *rx_ring) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int i; + + for (i = 0; i < ipoib_recvq_size; ++i) + if (rx_ring[i].skb) { + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + rx_ring[i].mapping); + dev_kfree_skb_any(rx_ring[i].skb); + } + + kfree(rx_ring); +} + +static void ipoib_cm_start_rx_drain(struct ipoib_dev_priv *priv) { struct ib_send_wr *bad_wr; struct ipoib_cm_rx *p; @@ -208,12 +257,18 @@ static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev, .qp_type = IB_QPT_RC, .qp_context = p, }; + + if (!ipoib_cm_has_srq(dev)) { + attr.cap.max_recv_wr = ipoib_recvq_size; + attr.cap.max_recv_sge = IPOIB_CM_RX_SG; + } + return ib_create_qp(priv->pd, &attr); } static int ipoib_cm_modify_rx_qp(struct net_device *dev, - struct ib_cm_id *cm_id, struct ib_qp *qp, - unsigned psn) + struct ib_cm_id *cm_id, struct ib_qp *qp, + unsigned psn) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -266,6 +321,60 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev, return 0; } +static int ipoib_cm_nonsrq_init_rx(struct net_device *dev, struct ib_cm_id *cm_id, + struct ipoib_cm_rx *rx) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int ret; + int i; + + rx->rx_ring = kcalloc(ipoib_recvq_size, sizeof *rx->rx_ring, GFP_KERNEL); + if (!rx->rx_ring) + return -ENOMEM; + + spin_lock_irq(&priv->lock); + + if (priv->cm.nonsrq_conn_qp >= ipoib_max_conn_qp) { + spin_unlock_irq(&priv->lock); + ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0); + ret = -EINVAL; + goto err_free; + } else + ++priv->cm.nonsrq_conn_qp; + + spin_unlock_irq(&priv->lock); + + for (i = 0; i < ipoib_recvq_size; ++i) { + if (!ipoib_cm_alloc_rx_skb(dev, rx->rx_ring, i, IPOIB_CM_RX_SG - 1, + rx->rx_ring[i].mapping)) { + ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); + ret = -ENOMEM; + goto err_count; + } + ret = ipoib_cm_post_receive_nonsrq(dev, rx, i); + if (ret) { + ipoib_warn(priv, "ipoib_cm_post_receive_nonsrq " + "failed for buf %d\n", i); + ret = -EIO; + goto err_count; + } + } + + rx->recv_count = ipoib_recvq_size; + + return 0; + +err_count: + spin_lock_irq(&priv->lock); + --priv->cm.nonsrq_conn_qp; + spin_unlock_irq(&priv->lock); + +err_free: + ipoib_cm_free_rx_ring(dev, rx->rx_ring); + + return ret; +} + static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id, struct ib_qp *qp, struct ib_cm_req_event_param *req, unsigned psn) @@ -281,7 +390,7 @@ static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id, rep.private_data_len = sizeof data; rep.flow_control = 0; rep.rnr_retry_count = req->rnr_retry_count; - rep.srq = 1; + rep.srq = ipoib_cm_has_srq(dev); rep.qp_num = qp->qp_num; rep.starting_psn = psn; return ib_send_cm_rep(cm_id, &rep); @@ -317,6 +426,12 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even if (ret) goto err_modify; + if (!ipoib_cm_has_srq(dev)) { + ret = ipoib_cm_nonsrq_init_rx(dev, cm_id, p); + if (ret) + goto err_modify; + } + spin_lock_irq(&priv->lock); queue_delayed_work(ipoib_workqueue, &priv->cm.stale_task, IPOIB_CM_RX_DELAY); @@ -401,12 +516,14 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_rx_buf *rx_ring; unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV); struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; u64 mapping[IPOIB_CM_RX_SG]; int frags; + int has_srq; ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n", wr_id, wc->status); @@ -424,18 +541,32 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) return; } - skb = priv->cm.srq_ring[wr_id].skb; + p = wc->qp->qp_context; + + has_srq = ipoib_cm_has_srq(dev); + rx_ring = has_srq ? priv->cm.srq_ring : p->rx_ring; + + skb = rx_ring[wr_id].skb; if (unlikely(wc->status != IB_WC_SUCCESS)) { ipoib_dbg(priv, "cm recv error " "(status=%d, wrid=%d vend_err %x)\n", wc->status, wr_id, wc->vendor_err); ++dev->stats.rx_dropped; - goto repost; + if (has_srq) + goto repost; + else { + if (!--p->recv_count) { + spin_lock_irqsave(&priv->lock, flags); + list_move(&p->list, &priv->cm.rx_reap_list); + spin_unlock_irqrestore(&priv->lock, flags); + queue_work(ipoib_workqueue, &priv->cm.rx_reap_task); + } + return; + } } if (unlikely(!(wr_id & IPOIB_CM_RX_UPDATE_MASK))) { - p = wc->qp->qp_context; if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { spin_lock_irqsave(&priv->lock, flags); p->jiffies = jiffies; @@ -450,7 +581,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; - newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping); + newskb = ipoib_cm_alloc_rx_skb(dev, rx_ring, wr_id, frags, mapping); if (unlikely(!newskb)) { /* * If we can't allocate a new RX buffer, dump @@ -461,8 +592,8 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) goto repost; } - ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); - memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); + ipoib_cm_dma_unmap_rx(priv, frags, rx_ring[wr_id].mapping); + memcpy(rx_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); @@ -483,9 +614,17 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) netif_receive_skb(skb); repost: - if (unlikely(ipoib_cm_post_receive(dev, wr_id))) - ipoib_warn(priv, "ipoib_cm_post_receive failed " - "for buf %d\n", wr_id); + if (has_srq) { + if (unlikely(ipoib_cm_post_receive_srq(dev, wr_id))) + ipoib_warn(priv, "ipoib_cm_post_receive_srq failed " + "for buf %d\n", wr_id); + } else { + if (unlikely(ipoib_cm_post_receive_nonsrq(dev, p, wr_id))) { + --p->recv_count; + ipoib_warn(priv, "ipoib_cm_post_receive_nonsrq failed " + "for buf %d\n", wr_id); + } + } } static inline int post_send(struct ipoib_dev_priv *priv, @@ -495,10 +634,10 @@ static inline int post_send(struct ipoib_dev_priv *priv, { struct ib_send_wr *bad_wr; - priv->tx_sge.addr = addr; - priv->tx_sge.length = len; + priv->tx_sge.addr = addr; + priv->tx_sge.length = len; - priv->tx_wr.wr_id = wr_id | IPOIB_OP_CM; + priv->tx_wr.wr_id = wr_id | IPOIB_OP_CM; return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr); } @@ -540,7 +679,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ tx_req->mapping = addr; if (unlikely(post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1), - addr, skb->len))) { + addr, skb->len))) { ipoib_warn(priv, "post_send failed\n"); ++dev->stats.tx_errors; ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); @@ -657,10 +796,33 @@ err_cm: return ret; } +static void ipoib_cm_free_rx_reap_list(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_rx *rx, *n; + LIST_HEAD(list); + + spin_lock_irq(&priv->lock); + list_splice_init(&priv->cm.rx_reap_list, &list); + spin_unlock_irq(&priv->lock); + + list_for_each_entry_safe(rx, n, &list, list) { + ib_destroy_cm_id(rx->id); + ib_destroy_qp(rx->qp); + if (!ipoib_cm_has_srq(dev)) { + ipoib_cm_free_rx_ring(priv->dev, rx->rx_ring); + spin_lock_irq(&priv->lock); + --priv->cm.nonsrq_conn_qp; + spin_unlock_irq(&priv->lock); + } + kfree(rx); + } +} + void ipoib_cm_dev_stop(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_cm_rx *p, *n; + struct ipoib_cm_rx *p; unsigned long begin; LIST_HEAD(list); int ret; @@ -706,15 +868,9 @@ void ipoib_cm_dev_stop(struct net_device *dev) spin_lock_irq(&priv->lock); } - list_splice_init(&priv->cm.rx_reap_list, &list); - spin_unlock_irq(&priv->lock); - list_for_each_entry_safe(p, n, &list, list) { - ib_destroy_cm_id(p->id); - ib_destroy_qp(p->qp); - kfree(p); - } + ipoib_cm_free_rx_reap_list(dev); cancel_delayed_work(&priv->cm.stale_task); } @@ -799,7 +955,7 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_ .sq_sig_type = IB_SIGNAL_ALL_WR, .qp_type = IB_QPT_RC, .qp_context = tx - }; + }; return ib_create_qp(priv->pd, &attr); } @@ -816,28 +972,28 @@ static int ipoib_cm_send_req(struct net_device *dev, data.qpn = cpu_to_be32(priv->qp->qp_num); data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE); - req.primary_path = pathrec; - req.alternate_path = NULL; - req.service_id = cpu_to_be64(IPOIB_CM_IETF_ID | qpn); - req.qp_num = qp->qp_num; - req.qp_type = qp->qp_type; - req.private_data = &data; - req.private_data_len = sizeof data; - req.flow_control = 0; + req.primary_path = pathrec; + req.alternate_path = NULL; + req.service_id = cpu_to_be64(IPOIB_CM_IETF_ID | qpn); + req.qp_num = qp->qp_num; + req.qp_type = qp->qp_type; + req.private_data = &data; + req.private_data_len = sizeof data; + req.flow_control = 0; - req.starting_psn = 0; /* FIXME */ + req.starting_psn = 0; /* FIXME */ /* * Pick some arbitrary defaults here; we could make these * module parameters if anyone cared about setting them. */ - req.responder_resources = 4; - req.remote_cm_response_timeout = 20; - req.local_cm_response_timeout = 20; - req.retry_count = 0; /* RFC draft warns against retries */ - req.rnr_retry_count = 0; /* RFC draft warns against retries */ - req.max_cm_retries = 15; - req.srq = 1; + req.responder_resources = 4; + req.remote_cm_response_timeout = 20; + req.local_cm_response_timeout = 20; + req.retry_count = 0; /* RFC draft warns against retries */ + req.rnr_retry_count = 0; /* RFC draft warns against retries */ + req.max_cm_retries = 15; + req.srq = ipoib_cm_has_srq(dev); return ib_send_cm_req(id, &req); } @@ -1150,7 +1306,7 @@ static void ipoib_cm_skb_reap(struct work_struct *work) spin_unlock_irq(&priv->tx_lock); } -void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, +void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb, unsigned int mtu) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -1166,20 +1322,8 @@ void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, static void ipoib_cm_rx_reap(struct work_struct *work) { - struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, - cm.rx_reap_task); - struct ipoib_cm_rx *p, *n; - LIST_HEAD(list); - - spin_lock_irq(&priv->lock); - list_splice_init(&priv->cm.rx_reap_list, &list); - spin_unlock_irq(&priv->lock); - - list_for_each_entry_safe(p, n, &list, list) { - ib_destroy_cm_id(p->id); - ib_destroy_qp(p->qp); - kfree(p); - } + ipoib_cm_free_rx_reap_list(container_of(work, struct ipoib_dev_priv, + cm.rx_reap_task)->dev); } static void ipoib_cm_stale_task(struct work_struct *work) @@ -1212,7 +1356,7 @@ static void ipoib_cm_stale_task(struct work_struct *work) } -static ssize_t show_mode(struct device *d, struct device_attribute *attr, +static ssize_t show_mode(struct device *d, struct device_attribute *attr, char *buf) { struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(d)); @@ -1255,7 +1399,7 @@ int ipoib_cm_add_mode_attr(struct net_device *dev) return device_create_file(&dev->dev, &dev_attr_mode); } -int ipoib_cm_dev_init(struct net_device *dev) +static void ipoib_cm_create_srq(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_srq_init_attr srq_init_attr = { @@ -1264,7 +1408,30 @@ int ipoib_cm_dev_init(struct net_device *dev) .max_sge = IPOIB_CM_RX_SG } }; - int ret, i; + + priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); + if (IS_ERR(priv->cm.srq)) { + if (PTR_ERR(priv->cm.srq) != -ENOSYS) + printk(KERN_WARNING "%s: failed to allocate SRQ, error %ld\n", + priv->ca->name, PTR_ERR(priv->cm.srq)); + priv->cm.srq = NULL; + return; + } + + priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring, + GFP_KERNEL); + if (!priv->cm.srq_ring) { + printk(KERN_WARNING "%s: failed to allocate CM SRQ ring (%d entries)\n", + priv->ca->name, ipoib_recvq_size); + ib_destroy_srq(priv->cm.srq); + priv->cm.srq = NULL; + } +} + +int ipoib_cm_dev_init(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int i; INIT_LIST_HEAD(&priv->cm.passive_ids); INIT_LIST_HEAD(&priv->cm.reap_list); @@ -1281,22 +1448,6 @@ int ipoib_cm_dev_init(struct net_device *dev) skb_queue_head_init(&priv->cm.skb_queue); - priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); - if (IS_ERR(priv->cm.srq)) { - ret = PTR_ERR(priv->cm.srq); - priv->cm.srq = NULL; - return ret; - } - - priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring, - GFP_KERNEL); - if (!priv->cm.srq_ring) { - printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n", - priv->ca->name, ipoib_recvq_size); - ipoib_cm_dev_cleanup(dev); - return -ENOMEM; - } - for (i = 0; i < IPOIB_CM_RX_SG; ++i) priv->cm.rx_sge[i].lkey = priv->mr->lkey; @@ -1307,17 +1458,25 @@ int ipoib_cm_dev_init(struct net_device *dev) priv->cm.rx_wr.sg_list = priv->cm.rx_sge; priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; - for (i = 0; i < ipoib_recvq_size; ++i) { - if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, - priv->cm.srq_ring[i].mapping)) { - ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); - ipoib_cm_dev_cleanup(dev); - return -ENOMEM; - } - if (ipoib_cm_post_receive(dev, i)) { - ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i); - ipoib_cm_dev_cleanup(dev); - return -EIO; + ipoib_cm_create_srq(dev); + + if (ipoib_cm_has_srq(dev)) { + for (i = 0; i < ipoib_recvq_size; ++i) { + if (!ipoib_cm_alloc_rx_skb(dev, priv->cm.srq_ring, i, + IPOIB_CM_RX_SG - 1, + priv->cm.srq_ring[i].mapping)) { + ipoib_warn(priv, "failed to allocate " + "receive buffer %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + + if (ipoib_cm_post_receive_srq(dev, i)) { + ipoib_warn(priv, "ipoib_cm_post_receive_srq " + "failed for buf %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -EIO; + } } } @@ -1328,7 +1487,7 @@ int ipoib_cm_dev_init(struct net_device *dev) void ipoib_cm_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - int i, ret; + int ret; if (!priv->cm.srq) return; @@ -1342,13 +1501,7 @@ void ipoib_cm_dev_cleanup(struct net_device *dev) priv->cm.srq = NULL; if (!priv->cm.srq_ring) return; - for (i = 0; i < ipoib_recvq_size; ++i) - if (priv->cm.srq_ring[i].skb) { - ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, - priv->cm.srq_ring[i].mapping); - dev_kfree_skb_any(priv->cm.srq_ring[i].skb); - priv->cm.srq_ring[i].skb = NULL; - } - kfree(priv->cm.srq_ring); + + ipoib_cm_free_rx_ring(dev, priv->cm.srq_ring); priv->cm.srq_ring = NULL; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 5063dd5..52bc2bd 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -345,12 +345,12 @@ static inline int post_send(struct ipoib_dev_priv *priv, { struct ib_send_wr *bad_wr; - priv->tx_sge.addr = addr; - priv->tx_sge.length = len; + priv->tx_sge.addr = addr; + priv->tx_sge.length = len; - priv->tx_wr.wr_id = wr_id; + priv->tx_wr.wr_id = wr_id; priv->tx_wr.wr.ud.remote_qpn = qpn; - priv->tx_wr.wr.ud.ah = address; + priv->tx_wr.wr.ud.ah = address; return ib_post_send(priv->qp, &priv->tx_wr, &bad_wr); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index c9f6077..fe63723 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -474,8 +474,8 @@ static struct ipoib_path *path_rec_create(struct net_device *dev, void *gid) INIT_LIST_HEAD(&path->neigh_list); memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); - path->pathrec.sgid = priv->local_gid; - path->pathrec.pkey = cpu_to_be16(priv->pkey); + path->pathrec.sgid = priv->local_gid; + path->pathrec.pkey = cpu_to_be16(priv->pkey); path->pathrec.numb_path = 1; path->pathrec.traffic_class = priv->broadcast->mcmember.traffic_class; @@ -950,34 +950,34 @@ static void ipoib_setup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - dev->open = ipoib_open; - dev->stop = ipoib_stop; - dev->change_mtu = ipoib_change_mtu; - dev->hard_start_xmit = ipoib_start_xmit; - dev->tx_timeout = ipoib_timeout; - dev->header_ops = &ipoib_header_ops; - dev->set_multicast_list = ipoib_set_mcast_list; - dev->neigh_setup = ipoib_neigh_setup_dev; + dev->open = ipoib_open; + dev->stop = ipoib_stop; + dev->change_mtu = ipoib_change_mtu; + dev->hard_start_xmit = ipoib_start_xmit; + dev->tx_timeout = ipoib_timeout; + dev->header_ops = &ipoib_header_ops; + dev->set_multicast_list = ipoib_set_mcast_list; + dev->neigh_setup = ipoib_neigh_setup_dev; netif_napi_add(dev, &priv->napi, ipoib_poll, 100); - dev->watchdog_timeo = HZ; + dev->watchdog_timeo = HZ; - dev->flags |= IFF_BROADCAST | IFF_MULTICAST; + dev->flags |= IFF_BROADCAST | IFF_MULTICAST; /* * We add in INFINIBAND_ALEN to allow for the destination * address "pseudoheader" for skbs without neighbour struct. */ - dev->hard_header_len = IPOIB_ENCAP_LEN + INFINIBAND_ALEN; - dev->addr_len = INFINIBAND_ALEN; - dev->type = ARPHRD_INFINIBAND; - dev->tx_queue_len = ipoib_sendq_size * 2; - dev->features = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX; + dev->hard_header_len = IPOIB_ENCAP_LEN + INFINIBAND_ALEN; + dev->addr_len = INFINIBAND_ALEN; + dev->type = ARPHRD_INFINIBAND; + dev->tx_queue_len = ipoib_sendq_size * 2; + dev->features = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX; /* MTU will be reset when mcast join happens */ - dev->mtu = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN; - priv->mcast_mtu = priv->admin_mtu = dev->mtu; + dev->mtu = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN; + priv->mcast_mtu = priv->admin_mtu = dev->mtu; memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN); @@ -1269,6 +1269,8 @@ static int __init ipoib_init_module(void) ipoib_sendq_size = min(ipoib_sendq_size, IPOIB_MAX_QUEUE_SIZE); ipoib_sendq_size = max(ipoib_sendq_size, IPOIB_MIN_QUEUE_SIZE); + ipoib_max_conn_qp = min(ipoib_max_conn_qp, IPOIB_CM_MAX_CONN_QP); + ret = ipoib_register_debugfs(); if (ret) return ret; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 9bcfc7a..858ada1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -702,7 +702,7 @@ void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) out: if (mcast && mcast->ah) { - if (skb->dst && + if (skb->dst && skb->dst->neighbour && !*to_ipoib_neigh(skb->dst->neighbour)) { struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, @@ -710,7 +710,7 @@ out: if (neigh) { kref_get(&mcast->ah->ref); - neigh->ah = mcast->ah; + neigh->ah = mcast->ah; list_add_tail(&neigh->list, &mcast->neigh_list); } } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3c6e45d..433e99a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -172,8 +172,12 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) size = ipoib_sendq_size + ipoib_recvq_size + 1; ret = ipoib_cm_dev_init(dev); - if (!ret) - size += ipoib_recvq_size + 1 /* 1 extra for rx_drain_qp */; + if (!ret) { + if (ipoib_cm_has_srq(dev)) + size += ipoib_recvq_size + 1; /* 1 extra for rx_drain_qp */ + else + size += ipoib_recvq_size * ipoib_max_conn_qp; + } priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0); if (IS_ERR(priv->cq)) { @@ -197,12 +201,12 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) priv->dev->dev_addr[2] = (priv->qp->qp_num >> 8) & 0xff; priv->dev->dev_addr[3] = (priv->qp->qp_num ) & 0xff; - priv->tx_sge.lkey = priv->mr->lkey; + priv->tx_sge.lkey = priv->mr->lkey; - priv->tx_wr.opcode = IB_WR_SEND; - priv->tx_wr.sg_list = &priv->tx_sge; - priv->tx_wr.num_sge = 1; - priv->tx_wr.send_flags = IB_SEND_SIGNALED; + priv->tx_wr.opcode = IB_WR_SEND; + priv->tx_wr.sg_list = &priv->tx_sge; + priv->tx_wr.num_sge = 1; + priv->tx_wr.send_flags = IB_SEND_SIGNALED; return 0; diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index a6f2303..ba1b455 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -561,7 +561,7 @@ void iser_rcv_completion(struct iser_desc *rx_desc, if (opcode == ISCSI_OP_SCSI_CMD_RSP) { itt = get_itt(hdr->itt); /* mask out cid and age bits */ if (!(itt < session->cmds_max)) - iser_err("itt can't be matched to task!!!" + iser_err("itt can't be matched to task!!! " "conn %p opcode %d cmds_max %d itt %d\n", conn->iscsi_conn,opcode,session->cmds_max,itt); /* use the mapping given with the cmds array indexed by itt */