GIT ba9cfd16a13a932f0603a7f65b3881738a698ae1 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git#for-mm commit ead86ba3651371292f7ff597178e3b65e620fd3b Author: Or Gerlitz Date: Thu May 11 10:03:30 2006 +0300 IB/iser: iSER Kconfig and Makefile Kconfig and Makefile for iSER. Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier commit 4cc94db517c0d1adb50c86653d9d5db78d5a0153 Author: Or Gerlitz Date: Thu May 11 10:03:08 2006 +0300 IB/iser: iSER handling of memory for RDMA This file contains the processing carried over an SG list associated with a SCSI command such that it can be registered with the IB verbs. The registration produces a network virtual address (VA) and a remote access key (RKEY or STAG in iSER spec notation) which are used by the target for its RDMA operation. Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier commit 267deda6c22314dad53527902aaac61c4626facd Author: Or Gerlitz Date: Thu May 11 10:02:46 2006 +0300 IB/iser: iSER RDMA CM (CMA) and IB verbs interaction This file contains the low level interaction with the RDMA CM and the IB verbs, where iSER is consumer of both. Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier commit 1171ecd02956da9d69eb7d1f74b6b3e542da5137 Author: Or Gerlitz Date: Thu May 11 10:02:19 2006 +0300 IB/iser: iSER initiator iSCSI PDU and TX/RX This file contains the iSER initiator processing of iSCSI PDUs - controls, commands and data-outs along with processing of TX and RX completions. It interacts with the lower level iser code doing the memory registration and and the cma and verbs calls. Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier commit 6586f481301c843cf8379cd36192fe20e85c3da0 Author: Or Gerlitz Date: Thu May 11 10:00:44 2006 +0300 IB/iser: iSCSI iSER transport provider high level code This file contains the code that registeres with the iscsi transport manager and with the SCSI Mid Layer, where much of the provided functions to iSCSI and SCSI are implemented in libiscsi. Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier commit 347b9a7ec8eed8b1f85e239b9dbacb2aa24b7c0e Author: Or Gerlitz Date: Thu May 11 10:00:21 2006 +0300 IB/iser: iSCSI iSER transport provider header file iSER (iSCSI Extensions for RDMA) transport provider driver for the iSCSI initiator, whose other parts (under drivers/scsi) are scsi_transport_iscsi - the transport management module, iscsi_tcp - the TCP transport provider module and libiscsi - a kernel library (module) implementing functionality needed by both TCP and iSER transports. iSER is both a provider of the iSCSI transport api and a SCSI low level driver. This file contains internal data structures and non static service functions. Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier commit 53c12523ddef8375c6b8539abb0c48b16df88893 Author: Sean Hefty Date: Mon Mar 6 15:41:14 2006 -0800 IB: userspace support for RDMA connection manager Kernel component necessary to support the userspace RDMA connection management library. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 4afa075a3f131a3e937b1b9e7afb2a56c6cb80c5 Author: Sean Hefty Date: Mon Mar 6 15:40:58 2006 -0800 IB: IP address based RDMA connection manager Kernel connection management agent over InfiniBand that connects based on IP addresses. The agent defines a generic RDMA connection abstraction to support clients wanting to connect over different RDMA devices. The agent also handles RDMA device hotplug events on behalf of clients. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit c1e865cfb598e705f40a1604dbb059c6201f7a24 Author: Sean Hefty Date: Mon Mar 6 15:31:58 2006 -0800 IB: address translation to map IP toIB addresses (GIDs) Add an address translation service that maps IP addresses to Infiniband GID addresses using IPoIB. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit aa2107247d4357e1cd6392cb8ffadcc1e90a69bb Author: Sean Hefty Date: Mon Mar 6 11:07:44 2006 -0800 [NET]: Export ip_dev_find() Export ip_dev_find() to allow locating a net_device given an IP address. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 683b2e3b52059ddbaea4f203c1b6dfd1fa1cc12b Author: Sean Hefty Date: Mon Mar 6 15:29:35 2006 -0800 IB/cm: Match connection requests based on private data Extend matching connection requests to listens in the Infiniband CM to include private data checks. This allows applications to listen on the same service identifier, with private data directing the request to the appropriate application. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 359ff440a7722fcc2aca8d356c81508acd19af38 Author: Sean Hefty Date: Mon Mar 6 10:59:07 2006 -0800 IB: common handling for marshalling parameters to/from userspace Provide common handling for marshalling data between userspace clients and kernel mode Infiniband drivers. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit f5fcfd5f1c44f06dfb8bdcc0379b208e43f9bce0 Author: Vu Pham Date: Fri May 19 22:03:20 2006 -0700 IB/srp: Allow sg_tablesize to be adjusted Make the sg_tablesize used by SRP adjustable at module load time via a module parameter. Calculate the corresponding IU length required to support this. Signed-off-by: Vu Pham Signed-off-by: Roland Dreier commit 3352de1a2adde735f1732019193f2087ebf5cbff Author: Vu Pham Date: Fri May 19 22:03:20 2006 -0700 IB/srp: Allow cmd_per_lun to be set per target port Allow userspace to throttle traffic on a given connection to a target port by adding "max_cmd_per_lun=xyz" to lower the cmd_per_lun value set for that scsi_host. Signed-off-by: Vu Pham Signed-off-by: Roland Dreier commit 51068c75996f71bf39564728c670dcd8a0d26006 Author: Ishai Rabinovitz Date: Fri May 19 22:03:20 2006 -0700 IB/srp: Clean up loop in srp_remove_one() Interrupts will always be enabled in srp_remove_one(), so spin_lock_irq() can be used instead of spin_lock_irqsave(). Also, the loop takes target->scsi_host->host_lock, so target->state can just be set to SRP_TARGET_REMOVED witout testing the old value. Signed-off-by: Ishai Rabinovitz Signed-off-by: Roland Dreier commit 0355015ae1dd198e43562ea0ae9c91d43733b8b4 Author: Roland Dreier Date: Fri May 19 22:03:19 2006 -0700 IB: Make needlessly global ib_mad_cache static Signed-off-by: Roland Dreier commit 41e1914dc475b9c098e8f37574540a325892b2ae Author: Matthew Wilcox Date: Fri May 19 22:03:19 2006 -0700 IB/srp: Change target_mutex to a spinlock The SRP driver never sleeps while holding target_mutex, and it's just used to protect some simple list operations, so hold times will be short. So just convert it to a spinlock, which is smaller and faster. Signed-off-by: Matthew Wilcox Signed-off-by: Roland Dreier commit 067b99616f777e2563c7682b7bd9396ed9bc969d Author: Matthew Wilcox Date: Fri May 19 22:03:19 2006 -0700 IB/srp: Get rid of unneeded use of list_for_each_entry_safe() list_for_each_entry_safe() is used in one place where the list isn't modified. So just change it to list_for_each_entry(). Signed-off-by: Matthew Wilcox Signed-off-by: Roland Dreier commit 8fc1311bd115e3e2485b03a348681228a4ce6a18 Author: Matthew Wilcox Date: Fri May 19 22:03:18 2006 -0700 IB/srp: Use SCAN_WILD_CARD from SCSI headers SCAN_WILD_CARD is indeed available from , which is already included. So get rid of private hack. Signed-off-by: Matthew Wilcox Signed-off-by: Roland Dreier commit 48f705a6b127f7b96dbadb1644691d9fd97ebfe4 Author: Roland Dreier Date: Fri May 19 22:03:18 2006 -0700 IB/mthca: Convert FW commands to use wait_for_completion_timeout() The kernel has had wait_for_completion_timeout() for a long time now. mthca should use it to handle FW commands timing out, instead of implementing the same thing in a much more complicated way by using wait_for_completion() along with a timer that does complete(). Signed-off-by: Roland Dreier commit 62ab66f220352c0c6888026e8f81b289d82e8a70 Author: Roland Dreier Date: Fri May 19 22:03:18 2006 -0700 IB/srp: Use FMRs to map gather/scatter lists Create an SRP FMR pool on HCAs that support FMRs, and use FMRs to map gather/scatter lists that have more than one entry into a single memory region that appears virtually contiguous to the SRP target (which is the RDMA initiator). This patch bails out on FMR mapping for SCSI commands where the gather/scatter list cannot be mapped into a single FMR because there are sub-page-sized entries in middle of the list. An unaligned start or end of the list is OK. Based on a patch by Vu Pham . Signed-off-by: Roland Dreier commit ed11b95bd29d76d9d82b17143860887321c93d3f Author: Michael S. Tsirkin Date: Fri May 19 22:03:18 2006 -0700 IB/mthca: Remove dead code Kill some dead code in mthca_eq.c Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier commit d2c007e4149d1a481eb7aeee6b3c67bc369eb4ec Author: Sean Hefty Date: Fri May 19 22:03:18 2006 -0700 IB: IP address based RDMA connection manager Kernel connection management agent over InfiniBand that connects based on IP addresses. The agent defines a generic RDMA connection abstraction to support clients wanting to connect over different RDMA devices. The agent also handles RDMA device hotplug events on behalf of clients. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit d2eeb596c40ffe41b087f248fc4f625c5b6e96cb Author: Sean Hefty Date: Fri May 19 22:03:17 2006 -0700 IB: address translation to map IP toIB addresses (GIDs) Add an address translation service that maps IP addresses to InfiniBand GID addresses using IPoIB. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 52e4cdeb6d9ff7a0b224302a8315fbd67d3ac6d0 Author: Sean Hefty Date: Fri May 19 22:03:17 2006 -0700 [NET]: Export ip_dev_find() Export ip_dev_find() to allow locating a net_device given an IP address. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 6be9db03da638480cf048c9c96b9240d50503111 Author: Sean Hefty Date: Fri May 19 22:03:17 2006 -0700 IB/cm: Match connection requests based on private data Extend matching connection requests to listens in the InfiniBand CM to include private data checks. This allows applications to listen on the same service identifier, with private data directing the request to the appropriate application. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 2e554c1c824d3e46968c98096611166e2b56b3e4 Author: Sean Hefty Date: Fri May 19 22:03:16 2006 -0700 IB: common handling for marshalling parameters to/from userspace Provide common handling for marshalling data between userspace clients and kernel InfiniBand drivers. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 2fc056d457be9f5a2af3b2c9223d048e63bb2410 Author: Vu Pham Date: Thu May 18 12:37:54 2006 -0700 IB/srp: Allow sg_tablesize to be adjusted Make the sg_tablesize used by SRP adjustable at module load time via a module parameter. Calculate the corresponding IU length required to support this. Signed-off-by: Vu Pham Signed-off-by: Roland Dreier commit 8c7ae9e602d18b0dd6a1d5dae7c4a73ff7e6a450 Author: Vu Pham Date: Thu May 18 12:37:54 2006 -0700 IB/srp: Allow cmd_per_lun to be set per target port Allow userspace to throttle traffic on a given connection to a target port by adding "max_cmd_per_lun=xyz" to lower the cmd_per_lun value set for that scsi_host. Signed-off-by: Vu Pham Signed-off-by: Roland Dreier commit 8acdc9b90c5c55dfe5941ef3cf3757ebc8f68455 Author: Ishai Rabinovitz Date: Thu May 18 12:37:53 2006 -0700 IB/srp: Clean up loop in srp_remove_one() Interrupts will always be enabled in srp_remove_one(), so spin_lock_irq() can be used instead of spin_lock_irqsave(). Also, the loop takes target->scsi_host->host_lock, so target->state can just be set to SRP_TARGET_REMOVED witout testing the old value. Signed-off-by: Ishai Rabinovitz Signed-off-by: Roland Dreier commit e86004eb97859b6e5d66e4e9d44282d26bd06a57 Author: Roland Dreier Date: Thu May 18 12:37:53 2006 -0700 IB: Make needlessly global ib_mad_cache static Signed-off-by: Roland Dreier commit 53cc4845e6003180ff7f2e4036beef7d5dddbddf Author: Matthew Wilcox Date: Thu May 18 12:37:53 2006 -0700 IB/srp: Change target_mutex to a spinlock The SRP driver never sleeps while holding target_mutex, and it's just used to protect some simple list operations, so hold times will be short. So just convert it to a spinlock, which is smaller and faster. Signed-off-by: Matthew Wilcox Signed-off-by: Roland Dreier commit a73b861548549f5853cb9657a983fc4477ac590c Author: Matthew Wilcox Date: Thu May 18 12:37:52 2006 -0700 IB/srp: Get rid of unneeded use of list_for_each_entry_safe() list_for_each_entry_safe() is used in one place where the list isn't modified. So just change it to list_for_each_entry(). Signed-off-by: Matthew Wilcox Signed-off-by: Roland Dreier commit 65a97cbaf7882d224b9ec3b73d05c029b4a94593 Author: Matthew Wilcox Date: Thu May 18 12:37:52 2006 -0700 IB/srp: Use SCAN_WILD_CARD from SCSI headers SCAN_WILD_CARD is indeed available from , which is already included. So get rid of private hack. Signed-off-by: Matthew Wilcox Signed-off-by: Roland Dreier commit 8011de0467db3bcb52d1ed4bfd34802f8301dfb0 Author: Roland Dreier Date: Thu May 18 12:37:51 2006 -0700 IB/mthca: Convert FW commands to use wait_for_completion_timeout() The kernel has had wait_for_completion_timeout() for a long time now. mthca should use it to handle FW commands timing out, instead of implementing the same thing in a much more complicated way by using wait_for_completion() along with a timer that does complete(). Signed-off-by: Roland Dreier commit 08c6b0be8a0a24a0ad0d0dbe630c9c02740b6025 Author: Roland Dreier Date: Thu May 18 12:37:51 2006 -0700 IB/srp: Use FMRs to map gather/scatter lists Create an SRP FMR pool on HCAs that support FMRs, and use FMRs to map gather/scatter lists that have more than one entry into a single memory region that appears virtually contiguous to the SRP target (which is the RDMA initiator). This patch bails out on FMR mapping for SCSI commands where the gather/scatter list cannot be mapped into a single FMR because there are sub-page-sized entries in middle of the list. An unaligned start or end of the list is OK. Based on a patch by Vu Pham . Signed-off-by: Roland Dreier commit ea79bea9307dfd0a735f500c8f3813878df86657 Author: Michael S. Tsirkin Date: Thu May 18 12:37:51 2006 -0700 IB/mthca: Remove dead code Kill some dead code in mthca_eq.c Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier commit 7583f60454ca67e5f9cc23814be9b0764c4b030f Author: Sean Hefty Date: Thu May 18 12:37:49 2006 -0700 IB: IP address based RDMA connection manager Kernel connection management agent over InfiniBand that connects based on IP addresses. The agent defines a generic RDMA connection abstraction to support clients wanting to connect over different RDMA devices. The agent also handles RDMA device hotplug events on behalf of clients. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit f4d2e42b6cead55a2d6b8a70d05f6383bd8ff3c7 Author: Sean Hefty Date: Fri May 12 23:29:58 2006 -0700 IB: address translation to map IP toIB addresses (GIDs) Add an address translation service that maps IP addresses to InfiniBand GID addresses using IPoIB. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 7108f2396009b1871347ba3e9c0e2a0d5ac23cf1 Author: Sean Hefty Date: Fri May 12 23:29:57 2006 -0700 [NET]: Export ip_dev_find() Export ip_dev_find() to allow locating a net_device given an IP address. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 0c97e5f885ccc9a584d6bdcf04f029d100663ebe Author: Sean Hefty Date: Fri May 12 23:29:57 2006 -0700 IB/cm: Match connection requests based on private data Extend matching connection requests to listens in the InfiniBand CM to include private data checks. This allows applications to listen on the same service identifier, with private data directing the request to the appropriate application. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit de97bbca06c722b95cedc72cce3bcadffecfe0bf Author: Sean Hefty Date: Fri May 12 23:29:56 2006 -0700 IB: common handling for marshalling parameters to/from userspace Provide common handling for marshalling data between userspace clients and kernel InfiniBand drivers. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 2ca48a132167f9f12efba179382979aafde0ab36 Author: James Bottomley Date: Thu Apr 27 14:07:49 2006 -0500 [SCSI] fix proc_scsi_write to return "length" on success with remove-single-device case Problem spotted by: Suzuki K P A zero return on success isn't correct for filesystem write functions. They should either return negative error or the length of bytes consumed. Add code to convert our zero on success error return to return the length of bytes passed in. This fixes the following: $ echo "scsi remove-single-device 0 0 3 0" > /proc/scsi/scsi bash: echo: write error: No such device or address" Signed-off-by: James Bottomley commit 665b44aee34e9f2c64558df4ec01d40576e45651 Author: Mike Christie Date: Tue May 2 19:46:49 2006 -0500 [SCSI] iscsi: dequeue all buffers from queue debugged by wrwhitehead@novell.com patch and analysis by fujita.tomonori@lab.ntt.co.jp Only tcp_read_sock and recv_actor (iscsi_tcp_data_recv for us) see desc.count. It is is used just for permitting tcp_read_sock to read the portion of data in the socket. When iscsi_tcp_data_recv sees a partial header, it sets desc.count. However, it is possible that the next skb (containing the rest of the header) still does not come. So I'm not sure that this scheme is completely correct. Ideally, we should use the exact length of the data in the socket for desc.count. However, it is not so simple (see SIOCINQ in tcp_ioctl). So I think that iscsi_tcp_data_recv can just stop playing with desc.count and tell tcp_read_sock to read the all skbs. As proposed already, if iscsi_tcp_data_ready sets desc.count to non-zero, tcp_read_sock does that. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit 8d2860b3c3e933304f49171770658c00ed26fd79 Author: Mike Christie Date: Tue May 2 19:46:47 2006 -0500 [SCSI] iscsi: increment expstatsn during login debugged by Ming and Rohan: The problem Ming and Rohan debugged was that during a normal session login, open-iscsi is not incrementing the exp_statsn counter. It was stuck at zero. From the RFC, it looks like if the login response PDU has a successful status then we should be incrementing that value. Also from the RFC, it looks like if when we drop a connection then reconnect, we should be using the exp_statsn from the old connection in the next relogin attempt. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit be2df72e7ec5fa5e6e1ccccab6cef97ecbb9c191 Author: Or Gerlitz Date: Tue May 2 19:46:43 2006 -0500 [SCSI] iscsi: align printks align printk output Signed-off-by: Or Gerlitz Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit ed2abc7ff19dc99c6242a70f8578a17b2ff0d0ce Author: Mike Christie Date: Tue May 2 19:46:40 2006 -0500 [SCSI] iscsi: fix manamgement task oops from patmans@us.ibm.com and michaelc@cs.wisc.edu Fix bugs when forcing a mgmt task to fail and allow session recovery to cleanup the session/connection of any running mgmt tasks. When called during the in login state. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit 264faaaa12544e7914928ad57ccba21907cad56b Author: Or Gerlitz Date: Tue May 2 19:46:36 2006 -0500 [SCSI] iscsi: add transport end point callbacks add transport end point callbacks so iscsi drivers that cannot connect from userspace, like iscsi tcp, using sockets do not have to implement their own socket infrastructure. Signed-off-by: Or Gerlitz Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit 169e1a2a8a789fa84254695ec6a56fc410bb19a9 Author: Andrew Morton Date: Tue Apr 18 21:09:08 2006 -0700 [SCSI] scsi_lib.c: fix warning in scsi_kmap_atomic_sg drivers/scsi/scsi_lib.c: In function `scsi_kmap_atomic_sg': drivers/scsi/scsi_lib.c:2394: warning: unsigned int format, different type arg (arg 3) drivers/scsi/scsi_lib.c:2394: warning: unsigned int format, different type arg (arg 4) Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit c5f2e6404c65e8380c9ba80a7d58a27d2642743b Author: akpm@osdl.org Date: Sat Apr 15 00:30:24 2006 -0700 [SCSI] scsi_scan.c: fix compile warnings drivers/scsi/scsi_scan.c: In function `scsi_probe_and_add_lun': drivers/scsi/scsi_scan.c:926: warning: unused variable `vend' drivers/scsi/scsi_scan.c:926: warning: unused variable `mod' drivers/scsi/scsi_scan.c: At top level: drivers/scsi/scsi_scan.c:829: warning: `scsi_inq_str' defined but not used Fix those, tighten up the (somewhat poorly-designed) logging macro and fix some coding-style warts. Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit cdb8c2a6d848deb9eeefffff42974478fbb51b8c Author: Guennadi Liakhovetski Date: Sun Apr 2 21:57:43 2006 +0200 [SCSI] dc395x: dynamically map scatter-gather for PIO The current dc395x driver uses PIO to transfer up to 4 bytes which do not get transferred by DMA (under unclear circumstances). For this the driver uses page_address() which is broken on highmem. Apart from this the actual calculation of the virtual address is wrong (even without highmem). So, e.g., for reading it reads bytes from the driver to a wrong address and returns wrong data, I guess, for writing it would just output random data to the device. The proper fix, as suggested by many, is to dynamically map data using kmap_atomic(page, KM_BIO_SRC_IRQ) / kunmap_atomic(virt). The reason why it has not been done until now, although I've done some preliminary patches more than a year ago was that nobody interested in fixing this problem was able to reliably reproduce it. Now it changed - with the help from Sebastian Frei (CC'ed) I was able to trigger the PIO path. Thus, I was also able to test and debug it. There are 4 cases when PIO is used in dc395x - data-in / -out with and without scatter-gather. I was able to reproduce and test only data-in with and without SG. So, the data-out path is still untested, but it is also somewhat simpler than the data-in. Fredrik Roubert (also CC'ed) also had PIO triggering on his system, and in his case it was data-out without SG. It would be great if he could test the attached patch on his system, but even if he cannot, I would still request to apply the patch and just wait if anybody cries... Implementation: I put 2 new functions in scsi_lib.c and their declarations in scsi_cmnd.h. I exported them without _GPL, although, I don't feel strongly about that - not many drivers are likely to use them. But there is at least one more - I want to use them in tmscsim.c. Whether these are the right files for the functions and their declarations - not sure either. Actually, they are not scsi-specific, so, might go somewhere around other scattergather magic? They are not platform specific either, and most SG functions are defined under arch/*/... As these issues were discussed previously there were some more routines suggested to manipulate scattergather buffers, I think, some of them were needed around crypto code... So, might be a common place reasonable, like lib/scattergather.c? I am open here. Signed-off-by: James Bottomley commit 4c021dd136c0ad524e6d117296beafad2bf570c0 Author: FUJITA Tomonori Date: Fri Apr 7 19:10:03 2006 +0900 [SCSI] ibmvscsi: convert kmalloc + memset to kcalloc Convert kmalloc + memset to kcalloc in ibmvscsi Signed-off-by: FUJITA Tomonori Acked-by: Dave Boutcher Signed-off-by: James Bottomley commit 5bb0b55a3283369f1cd8ac76a6d8bda8e7a77055 Author: Mike Christie Date: Thu Apr 6 21:26:46 2006 -0500 [SCSI] iscsi: convert iscsi tcp to libiscsi This just converts iscsi_tcp to the lib Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit 7996a778ff8c717cb1a7a294475c59cc8f1e9fb8 Author: Mike Christie Date: Thu Apr 6 21:13:41 2006 -0500 [SCSI] iscsi: add libiscsi There is a lot of code duplcited between iscsi_tcp and the upcoming iscsi_iser driver. This patch puts the duplicated code in a lib. There is more code to move around but this takes care of the basics. For iscsi_offload if they use the lib we will probably move some things around. For example in the queuecommand we will not assume that the LLD wants to do queue_work, but it is better to handle that later when we know for sure what iscsi_offload looks like (we could probably do this for iscsi_iser though to). Ideally I would like to get the iscsi_transports modules to a place where all they really have to do is put data on the wire, but how to do that will hopefully be more clear when we see other modules like iscsi_offload. Or maybe iscsi_offload will not use the lib and it will just be iscsi_iser and iscsi_tcp and maybe the iscsi_tcp_tgt if that is allowed in mainline. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit 30a6c65236f9d26e3325cae468f330b833a3878c Author: Mike Christie Date: Thu Apr 6 21:13:39 2006 -0500 [SCSI] iscsi: fix up iscsi eh The current iscsi_tcp eh is not nicely setup for dm-multipath and performs some extra task management functions when they are not needed. The attached patch: - Fixes the TMF issues. If a session is rebuilt then we do not send aborts. - Fixes the problem where if the host reset fired, we would return SUCCESS even though we had not really done anything yet. This ends up causing problem with scsi_error.c's TUR. - If someone has turned on the userspace nop daemon code to try and detect network problems before the scsi command timeout we can now drop and clean up the session before the scsi command timesout and fires the eh speeding up the time it takes for a command to go from one patch to another. For network problems we fail the command with DID_BUS_BUSY so if failfast is set scsi_decide_disposition fails the command up to dm for it to try on another path. - And we had to add some basic iscsi session block code. Previously if we were trying to repair a session we would retrun a MLQUEUE code in the queuecommand. This worked but it was not the most efficient or pretty thing to do since it would take a while to relogin to the target. For iscsi_tcp/open-iscsi a lot of the iscsi error handler is in userspace the block code is pretty bare. We will be adding to that for qla4xxx. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit fd7255f51a13ea915099c7e488001dfbbeb05104 Author: Mike Christie Date: Thu Apr 6 21:13:36 2006 -0500 [SCSI] iscsi: add sysfs attrs for uspace sync up For iscsi boot when going from initramfs to the real root we need to stop the userpsace iscsi daemon. To later restart it iscsid needs to be able to rebuild itself and part of that process is matching a session running the kernel with the iscsid representation. To do this the attached patch adds several required iscsi values. If the LLD does not provide them becuase, login is done in userspace, then the transport class and userspace set ths up for the LLD. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit b5c7a12dc29ae0990d9e867749bdd717a3160325 Author: Mike Christie Date: Thu Apr 6 21:13:33 2006 -0500 [SCSI] iscsi: rm kernel iscsi handles usage for session and connection from hare@suse.de and michaelc@cs.wisc.edu hw iscsi like qla4xxx does not allocate a host per session and for userspace it is difficult to restart iscsid using the "iscsi handles" for the session and connection, so this patch just has the class or userspace allocate the id for the session and connection. Note: this breaks userspace and requires users to upgrade to the newest open-iscsi tools. Sorry about his but open-iscsi is still too new to say we have a stable user-kernel api and we were not good nough designers to know that other hw iscsi drivers and iscsid itself would need such changes. Actually we sorta did but at the time we did not have the HW available to us so we could only guess. Luckily, the only tools hooking into the class are the open-iscsi ones or other tools like iscsitart hook into the open-iscsi engine from userspace or prgroams like anaconda call our tools so they are not affected. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit 13f7e5acc8b329080672c13f05f252ace5b79825 Author: Kurt Garloff Date: Mon Apr 3 15:20:08 2006 +0200 [SCSI] BLIST_ATTACH_PQ3 flags Some devices report a peripheral qualifier of 3 for LUN 0; with the original code, we would still try a REPORT_LUNS scan (if SCSI level is >= 3 or if we have the BLIST_REPORTLUNS2 passed in), but NOT any sequential scan. Also, the device at LUN 0 (which is not connected according to the PQ) is not registered with the OS. Unfortunately, SANs exist that are SCSI-2 and do NOT support REPORT_LUNS, but report a unknown device with PQ 3 on LUN 0. We still need to scan them, and most probably we even need BLIST_SPARSELUN (and BLIST_LARGELUN). See the bug reference for an infamous example. This is patch 3/3: 3. Implement the blacklist flag BLIST_ATTACH_PQ3 that makes the scsi scanning code register PQ3 devices and continues scanning; only sg will attach thanks to scsi_bus_match(). Signed-off-by: Kurt Garloff Signed-off-by: James Bottomley commit 6c7154c97e20c0ea28547240dc86731c0cee1b2f Author: Kurt Garloff Date: Mon Apr 3 15:18:35 2006 +0200 [SCSI] Better log messages for PQ3 devs Some devices report a peripheral qualifier of 3 for LUN 0; with the original code, we would still try a REPORT_LUNS scan (if SCSI level is >= 3 or if we have the BLIST_REPORTLUNS2 passed in), but NOT any sequential scan. Also, the device at LUN 0 (which is not connected according to the PQ) is not registered with the OS. Unfortunately, SANs exist that are SCSI-2 and do NOT support REPORT_LUNS, but report a unknown device with PQ 3 on LUN 0. We still need to scan them, and most probably we even need BLIST_SPARSELUN (and BLIST_LARGELUN). See the bug reference for an infamous example. This patch 2/3: If a PQ3 device is found, log a message that describes the device (INQUIRY DATA and C:B:T:U tuple) and make a suggestion for blacklisting it. Signed-off-by: Kurt Garloff Signed-off-by: James Bottomley commit 4186ab1973758190916703eb8889ebe8002c5c8f Author: Kurt Garloff Date: Mon Apr 3 15:16:48 2006 +0200 [SCSI] Try LUN 1 and use bflags Some devices report a peripheral qualifier of 3 for LUN 0; with the original code, we would still try a REPORT_LUNS scan (if SCSI level is >= 3 or if we have the BLIST_REPORTLUNS2 passed in), but NOT any sequential scan. Also, the device at LUN 0 (which is not connected according to the PQ) is not registered with the OS. Unfortunately, SANs exist that are SCSI-2 and do NOT support REPORT_LUNS, but report a unknown device with PQ 3 on LUN 0. We still need to scan them, and most probably we even need BLIST_SPARSELUN (and BLIST_LARGELUN). See the bug reference for an infamous example. This is patch 1/3: If we end up in sequential scan, at least try LUN 1 for devices that reported a PQ of 3 for LUN 0. Also return blacklist flags, even for PQ3 devices. Signed-off-by: Kurt Garloff Signed-off-by: James Bottomley --- diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index afc612b..69a53d4 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -29,6 +29,11 @@ config INFINIBAND_USER_ACCESS libibverbs, libibcm and a hardware driver library from . +config INFINIBAND_ADDR_TRANS + bool + depends on INFINIBAND && INET + default y + source "drivers/infiniband/hw/mthca/Kconfig" source "drivers/infiniband/hw/ipath/Kconfig" @@ -36,4 +41,6 @@ source "drivers/infiniband/ulp/ipoib/Kco source "drivers/infiniband/ulp/srp/Kconfig" +source "drivers/infiniband/ulp/iser/Kconfig" + endmenu diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index eea2732..c7ff58c 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mt obj-$(CONFIG_IPATH_CORE) += hw/ipath/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ +obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index ec3353f..21be457 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,7 +1,10 @@ +infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o +user_access-$(CONFIG_INFINIBAND_ADDR_TRANS) := rdma_ucm.o + obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ - ib_cm.o + ib_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o -obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o +obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o @@ -12,8 +15,15 @@ ib_sa-y := sa_query.o ib_cm-y := cm.o +rdma_cm-y := cma.o + +rdma_ucm-y := ucma.o + +ib_addr-y := addr.o + ib_umad-y := user_mad.o ib_ucm-y := ucm.o -ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_mem.o +ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_mem.o \ + uverbs_marshall.o diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c new file mode 100644 index 0000000..d294bbc --- /dev/null +++ b/drivers/infiniband/core/addr.c @@ -0,0 +1,367 @@ +/* + * Copyright (c) 2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. + * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("IB Address Translation"); +MODULE_LICENSE("Dual BSD/GPL"); + +struct addr_req { + struct list_head list; + struct sockaddr src_addr; + struct sockaddr dst_addr; + struct rdma_dev_addr *addr; + void *context; + void (*callback)(int status, struct sockaddr *src_addr, + struct rdma_dev_addr *addr, void *context); + unsigned long timeout; + int status; +}; + +static void process_req(void *data); + +static DEFINE_MUTEX(lock); +static LIST_HEAD(req_list); +static DECLARE_WORK(work, process_req, NULL); +static struct workqueue_struct *addr_wq; + +static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, + unsigned char *dst_dev_addr) +{ + switch (dev->type) { + case ARPHRD_INFINIBAND: + dev_addr->dev_type = IB_NODE_CA; + break; + default: + return -EADDRNOTAVAIL; + } + + memcpy(dev_addr->src_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + memcpy(dev_addr->broadcast, dev->broadcast, MAX_ADDR_LEN); + if (dst_dev_addr) + memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN); + return 0; +} + +int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) +{ + struct net_device *dev; + u32 ip = ((struct sockaddr_in *) addr)->sin_addr.s_addr; + int ret; + + dev = ip_dev_find(ip); + if (!dev) + return -EADDRNOTAVAIL; + + ret = copy_addr(dev_addr, dev, NULL); + dev_put(dev); + return ret; +} +EXPORT_SYMBOL(rdma_translate_ip); + +static void set_timeout(unsigned long time) +{ + unsigned long delay; + + cancel_delayed_work(&work); + + delay = time - jiffies; + if ((long)delay <= 0) + delay = 1; + + queue_delayed_work(addr_wq, &work, delay); +} + +static void queue_req(struct addr_req *req) +{ + struct addr_req *temp_req; + + mutex_lock(&lock); + list_for_each_entry_reverse(temp_req, &req_list, list) { + if (time_after(req->timeout, temp_req->timeout)) + break; + } + + list_add(&req->list, &temp_req->list); + + if (req_list.next == &req->list) + set_timeout(req->timeout); + mutex_unlock(&lock); +} + +static void addr_send_arp(struct sockaddr_in *dst_in) +{ + struct rtable *rt; + struct flowi fl; + u32 dst_ip = dst_in->sin_addr.s_addr; + + memset(&fl, 0, sizeof fl); + fl.nl_u.ip4_u.daddr = dst_ip; + if (ip_route_output_key(&rt, &fl)) + return; + + arp_send(ARPOP_REQUEST, ETH_P_ARP, rt->rt_gateway, rt->idev->dev, + rt->rt_src, NULL, rt->idev->dev->dev_addr, NULL); + ip_rt_put(rt); +} + +static int addr_resolve_remote(struct sockaddr_in *src_in, + struct sockaddr_in *dst_in, + struct rdma_dev_addr *addr) +{ + u32 src_ip = src_in->sin_addr.s_addr; + u32 dst_ip = dst_in->sin_addr.s_addr; + struct flowi fl; + struct rtable *rt; + struct neighbour *neigh; + int ret; + + memset(&fl, 0, sizeof fl); + fl.nl_u.ip4_u.daddr = dst_ip; + fl.nl_u.ip4_u.saddr = src_ip; + ret = ip_route_output_key(&rt, &fl); + if (ret) + goto out; + + /* If the device does ARP internally, return 'done' */ + if (rt->idev->dev->flags & IFF_NOARP) { + copy_addr(addr, rt->idev->dev, NULL); + goto put; + } + + neigh = neigh_lookup(&arp_tbl, &rt->rt_gateway, rt->idev->dev); + if (!neigh) { + ret = -ENODATA; + goto put; + } + + if (!(neigh->nud_state & NUD_VALID)) { + ret = -ENODATA; + goto release; + } + + if (!src_ip) { + src_in->sin_family = dst_in->sin_family; + src_in->sin_addr.s_addr = rt->rt_src; + } + + ret = copy_addr(addr, neigh->dev, neigh->ha); +release: + neigh_release(neigh); +put: + ip_rt_put(rt); +out: + return ret; +} + +static void process_req(void *data) +{ + struct addr_req *req, *temp_req; + struct sockaddr_in *src_in, *dst_in; + struct list_head done_list; + + INIT_LIST_HEAD(&done_list); + + mutex_lock(&lock); + list_for_each_entry_safe(req, temp_req, &req_list, list) { + if (req->status) { + src_in = (struct sockaddr_in *) &req->src_addr; + dst_in = (struct sockaddr_in *) &req->dst_addr; + req->status = addr_resolve_remote(src_in, dst_in, + req->addr); + } + if (req->status && time_after(jiffies, req->timeout)) + req->status = -ETIMEDOUT; + else if (req->status == -ENODATA) + continue; + + list_del(&req->list); + list_add_tail(&req->list, &done_list); + } + + if (!list_empty(&req_list)) { + req = list_entry(req_list.next, struct addr_req, list); + set_timeout(req->timeout); + } + mutex_unlock(&lock); + + list_for_each_entry_safe(req, temp_req, &done_list, list) { + list_del(&req->list); + req->callback(req->status, &req->src_addr, req->addr, + req->context); + kfree(req); + } +} + +static int addr_resolve_local(struct sockaddr_in *src_in, + struct sockaddr_in *dst_in, + struct rdma_dev_addr *addr) +{ + struct net_device *dev; + u32 src_ip = src_in->sin_addr.s_addr; + u32 dst_ip = dst_in->sin_addr.s_addr; + int ret; + + dev = ip_dev_find(dst_ip); + if (!dev) + return -EADDRNOTAVAIL; + + if (ZERONET(src_ip)) { + src_in->sin_family = dst_in->sin_family; + src_in->sin_addr.s_addr = dst_ip; + ret = copy_addr(addr, dev, dev->dev_addr); + } else if (LOOPBACK(src_ip)) { + ret = rdma_translate_ip((struct sockaddr *)dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip((struct sockaddr *)src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } + + dev_put(dev); + return ret; +} + +int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr, + struct rdma_dev_addr *addr, int timeout_ms, + void (*callback)(int status, struct sockaddr *src_addr, + struct rdma_dev_addr *addr, void *context), + void *context) +{ + struct sockaddr_in *src_in, *dst_in; + struct addr_req *req; + int ret = 0; + + req = kmalloc(sizeof *req, GFP_KERNEL); + if (!req) + return -ENOMEM; + memset(req, 0, sizeof *req); + + if (src_addr) + memcpy(&req->src_addr, src_addr, ip_addr_size(src_addr)); + memcpy(&req->dst_addr, dst_addr, ip_addr_size(dst_addr)); + req->addr = addr; + req->callback = callback; + req->context = context; + + src_in = (struct sockaddr_in *) &req->src_addr; + dst_in = (struct sockaddr_in *) &req->dst_addr; + + req->status = addr_resolve_local(src_in, dst_in, addr); + if (req->status == -EADDRNOTAVAIL) + req->status = addr_resolve_remote(src_in, dst_in, addr); + + switch (req->status) { + case 0: + req->timeout = jiffies; + queue_req(req); + break; + case -ENODATA: + req->timeout = msecs_to_jiffies(timeout_ms) + jiffies; + queue_req(req); + addr_send_arp(dst_in); + break; + default: + ret = req->status; + kfree(req); + break; + } + return ret; +} +EXPORT_SYMBOL(rdma_resolve_ip); + +void rdma_addr_cancel(struct rdma_dev_addr *addr) +{ + struct addr_req *req, *temp_req; + + mutex_lock(&lock); + list_for_each_entry_safe(req, temp_req, &req_list, list) { + if (req->addr == addr) { + req->status = -ECANCELED; + req->timeout = jiffies; + list_del(&req->list); + list_add(&req->list, &req_list); + set_timeout(req->timeout); + break; + } + } + mutex_unlock(&lock); +} +EXPORT_SYMBOL(rdma_addr_cancel); + +static int addr_arp_recv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pkt, struct net_device *orig_dev) +{ + struct arphdr *arp_hdr; + + arp_hdr = (struct arphdr *) skb->nh.raw; + + if (arp_hdr->ar_op == htons(ARPOP_REQUEST) || + arp_hdr->ar_op == htons(ARPOP_REPLY)) + set_timeout(jiffies); + + kfree_skb(skb); + return 0; +} + +static struct packet_type addr_arp = { + .type = __constant_htons(ETH_P_ARP), + .func = addr_arp_recv, + .af_packet_priv = (void*) 1, +}; + +static int addr_init(void) +{ + addr_wq = create_singlethread_workqueue("ib_addr_wq"); + if (!addr_wq) + return -ENOMEM; + + dev_add_pack(&addr_arp); + return 0; +} + +static void addr_cleanup(void) +{ + dev_remove_pack(&addr_arp); + destroy_workqueue(addr_wq); +} + +module_init(addr_init); +module_exit(addr_cleanup); diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 86fee43..490fd03 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -32,7 +32,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: cm.c 2821 2005-07-08 17:07:28Z sean.hefty $ + * $Id: cm.c 4311 2005-12-05 18:42:01Z sean.hefty $ */ #include @@ -132,6 +132,7 @@ struct cm_id_private { /* todo: use alternate port on send failure */ struct cm_av av; struct cm_av alt_av; + struct ib_cm_compare_data *compare_data; void *private_data; __be64 tid; @@ -357,6 +358,41 @@ static struct cm_id_private * cm_acquire return cm_id_priv; } +static void cm_mask_copy(u8 *dst, u8 *src, u8 *mask) +{ + int i; + + for (i = 0; i < IB_CM_COMPARE_SIZE / sizeof(unsigned long); i++) + ((unsigned long *) dst)[i] = ((unsigned long *) src)[i] & + ((unsigned long *) mask)[i]; +} + +static int cm_compare_data(struct ib_cm_compare_data *src_data, + struct ib_cm_compare_data *dst_data) +{ + u8 src[IB_CM_COMPARE_SIZE]; + u8 dst[IB_CM_COMPARE_SIZE]; + + if (!src_data || !dst_data) + return 0; + + cm_mask_copy(src, src_data->data, dst_data->mask); + cm_mask_copy(dst, dst_data->data, src_data->mask); + return memcmp(src, dst, IB_CM_COMPARE_SIZE); +} + +static int cm_compare_private_data(u8 *private_data, + struct ib_cm_compare_data *dst_data) +{ + u8 src[IB_CM_COMPARE_SIZE]; + + if (!dst_data) + return 0; + + cm_mask_copy(src, private_data, dst_data->mask); + return memcmp(src, dst_data->data, IB_CM_COMPARE_SIZE); +} + static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv) { struct rb_node **link = &cm.listen_service_table.rb_node; @@ -364,14 +400,18 @@ static struct cm_id_private * cm_insert_ struct cm_id_private *cur_cm_id_priv; __be64 service_id = cm_id_priv->id.service_id; __be64 service_mask = cm_id_priv->id.service_mask; + int data_cmp; while (*link) { parent = *link; cur_cm_id_priv = rb_entry(parent, struct cm_id_private, service_node); + data_cmp = cm_compare_data(cm_id_priv->compare_data, + cur_cm_id_priv->compare_data); if ((cur_cm_id_priv->id.service_mask & service_id) == (service_mask & cur_cm_id_priv->id.service_id) && - (cm_id_priv->id.device == cur_cm_id_priv->id.device)) + (cm_id_priv->id.device == cur_cm_id_priv->id.device) && + !data_cmp) return cur_cm_id_priv; if (cm_id_priv->id.device < cur_cm_id_priv->id.device) @@ -380,6 +420,10 @@ static struct cm_id_private * cm_insert_ link = &(*link)->rb_right; else if (service_id < cur_cm_id_priv->id.service_id) link = &(*link)->rb_left; + else if (service_id > cur_cm_id_priv->id.service_id) + link = &(*link)->rb_right; + else if (data_cmp < 0) + link = &(*link)->rb_left; else link = &(*link)->rb_right; } @@ -389,16 +433,20 @@ static struct cm_id_private * cm_insert_ } static struct cm_id_private * cm_find_listen(struct ib_device *device, - __be64 service_id) + __be64 service_id, + u8 *private_data) { struct rb_node *node = cm.listen_service_table.rb_node; struct cm_id_private *cm_id_priv; + int data_cmp; while (node) { cm_id_priv = rb_entry(node, struct cm_id_private, service_node); + data_cmp = cm_compare_private_data(private_data, + cm_id_priv->compare_data); if ((cm_id_priv->id.service_mask & service_id) == cm_id_priv->id.service_id && - (cm_id_priv->id.device == device)) + (cm_id_priv->id.device == device) && !data_cmp) return cm_id_priv; if (device < cm_id_priv->id.device) @@ -407,6 +455,10 @@ static struct cm_id_private * cm_find_li node = node->rb_right; else if (service_id < cm_id_priv->id.service_id) node = node->rb_left; + else if (service_id > cm_id_priv->id.service_id) + node = node->rb_right; + else if (data_cmp < 0) + node = node->rb_left; else node = node->rb_right; } @@ -730,15 +782,14 @@ retest: wait_for_completion(&cm_id_priv->comp); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); - if (cm_id_priv->private_data && cm_id_priv->private_data_len) - kfree(cm_id_priv->private_data); + kfree(cm_id_priv->compare_data); + kfree(cm_id_priv->private_data); kfree(cm_id_priv); } EXPORT_SYMBOL(ib_destroy_cm_id); -int ib_cm_listen(struct ib_cm_id *cm_id, - __be64 service_id, - __be64 service_mask) +int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask, + struct ib_cm_compare_data *compare_data) { struct cm_id_private *cm_id_priv, *cur_cm_id_priv; unsigned long flags; @@ -752,7 +803,19 @@ int ib_cm_listen(struct ib_cm_id *cm_id, return -EINVAL; cm_id_priv = container_of(cm_id, struct cm_id_private, id); - BUG_ON(cm_id->state != IB_CM_IDLE); + if (cm_id->state != IB_CM_IDLE) + return -EINVAL; + + if (compare_data) { + cm_id_priv->compare_data = kzalloc(sizeof *compare_data, + GFP_KERNEL); + if (!cm_id_priv->compare_data) + return -ENOMEM; + cm_mask_copy(cm_id_priv->compare_data->data, + compare_data->data, compare_data->mask); + memcpy(cm_id_priv->compare_data->mask, compare_data->mask, + IB_CM_COMPARE_SIZE); + } cm_id->state = IB_CM_LISTEN; @@ -769,6 +832,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id, if (cur_cm_id_priv) { cm_id->state = IB_CM_IDLE; + kfree(cm_id_priv->compare_data); + cm_id_priv->compare_data = NULL; ret = -EBUSY; } return ret; @@ -1241,7 +1306,8 @@ static struct cm_id_private * cm_match_r /* Find matching listen request. */ listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device, - req_msg->service_id); + req_msg->service_id, + req_msg->private_data); if (!listen_cm_id_priv) { spin_unlock_irqrestore(&cm.lock, flags); cm_issue_rej(work->port, work->mad_recv_wc, @@ -2654,7 +2720,8 @@ static int cm_sidr_req_handler(struct cm goto out; /* Duplicate message. */ } cur_cm_id_priv = cm_find_listen(cm_id->device, - sidr_req_msg->service_id); + sidr_req_msg->service_id, + sidr_req_msg->private_data); if (!cur_cm_id_priv) { rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table); spin_unlock_irqrestore(&cm.lock, flags); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c new file mode 100644 index 0000000..0e6e4d6 --- /dev/null +++ b/drivers/infiniband/core/cma.c @@ -0,0 +1,1886 @@ +/* + * Copyright (c) 2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. + * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#include +#include +#include +#include +#include +#include + +#include + +#include +#include +#include +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("Generic RDMA CM Agent"); +MODULE_LICENSE("Dual BSD/GPL"); + +#define CMA_CM_RESPONSE_TIMEOUT 20 +#define CMA_MAX_CM_RETRIES 3 + +static void cma_add_one(struct ib_device *device); +static void cma_remove_one(struct ib_device *device); + +static struct ib_client cma_client = { + .name = "cma", + .add = cma_add_one, + .remove = cma_remove_one +}; + +static LIST_HEAD(dev_list); +static LIST_HEAD(listen_any_list); +static DEFINE_MUTEX(lock); +static struct workqueue_struct *cma_wq; +static DEFINE_IDR(sdp_ps); +static DEFINE_IDR(tcp_ps); + +struct cma_device { + struct list_head list; + struct ib_device *device; + __be64 node_guid; + struct completion comp; + atomic_t refcount; + struct list_head id_list; +}; + +enum cma_state { + CMA_IDLE, + CMA_ADDR_QUERY, + CMA_ADDR_RESOLVED, + CMA_ROUTE_QUERY, + CMA_ROUTE_RESOLVED, + CMA_CONNECT, + CMA_ADDR_BOUND, + CMA_LISTEN, + CMA_DEVICE_REMOVAL, + CMA_DESTROYING +}; + +struct rdma_bind_list { + struct idr *ps; + struct hlist_head owners; + unsigned short port; +}; + +/* + * Device removal can occur at anytime, so we need extra handling to + * serialize notifying the user of device removal with other callbacks. + * We do this by disabling removal notification while a callback is in process, + * and reporting it after the callback completes. + */ +struct rdma_id_private { + struct rdma_cm_id id; + + struct rdma_bind_list *bind_list; + struct hlist_node node; + struct list_head list; + struct list_head listen_list; + struct cma_device *cma_dev; + + enum cma_state state; + spinlock_t lock; + struct completion comp; + atomic_t refcount; + wait_queue_head_t wait_remove; + atomic_t dev_remove; + + int backlog; + int timeout_ms; + struct ib_sa_query *query; + int query_id; + union { + struct ib_cm_id *ib; + } cm_id; + + u32 seq_num; + u32 qp_num; + enum ib_qp_type qp_type; + u8 srq; +}; + +struct cma_work { + struct work_struct work; + struct rdma_id_private *id; + enum cma_state old_state; + enum cma_state new_state; + struct rdma_cm_event event; +}; + +union cma_ip_addr { + struct in6_addr ip6; + struct { + __u32 pad[3]; + __u32 addr; + } ip4; +}; + +struct cma_hdr { + u8 cma_version; + u8 ip_version; /* IP version: 7:4 */ + __u16 port; + union cma_ip_addr src_addr; + union cma_ip_addr dst_addr; +}; + +struct sdp_hh { + u8 bsdh[16]; + u8 sdp_version; /* Major version: 7:4 */ + u8 ip_version; /* IP version: 7:4 */ + u8 sdp_specific1[10]; + __u16 port; + __u16 sdp_specific2; + union cma_ip_addr src_addr; + union cma_ip_addr dst_addr; +}; + +struct sdp_hah { + u8 bsdh[16]; + u8 sdp_version; +}; + +#define CMA_VERSION 0x00 +#define SDP_MAJ_VERSION 0x2 + +static int cma_comp(struct rdma_id_private *id_priv, enum cma_state comp) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&id_priv->lock, flags); + ret = (id_priv->state == comp); + spin_unlock_irqrestore(&id_priv->lock, flags); + return ret; +} + +static int cma_comp_exch(struct rdma_id_private *id_priv, + enum cma_state comp, enum cma_state exch) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&id_priv->lock, flags); + if ((ret = (id_priv->state == comp))) + id_priv->state = exch; + spin_unlock_irqrestore(&id_priv->lock, flags); + return ret; +} + +static enum cma_state cma_exch(struct rdma_id_private *id_priv, + enum cma_state exch) +{ + unsigned long flags; + enum cma_state old; + + spin_lock_irqsave(&id_priv->lock, flags); + old = id_priv->state; + id_priv->state = exch; + spin_unlock_irqrestore(&id_priv->lock, flags); + return old; +} + +static inline u8 cma_get_ip_ver(struct cma_hdr *hdr) +{ + return hdr->ip_version >> 4; +} + +static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) +{ + hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF); +} + +static inline u8 sdp_get_majv(u8 sdp_version) +{ + return sdp_version >> 4; +} + +static inline u8 sdp_get_ip_ver(struct sdp_hh *hh) +{ + return hh->ip_version >> 4; +} + +static inline void sdp_set_ip_ver(struct sdp_hh *hh, u8 ip_ver) +{ + hh->ip_version = (ip_ver << 4) | (hh->ip_version & 0xF); +} + +static void cma_attach_to_dev(struct rdma_id_private *id_priv, + struct cma_device *cma_dev) +{ + atomic_inc(&cma_dev->refcount); + id_priv->cma_dev = cma_dev; + id_priv->id.device = cma_dev->device; + list_add_tail(&id_priv->list, &cma_dev->id_list); +} + +static inline void cma_deref_dev(struct cma_device *cma_dev) +{ + if (atomic_dec_and_test(&cma_dev->refcount)) + complete(&cma_dev->comp); +} + +static void cma_detach_from_dev(struct rdma_id_private *id_priv) +{ + list_del(&id_priv->list); + cma_deref_dev(id_priv->cma_dev); + id_priv->cma_dev = NULL; +} + +static int cma_acquire_ib_dev(struct rdma_id_private *id_priv) +{ + struct cma_device *cma_dev; + union ib_gid *gid; + int ret = -ENODEV; + + gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); + + mutex_lock(&lock); + list_for_each_entry(cma_dev, &dev_list, list) { + ret = ib_find_cached_gid(cma_dev->device, gid, + &id_priv->id.port_num, NULL); + if (!ret) { + cma_attach_to_dev(id_priv, cma_dev); + break; + } + } + mutex_unlock(&lock); + return ret; +} + +static int cma_acquire_dev(struct rdma_id_private *id_priv) +{ + switch (id_priv->id.route.addr.dev_addr.dev_type) { + case IB_NODE_CA: + return cma_acquire_ib_dev(id_priv); + default: + return -ENODEV; + } +} + +static void cma_deref_id(struct rdma_id_private *id_priv) +{ + if (atomic_dec_and_test(&id_priv->refcount)) + complete(&id_priv->comp); +} + +static void cma_release_remove(struct rdma_id_private *id_priv) +{ + if (atomic_dec_and_test(&id_priv->dev_remove)) + wake_up(&id_priv->wait_remove); +} + +struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, + void *context, enum rdma_port_space ps) +{ + struct rdma_id_private *id_priv; + + id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL); + if (!id_priv) + return ERR_PTR(-ENOMEM); + + id_priv->state = CMA_IDLE; + id_priv->id.context = context; + id_priv->id.event_handler = event_handler; + id_priv->id.ps = ps; + spin_lock_init(&id_priv->lock); + init_completion(&id_priv->comp); + atomic_set(&id_priv->refcount, 1); + init_waitqueue_head(&id_priv->wait_remove); + atomic_set(&id_priv->dev_remove, 0); + INIT_LIST_HEAD(&id_priv->listen_list); + get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); + + return &id_priv->id; +} +EXPORT_SYMBOL(rdma_create_id); + +static int cma_init_ib_qp(struct rdma_id_private *id_priv, struct ib_qp *qp) +{ + struct ib_qp_attr qp_attr; + struct rdma_dev_addr *dev_addr; + int ret; + + dev_addr = &id_priv->id.route.addr.dev_addr; + ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num, + ib_addr_get_pkey(dev_addr), + &qp_attr.pkey_index); + if (ret) + return ret; + + qp_attr.qp_state = IB_QPS_INIT; + qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; + qp_attr.port_num = id_priv->id.port_num; + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS | + IB_QP_PKEY_INDEX | IB_QP_PORT); +} + +int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr) +{ + struct rdma_id_private *id_priv; + struct ib_qp *qp; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (id->device != pd->device) + return -EINVAL; + + qp = ib_create_qp(pd, qp_init_attr); + if (IS_ERR(qp)) + return PTR_ERR(qp); + + switch (id->device->node_type) { + case IB_NODE_CA: + ret = cma_init_ib_qp(id_priv, qp); + break; + default: + ret = -ENOSYS; + break; + } + + if (ret) + goto err; + + id->qp = qp; + id_priv->qp_num = qp->qp_num; + id_priv->qp_type = qp->qp_type; + id_priv->srq = (qp->srq != NULL); + return 0; +err: + ib_destroy_qp(qp); + return ret; +} +EXPORT_SYMBOL(rdma_create_qp); + +void rdma_destroy_qp(struct rdma_cm_id *id) +{ + ib_destroy_qp(id->qp); +} +EXPORT_SYMBOL(rdma_destroy_qp); + +static int cma_modify_qp_rtr(struct rdma_cm_id *id) +{ + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + + if (!id->qp) + return 0; + + /* Need to update QP attributes from default values. */ + qp_attr.qp_state = IB_QPS_INIT; + ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); + if (ret) + return ret; + + ret = ib_modify_qp(id->qp, &qp_attr, qp_attr_mask); + if (ret) + return ret; + + qp_attr.qp_state = IB_QPS_RTR; + ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); + if (ret) + return ret; + + return ib_modify_qp(id->qp, &qp_attr, qp_attr_mask); +} + +static int cma_modify_qp_rts(struct rdma_cm_id *id) +{ + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + + if (!id->qp) + return 0; + + qp_attr.qp_state = IB_QPS_RTS; + ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); + if (ret) + return ret; + + return ib_modify_qp(id->qp, &qp_attr, qp_attr_mask); +} + +static int cma_modify_qp_err(struct rdma_cm_id *id) +{ + struct ib_qp_attr qp_attr; + + if (!id->qp) + return 0; + + qp_attr.qp_state = IB_QPS_ERR; + return ib_modify_qp(id->qp, &qp_attr, IB_QP_STATE); +} + +int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, + int *qp_attr_mask) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + switch (id_priv->id.device->node_type) { + case IB_NODE_CA: + ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, qp_attr, + qp_attr_mask); + if (qp_attr->qp_state == IB_QPS_RTR) + qp_attr->rq_psn = id_priv->seq_num; + break; + default: + ret = -ENOSYS; + break; + } + + return ret; +} +EXPORT_SYMBOL(rdma_init_qp_attr); + +static inline int cma_zero_addr(struct sockaddr *addr) +{ + struct in6_addr *ip6; + + if (addr->sa_family == AF_INET) + return ZERONET(((struct sockaddr_in *) addr)->sin_addr.s_addr); + else { + ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr; + return (ip6->s6_addr32[0] | ip6->s6_addr32[1] | + ip6->s6_addr32[3] | ip6->s6_addr32[4]) == 0; + } +} + +static inline int cma_loopback_addr(struct sockaddr *addr) +{ + return LOOPBACK(((struct sockaddr_in *) addr)->sin_addr.s_addr); +} + +static inline int cma_any_addr(struct sockaddr *addr) +{ + return cma_zero_addr(addr) || cma_loopback_addr(addr); +} + +static inline int cma_any_port(struct sockaddr *addr) +{ + return !((struct sockaddr_in *) addr)->sin_port; +} + +static int cma_get_net_info(void *hdr, enum rdma_port_space ps, + u8 *ip_ver, __u16 *port, + union cma_ip_addr **src, union cma_ip_addr **dst) +{ + switch (ps) { + case RDMA_PS_SDP: + if (sdp_get_majv(((struct sdp_hh *) hdr)->sdp_version) != + SDP_MAJ_VERSION) + return -EINVAL; + + *ip_ver = sdp_get_ip_ver(hdr); + *port = ((struct sdp_hh *) hdr)->port; + *src = &((struct sdp_hh *) hdr)->src_addr; + *dst = &((struct sdp_hh *) hdr)->dst_addr; + break; + default: + if (((struct cma_hdr *) hdr)->cma_version != CMA_VERSION) + return -EINVAL; + + *ip_ver = cma_get_ip_ver(hdr); + *port = ((struct cma_hdr *) hdr)->port; + *src = &((struct cma_hdr *) hdr)->src_addr; + *dst = &((struct cma_hdr *) hdr)->dst_addr; + break; + } + + if (*ip_ver != 4 && *ip_ver != 6) + return -EINVAL; + return 0; +} + +static void cma_save_net_info(struct rdma_addr *addr, + struct rdma_addr *listen_addr, + u8 ip_ver, __u16 port, + union cma_ip_addr *src, union cma_ip_addr *dst) +{ + struct sockaddr_in *listen4, *ip4; + struct sockaddr_in6 *listen6, *ip6; + + switch (ip_ver) { + case 4: + listen4 = (struct sockaddr_in *) &listen_addr->src_addr; + ip4 = (struct sockaddr_in *) &addr->src_addr; + ip4->sin_family = listen4->sin_family; + ip4->sin_addr.s_addr = dst->ip4.addr; + ip4->sin_port = listen4->sin_port; + + ip4 = (struct sockaddr_in *) &addr->dst_addr; + ip4->sin_family = listen4->sin_family; + ip4->sin_addr.s_addr = src->ip4.addr; + ip4->sin_port = port; + break; + case 6: + listen6 = (struct sockaddr_in6 *) &listen_addr->src_addr; + ip6 = (struct sockaddr_in6 *) &addr->src_addr; + ip6->sin6_family = listen6->sin6_family; + ip6->sin6_addr = dst->ip6; + ip6->sin6_port = listen6->sin6_port; + + ip6 = (struct sockaddr_in6 *) &addr->dst_addr; + ip6->sin6_family = listen6->sin6_family; + ip6->sin6_addr = src->ip6; + ip6->sin6_port = port; + break; + default: + break; + } +} + +static inline int cma_user_data_offset(enum rdma_port_space ps) +{ + switch (ps) { + case RDMA_PS_SDP: + return 0; + default: + return sizeof(struct cma_hdr); + } +} + +static int cma_notify_user(struct rdma_id_private *id_priv, + enum rdma_cm_event_type type, int status, + void *data, u8 data_len) +{ + struct rdma_cm_event event; + + event.event = type; + event.status = status; + event.private_data = data; + event.private_data_len = data_len; + + return id_priv->id.event_handler(&id_priv->id, &event); +} + +static void cma_cancel_route(struct rdma_id_private *id_priv) +{ + switch (id_priv->id.device->node_type) { + case IB_NODE_CA: + if (id_priv->query) + ib_sa_cancel_query(id_priv->query_id, id_priv->query); + break; + default: + break; + } +} + +static inline int cma_internal_listen(struct rdma_id_private *id_priv) +{ + return (id_priv->state == CMA_LISTEN) && id_priv->cma_dev && + cma_any_addr(&id_priv->id.route.addr.src_addr); +} + +static void cma_destroy_listen(struct rdma_id_private *id_priv) +{ + cma_exch(id_priv, CMA_DESTROYING); + + if (id_priv->cma_dev) { + switch (id_priv->id.device->node_type) { + case IB_NODE_CA: + if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) + ib_destroy_cm_id(id_priv->cm_id.ib); + break; + default: + break; + } + cma_detach_from_dev(id_priv); + } + list_del(&id_priv->listen_list); + + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); + + kfree(id_priv); +} + +static void cma_cancel_listens(struct rdma_id_private *id_priv) +{ + struct rdma_id_private *dev_id_priv; + + mutex_lock(&lock); + list_del(&id_priv->list); + + while (!list_empty(&id_priv->listen_list)) { + dev_id_priv = list_entry(id_priv->listen_list.next, + struct rdma_id_private, listen_list); + cma_destroy_listen(dev_id_priv); + } + mutex_unlock(&lock); +} + +static void cma_cancel_operation(struct rdma_id_private *id_priv, + enum cma_state state) +{ + switch (state) { + case CMA_ADDR_QUERY: + rdma_addr_cancel(&id_priv->id.route.addr.dev_addr); + break; + case CMA_ROUTE_QUERY: + cma_cancel_route(id_priv); + break; + case CMA_LISTEN: + if (cma_any_addr(&id_priv->id.route.addr.src_addr) && + !id_priv->cma_dev) + cma_cancel_listens(id_priv); + break; + default: + break; + } +} + +static void cma_release_port(struct rdma_id_private *id_priv) +{ + struct rdma_bind_list *bind_list = id_priv->bind_list; + + if (!bind_list) + return; + + mutex_lock(&lock); + hlist_del(&id_priv->node); + if (hlist_empty(&bind_list->owners)) { + idr_remove(bind_list->ps, bind_list->port); + kfree(bind_list); + } + mutex_unlock(&lock); +} + +void rdma_destroy_id(struct rdma_cm_id *id) +{ + struct rdma_id_private *id_priv; + enum cma_state state; + + id_priv = container_of(id, struct rdma_id_private, id); + state = cma_exch(id_priv, CMA_DESTROYING); + cma_cancel_operation(id_priv, state); + + if (id_priv->cma_dev) { + switch (id->device->node_type) { + case IB_NODE_CA: + if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) + ib_destroy_cm_id(id_priv->cm_id.ib); + break; + default: + break; + } + mutex_lock(&lock); + cma_detach_from_dev(id_priv); + mutex_unlock(&lock); + } + + cma_release_port(id_priv); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); + + kfree(id_priv->id.route.path_rec); + kfree(id_priv); +} +EXPORT_SYMBOL(rdma_destroy_id); + +static int cma_rep_recv(struct rdma_id_private *id_priv) +{ + int ret; + + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + goto reject; + + ret = cma_modify_qp_rts(&id_priv->id); + if (ret) + goto reject; + + ret = ib_send_cm_rtu(id_priv->cm_id.ib, NULL, 0); + if (ret) + goto reject; + + return 0; +reject: + cma_modify_qp_err(&id_priv->id); + ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + return ret; +} + +static int cma_verify_rep(struct rdma_id_private *id_priv, void *data) +{ + if (id_priv->id.ps == RDMA_PS_SDP && + sdp_get_majv(((struct sdp_hah *) data)->sdp_version) != + SDP_MAJ_VERSION) + return -EINVAL; + + return 0; +} + +static int cma_rtu_recv(struct rdma_id_private *id_priv) +{ + int ret; + + ret = cma_modify_qp_rts(&id_priv->id); + if (ret) + goto reject; + + return 0; +reject: + cma_modify_qp_err(&id_priv->id); + ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + return ret; +} + +static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv = cm_id->context; + enum rdma_cm_event_type event; + u8 private_data_len = 0; + int ret = 0, status = 0; + + if (!cma_comp(id_priv, CMA_CONNECT)) + return 0; + + atomic_inc(&id_priv->dev_remove); + switch (ib_event->event) { + case IB_CM_REQ_ERROR: + case IB_CM_REP_ERROR: + event = RDMA_CM_EVENT_UNREACHABLE; + status = -ETIMEDOUT; + break; + case IB_CM_REP_RECEIVED: + status = cma_verify_rep(id_priv, ib_event->private_data); + if (status) + event = RDMA_CM_EVENT_CONNECT_ERROR; + else if (id_priv->id.qp) { + status = cma_rep_recv(id_priv); + event = status ? RDMA_CM_EVENT_CONNECT_ERROR : + RDMA_CM_EVENT_ESTABLISHED; + } else + event = RDMA_CM_EVENT_CONNECT_RESPONSE; + private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; + break; + case IB_CM_RTU_RECEIVED: + status = cma_rtu_recv(id_priv); + event = status ? RDMA_CM_EVENT_CONNECT_ERROR : + RDMA_CM_EVENT_ESTABLISHED; + break; + case IB_CM_DREQ_ERROR: + status = -ETIMEDOUT; /* fall through */ + case IB_CM_DREQ_RECEIVED: + case IB_CM_DREP_RECEIVED: + event = RDMA_CM_EVENT_DISCONNECTED; + break; + case IB_CM_TIMEWAIT_EXIT: + case IB_CM_MRA_RECEIVED: + /* ignore event */ + goto out; + case IB_CM_REJ_RECEIVED: + cma_modify_qp_err(&id_priv->id); + status = ib_event->param.rej_rcvd.reason; + event = RDMA_CM_EVENT_REJECTED; + break; + default: + printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", + ib_event->event); + goto out; + } + + ret = cma_notify_user(id_priv, event, status, ib_event->private_data, + private_data_len); + if (ret) { + /* Destroy the CM ID by returning a non-zero value. */ + id_priv->cm_id.ib = NULL; + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return ret; + } +out: + cma_release_remove(id_priv); + return ret; +} + +static struct rdma_id_private *cma_new_id(struct rdma_cm_id *listen_id, + struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv; + struct rdma_cm_id *id; + struct rdma_route *rt; + union cma_ip_addr *src, *dst; + __u16 port; + u8 ip_ver; + + id = rdma_create_id(listen_id->event_handler, listen_id->context, + listen_id->ps); + if (IS_ERR(id)) + return NULL; + + rt = &id->route; + rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; + rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); + if (!rt->path_rec) + goto err; + + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto err; + + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); + rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; + if (rt->num_paths == 2) + rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; + + ib_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid); + ib_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid); + ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey)); + rt->addr.dev_addr.dev_type = IB_NODE_CA; + + id_priv = container_of(id, struct rdma_id_private, id); + id_priv->state = CMA_CONNECT; + return id_priv; +err: + rdma_destroy_id(id); + return NULL; +} + +static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) +{ + struct rdma_id_private *listen_id, *conn_id; + int offset, ret; + + listen_id = cm_id->context; + atomic_inc(&listen_id->dev_remove); + if (!cma_comp(listen_id, CMA_LISTEN)) { + ret = -ECONNABORTED; + goto out; + } + + conn_id = cma_new_id(&listen_id->id, ib_event); + if (!conn_id) { + ret = -ENOMEM; + goto out; + } + + atomic_inc(&conn_id->dev_remove); + ret = cma_acquire_ib_dev(conn_id); + if (ret) { + ret = -ENODEV; + cma_release_remove(conn_id); + rdma_destroy_id(&conn_id->id); + goto out; + } + + conn_id->cm_id.ib = cm_id; + cm_id->context = conn_id; + cm_id->cm_handler = cma_ib_handler; + + offset = cma_user_data_offset(listen_id->id.ps); + ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, + ib_event->private_data + offset, + IB_CM_REQ_PRIVATE_DATA_SIZE - offset); + if (ret) { + /* Destroy the CM ID by returning a non-zero value. */ + conn_id->cm_id.ib = NULL; + cma_exch(conn_id, CMA_DESTROYING); + cma_release_remove(conn_id); + rdma_destroy_id(&conn_id->id); + } +out: + cma_release_remove(listen_id); + return ret; +} + +static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) +{ + return cpu_to_be64(((u64)ps << 16) + + be16_to_cpu(((struct sockaddr_in *) addr)->sin_port)); +} + +static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, + struct ib_cm_compare_data *compare) +{ + struct cma_hdr *cma_data, *cma_mask; + struct sdp_hh *sdp_data, *sdp_mask; + __u32 ip4_addr; + struct in6_addr ip6_addr; + + memset(compare, 0, sizeof *compare); + cma_data = (void *) compare->data; + cma_mask = (void *) compare->mask; + sdp_data = (void *) compare->data; + sdp_mask = (void *) compare->mask; + + switch (addr->sa_family) { + case AF_INET: + ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr; + if (ps == RDMA_PS_SDP) { + sdp_set_ip_ver(sdp_data, 4); + sdp_set_ip_ver(sdp_mask, 0xF); + sdp_data->dst_addr.ip4.addr = ip4_addr; + sdp_mask->dst_addr.ip4.addr = ~0; + } else { + cma_set_ip_ver(cma_data, 4); + cma_set_ip_ver(cma_mask, 0xF); + cma_data->dst_addr.ip4.addr = ip4_addr; + cma_mask->dst_addr.ip4.addr = ~0; + } + break; + case AF_INET6: + ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr; + if (ps == RDMA_PS_SDP) { + sdp_set_ip_ver(sdp_data, 6); + sdp_set_ip_ver(sdp_mask, 0xF); + sdp_data->dst_addr.ip6 = ip6_addr; + memset(&sdp_mask->dst_addr.ip6, 0xFF, + sizeof sdp_mask->dst_addr.ip6); + } else { + cma_set_ip_ver(cma_data, 6); + cma_set_ip_ver(cma_mask, 0xF); + cma_data->dst_addr.ip6 = ip6_addr; + memset(&cma_mask->dst_addr.ip6, 0xFF, + sizeof cma_mask->dst_addr.ip6); + } + break; + default: + break; + } +} + +static int cma_ib_listen(struct rdma_id_private *id_priv) +{ + struct ib_cm_compare_data compare_data; + struct sockaddr *addr; + __be64 svc_id; + int ret; + + id_priv->cm_id.ib = ib_create_cm_id(id_priv->id.device, cma_req_handler, + id_priv); + if (IS_ERR(id_priv->cm_id.ib)) + return PTR_ERR(id_priv->cm_id.ib); + + addr = &id_priv->id.route.addr.src_addr; + svc_id = cma_get_service_id(id_priv->id.ps, addr); + if (cma_any_addr(addr)) + ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, NULL); + else { + cma_set_compare_data(id_priv->id.ps, addr, &compare_data); + ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, &compare_data); + } + + if (ret) { + ib_destroy_cm_id(id_priv->cm_id.ib); + id_priv->cm_id.ib = NULL; + } + + return ret; +} + +static int cma_listen_handler(struct rdma_cm_id *id, + struct rdma_cm_event *event) +{ + struct rdma_id_private *id_priv = id->context; + + id->context = id_priv->id.context; + id->event_handler = id_priv->id.event_handler; + return id_priv->id.event_handler(id, event); +} + +static void cma_listen_on_dev(struct rdma_id_private *id_priv, + struct cma_device *cma_dev) +{ + struct rdma_id_private *dev_id_priv; + struct rdma_cm_id *id; + int ret; + + id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps); + if (IS_ERR(id)) + return; + + dev_id_priv = container_of(id, struct rdma_id_private, id); + + dev_id_priv->state = CMA_ADDR_BOUND; + memcpy(&id->route.addr.src_addr, &id_priv->id.route.addr.src_addr, + ip_addr_size(&id_priv->id.route.addr.src_addr)); + + cma_attach_to_dev(dev_id_priv, cma_dev); + list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list); + + ret = rdma_listen(id, id_priv->backlog); + if (ret) + goto err; + + return; +err: + cma_destroy_listen(dev_id_priv); +} + +static void cma_listen_on_all(struct rdma_id_private *id_priv) +{ + struct cma_device *cma_dev; + + mutex_lock(&lock); + list_add_tail(&id_priv->list, &listen_any_list); + list_for_each_entry(cma_dev, &dev_list, list) + cma_listen_on_dev(id_priv, cma_dev); + mutex_unlock(&lock); +} + +static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) +{ + struct sockaddr_in addr_in; + + memset(&addr_in, 0, sizeof addr_in); + addr_in.sin_family = af; + return rdma_bind_addr(id, (struct sockaddr *) &addr_in); +} + +int rdma_listen(struct rdma_cm_id *id, int backlog) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (id_priv->state == CMA_IDLE) { + ret = cma_bind_any(id, AF_INET); + if (ret) + return ret; + } + + if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN)) + return -EINVAL; + + if (id->device) { + switch (id->device->node_type) { + case IB_NODE_CA: + ret = cma_ib_listen(id_priv); + if (ret) + goto err; + break; + default: + ret = -ENOSYS; + goto err; + } + } else + cma_listen_on_all(id_priv); + + id_priv->backlog = backlog; + return 0; +err: + cma_comp_exch(id_priv, CMA_LISTEN, CMA_ADDR_BOUND); + return ret; +} +EXPORT_SYMBOL(rdma_listen); + +static void cma_query_handler(int status, struct ib_sa_path_rec *path_rec, + void *context) +{ + struct cma_work *work = context; + struct rdma_route *route; + + route = &work->id->id.route; + + if (!status) { + route->num_paths = 1; + *route->path_rec = *path_rec; + } else { + work->old_state = CMA_ROUTE_QUERY; + work->new_state = CMA_ADDR_RESOLVED; + work->event.event = RDMA_CM_EVENT_ROUTE_ERROR; + } + + queue_work(cma_wq, &work->work); +} + +static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, + struct cma_work *work) +{ + struct rdma_dev_addr *addr = &id_priv->id.route.addr.dev_addr; + struct ib_sa_path_rec path_rec; + + memset(&path_rec, 0, sizeof path_rec); + path_rec.sgid = *ib_addr_get_sgid(addr); + path_rec.dgid = *ib_addr_get_dgid(addr); + path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); + path_rec.numb_path = 1; + + id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device, + id_priv->id.port_num, &path_rec, + IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | + IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH, + timeout_ms, GFP_KERNEL, + cma_query_handler, work, &id_priv->query); + + return (id_priv->query_id < 0) ? id_priv->query_id : 0; +} + +static void cma_work_handler(void *data) +{ + struct cma_work *work = data; + struct rdma_id_private *id_priv = work->id; + int destroy = 0; + + atomic_inc(&id_priv->dev_remove); + if (!cma_comp_exch(id_priv, work->old_state, work->new_state)) + goto out; + + if (id_priv->id.event_handler(&id_priv->id, &work->event)) { + cma_exch(id_priv, CMA_DESTROYING); + destroy = 1; + } +out: + cma_release_remove(id_priv); + cma_deref_id(id_priv); + if (destroy) + rdma_destroy_id(&id_priv->id); + kfree(work); +} + +static int cma_resolve_ib_route(struct rdma_id_private *id_priv, int timeout_ms) +{ + struct rdma_route *route = &id_priv->id.route; + struct cma_work *work; + int ret; + + work = kzalloc(sizeof *work, GFP_KERNEL); + if (!work) + return -ENOMEM; + + work->id = id_priv; + INIT_WORK(&work->work, cma_work_handler, work); + work->old_state = CMA_ROUTE_QUERY; + work->new_state = CMA_ROUTE_RESOLVED; + work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; + + route->path_rec = kmalloc(sizeof *route->path_rec, GFP_KERNEL); + if (!route->path_rec) { + ret = -ENOMEM; + goto err1; + } + + ret = cma_query_ib_route(id_priv, timeout_ms, work); + if (ret) + goto err2; + + return 0; +err2: + kfree(route->path_rec); + route->path_rec = NULL; +err1: + kfree(work); + return ret; +} + +int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ROUTE_QUERY)) + return -EINVAL; + + atomic_inc(&id_priv->refcount); + switch (id->device->node_type) { + case IB_NODE_CA: + ret = cma_resolve_ib_route(id_priv, timeout_ms); + break; + default: + ret = -ENOSYS; + break; + } + if (ret) + goto err; + + return 0; +err: + cma_comp_exch(id_priv, CMA_ROUTE_QUERY, CMA_ADDR_RESOLVED); + cma_deref_id(id_priv); + return ret; +} +EXPORT_SYMBOL(rdma_resolve_route); + +static int cma_bind_loopback(struct rdma_id_private *id_priv) +{ + struct cma_device *cma_dev; + union ib_gid *gid; + u16 pkey; + int ret; + + mutex_lock(&lock); + if (list_empty(&dev_list)) { + ret = -ENODEV; + goto out; + } + + cma_dev = list_entry(dev_list.next, struct cma_device, list); + gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); + ret = ib_get_cached_gid(cma_dev->device, 1, 0, gid); + if (ret) + goto out; + + ret = ib_get_cached_pkey(cma_dev->device, 1, 0, &pkey); + if (ret) + goto out; + + ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); + id_priv->id.port_num = 1; + cma_attach_to_dev(id_priv, cma_dev); +out: + mutex_unlock(&lock); + return ret; +} + +static void addr_handler(int status, struct sockaddr *src_addr, + struct rdma_dev_addr *dev_addr, void *context) +{ + struct rdma_id_private *id_priv = context; + enum rdma_cm_event_type event; + + atomic_inc(&id_priv->dev_remove); + if (!id_priv->cma_dev && !status) + status = cma_acquire_dev(id_priv); + + if (status) { + if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_BOUND)) + goto out; + event = RDMA_CM_EVENT_ADDR_ERROR; + } else { + if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED)) + goto out; + memcpy(&id_priv->id.route.addr.src_addr, src_addr, + ip_addr_size(src_addr)); + event = RDMA_CM_EVENT_ADDR_RESOLVED; + } + + if (cma_notify_user(id_priv, event, status, NULL, 0)) { + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + cma_deref_id(id_priv); + rdma_destroy_id(&id_priv->id); + return; + } +out: + cma_release_remove(id_priv); + cma_deref_id(id_priv); +} + +static int cma_resolve_loopback(struct rdma_id_private *id_priv) +{ + struct cma_work *work; + struct sockaddr_in *src_in, *dst_in; + int ret; + + work = kzalloc(sizeof *work, GFP_KERNEL); + if (!work) + return -ENOMEM; + + if (!id_priv->cma_dev) { + ret = cma_bind_loopback(id_priv); + if (ret) + goto err; + } + + ib_addr_set_dgid(&id_priv->id.route.addr.dev_addr, + ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr)); + + if (cma_zero_addr(&id_priv->id.route.addr.src_addr)) { + src_in = (struct sockaddr_in *)&id_priv->id.route.addr.src_addr; + dst_in = (struct sockaddr_in *)&id_priv->id.route.addr.dst_addr; + src_in->sin_family = dst_in->sin_family; + src_in->sin_addr.s_addr = dst_in->sin_addr.s_addr; + } + + work->id = id_priv; + INIT_WORK(&work->work, cma_work_handler, work); + work->old_state = CMA_ADDR_QUERY; + work->new_state = CMA_ADDR_RESOLVED; + work->event.event = RDMA_CM_EVENT_ADDR_RESOLVED; + queue_work(cma_wq, &work->work); + return 0; +err: + kfree(work); + return ret; +} + +static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, + struct sockaddr *dst_addr) +{ + if (src_addr && src_addr->sa_family) + return rdma_bind_addr(id, src_addr); + else + return cma_bind_any(id, dst_addr->sa_family); +} + +int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, + struct sockaddr *dst_addr, int timeout_ms) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (id_priv->state == CMA_IDLE) { + ret = cma_bind_addr(id, src_addr, dst_addr); + if (ret) + return ret; + } + + if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_ADDR_QUERY)) + return -EINVAL; + + atomic_inc(&id_priv->refcount); + memcpy(&id->route.addr.dst_addr, dst_addr, ip_addr_size(dst_addr)); + if (cma_any_addr(dst_addr)) + ret = cma_resolve_loopback(id_priv); + else + ret = rdma_resolve_ip(&id->route.addr.src_addr, dst_addr, + &id->route.addr.dev_addr, + timeout_ms, addr_handler, id_priv); + if (ret) + goto err; + + return 0; +err: + cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_BOUND); + cma_deref_id(id_priv); + return ret; +} +EXPORT_SYMBOL(rdma_resolve_addr); + +static void cma_bind_port(struct rdma_bind_list *bind_list, + struct rdma_id_private *id_priv) +{ + struct sockaddr_in *sin; + + sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; + sin->sin_port = htons(bind_list->port); + id_priv->bind_list = bind_list; + hlist_add_head(&id_priv->node, &bind_list->owners); +} + +static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, + unsigned short snum) +{ + struct rdma_bind_list *bind_list; + int port, start, ret; + + bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); + if (!bind_list) + return -ENOMEM; + + start = snum ? snum : sysctl_local_port_range[0]; + + do { + ret = idr_get_new_above(ps, bind_list, start, &port); + } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); + + if (ret) + goto err; + + if ((snum && port != snum) || + (!snum && port > sysctl_local_port_range[1])) { + idr_remove(ps, port); + ret = -EADDRNOTAVAIL; + goto err; + } + + bind_list->ps = ps; + bind_list->port = (unsigned short) port; + cma_bind_port(bind_list, id_priv); + return 0; +err: + kfree(bind_list); + return ret; +} + +static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv) +{ + struct rdma_id_private *cur_id; + struct sockaddr_in *sin, *cur_sin; + struct rdma_bind_list *bind_list; + struct hlist_node *node; + unsigned short snum; + + sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; + snum = ntohs(sin->sin_port); + if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE)) + return -EACCES; + + bind_list = idr_find(ps, snum); + if (!bind_list) + return cma_alloc_port(ps, id_priv, snum); + + /* + * We don't support binding to any address if anyone is bound to + * a specific address on the same port. + */ + if (cma_any_addr(&id_priv->id.route.addr.src_addr)) + return -EADDRNOTAVAIL; + + hlist_for_each_entry(cur_id, node, &bind_list->owners, node) { + if (cma_any_addr(&cur_id->id.route.addr.src_addr)) + return -EADDRNOTAVAIL; + + cur_sin = (struct sockaddr_in *) &cur_id->id.route.addr.src_addr; + if (sin->sin_addr.s_addr == cur_sin->sin_addr.s_addr) + return -EADDRINUSE; + } + + cma_bind_port(bind_list, id_priv); + return 0; +} + +static int cma_get_port(struct rdma_id_private *id_priv) +{ + struct idr *ps; + int ret; + + switch (id_priv->id.ps) { + case RDMA_PS_SDP: + ps = &sdp_ps; + break; + case RDMA_PS_TCP: + ps = &tcp_ps; + break; + default: + return -EPROTONOSUPPORT; + } + + mutex_lock(&lock); + if (cma_any_port(&id_priv->id.route.addr.src_addr)) + ret = cma_alloc_port(ps, id_priv, 0); + else + ret = cma_use_port(ps, id_priv); + mutex_unlock(&lock); + + return ret; +} + +int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) +{ + struct rdma_id_private *id_priv; + int ret; + + if (addr->sa_family != AF_INET) + return -EAFNOSUPPORT; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND)) + return -EINVAL; + + if (!cma_any_addr(addr)) { + ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); + if (!ret) + ret = cma_acquire_dev(id_priv); + if (ret) + goto err; + } + + memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); + ret = cma_get_port(id_priv); + if (ret) + goto err; + + return 0; +err: + cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); + return ret; +} +EXPORT_SYMBOL(rdma_bind_addr); + +static int cma_format_hdr(void *hdr, enum rdma_port_space ps, + struct rdma_route *route) +{ + struct sockaddr_in *src4, *dst4; + struct cma_hdr *cma_hdr; + struct sdp_hh *sdp_hdr; + + src4 = (struct sockaddr_in *) &route->addr.src_addr; + dst4 = (struct sockaddr_in *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 4); + sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + sdp_hdr->port = src4->sin_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 4); + cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + cma_hdr->port = src4->sin_port; + break; + } + return 0; +} + +static int cma_connect_ib(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct ib_cm_req_param req; + struct rdma_route *route; + void *private_data; + int offset, ret; + + memset(&req, 0, sizeof req); + offset = cma_user_data_offset(id_priv->id.ps); + req.private_data_len = offset + conn_param->private_data_len; + private_data = kzalloc(req.private_data_len, GFP_ATOMIC); + if (!private_data) + return -ENOMEM; + + if (conn_param->private_data && conn_param->private_data_len) + memcpy(private_data + offset, conn_param->private_data, + conn_param->private_data_len); + + id_priv->cm_id.ib = ib_create_cm_id(id_priv->id.device, cma_ib_handler, + id_priv); + if (IS_ERR(id_priv->cm_id.ib)) { + ret = PTR_ERR(id_priv->cm_id.ib); + goto out; + } + + route = &id_priv->id.route; + ret = cma_format_hdr(private_data, id_priv->id.ps, route); + if (ret) + goto out; + req.private_data = private_data; + + req.primary_path = &route->path_rec[0]; + if (route->num_paths == 2) + req.alternate_path = &route->path_rec[1]; + + req.service_id = cma_get_service_id(id_priv->id.ps, + &route->addr.dst_addr); + req.qp_num = id_priv->qp_num; + req.qp_type = id_priv->qp_type; + req.starting_psn = id_priv->seq_num; + req.responder_resources = conn_param->responder_resources; + req.initiator_depth = conn_param->initiator_depth; + req.flow_control = conn_param->flow_control; + req.retry_count = conn_param->retry_count; + req.rnr_retry_count = conn_param->rnr_retry_count; + req.remote_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; + req.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; + req.max_cm_retries = CMA_MAX_CM_RETRIES; + req.srq = id_priv->srq ? 1 : 0; + + ret = ib_send_cm_req(id_priv->cm_id.ib, &req); +out: + kfree(private_data); + return ret; +} + +int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_CONNECT)) + return -EINVAL; + + if (!id->qp) { + id_priv->qp_num = conn_param->qp_num; + id_priv->qp_type = conn_param->qp_type; + id_priv->srq = conn_param->srq; + } + + switch (id->device->node_type) { + case IB_NODE_CA: + ret = cma_connect_ib(id_priv, conn_param); + break; + default: + ret = -ENOSYS; + break; + } + if (ret) + goto err; + + return 0; +err: + cma_comp_exch(id_priv, CMA_CONNECT, CMA_ROUTE_RESOLVED); + return ret; +} +EXPORT_SYMBOL(rdma_connect); + +static int cma_accept_ib(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct ib_cm_rep_param rep; + int ret; + + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + return ret; + + memset(&rep, 0, sizeof rep); + rep.qp_num = id_priv->qp_num; + rep.starting_psn = id_priv->seq_num; + rep.private_data = conn_param->private_data; + rep.private_data_len = conn_param->private_data_len; + rep.responder_resources = conn_param->responder_resources; + rep.initiator_depth = conn_param->initiator_depth; + rep.target_ack_delay = CMA_CM_RESPONSE_TIMEOUT; + rep.failover_accepted = 0; + rep.flow_control = conn_param->flow_control; + rep.rnr_retry_count = conn_param->rnr_retry_count; + rep.srq = id_priv->srq ? 1 : 0; + + return ib_send_cm_rep(id_priv->cm_id.ib, &rep); +} + +int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + if (!id->qp && conn_param) { + id_priv->qp_num = conn_param->qp_num; + id_priv->qp_type = conn_param->qp_type; + id_priv->srq = conn_param->srq; + } + + switch (id->device->node_type) { + case IB_NODE_CA: + if (conn_param) + ret = cma_accept_ib(id_priv, conn_param); + else + ret = cma_rep_recv(id_priv); + break; + default: + ret = -ENOSYS; + break; + } + + if (ret) + goto reject; + + return 0; +reject: + cma_modify_qp_err(id); + rdma_reject(id, NULL, 0); + return ret; +} +EXPORT_SYMBOL(rdma_accept); + +int rdma_reject(struct rdma_cm_id *id, const void *private_data, + u8 private_data_len) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + switch (id->device->node_type) { + case IB_NODE_CA: + ret = ib_send_cm_rej(id_priv->cm_id.ib, + IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, + private_data, private_data_len); + break; + default: + ret = -ENOSYS; + break; + } + return ret; +} +EXPORT_SYMBOL(rdma_reject); + +int rdma_disconnect(struct rdma_cm_id *id) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + ret = cma_modify_qp_err(id); + if (ret) + goto out; + + switch (id->device->node_type) { + case IB_NODE_CA: + /* Initiate or respond to a disconnect. */ + if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0)) + ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0); + break; + default: + break; + } +out: + return ret; +} +EXPORT_SYMBOL(rdma_disconnect); + +static void cma_add_one(struct ib_device *device) +{ + struct cma_device *cma_dev; + struct rdma_id_private *id_priv; + + cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL); + if (!cma_dev) + return; + + cma_dev->device = device; + cma_dev->node_guid = device->node_guid; + if (!cma_dev->node_guid) + goto err; + + init_completion(&cma_dev->comp); + atomic_set(&cma_dev->refcount, 1); + INIT_LIST_HEAD(&cma_dev->id_list); + ib_set_client_data(device, &cma_client, cma_dev); + + mutex_lock(&lock); + list_add_tail(&cma_dev->list, &dev_list); + list_for_each_entry(id_priv, &listen_any_list, list) + cma_listen_on_dev(id_priv, cma_dev); + mutex_unlock(&lock); + return; +err: + kfree(cma_dev); +} + +static int cma_remove_id_dev(struct rdma_id_private *id_priv) +{ + enum cma_state state; + + /* Record that we want to remove the device */ + state = cma_exch(id_priv, CMA_DEVICE_REMOVAL); + if (state == CMA_DESTROYING) + return 0; + + cma_cancel_operation(id_priv, state); + wait_event(id_priv->wait_remove, !atomic_read(&id_priv->dev_remove)); + + /* Check for destruction from another callback. */ + if (!cma_comp(id_priv, CMA_DEVICE_REMOVAL)) + return 0; + + return cma_notify_user(id_priv, RDMA_CM_EVENT_DEVICE_REMOVAL, + 0, NULL, 0); +} + +static void cma_process_remove(struct cma_device *cma_dev) +{ + struct list_head remove_list; + struct rdma_id_private *id_priv; + int ret; + + INIT_LIST_HEAD(&remove_list); + + mutex_lock(&lock); + while (!list_empty(&cma_dev->id_list)) { + id_priv = list_entry(cma_dev->id_list.next, + struct rdma_id_private, list); + + if (cma_internal_listen(id_priv)) { + cma_destroy_listen(id_priv); + continue; + } + + list_del(&id_priv->list); + list_add_tail(&id_priv->list, &remove_list); + atomic_inc(&id_priv->refcount); + mutex_unlock(&lock); + + ret = cma_remove_id_dev(id_priv); + cma_deref_id(id_priv); + if (ret) + rdma_destroy_id(&id_priv->id); + + mutex_lock(&lock); + } + mutex_unlock(&lock); + + cma_deref_dev(cma_dev); + wait_for_completion(&cma_dev->comp); +} + +static void cma_remove_one(struct ib_device *device) +{ + struct cma_device *cma_dev; + + cma_dev = ib_get_client_data(device, &cma_client); + if (!cma_dev) + return; + + mutex_lock(&lock); + list_del(&cma_dev->list); + mutex_unlock(&lock); + + cma_process_remove(cma_dev); + kfree(cma_dev); +} + +static int cma_init(void) +{ + int ret; + + cma_wq = create_singlethread_workqueue("rdma_cm_wq"); + if (!cma_wq) + return -ENOMEM; + + ret = ib_register_client(&cma_client); + if (ret) + goto err; + return 0; + +err: + destroy_workqueue(cma_wq); + return ret; +} + +static void cma_cleanup(void) +{ + ib_unregister_client(&cma_client); + destroy_workqueue(cma_wq); + idr_destroy(&sdp_ps); + idr_destroy(&tcp_ps); +} + +module_init(cma_init); +module_exit(cma_cleanup); diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 5ad41a6..92c7362 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -45,8 +45,7 @@ MODULE_DESCRIPTION("kernel IB MAD API"); MODULE_AUTHOR("Hal Rosenstock"); MODULE_AUTHOR("Sean Hefty"); - -kmem_cache_t *ib_mad_cache; +static kmem_cache_t *ib_mad_cache; static struct list_head ib_mad_port_list; static u32 ib_mad_client_id = 0; diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index b4fa28d..d147f3b 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -212,8 +212,6 @@ struct ib_mad_port_private { struct ib_mad_qp_info qp_info[IB_MAD_QPS_CORE]; }; -extern kmem_cache_t *ib_mad_cache; - int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr); struct ib_mad_send_wr_private * diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index 9164a09..fefc9b6 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -30,7 +30,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: ucm.c 2594 2005-06-13 19:46:02Z libor $ + * $Id: ucm.c 4311 2005-12-05 18:42:01Z sean.hefty $ */ #include @@ -50,6 +50,7 @@ #include #include #include +#include MODULE_AUTHOR("Libor Michalek"); MODULE_DESCRIPTION("InfiniBand userspace Connection Manager access"); @@ -205,36 +206,6 @@ error: return NULL; } -static void ib_ucm_event_path_get(struct ib_ucm_path_rec *upath, - struct ib_sa_path_rec *kpath) -{ - if (!kpath || !upath) - return; - - memcpy(upath->dgid, kpath->dgid.raw, sizeof *upath->dgid); - memcpy(upath->sgid, kpath->sgid.raw, sizeof *upath->sgid); - - upath->dlid = kpath->dlid; - upath->slid = kpath->slid; - upath->raw_traffic = kpath->raw_traffic; - upath->flow_label = kpath->flow_label; - upath->hop_limit = kpath->hop_limit; - upath->traffic_class = kpath->traffic_class; - upath->reversible = kpath->reversible; - upath->numb_path = kpath->numb_path; - upath->pkey = kpath->pkey; - upath->sl = kpath->sl; - upath->mtu_selector = kpath->mtu_selector; - upath->mtu = kpath->mtu; - upath->rate_selector = kpath->rate_selector; - upath->rate = kpath->rate; - upath->packet_life_time = kpath->packet_life_time; - upath->preference = kpath->preference; - - upath->packet_life_time_selector = - kpath->packet_life_time_selector; -} - static void ib_ucm_event_req_get(struct ib_ucm_req_event_resp *ureq, struct ib_cm_req_event_param *kreq) { @@ -253,8 +224,10 @@ static void ib_ucm_event_req_get(struct ureq->srq = kreq->srq; ureq->port = kreq->port; - ib_ucm_event_path_get(&ureq->primary_path, kreq->primary_path); - ib_ucm_event_path_get(&ureq->alternate_path, kreq->alternate_path); + ib_copy_path_rec_to_user(&ureq->primary_path, kreq->primary_path); + if (kreq->alternate_path) + ib_copy_path_rec_to_user(&ureq->alternate_path, + kreq->alternate_path); } static void ib_ucm_event_rep_get(struct ib_ucm_rep_event_resp *urep, @@ -324,8 +297,8 @@ static int ib_ucm_event_process(struct i info = evt->param.rej_rcvd.ari; break; case IB_CM_LAP_RECEIVED: - ib_ucm_event_path_get(&uvt->resp.u.lap_resp.path, - evt->param.lap_rcvd.alternate_path); + ib_copy_path_rec_to_user(&uvt->resp.u.lap_resp.path, + evt->param.lap_rcvd.alternate_path); uvt->data_len = IB_CM_LAP_PRIVATE_DATA_SIZE; uvt->resp.present = IB_UCM_PRES_ALTERNATE; break; @@ -637,65 +610,11 @@ static ssize_t ib_ucm_attr_id(struct ib_ return result; } -static void ib_ucm_copy_ah_attr(struct ib_ucm_ah_attr *dest_attr, - struct ib_ah_attr *src_attr) -{ - memcpy(dest_attr->grh_dgid, src_attr->grh.dgid.raw, - sizeof src_attr->grh.dgid); - dest_attr->grh_flow_label = src_attr->grh.flow_label; - dest_attr->grh_sgid_index = src_attr->grh.sgid_index; - dest_attr->grh_hop_limit = src_attr->grh.hop_limit; - dest_attr->grh_traffic_class = src_attr->grh.traffic_class; - - dest_attr->dlid = src_attr->dlid; - dest_attr->sl = src_attr->sl; - dest_attr->src_path_bits = src_attr->src_path_bits; - dest_attr->static_rate = src_attr->static_rate; - dest_attr->is_global = (src_attr->ah_flags & IB_AH_GRH); - dest_attr->port_num = src_attr->port_num; -} - -static void ib_ucm_copy_qp_attr(struct ib_ucm_init_qp_attr_resp *dest_attr, - struct ib_qp_attr *src_attr) -{ - dest_attr->cur_qp_state = src_attr->cur_qp_state; - dest_attr->path_mtu = src_attr->path_mtu; - dest_attr->path_mig_state = src_attr->path_mig_state; - dest_attr->qkey = src_attr->qkey; - dest_attr->rq_psn = src_attr->rq_psn; - dest_attr->sq_psn = src_attr->sq_psn; - dest_attr->dest_qp_num = src_attr->dest_qp_num; - dest_attr->qp_access_flags = src_attr->qp_access_flags; - - dest_attr->max_send_wr = src_attr->cap.max_send_wr; - dest_attr->max_recv_wr = src_attr->cap.max_recv_wr; - dest_attr->max_send_sge = src_attr->cap.max_send_sge; - dest_attr->max_recv_sge = src_attr->cap.max_recv_sge; - dest_attr->max_inline_data = src_attr->cap.max_inline_data; - - ib_ucm_copy_ah_attr(&dest_attr->ah_attr, &src_attr->ah_attr); - ib_ucm_copy_ah_attr(&dest_attr->alt_ah_attr, &src_attr->alt_ah_attr); - - dest_attr->pkey_index = src_attr->pkey_index; - dest_attr->alt_pkey_index = src_attr->alt_pkey_index; - dest_attr->en_sqd_async_notify = src_attr->en_sqd_async_notify; - dest_attr->sq_draining = src_attr->sq_draining; - dest_attr->max_rd_atomic = src_attr->max_rd_atomic; - dest_attr->max_dest_rd_atomic = src_attr->max_dest_rd_atomic; - dest_attr->min_rnr_timer = src_attr->min_rnr_timer; - dest_attr->port_num = src_attr->port_num; - dest_attr->timeout = src_attr->timeout; - dest_attr->retry_cnt = src_attr->retry_cnt; - dest_attr->rnr_retry = src_attr->rnr_retry; - dest_attr->alt_port_num = src_attr->alt_port_num; - dest_attr->alt_timeout = src_attr->alt_timeout; -} - static ssize_t ib_ucm_init_qp_attr(struct ib_ucm_file *file, const char __user *inbuf, int in_len, int out_len) { - struct ib_ucm_init_qp_attr_resp resp; + struct ib_uverbs_qp_attr resp; struct ib_ucm_init_qp_attr cmd; struct ib_ucm_context *ctx; struct ib_qp_attr qp_attr; @@ -718,7 +637,7 @@ static ssize_t ib_ucm_init_qp_attr(struc if (result) goto out; - ib_ucm_copy_qp_attr(&resp, &qp_attr); + ib_copy_qp_attr_to_user(&resp, &qp_attr); if (copy_to_user((void __user *)(unsigned long)cmd.response, &resp, sizeof(resp))) @@ -729,6 +648,17 @@ out: return result; } +static int ucm_validate_listen(__be64 service_id, __be64 service_mask) +{ + service_id &= service_mask; + + if (((service_id & IB_CMA_SERVICE_ID_MASK) == IB_CMA_SERVICE_ID) || + ((service_id & IB_SDP_SERVICE_ID_MASK) == IB_SDP_SERVICE_ID)) + return -EINVAL; + + return 0; +} + static ssize_t ib_ucm_listen(struct ib_ucm_file *file, const char __user *inbuf, int in_len, int out_len) @@ -744,7 +674,13 @@ static ssize_t ib_ucm_listen(struct ib_u if (IS_ERR(ctx)) return PTR_ERR(ctx); - result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask); + result = ucm_validate_listen(cmd.service_id, cmd.service_mask); + if (result) + goto out; + + result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask, + NULL); +out: ib_ucm_ctx_put(ctx); return result; } @@ -793,7 +729,7 @@ static int ib_ucm_alloc_data(const void static int ib_ucm_path_get(struct ib_sa_path_rec **path, u64 src) { - struct ib_ucm_path_rec ucm_path; + struct ib_user_path_rec upath; struct ib_sa_path_rec *sa_path; *path = NULL; @@ -805,36 +741,14 @@ static int ib_ucm_path_get(struct ib_sa_ if (!sa_path) return -ENOMEM; - if (copy_from_user(&ucm_path, (void __user *)(unsigned long)src, - sizeof(ucm_path))) { + if (copy_from_user(&upath, (void __user *)(unsigned long)src, + sizeof(upath))) { kfree(sa_path); return -EFAULT; } - memcpy(sa_path->dgid.raw, ucm_path.dgid, sizeof sa_path->dgid); - memcpy(sa_path->sgid.raw, ucm_path.sgid, sizeof sa_path->sgid); - - sa_path->dlid = ucm_path.dlid; - sa_path->slid = ucm_path.slid; - sa_path->raw_traffic = ucm_path.raw_traffic; - sa_path->flow_label = ucm_path.flow_label; - sa_path->hop_limit = ucm_path.hop_limit; - sa_path->traffic_class = ucm_path.traffic_class; - sa_path->reversible = ucm_path.reversible; - sa_path->numb_path = ucm_path.numb_path; - sa_path->pkey = ucm_path.pkey; - sa_path->sl = ucm_path.sl; - sa_path->mtu_selector = ucm_path.mtu_selector; - sa_path->mtu = ucm_path.mtu; - sa_path->rate_selector = ucm_path.rate_selector; - sa_path->rate = ucm_path.rate; - sa_path->packet_life_time = ucm_path.packet_life_time; - sa_path->preference = ucm_path.preference; - - sa_path->packet_life_time_selector = - ucm_path.packet_life_time_selector; - + ib_copy_path_rec_from_user(sa_path, &upath); *path = sa_path; return 0; } @@ -1245,8 +1159,10 @@ static unsigned int ib_ucm_poll(struct f poll_wait(filp, &file->poll_wait, wait); + down(&file->mutex); if (!list_empty(&file->events)) mask = POLLIN | POLLRDNORM; + up(&file->mutex); return mask; } diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c new file mode 100644 index 0000000..3e21f10 --- /dev/null +++ b/drivers/infiniband/core/ucma.c @@ -0,0 +1,811 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access"); +MODULE_LICENSE("Dual BSD/GPL"); + +enum { + UCMA_MAX_BACKLOG = 128 +}; + +struct ucma_file { + struct mutex file_mutex; + struct file *filp; + struct list_head ctxs; + struct list_head events; + wait_queue_head_t poll_wait; +}; + +struct ucma_context { + int id; + wait_queue_head_t wait; + atomic_t ref; + int events_reported; + int backlog; + + struct ucma_file *file; + struct rdma_cm_id *cm_id; + __u64 uid; + + struct list_head events; /* list of pending events. */ + struct list_head file_list; /* member in file ctx list */ +}; + +struct ucma_event { + struct ucma_context *ctx; + struct list_head file_list; /* member in file event list */ + struct list_head ctx_list; /* member in ctx event list */ + struct rdma_cm_id *cm_id; + struct rdma_ucm_event_resp resp; +}; + +static DEFINE_MUTEX(ctx_mutex); +static DEFINE_IDR(ctx_idr); + +static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id) +{ + struct ucma_context *ctx; + + mutex_lock(&ctx_mutex); + ctx = idr_find(&ctx_idr, id); + if (!ctx) + ctx = ERR_PTR(-ENOENT); + else if (ctx->file != file) + ctx = ERR_PTR(-EINVAL); + else + atomic_inc(&ctx->ref); + mutex_unlock(&ctx_mutex); + + return ctx; +} + +static void ucma_put_ctx(struct ucma_context *ctx) +{ + if (atomic_dec_and_test(&ctx->ref)) + wake_up(&ctx->wait); +} + +static void ucma_cleanup_events(struct ucma_context *ctx) +{ + struct ucma_event *uevent; + + mutex_lock(&ctx->file->file_mutex); + list_del(&ctx->file_list); + while (!list_empty(&ctx->events)) { + + uevent = list_entry(ctx->events.next, struct ucma_event, + ctx_list); + list_del(&uevent->file_list); + list_del(&uevent->ctx_list); + + /* clear incoming connections. */ + if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) + rdma_destroy_id(uevent->cm_id); + + kfree(uevent); + } + mutex_unlock(&ctx->file->file_mutex); +} + +static struct ucma_context* ucma_alloc_ctx(struct ucma_file *file) +{ + struct ucma_context *ctx; + int ret; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return NULL; + + atomic_set(&ctx->ref, 1); + init_waitqueue_head(&ctx->wait); + ctx->file = file; + INIT_LIST_HEAD(&ctx->events); + + do { + ret = idr_pre_get(&ctx_idr, GFP_KERNEL); + if (!ret) + goto error; + + mutex_lock(&ctx_mutex); + ret = idr_get_new(&ctx_idr, ctx, &ctx->id); + mutex_unlock(&ctx_mutex); + } while (ret == -EAGAIN); + + if (ret) + goto error; + + list_add_tail(&ctx->file_list, &file->ctxs); + return ctx; + +error: + kfree(ctx); + return NULL; +} + +static int ucma_event_handler(struct rdma_cm_id *cm_id, + struct rdma_cm_event *event) +{ + struct ucma_event *uevent; + struct ucma_context *ctx = cm_id->context; + int ret = 0; + + uevent = kzalloc(sizeof(*uevent), GFP_KERNEL); + if (!uevent) + return event->event == RDMA_CM_EVENT_CONNECT_REQUEST; + + uevent->ctx = ctx; + uevent->cm_id = cm_id; + uevent->resp.uid = ctx->uid; + uevent->resp.id = ctx->id; + uevent->resp.event = event->event; + uevent->resp.status = event->status; + if ((uevent->resp.private_data_len = event->private_data_len)) + memcpy(uevent->resp.private_data, event->private_data, + event->private_data_len); + + mutex_lock(&ctx->file->file_mutex); + if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) { + if (!ctx->backlog) { + ret = -EDQUOT; + goto out; + } + ctx->backlog--; + } + list_add_tail(&uevent->file_list, &ctx->file->events); + list_add_tail(&uevent->ctx_list, &ctx->events); + wake_up_interruptible(&ctx->file->poll_wait); +out: + mutex_unlock(&ctx->file->file_mutex); + return ret; +} + +static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct ucma_context *ctx; + struct rdma_ucm_get_event cmd; + struct ucma_event *uevent; + int ret = 0; + DEFINE_WAIT(wait); + + if (out_len < sizeof(struct rdma_ucm_event_resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&file->file_mutex); + while (list_empty(&file->events)) { + if (file->filp->f_flags & O_NONBLOCK) { + ret = -EAGAIN; + break; + } + + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + + prepare_to_wait(&file->poll_wait, &wait, TASK_INTERRUPTIBLE); + mutex_unlock(&file->file_mutex); + schedule(); + mutex_lock(&file->file_mutex); + finish_wait(&file->poll_wait, &wait); + } + + if (ret) + goto done; + + uevent = list_entry(file->events.next, struct ucma_event, file_list); + + if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) { + ctx = ucma_alloc_ctx(file); + if (!ctx) { + ret = -ENOMEM; + goto done; + } + uevent->ctx->backlog++; + ctx->cm_id = uevent->cm_id; + ctx->cm_id->context = ctx; + uevent->resp.id = ctx->id; + } + + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &uevent->resp, sizeof(uevent->resp))) { + ret = -EFAULT; + goto done; + } + + list_del(&uevent->file_list); + list_del(&uevent->ctx_list); + uevent->ctx->events_reported++; + kfree(uevent); +done: + mutex_unlock(&file->file_mutex); + return ret; +} + +static ssize_t ucma_create_id(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_create_id cmd; + struct rdma_ucm_create_id_resp resp; + struct ucma_context *ctx; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&file->file_mutex); + ctx = ucma_alloc_ctx(file); + mutex_unlock(&file->file_mutex); + if (!ctx) + return -ENOMEM; + + ctx->uid = cmd.uid; + ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, RDMA_PS_TCP); + if (IS_ERR(ctx->cm_id)) { + ret = PTR_ERR(ctx->cm_id); + goto err1; + } + + resp.id = ctx->id; + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) { + ret = -EFAULT; + goto err2; + } + return 0; + +err2: + rdma_destroy_id(ctx->cm_id); +err1: + mutex_lock(&ctx_mutex); + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&ctx_mutex); + kfree(ctx); + return ret; +} + +static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_destroy_id cmd; + struct rdma_ucm_destroy_id_resp resp; + struct ucma_context *ctx; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&ctx_mutex); + ctx = idr_find(&ctx_idr, cmd.id); + if (!ctx) + ctx = ERR_PTR(-ENOENT); + else if (ctx->file != file) + ctx = ERR_PTR(-EINVAL); + else + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&ctx_mutex); + + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + atomic_dec(&ctx->ref); + wait_event(ctx->wait, !atomic_read(&ctx->ref)); + + /* No new events will be generated after destroying the id. */ + rdma_destroy_id(ctx->cm_id); + /* Cleanup events not yet reported to the user. */ + ucma_cleanup_events(ctx); + + resp.events_reported = ctx->events_reported; + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + + kfree(ctx); + return ret; +} + +static ssize_t ucma_bind_addr(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_bind_addr cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_resolve_addr(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_resolve_addr cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr, + (struct sockaddr *) &cmd.dst_addr, + cmd.timeout_ms); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_resolve_route(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_resolve_route cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_resolve_route(ctx->cm_id, cmd.timeout_ms); + ucma_put_ctx(ctx); + return ret; +} + +static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp, + struct rdma_route *route) +{ + struct rdma_dev_addr *dev_addr; + + resp->num_paths = route->num_paths; + switch (route->num_paths) { + case 0: + dev_addr = &route->addr.dev_addr; + memcpy(&resp->ib_route[0].dgid, ib_addr_get_dgid(dev_addr), + sizeof(union ib_gid)); + memcpy(&resp->ib_route[0].sgid, ib_addr_get_sgid(dev_addr), + sizeof(union ib_gid)); + resp->ib_route[0].pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); + break; + case 2: + ib_copy_path_rec_to_user(&resp->ib_route[1], + &route->path_rec[1]); + /* fall through */ + case 1: + ib_copy_path_rec_to_user(&resp->ib_route[0], + &route->path_rec[0]); + break; + default: + break; + } +} + +static ssize_t ucma_query_route(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_query_route cmd; + struct rdma_ucm_query_route_resp resp; + struct ucma_context *ctx; + struct sockaddr *addr; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + memset(&resp, 0, sizeof resp); + addr = &ctx->cm_id->route.addr.src_addr; + memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ? + sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6)); + addr = &ctx->cm_id->route.addr.dst_addr; + memcpy(&resp.dst_addr, addr, addr->sa_family == AF_INET ? + sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6)); + if (!ctx->cm_id->device) + goto out; + + resp.node_guid = ctx->cm_id->device->node_guid; + resp.port_num = ctx->cm_id->port_num; + switch (ctx->cm_id->device->node_type) { + case IB_NODE_CA: + ucma_copy_ib_route(&resp, &ctx->cm_id->route); + default: + break; + } + +out: + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + + ucma_put_ctx(ctx); + return ret; +} + +static void ucma_copy_conn_param(struct rdma_conn_param *dst_conn, + struct rdma_ucm_conn_param *src_conn) +{ + dst_conn->private_data = src_conn->private_data; + dst_conn->private_data_len = src_conn->private_data_len; + dst_conn->responder_resources =src_conn->responder_resources; + dst_conn->initiator_depth = src_conn->initiator_depth; + dst_conn->flow_control = src_conn->flow_control; + dst_conn->retry_count = src_conn->retry_count; + dst_conn->rnr_retry_count = src_conn->rnr_retry_count; + dst_conn->srq = src_conn->srq; + dst_conn->qp_num = src_conn->qp_num; + dst_conn->qp_type = src_conn->qp_type; +} + +static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_connect cmd; + struct rdma_conn_param conn_param; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + if (!cmd.conn_param.valid) + return -EINVAL; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ucma_copy_conn_param(&conn_param, &cmd.conn_param); + ret = rdma_connect(ctx->cm_id, &conn_param); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_listen cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ctx->backlog = cmd.backlog > 0 && cmd.backlog < UCMA_MAX_BACKLOG ? + cmd.backlog : UCMA_MAX_BACKLOG; + ret = rdma_listen(ctx->cm_id, ctx->backlog); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_accept cmd; + struct rdma_conn_param conn_param; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + if (cmd.conn_param.valid) { + ctx->uid = cmd.uid; + ucma_copy_conn_param(&conn_param, &cmd.conn_param); + ret = rdma_accept(ctx->cm_id, &conn_param); + } else + ret = rdma_accept(ctx->cm_id, NULL); + + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_reject cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_disconnect(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_disconnect cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_disconnect(ctx->cm_id); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_init_qp_attr(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_init_qp_attr cmd; + struct ib_uverbs_qp_attr resp; + struct ucma_context *ctx; + struct ib_qp_attr qp_attr; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + resp.qp_attr_mask = 0; + memset(&qp_attr, 0, sizeof qp_attr); + qp_attr.qp_state = cmd.qp_state; + ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask); + if (ret) + goto out; + + ib_copy_qp_attr_to_user(&resp, &qp_attr); + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + +out: + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t (*ucma_cmd_table[])(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) = { + [RDMA_USER_CM_CMD_CREATE_ID] = ucma_create_id, + [RDMA_USER_CM_CMD_DESTROY_ID] = ucma_destroy_id, + [RDMA_USER_CM_CMD_BIND_ADDR] = ucma_bind_addr, + [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr, + [RDMA_USER_CM_CMD_RESOLVE_ROUTE]= ucma_resolve_route, + [RDMA_USER_CM_CMD_QUERY_ROUTE] = ucma_query_route, + [RDMA_USER_CM_CMD_CONNECT] = ucma_connect, + [RDMA_USER_CM_CMD_LISTEN] = ucma_listen, + [RDMA_USER_CM_CMD_ACCEPT] = ucma_accept, + [RDMA_USER_CM_CMD_REJECT] = ucma_reject, + [RDMA_USER_CM_CMD_DISCONNECT] = ucma_disconnect, + [RDMA_USER_CM_CMD_INIT_QP_ATTR] = ucma_init_qp_attr, + [RDMA_USER_CM_CMD_GET_EVENT] = ucma_get_event +}; + +static ssize_t ucma_write(struct file *filp, const char __user *buf, + size_t len, loff_t *pos) +{ + struct ucma_file *file = filp->private_data; + struct rdma_ucm_cmd_hdr hdr; + ssize_t ret; + + if (len < sizeof(hdr)) + return -EINVAL; + + if (copy_from_user(&hdr, buf, sizeof(hdr))) + return -EFAULT; + + if (hdr.cmd < 0 || hdr.cmd >= ARRAY_SIZE(ucma_cmd_table)) + return -EINVAL; + + if (hdr.in + sizeof(hdr) > len) + return -EINVAL; + + ret = ucma_cmd_table[hdr.cmd](file, buf + sizeof(hdr), hdr.in, hdr.out); + if (!ret) + ret = len; + + return ret; +} + +static unsigned int ucma_poll(struct file *filp, struct poll_table_struct *wait) +{ + struct ucma_file *file = filp->private_data; + unsigned int mask = 0; + + poll_wait(filp, &file->poll_wait, wait); + + mutex_lock(&file->file_mutex); + if (!list_empty(&file->events)) + mask = POLLIN | POLLRDNORM; + mutex_unlock(&file->file_mutex); + + return mask; +} + +static int ucma_open(struct inode *inode, struct file *filp) +{ + struct ucma_file *file; + + file = kmalloc(sizeof *file, GFP_KERNEL); + if (!file) + return -ENOMEM; + + INIT_LIST_HEAD(&file->events); + INIT_LIST_HEAD(&file->ctxs); + init_waitqueue_head(&file->poll_wait); + mutex_init(&file->file_mutex); + + filp->private_data = file; + file->filp = filp; + return 0; +} + +static int ucma_close(struct inode *inode, struct file *filp) +{ + struct ucma_file *file = filp->private_data; + struct ucma_context *ctx; + + mutex_lock(&file->file_mutex); + while (!list_empty(&file->ctxs)) { + ctx = list_entry(file->ctxs.next, struct ucma_context, + file_list); + mutex_unlock(&file->file_mutex); + + mutex_lock(&ctx_mutex); + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&ctx_mutex); + + rdma_destroy_id(ctx->cm_id); + ucma_cleanup_events(ctx); + kfree(ctx); + + mutex_lock(&file->file_mutex); + } + mutex_unlock(&file->file_mutex); + kfree(file); + return 0; +} + +static struct file_operations ucma_fops = { + .owner = THIS_MODULE, + .open = ucma_open, + .release = ucma_close, + .write = ucma_write, + .poll = ucma_poll, +}; + +static struct miscdevice ucma_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "rdma_cm", + .fops = &ucma_fops, +}; + +static ssize_t show_abi_version(struct class_device *class_dev, char *buf) +{ + return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION); +} +static CLASS_DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); + +static int __init ucma_init(void) +{ + int ret; + + ret = misc_register(&ucma_misc); + if (ret) + return ret; + + ret = class_device_create_file(ucma_misc.class, + &class_device_attr_abi_version); + if (ret) { + printk(KERN_ERR "rdma_ucm: couldn't create abi_version attr\n"); + goto err; + } + return 0; +err: + misc_deregister(&ucma_misc); + return ret; +} + +static void __exit ucma_cleanup(void) +{ + class_device_remove_file(ucma_misc.class, + &class_device_attr_abi_version); + misc_deregister(&ucma_misc); + idr_destroy(&ctx_idr); +} + +module_init(ucma_init); +module_exit(ucma_cleanup); diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c new file mode 100644 index 0000000..ce46b13 --- /dev/null +++ b/drivers/infiniband/core/uverbs_marshall.c @@ -0,0 +1,138 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include + +static void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, + struct ib_ah_attr *src) +{ + memcpy(dst->grh.dgid, src->grh.dgid.raw, sizeof src->grh.dgid); + dst->grh.flow_label = src->grh.flow_label; + dst->grh.sgid_index = src->grh.sgid_index; + dst->grh.hop_limit = src->grh.hop_limit; + dst->grh.traffic_class = src->grh.traffic_class; + dst->dlid = src->dlid; + dst->sl = src->sl; + dst->src_path_bits = src->src_path_bits; + dst->static_rate = src->static_rate; + dst->is_global = src->ah_flags & IB_AH_GRH ? 1 : 0; + dst->port_num = src->port_num; +} + +void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, + struct ib_qp_attr *src) +{ + dst->cur_qp_state = src->cur_qp_state; + dst->path_mtu = src->path_mtu; + dst->path_mig_state = src->path_mig_state; + dst->qkey = src->qkey; + dst->rq_psn = src->rq_psn; + dst->sq_psn = src->sq_psn; + dst->dest_qp_num = src->dest_qp_num; + dst->qp_access_flags = src->qp_access_flags; + + dst->max_send_wr = src->cap.max_send_wr; + dst->max_recv_wr = src->cap.max_recv_wr; + dst->max_send_sge = src->cap.max_send_sge; + dst->max_recv_sge = src->cap.max_recv_sge; + dst->max_inline_data = src->cap.max_inline_data; + + ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr); + ib_copy_ah_attr_to_user(&dst->alt_ah_attr, &src->alt_ah_attr); + + dst->pkey_index = src->pkey_index; + dst->alt_pkey_index = src->alt_pkey_index; + dst->en_sqd_async_notify = src->en_sqd_async_notify; + dst->sq_draining = src->sq_draining; + dst->max_rd_atomic = src->max_rd_atomic; + dst->max_dest_rd_atomic = src->max_dest_rd_atomic; + dst->min_rnr_timer = src->min_rnr_timer; + dst->port_num = src->port_num; + dst->timeout = src->timeout; + dst->retry_cnt = src->retry_cnt; + dst->rnr_retry = src->rnr_retry; + dst->alt_port_num = src->alt_port_num; + dst->alt_timeout = src->alt_timeout; +} +EXPORT_SYMBOL(ib_copy_qp_attr_to_user); + +void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst, + struct ib_sa_path_rec *src) +{ + memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid); + memcpy(dst->sgid, src->sgid.raw, sizeof src->sgid); + + dst->dlid = src->dlid; + dst->slid = src->slid; + dst->raw_traffic = src->raw_traffic; + dst->flow_label = src->flow_label; + dst->hop_limit = src->hop_limit; + dst->traffic_class = src->traffic_class; + dst->reversible = src->reversible; + dst->numb_path = src->numb_path; + dst->pkey = src->pkey; + dst->sl = src->sl; + dst->mtu_selector = src->mtu_selector; + dst->mtu = src->mtu; + dst->rate_selector = src->rate_selector; + dst->rate = src->rate; + dst->packet_life_time = src->packet_life_time; + dst->preference = src->preference; + dst->packet_life_time_selector = src->packet_life_time_selector; +} +EXPORT_SYMBOL(ib_copy_path_rec_to_user); + +void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst, + struct ib_user_path_rec *src) +{ + memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid); + memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid); + + dst->dlid = src->dlid; + dst->slid = src->slid; + dst->raw_traffic = src->raw_traffic; + dst->flow_label = src->flow_label; + dst->hop_limit = src->hop_limit; + dst->traffic_class = src->traffic_class; + dst->reversible = src->reversible; + dst->numb_path = src->numb_path; + dst->pkey = src->pkey; + dst->sl = src->sl; + dst->mtu_selector = src->mtu_selector; + dst->mtu = src->mtu; + dst->rate_selector = src->rate_selector; + dst->rate = src->rate; + dst->packet_life_time = src->packet_life_time; + dst->preference = src->preference; + dst->packet_life_time_selector = src->packet_life_time_selector; +} +EXPORT_SYMBOL(ib_copy_path_rec_from_user); diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 798e13e..d0f7731 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -174,7 +174,6 @@ enum { struct mthca_cmd_context { struct completion done; - struct timer_list timer; int result; int next; u64 out_param; @@ -362,15 +361,6 @@ void mthca_cmd_event(struct mthca_dev *d complete(&context->done); } -static void event_timeout(unsigned long context_ptr) -{ - struct mthca_cmd_context *context = - (struct mthca_cmd_context *) context_ptr; - - context->result = -EBUSY; - complete(&context->done); -} - static int mthca_cmd_wait(struct mthca_dev *dev, u64 in_param, u64 *out_param, @@ -401,11 +391,10 @@ static int mthca_cmd_wait(struct mthca_d if (err) goto out; - context->timer.expires = jiffies + timeout; - add_timer(&context->timer); - - wait_for_completion(&context->done); - del_timer_sync(&context->timer); + if (!wait_for_completion_timeout(&context->done, timeout)) { + err = -EBUSY; + goto out; + } err = context->result; if (err) @@ -535,10 +524,6 @@ int mthca_cmd_use_events(struct mthca_de for (i = 0; i < dev->cmd.max_cmds; ++i) { dev->cmd.context[i].token = i; dev->cmd.context[i].next = i + 1; - init_timer(&dev->cmd.context[i].timer); - dev->cmd.context[i].timer.data = - (unsigned long) &dev->cmd.context[i]; - dev->cmd.context[i].timer.function = event_timeout; } dev->cmd.context[dev->cmd.max_cmds - 1].next = -1; diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c index 99f109c..d536217 100644 --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -695,10 +695,6 @@ static void mthca_unmap_reg(struct mthca static int __devinit mthca_map_eq_regs(struct mthca_dev *dev) { - unsigned long mthca_base; - - mthca_base = pci_resource_start(dev->pdev, 0); - if (mthca_is_memfree(dev)) { /* * We assume that the EQ arm and EQ set CI registers diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig new file mode 100644 index 0000000..fead87d --- /dev/null +++ b/drivers/infiniband/ulp/iser/Kconfig @@ -0,0 +1,11 @@ +config INFINIBAND_ISER + tristate "ISCSI RDMA Protocol" + depends on INFINIBAND && SCSI + select SCSI_ISCSI_ATTRS + ---help--- + Support for the ISCSI RDMA Protocol over InfiniBand. This + allows you to access storage devices that speak ISER/ISCSI + over InfiniBand. + + The ISER protocol is defined by IETF. + See . diff --git a/drivers/infiniband/ulp/iser/Makefile b/drivers/infiniband/ulp/iser/Makefile new file mode 100644 index 0000000..fe6cd15 --- /dev/null +++ b/drivers/infiniband/ulp/iser/Makefile @@ -0,0 +1,4 @@ +obj-$(CONFIG_INFINIBAND_ISER) += ib_iser.o + +ib_iser-y := iser_verbs.o iser_initiator.o iser_memory.o \ + iscsi_iser.o diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c new file mode 100644 index 0000000..4f724a3 --- /dev/null +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -0,0 +1,794 @@ +/* + * iSCSI Initiator over iSER Data-Path + * + * Copyright (C) 2004 Dmitry Yusupov + * Copyright (C) 2004 Alex Aizman + * Copyright (C) 2005 Mike Christie + * Copyright (c) 2005, 2006 Voltaire, Inc. All rights reserved. + * maintained by openib-general@openib.org + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Credits: + * Christoph Hellwig + * FUJITA Tomonori + * Arne Redlich + * Zhenyu Wang + * Modified by: + * Erez Zilber + * + * + * $Id: iscsi_iser.c 6965 2006-05-07 11:36:20Z ogerlitz $ + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +static unsigned int iscsi_max_lun = 512; +module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO); + +int iser_debug_level = 0; + +MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover " + "v" DRV_VER " (" DRV_DATE ")"); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Alex Nezhinsky, Dan Bar Dov, Or Gerlitz"); + +module_param_named(debug_level, iser_debug_level, int, 0644); +MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)"); + +struct iser_global ig; + +void +iscsi_iser_recv(struct iscsi_conn *conn, + struct iscsi_hdr *hdr, char *rx_data, int rx_data_len) +{ + int rc = 0; + uint32_t ret_itt; + int datalen; + int ahslen; + + /* verify PDU length */ + datalen = ntoh24(hdr->dlength); + if (datalen != rx_data_len) { + printk(KERN_ERR "iscsi_iser: datalen %d (hdr) != %d (IB) \n", + datalen, rx_data_len); + rc = ISCSI_ERR_DATALEN; + goto error; + } + + /* read AHS */ + ahslen = hdr->hlength * 4; + + /* verify itt (itt encoding: age+cid+itt) */ + rc = iscsi_verify_itt(conn, hdr, &ret_itt); + + if (!rc) + rc = iscsi_complete_pdu(conn, hdr, rx_data, rx_data_len); + + if (rc && rc != ISCSI_ERR_NO_SCSI_CMD) + goto error; + + return; +error: + iscsi_conn_failure(conn, rc); +} + + +/** + * iscsi_iser_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands + * + **/ +static void +iscsi_iser_cmd_init(struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_conn *iser_conn = ctask->conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct scsi_cmnd *sc = ctask->sc; + + iser_ctask->command_sent = 0; + iser_ctask->iser_conn = iser_conn; + + if (sc->sc_data_direction == DMA_TO_DEVICE) { + BUG_ON(ctask->total_length == 0); + /* bytes to be sent via RDMA operations */ + iser_ctask->rdma_data_count = ctask->total_length - + ctask->imm_count - + ctask->unsol_count; + + debug_scsi("cmd [itt %x total %d imm %d imm_data %d " + "rdma_data %d]\n", + ctask->itt, ctask->total_length, ctask->imm_count, + ctask->unsol_count, ctask->rdma_data_count); + } else + /* bytes to be sent via RDMA operations */ + iser_ctask->rdma_data_count = ctask->total_length; + + iser_ctask_rdma_init(iser_ctask); +} + +/** + * iscsi_mtask_xmit - xmit management(immediate) task + * @conn: iscsi connection + * @mtask: task management task + * + * Notes: + * The function can return -EAGAIN in which case caller must + * call it again later, or recover. '0' return code means successful + * xmit. + * + **/ +static int +iscsi_iser_mtask_xmit(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask) +{ + int error = 0; + + debug_scsi("mtask deq [cid %d itt 0x%x]\n", conn->id, mtask->itt); + + error = iser_send_control(conn, mtask); + + /* since iser xmits control with zero copy, mtasks can not be recycled + * right after sending them. + * The recycling scheme is based on whether a response is expected + * - if yes, the mtask is recycled at iscsi_complete_pdu + * - if no, the mtask is recycled at iser_snd_completion + */ + if (error && error != -EAGAIN) + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + + return error; +} + +static int +iscsi_iser_ctask_xmit_unsol_data(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_data hdr; + int error = 0; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + + /* Send data-out PDUs while there's still unsolicited data to send */ + while (ctask->unsol_count > 0) { + iscsi_prep_unsolicit_data_pdu(ctask, &hdr, + iser_ctask->rdma_data_count); + + debug_scsi("Sending data-out: itt 0x%x, data count %d\n", + hdr.itt, ctask->data_count); + + /* the buffer description has been passed with the command */ + /* Send the command */ + error = iser_send_data_out(conn, ctask, &hdr); + if (error) { + ctask->unsol_datasn--; + goto iscsi_iser_ctask_xmit_unsol_data_exit; + } + ctask->unsol_count -= ctask->data_count; + debug_scsi("Need to send %d more as data-out PDUs\n", + ctask->unsol_count); + } + +iscsi_iser_ctask_xmit_unsol_data_exit: + return error; +} + +static int +iscsi_iser_ctask_xmit(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + int error = 0; + + debug_scsi("ctask deq [cid %d itt 0x%x]\n", + conn->id, ctask->itt); + + /* + * serialize with TMF AbortTask + */ + if (ctask->mtask) + return error; + + /* Send the cmd PDU */ + if (!iser_ctask->command_sent) { + error = iser_send_command(conn, ctask); + if (error) + goto iscsi_iser_ctask_xmit_exit; + iser_ctask->command_sent = 1; + } + + /* Send unsolicited data-out PDU(s) if necessary */ + if (ctask->unsol_count) + error = iscsi_iser_ctask_xmit_unsol_data(conn, ctask); + + iscsi_iser_ctask_xmit_exit: + if (error && error != -EAGAIN) + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + return error; +} + +static void +iscsi_iser_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + + if (iser_ctask->status == ISER_TASK_STATUS_STARTED) { + iser_ctask->status = ISER_TASK_STATUS_COMPLETED; + iser_ctask_rdma_finalize(iser_ctask); + } +} + +static struct iser_conn * +iscsi_iser_ib_conn_lookup(__u64 ep_handle) +{ + struct iser_conn *ib_conn; + struct iser_conn *uib_conn = (struct iser_conn *)(unsigned long)ep_handle; + + mutex_lock(&ig.connlist_mutex); + list_for_each_entry(ib_conn, &ig.connlist, conn_list) { + if (ib_conn == uib_conn) { + mutex_unlock(&ig.connlist_mutex); + return ib_conn; + } + } + mutex_unlock(&ig.connlist_mutex); + iser_err("no conn exists for eph %llx\n",(unsigned long long)ep_handle); + return NULL; +} + +static struct iscsi_cls_conn * +iscsi_iser_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx) +{ + struct iscsi_conn *conn; + struct iscsi_cls_conn *cls_conn; + struct iscsi_iser_conn *iser_conn; + + cls_conn = iscsi_conn_setup(cls_session, conn_idx); + if (!cls_conn) + return NULL; + conn = cls_conn->dd_data; + + /* + * due to issues with the login code re iser sematics + * this not set in iscsi_conn_setup - FIXME + */ + conn->max_recv_dlength = 128; + + iser_conn = kzalloc(sizeof(*iser_conn), GFP_KERNEL); + if (!iser_conn) + goto conn_alloc_fail; + + /* currently this is the only field which need to be initiated */ + rwlock_init(&iser_conn->lock); + + conn->recv_lock = &iser_conn->lock; + + conn->dd_data = iser_conn; + iser_conn->iscsi_conn = conn; + + return cls_conn; + +conn_alloc_fail: + iscsi_conn_teardown(cls_conn); + return NULL; +} + +static void +iscsi_iser_conn_destroy(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + iscsi_conn_teardown(cls_conn); + kfree(iser_conn); +} + +static int +iscsi_iser_conn_bind(struct iscsi_cls_session *cls_session, + struct iscsi_cls_conn *cls_conn, uint64_t transport_eph, + int is_leading) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_iser_conn *iser_conn; + struct iser_conn *ib_conn; + int error; + + error = iscsi_conn_bind(cls_session, cls_conn, is_leading); + if (error) + return error; + + if (conn->stop_stage != STOP_CONN_SUSPEND) { + /* the transport ep handle comes from user space so it must be + * verified against the global ib connections list */ + ib_conn = iscsi_iser_ib_conn_lookup(transport_eph); + if (!ib_conn) { + iser_err("can't bind eph %llx\n", + (unsigned long long)transport_eph); + return -EINVAL; + } + /* binds the iSER connection retrieved from the previously + * connected ep_handle to the iSCSI layer connection. exchanges + * connection pointers */ + iser_err("binding iscsi conn %p to iser_conn %p\n",conn,ib_conn); + iser_conn = conn->dd_data; + ib_conn->iser_conn = iser_conn; + iser_conn->ib_conn = ib_conn; + } + + return 0; +} + +static int +iscsi_iser_conn_start(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + int err; + + err = iscsi_conn_start(cls_conn); + if (err) + return err; + + return iser_conn_set_full_featured_mode(conn); +} + +static void +iscsi_iser_conn_terminate(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_conn *ib_conn = iser_conn->ib_conn; + + BUG_ON(!ib_conn); + /* starts conn teardown process, waits until all previously * + * posted buffers get flushed, deallocates all conn resources */ + iser_conn_terminate(ib_conn); + iser_conn->ib_conn = NULL; + conn->recv_lock = NULL; +} + + +static struct iscsi_transport iscsi_iser_transport; + +static struct iscsi_cls_session * +iscsi_iser_session_create(struct iscsi_transport *iscsit, + struct scsi_transport_template *scsit, + uint32_t initial_cmdsn, uint32_t *hostno) +{ + struct iscsi_cls_session *cls_session; + struct iscsi_session *session; + int i; + uint32_t hn; + struct iscsi_cmd_task *ctask; + struct iscsi_mgmt_task *mtask; + struct iscsi_iser_cmd_task *iser_ctask; + struct iser_desc *desc; + + cls_session = iscsi_session_setup(iscsit, scsit, + sizeof(struct iscsi_iser_cmd_task), + sizeof(struct iser_desc), + initial_cmdsn, &hn); + if (!cls_session) + return NULL; + + *hostno = hn; + session = class_to_transport_session(cls_session); + + /* libiscsi setup itts, data and pool so just set desc fields */ + for (i = 0; i < session->cmds_max; i++) { + ctask = session->cmds[i]; + iser_ctask = ctask->dd_data; + ctask->hdr = (struct iscsi_cmd *)&iser_ctask->desc.iscsi_header; + } + + for (i = 0; i < session->mgmtpool_max; i++) { + mtask = session->mgmt_cmds[i]; + desc = mtask->dd_data; + mtask->hdr = &desc->iscsi_header; + desc->data = mtask->data; + } + + return cls_session; +} + +static int +iscsi_iser_conn_set_param(struct iscsi_cls_conn *cls_conn, + enum iscsi_param param, uint32_t value) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_session *session = conn->session; + + spin_lock_bh(&session->lock); + if (conn->c_stage != ISCSI_CONN_INITIAL_STAGE && + conn->stop_stage != STOP_CONN_RECOVER) { + printk(KERN_ERR "iscsi_iser: can not change parameter [%d]\n", + param); + spin_unlock_bh(&session->lock); + return 0; + } + spin_unlock_bh(&session->lock); + + switch (param) { + case ISCSI_PARAM_MAX_RECV_DLENGTH: + /* TBD */ + break; + case ISCSI_PARAM_MAX_XMIT_DLENGTH: + conn->max_xmit_dlength = value; + break; + case ISCSI_PARAM_HDRDGST_EN: + if (value) { + printk(KERN_ERR "DataDigest wasn't negotiated to None"); + return -EPROTO; + } + break; + case ISCSI_PARAM_DATADGST_EN: + if (value) { + printk(KERN_ERR "DataDigest wasn't negotiated to None"); + return -EPROTO; + } + break; + case ISCSI_PARAM_INITIAL_R2T_EN: + session->initial_r2t_en = value; + break; + case ISCSI_PARAM_IMM_DATA_EN: + session->imm_data_en = value; + break; + case ISCSI_PARAM_FIRST_BURST: + session->first_burst = value; + break; + case ISCSI_PARAM_MAX_BURST: + session->max_burst = value; + break; + case ISCSI_PARAM_PDU_INORDER_EN: + session->pdu_inorder_en = value; + break; + case ISCSI_PARAM_DATASEQ_INORDER_EN: + session->dataseq_inorder_en = value; + break; + case ISCSI_PARAM_ERL: + session->erl = value; + break; + case ISCSI_PARAM_IFMARKER_EN: + if (value) { + printk(KERN_ERR "IFMarker wasn't negotiated to No"); + return -EPROTO; + } + break; + case ISCSI_PARAM_OFMARKER_EN: + if (value) { + printk(KERN_ERR "OFMarker wasn't negotiated to No"); + return -EPROTO; + } + break; + default: + break; + } + + return 0; +} + +static int +iscsi_iser_session_get_param(struct iscsi_cls_session *cls_session, + enum iscsi_param param, uint32_t *value) +{ + struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); + struct iscsi_session *session = iscsi_hostdata(shost->hostdata); + + switch (param) { + case ISCSI_PARAM_INITIAL_R2T_EN: + *value = session->initial_r2t_en; + break; + case ISCSI_PARAM_MAX_R2T: + *value = session->max_r2t; + break; + case ISCSI_PARAM_IMM_DATA_EN: + *value = session->imm_data_en; + break; + case ISCSI_PARAM_FIRST_BURST: + *value = session->first_burst; + break; + case ISCSI_PARAM_MAX_BURST: + *value = session->max_burst; + break; + case ISCSI_PARAM_PDU_INORDER_EN: + *value = session->pdu_inorder_en; + break; + case ISCSI_PARAM_DATASEQ_INORDER_EN: + *value = session->dataseq_inorder_en; + break; + case ISCSI_PARAM_ERL: + *value = session->erl; + break; + case ISCSI_PARAM_IFMARKER_EN: + *value = 0; + break; + case ISCSI_PARAM_OFMARKER_EN: + *value = 0; + break; + default: + return ISCSI_ERR_PARAM_NOT_FOUND; + } + + return 0; +} + +static int +iscsi_iser_conn_get_param(struct iscsi_cls_conn *cls_conn, + enum iscsi_param param, uint32_t *value) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + + switch(param) { + case ISCSI_PARAM_MAX_RECV_DLENGTH: + *value = conn->max_recv_dlength; + break; + case ISCSI_PARAM_MAX_XMIT_DLENGTH: + *value = conn->max_xmit_dlength; + break; + case ISCSI_PARAM_HDRDGST_EN: + *value = 0; + break; + case ISCSI_PARAM_DATADGST_EN: + *value = 0; + break; + /*case ISCSI_PARAM_TARGET_RECV_DLENGTH: + *value = conn->target_recv_dlength; + break; + case ISCSI_PARAM_INITIATOR_RECV_DLENGTH: + *value = conn->initiator_recv_dlength; + break;*/ + default: + return ISCSI_ERR_PARAM_NOT_FOUND; + } + + return 0; +} + + +static void +iscsi_iser_conn_get_stats(struct iscsi_cls_conn *cls_conn, struct iscsi_stats *stats) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + + stats->txdata_octets = conn->txdata_octets; + stats->rxdata_octets = conn->rxdata_octets; + stats->scsicmd_pdus = conn->scsicmd_pdus_cnt; + stats->dataout_pdus = conn->dataout_pdus_cnt; + stats->scsirsp_pdus = conn->scsirsp_pdus_cnt; + stats->datain_pdus = conn->datain_pdus_cnt; /* always 0 */ + stats->r2t_pdus = conn->r2t_pdus_cnt; /* always 0 */ + stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt; + stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt; + stats->custom_length = 3; + strcpy(stats->custom[0].desc, "qp_tx_queue_full"); + stats->custom[0].value = 0; /* TB iser_conn->qp_tx_queue_full; */ + strcpy(stats->custom[1].desc, "fmr_map_not_avail"); + stats->custom[1].value = 0; /* TB iser_conn->fmr_map_not_avail */; + strcpy(stats->custom[2].desc, "eh_abort_cnt"); + stats->custom[2].value = conn->eh_abort_cnt; +} + +static int +iscsi_iser_ep_connect(struct sockaddr *dst_addr, int non_blocking, + __u64 *ep_handle) +{ + int err; + struct iser_conn *ib_conn; + + err = iser_conn_init(&ib_conn); + if (err) + goto out; + + err = iser_connect(ib_conn, NULL, (struct sockaddr_in *)dst_addr, non_blocking); + if (!err) + *ep_handle = (__u64)(unsigned long)ib_conn; + +out: + return err; +} + +static int +iscsi_iser_ep_poll(__u64 ep_handle, int timeout_ms) +{ + struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle); + int rc; + + if (!ib_conn) + return -EINVAL; + + rc = wait_event_interruptible_timeout(ib_conn->wait, + ib_conn->state == ISER_CONN_UP, + msecs_to_jiffies(timeout_ms)); + + /* if conn establishment failed, return error code to iscsi */ + if (!rc && + (ib_conn->state == ISER_CONN_TERMINATING || + ib_conn->state == ISER_CONN_DOWN)) + rc = -1; + + iser_err("ib conn %p rc = %d\n", ib_conn, rc); + + if (rc > 0) + return 1; /* success, this is the equivalent of POLLOUT */ + else if (!rc) + return 0; /* timeout */ + else + return rc; /* signal */ +} + +static void +iscsi_iser_ep_disconnect(__u64 ep_handle) +{ + struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle); + + if (!ib_conn) + return; + + iser_err("ib conn %p state %d\n",ib_conn, ib_conn->state); + + iser_conn_terminate(ib_conn); +} + +static struct scsi_host_template iscsi_iser_sht = { + .name = "iSCSI Initiator over iSER, v." + ISCSI_VERSION_STR, + .queuecommand = iscsi_queuecommand, + .can_queue = ISCSI_XMIT_CMDS_MAX - 1, + .sg_tablesize = ISCSI_ISER_SG_TABLESIZE, + .cmd_per_lun = ISCSI_MAX_CMD_PER_LUN, + .eh_abort_handler = iscsi_eh_abort, + .eh_host_reset_handler = iscsi_eh_host_reset, + .use_clustering = DISABLE_CLUSTERING, + .proc_name = "iscsi_iser", + .this_id = -1, +}; + +static struct iscsi_transport iscsi_iser_transport = { + .owner = THIS_MODULE, + .name = "iser", + .caps = CAP_RECOVERY_L0 | CAP_MULTI_R2T, + .param_mask = ISCSI_MAX_RECV_DLENGTH | + ISCSI_MAX_XMIT_DLENGTH | + ISCSI_HDRDGST_EN | + ISCSI_DATADGST_EN | + ISCSI_INITIAL_R2T_EN | + ISCSI_MAX_R2T | + ISCSI_IMM_DATA_EN | + ISCSI_FIRST_BURST | + ISCSI_MAX_BURST | + ISCSI_PDU_INORDER_EN | + ISCSI_DATASEQ_INORDER_EN, + .host_template = &iscsi_iser_sht, + .conndata_size = sizeof(struct iscsi_conn), + .max_lun = ISCSI_ISER_MAX_LUN, + .max_cmd_len = ISCSI_ISER_MAX_CMD_LEN, + /* session management */ + .create_session = iscsi_iser_session_create, + .destroy_session = iscsi_session_teardown, + /* connection management */ + .create_conn = iscsi_iser_conn_create, + .bind_conn = iscsi_iser_conn_bind, + .destroy_conn = iscsi_iser_conn_destroy, + .set_param = iscsi_iser_conn_set_param, + .get_conn_param = iscsi_iser_conn_get_param, + .get_session_param = iscsi_iser_session_get_param, + .start_conn = iscsi_iser_conn_start, + .stop_conn = iscsi_conn_stop, + /* these are called as part of conn recovery */ + .suspend_conn_recv = NULL, /* FIXME is/how this relvant to iser? */ + .terminate_conn = iscsi_iser_conn_terminate, + /* IO */ + .send_pdu = iscsi_conn_send_pdu, + .get_stats = iscsi_iser_conn_get_stats, + .init_cmd_task = iscsi_iser_cmd_init, + .xmit_cmd_task = iscsi_iser_ctask_xmit, + .xmit_mgmt_task = iscsi_iser_mtask_xmit, + .cleanup_cmd_task = iscsi_iser_cleanup_ctask, + /* recovery */ + .session_recovery_timedout = iscsi_session_recovery_timedout, + + .ep_connect = iscsi_iser_ep_connect, + .ep_poll = iscsi_iser_ep_poll, + .ep_disconnect = iscsi_iser_ep_disconnect +}; + +static int __init iser_init(void) +{ + int err; + + iser_dbg("Starting iSER datamover...\n"); + + if (iscsi_max_lun < 1) { + printk(KERN_ERR "Invalid max_lun value of %u\n", iscsi_max_lun); + return -EINVAL; + } + + iscsi_iser_transport.max_lun = iscsi_max_lun; + + memset(&ig, 0, sizeof(struct iser_global)); + + ig.desc_cache = kmem_cache_create("iser_descriptors", + sizeof (struct iser_desc), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (ig.desc_cache == NULL) + return -ENOMEM; + + /* device init is called only after the first addr resolution */ + mutex_init(&ig.device_list_mutex); + INIT_LIST_HEAD(&ig.device_list); + mutex_init(&ig.connlist_mutex); + INIT_LIST_HEAD(&ig.connlist); + + if (!iscsi_register_transport(&iscsi_iser_transport)) { + iser_err("iscsi_register_transport failed\n"); + err = -EINVAL; + goto register_transport_failure; + } + + return 0; + +register_transport_failure: + kmem_cache_destroy(ig.desc_cache); + + return err; +} + +static void __exit iser_exit(void) +{ + iser_dbg("Removing iSER datamover...\n"); + iscsi_unregister_transport(&iscsi_iser_transport); + kmem_cache_destroy(ig.desc_cache); +} + +module_init(iser_init); +module_exit(iser_exit); diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h new file mode 100644 index 0000000..3350ba6 --- /dev/null +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -0,0 +1,354 @@ +/* + * iSER transport for the Open iSCSI Initiator & iSER transport internals + * + * Copyright (C) 2004 Dmitry Yusupov + * Copyright (C) 2004 Alex Aizman + * Copyright (C) 2005 Mike Christie + * based on code maintained by open-iscsi@googlegroups.com + * + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iscsi_iser.h 7051 2006-05-10 12:29:11Z ogerlitz $ + */ +#ifndef __ISCSI_ISER_H__ +#define __ISCSI_ISER_H__ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include + +#define DRV_NAME "iser" +#define PFX DRV_NAME ": " +#define DRV_VER "0.1" +#define DRV_DATE "May 7th, 2006" + +#define iser_dbg(fmt, arg...) \ + do { \ + if (iser_debug_level > 0) \ + printk(KERN_DEBUG PFX "%s:" fmt,\ + __func__ , ## arg); \ + } while (0) + +#define iser_err(fmt, arg...) \ + do { \ + printk(KERN_ERR PFX "%s:" fmt, \ + __func__ , ## arg); \ + } while (0) + + /* support upto 512KB in one RDMA */ +#define ISCSI_ISER_SG_TABLESIZE (0x80000 >> PAGE_SHIFT) +#define ISCSI_ISER_MAX_LUN 256 +#define ISCSI_ISER_MAX_CMD_LEN 16 + +/* QP settings */ +/* Maximal bounds on received asynchronous PDUs */ +#define ISER_MAX_RX_MISC_PDUS 4 /* NOOP_IN(2) , ASYNC_EVENT(2) */ + +#define ISER_MAX_TX_MISC_PDUS 6 /* NOOP_OUT(2), TEXT(1), * + * SCSI_TMFUNC(2), LOGOUT(1) */ + +#define ISER_QP_MAX_RECV_DTOS (ISCSI_XMIT_CMDS_MAX + \ + ISER_MAX_RX_MISC_PDUS + \ + ISER_MAX_TX_MISC_PDUS) + +/* the max TX (send) WR supported by the iSER QP is defined by * + * max_send_wr = T * (1 + D) + C ; D is how many inflight dataouts we expect * + * to have at max for SCSI command. The tx posting & completion handling code * + * supports -EAGAIN scheme where tx is suspended till the QP has room for more * + * send WR. D=8 comes from 64K/8K */ + +#define ISER_INFLIGHT_DATAOUTS 8 + +#define ISER_QP_MAX_REQ_DTOS (ISCSI_XMIT_CMDS_MAX * \ + (1 + ISER_INFLIGHT_DATAOUTS) + \ + ISER_MAX_TX_MISC_PDUS + \ + ISER_MAX_RX_MISC_PDUS) + +#define ISER_VER 0x10 +#define ISER_WSV 0x08 +#define ISER_RSV 0x04 + +struct iser_hdr { + u8 flags; + u8 rsvd[3]; + __be32 write_stag; /* write rkey */ + __be64 write_va; + __be32 read_stag; /* read rkey */ + __be64 read_va; +} __attribute__((packed)); + + +/* Length of an object name string */ +#define ISER_OBJECT_NAME_SIZE 64 + +enum iser_ib_conn_state { + ISER_CONN_INIT, /* descriptor allocd, no conn */ + ISER_CONN_PENDING, /* in the process of being established */ + ISER_CONN_UP, /* up and running */ + ISER_CONN_TERMINATING, /* in the process of being terminated */ + ISER_CONN_DOWN, /* shut down */ + ISER_CONN_STATES_NUM +}; + +enum iser_task_status { + ISER_TASK_STATUS_INIT = 0, + ISER_TASK_STATUS_STARTED, + ISER_TASK_STATUS_COMPLETED +}; + +enum iser_data_dir { + ISER_DIR_IN = 0, /* to initiator */ + ISER_DIR_OUT, /* from initiator */ + ISER_DIRS_NUM +}; + +struct iser_data_buf { + void *buf; /* pointer to the sg list */ + unsigned int size; /* num entries of this sg */ + unsigned long data_len; /* total data len */ + unsigned int dma_nents; /* returned by dma_map_sg */ + char *copy_buf; /* allocated copy buf for SGs unaligned * + * for rdma which are copied */ + struct scatterlist sg_single; /* SG-ified clone of a non SG SC or * + * unaligned SG */ + }; + +/* fwd declarations */ +struct iser_device; +struct iscsi_iser_conn; +struct iscsi_iser_cmd_task; + +struct iser_mem_reg { + u32 lkey; + u32 rkey; + u64 va; + u64 len; + void *mem_h; +}; + +struct iser_regd_buf { + struct iser_mem_reg reg; /* memory registration info */ + void *virt_addr; + struct iser_device *device; /* device->device for dma_unmap */ + dma_addr_t dma_addr; /* if non zero, addr for dma_unmap */ + enum dma_data_direction direction; /* direction for dma_unmap */ + unsigned int data_size; + atomic_t ref_count; /* refcount, freed when dec to 0 */ +}; + +#define MAX_REGD_BUF_VECTOR_LEN 2 + +struct iser_dto { + struct iscsi_iser_cmd_task *ctask; + struct iscsi_iser_conn *conn; + int notify_enable; + + /* vector of registered buffers */ + unsigned int regd_vector_len; + struct iser_regd_buf *regd[MAX_REGD_BUF_VECTOR_LEN]; + + /* offset into the registered buffer may be specified */ + unsigned int offset[MAX_REGD_BUF_VECTOR_LEN]; + + /* a smaller size may be specified, if 0, then full size is used */ + unsigned int used_sz[MAX_REGD_BUF_VECTOR_LEN]; +}; + +enum iser_desc_type { + ISCSI_RX, + ISCSI_TX_CONTROL , + ISCSI_TX_SCSI_COMMAND, + ISCSI_TX_DATAOUT +}; + +struct iser_desc { + struct iser_hdr iser_header; + struct iscsi_hdr iscsi_header; + struct iser_regd_buf hdr_regd_buf; + void *data; /* used by RX & TX_CONTROL */ + struct iser_regd_buf data_regd_buf; /* used by RX & TX_CONTROL */ + enum iser_desc_type type; + struct iser_dto dto; +}; + +struct iser_device { + struct ib_device *ib_device; + struct ib_pd *pd; + struct ib_cq *cq; + struct ib_mr *mr; + struct tasklet_struct cq_tasklet; + struct list_head ig_list; /* entry in ig devices list */ + int refcount; +}; + +struct iser_conn { + struct iscsi_iser_conn *iser_conn; /* iser conn for upcalls */ + enum iser_ib_conn_state state; /* rdma connection state */ + spinlock_t lock; /* used for state changes */ + struct iser_device *device; /* device context */ + struct rdma_cm_id *cma_id; /* CMA ID */ + struct ib_qp *qp; /* QP */ + struct ib_fmr_pool *fmr_pool; /* pool of IB FMRs */ + int disc_evt_flag; /* disconn event delivered */ + wait_queue_head_t wait; /* waitq for conn/disconn */ + atomic_t post_recv_buf_count; /* posted rx count */ + atomic_t post_send_buf_count; /* posted tx count */ + struct work_struct comperror_work; /* conn term sleepable ctx*/ + char name[ISER_OBJECT_NAME_SIZE]; + struct iser_page_vec *page_vec; /* represents SG to fmr maps* + * maps serialized as tx is*/ + struct list_head conn_list; /* entry in ig conn list */ +}; + +struct iscsi_iser_conn { + struct iscsi_conn *iscsi_conn;/* ptr to iscsi conn */ + struct iser_conn *ib_conn; /* iSER IB conn */ + + rwlock_t lock; +}; + +struct iscsi_iser_cmd_task { + struct iser_desc desc; + struct iscsi_iser_conn *iser_conn; + int rdma_data_count;/* RDMA bytes */ + enum iser_task_status status; + int command_sent; /* set if command sent */ + int dir[ISER_DIRS_NUM]; /* set if dir use*/ + struct iser_regd_buf rdma_regd[ISER_DIRS_NUM];/* regd rdma buf */ + struct iser_data_buf data[ISER_DIRS_NUM]; /* orig. data des*/ + struct iser_data_buf data_copy[ISER_DIRS_NUM];/* contig. copy */ +}; + +struct iser_page_vec { + u64 *pages; + int length; + int offset; + int data_size; +}; + +struct iser_global { + struct mutex device_list_mutex;/* */ + struct list_head device_list; /* all iSER devices */ + struct mutex connlist_mutex; + struct list_head connlist; /* all iSER IB connections */ + + kmem_cache_t *desc_cache; +}; + +extern struct iser_global ig; +extern int iser_debug_level; + +/* allocate connection resources needed for rdma functionality */ +int iser_conn_set_full_featured_mode(struct iscsi_conn *conn); + +int iser_send_control(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask); + +int iser_send_command(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask); + +int iser_send_data_out(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask, + struct iscsi_data *hdr); + +void iscsi_iser_recv(struct iscsi_conn *conn, + struct iscsi_hdr *hdr, + char *rx_data, + int rx_data_len); + +int iser_conn_init(struct iser_conn **ib_conn); + +void iser_conn_terminate(struct iser_conn *ib_conn); + +void iser_conn_release(struct iser_conn *ib_conn); + +void iser_rcv_completion(struct iser_desc *desc, + unsigned long dto_xfer_len); + +void iser_snd_completion(struct iser_desc *desc); + +void iser_ctask_rdma_init(struct iscsi_iser_cmd_task *ctask); + +void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *ctask); + +void iser_dto_buffs_release(struct iser_dto *dto); + +int iser_regd_buff_release(struct iser_regd_buf *regd_buf); + +void iser_reg_single(struct iser_device *device, + struct iser_regd_buf *regd_buf, + enum dma_data_direction direction); + +int iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +int iser_connect(struct iser_conn *ib_conn, + struct sockaddr_in *src_addr, + struct sockaddr_in *dst_addr, + int non_blocking); + +int iser_reg_page_vec(struct iser_conn *ib_conn, + struct iser_page_vec *page_vec, + struct iser_mem_reg *mem_reg); + +void iser_unreg_mem(struct iser_mem_reg *mem_reg); + +int iser_post_recv(struct iser_desc *rx_desc); +int iser_post_send(struct iser_desc *tx_desc); + +int iser_conn_state_comp(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp); +#endif diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c new file mode 100644 index 0000000..2703bb0 --- /dev/null +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -0,0 +1,734 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_initiator.c 6964 2006-05-07 11:11:43Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +/* Constant PDU lengths calculations */ +#define ISER_TOTAL_HEADERS_LEN (sizeof (struct iser_hdr) + \ + sizeof (struct iscsi_hdr)) + +/* iser_dto_add_regd_buff - increments the reference count for * + * the registered buffer & adds it to the DTO object */ +static void iser_dto_add_regd_buff(struct iser_dto *dto, + struct iser_regd_buf *regd_buf, + unsigned long use_offset, + unsigned long use_size) +{ + int add_idx; + + atomic_inc(®d_buf->ref_count); + + add_idx = dto->regd_vector_len; + dto->regd[add_idx] = regd_buf; + dto->used_sz[add_idx] = use_size; + dto->offset[add_idx] = use_offset; + + dto->regd_vector_len++; +} + +static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, + struct iser_data_buf *data, + enum iser_data_dir iser_dir, + enum dma_data_direction dma_dir) +{ + struct device *dma_device; + + iser_ctask->dir[iser_dir] = 1; + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir); + if (data->dma_nents == 0) { + iser_err("dma_map_sg failed!!!\n"); + return -EINVAL; + } + return 0; +} + +static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) +{ + struct device *dma_device; + struct iser_data_buf *data; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (iser_ctask->dir[ISER_DIR_IN]) { + data = &iser_ctask->data[ISER_DIR_IN]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE); + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + data = &iser_ctask->data[ISER_DIR_OUT]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE); + } +} + +/* Register user buffer memory and initialize passive rdma + * dto descriptor. Total data size is stored in + * iser_ctask->data[ISER_DIR_IN].data_len + */ +static int iser_prepare_read_cmd(struct iscsi_cmd_task *ctask, + unsigned int edtl) + +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_regd_buf *regd_buf; + int err; + struct iser_hdr *hdr = &iser_ctask->desc.iser_header; + struct iser_data_buf *buf_in = &iser_ctask->data[ISER_DIR_IN]; + + err = iser_dma_map_task_data(iser_ctask, + buf_in, + ISER_DIR_IN, + DMA_FROM_DEVICE); + if (err) + return err; + + if (edtl > iser_ctask->data[ISER_DIR_IN].data_len) { + iser_err("Total data length: %ld, less than EDTL: " + "%d, in READ cmd BHS itt: %d, conn: 0x%p\n", + iser_ctask->data[ISER_DIR_IN].data_len, edtl, + ctask->itt, iser_ctask->iser_conn); + return -EINVAL; + } + + err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_IN); + if (err) { + iser_err("Failed to set up Data-IN RDMA\n"); + return err; + } + regd_buf = &iser_ctask->rdma_regd[ISER_DIR_IN]; + + hdr->flags |= ISER_RSV; + hdr->read_stag = cpu_to_be32(regd_buf->reg.rkey); + hdr->read_va = cpu_to_be64(regd_buf->reg.va); + + iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n", + ctask->itt, regd_buf->reg.rkey, + (unsigned long long)regd_buf->reg.va); + + return 0; +} + +/* Register user buffer memory and initialize passive rdma + * dto descriptor. Total data size is stored in + * ctask->data[ISER_DIR_OUT].data_len + */ +static int +iser_prepare_write_cmd(struct iscsi_cmd_task *ctask, + unsigned int imm_sz, + unsigned int unsol_sz, + unsigned int edtl) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_regd_buf *regd_buf; + int err; + struct iser_dto *send_dto = &iser_ctask->desc.dto; + struct iser_hdr *hdr = &iser_ctask->desc.iser_header; + struct iser_data_buf *buf_out = &iser_ctask->data[ISER_DIR_OUT]; + + err = iser_dma_map_task_data(iser_ctask, + buf_out, + ISER_DIR_OUT, + DMA_TO_DEVICE); + if (err) + return err; + + if (edtl > iser_ctask->data[ISER_DIR_OUT].data_len) { + iser_err("Total data length: %ld, less than EDTL: %d, " + "in WRITE cmd BHS itt: %d, conn: 0x%p\n", + iser_ctask->data[ISER_DIR_OUT].data_len, + edtl, ctask->itt, ctask->conn); + return -EINVAL; + } + + err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_OUT); + if (err != 0) { + iser_err("Failed to register write cmd RDMA mem\n"); + return err; + } + + regd_buf = &iser_ctask->rdma_regd[ISER_DIR_OUT]; + + if (unsol_sz < edtl) { + hdr->flags |= ISER_WSV; + hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey); + hdr->write_va = cpu_to_be64(regd_buf->reg.va + unsol_sz); + + iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X " + "VA:%#llX + unsol:%d\n", + ctask->itt, regd_buf->reg.rkey, + (unsigned long long)regd_buf->reg.va, unsol_sz); + } + + if (imm_sz > 0) { + iser_dbg("Cmd itt:%d, WRITE, adding imm.data sz: %d\n", + ctask->itt, imm_sz); + iser_dto_add_regd_buff(send_dto, + regd_buf, + 0, + imm_sz); + } + + return 0; +} + +/** + * iser_post_receive_control - allocates, initializes and posts receive DTO. + */ +static int iser_post_receive_control(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_desc *rx_desc; + struct iser_regd_buf *regd_hdr; + struct iser_regd_buf *regd_data; + struct iser_dto *recv_dto = NULL; + struct iser_device *device = iser_conn->ib_conn->device; + int rx_data_size, err = 0; + + rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + if (rx_desc == NULL) { + iser_err("Failed to alloc desc for post recv\n"); + return -ENOMEM; + } + rx_desc->type = ISCSI_RX; + + /* for the login sequence we must support rx of upto 8K */ + if (conn->c_stage == ISCSI_CONN_INITIAL_STAGE) + rx_data_size = DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH; + else /* FIXME till user space sets conn->max_recv_dlength correctly */ + rx_data_size = 128; + + rx_desc->data = kmalloc(rx_data_size, GFP_KERNEL); + if (rx_desc->data == NULL) { + iser_err("Failed to alloc data buf for post recv\n"); + err = -ENOMEM; + goto post_rx_kmalloc_failure; + } + + recv_dto = &rx_desc->dto; + recv_dto->conn = iser_conn; + recv_dto->regd_vector_len = 0; + + regd_hdr = &rx_desc->hdr_regd_buf; + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = device; + regd_hdr->virt_addr = rx_desc; /* == &rx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + + iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE); + + iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0); + + regd_data = &rx_desc->data_regd_buf; + memset(regd_data, 0, sizeof(struct iser_regd_buf)); + regd_data->device = device; + regd_data->virt_addr = rx_desc->data; + regd_data->data_size = rx_data_size; + + iser_reg_single(device, regd_data, DMA_FROM_DEVICE); + + iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0); + + err = iser_post_recv(rx_desc); + if (!err) + return 0; + + /* iser_post_recv failed */ + iser_dto_buffs_release(recv_dto); + kfree(rx_desc->data); +post_rx_kmalloc_failure: + kmem_cache_free(ig.desc_cache, rx_desc); + return err; +} + +/* creates a new tx descriptor and adds header regd buffer */ +static void iser_create_send_desc(struct iscsi_iser_conn *iser_conn, + struct iser_desc *tx_desc) +{ + struct iser_regd_buf *regd_hdr = &tx_desc->hdr_regd_buf; + struct iser_dto *send_dto = &tx_desc->dto; + + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = iser_conn->ib_conn->device; + regd_hdr->virt_addr = tx_desc; /* == &tx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + + send_dto->conn = iser_conn; + send_dto->notify_enable = 1; + send_dto->regd_vector_len = 0; + + memset(&tx_desc->iser_header, 0, sizeof(struct iser_hdr)); + tx_desc->iser_header.flags = ISER_VER; + + iser_dto_add_regd_buff(send_dto, regd_hdr, 0, 0); +} + +/** + * iser_conn_set_full_featured_mode - (iSER API) + */ +int iser_conn_set_full_featured_mode(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + int i; + /* no need to keep it in a var, we are after login so if this should + * be negotiated, by now the result should be available here */ + int initial_post_recv_bufs_num = ISER_MAX_RX_MISC_PDUS; + + iser_dbg("Initially post: %d\n", initial_post_recv_bufs_num); + + /* Check that there is no posted recv or send buffers left - */ + /* they must be consumed during the login phase */ + BUG_ON(atomic_read(&iser_conn->ib_conn->post_recv_buf_count) != 0); + BUG_ON(atomic_read(&iser_conn->ib_conn->post_send_buf_count) != 0); + + /* Initial post receive buffers */ + for (i = 0; i < initial_post_recv_bufs_num; i++) { + if (iser_post_receive_control(conn) != 0) { + iser_err("Failed to post recv bufs at:%d conn:0x%p\n", + i, conn); + return -ENOMEM; + } + } + iser_dbg("Posted %d post recv bufs, conn:0x%p\n", i, conn); + return 0; +} + +static int +iser_check_xmit(struct iscsi_conn *conn, void *task) +{ + int rc = 0; + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + write_lock_bh(conn->recv_lock); + if (atomic_read(&iser_conn->ib_conn->post_send_buf_count) == + ISER_QP_MAX_REQ_DTOS) { + iser_dbg("%ld can't xmit task %p, suspending tx\n",jiffies,task); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + rc = -EAGAIN; + } + write_unlock_bh(conn->recv_lock); + return rc; +} + + +/** + * iser_send_command - send command PDU + */ +int iser_send_command(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_dto *send_dto = NULL; + unsigned long edtl; + int err = 0; + struct iser_data_buf *data_buf; + + struct iscsi_cmd *hdr = ctask->hdr; + struct scsi_cmnd *sc = ctask->sc; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + if (iser_check_xmit(conn, ctask)) + return -EAGAIN; + + edtl = ntohl(hdr->data_length); + + /* build the tx desc regd header and add it to the tx desc dto */ + iser_ctask->desc.type = ISCSI_TX_SCSI_COMMAND; + send_dto = &iser_ctask->desc.dto; + send_dto->ctask = iser_ctask; + iser_create_send_desc(iser_conn, &iser_ctask->desc); + + if (hdr->flags & ISCSI_FLAG_CMD_READ) + data_buf = &iser_ctask->data[ISER_DIR_IN]; + else + data_buf = &iser_ctask->data[ISER_DIR_OUT]; + + if (sc->use_sg) { /* using a scatter list */ + data_buf->buf = sc->request_buffer; + data_buf->size = sc->use_sg; + } else { /* using a single buffer - convert it into one entry SG */ + sg_init_one(&data_buf->sg_single, + sc->request_buffer, sc->request_bufflen); + data_buf->buf = &data_buf->sg_single; + data_buf->size = 1; + } + + data_buf->data_len = sc->request_bufflen; + + if (hdr->flags & ISCSI_FLAG_CMD_READ) { + err = iser_prepare_read_cmd(ctask, edtl); + if (err) + goto send_command_error; + } + if (hdr->flags & ISCSI_FLAG_CMD_WRITE) { + err = iser_prepare_write_cmd(ctask, + ctask->imm_count, + ctask->imm_count + + ctask->unsol_count, + edtl); + if (err) + goto send_command_error; + } + + iser_reg_single(iser_conn->ib_conn->device, + send_dto->regd[0], DMA_TO_DEVICE); + + if (iser_post_receive_control(conn) != 0) { + iser_err("post_recv failed!\n"); + err = -ENOMEM; + goto send_command_error; + } + + iser_ctask->status = ISER_TASK_STATUS_STARTED; + + err = iser_post_send(&iser_ctask->desc); + if (!err) + return 0; + +send_command_error: + iser_dto_buffs_release(send_dto); + iser_err("conn %p failed ctask->itt %d err %d\n",conn, ctask->itt, err); + return err; +} + +/** + * iser_send_data_out - send data out PDU + */ +int iser_send_data_out(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask, + struct iscsi_data *hdr) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_desc *tx_desc = NULL; + struct iser_dto *send_dto = NULL; + unsigned long buf_offset; + unsigned long data_seg_len; + unsigned int itt; + int err = 0; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + + if (iser_check_xmit(conn, ctask)) + return -EAGAIN; + + itt = ntohl(hdr->itt); + data_seg_len = ntoh24(hdr->dlength); + buf_offset = ntohl(hdr->offset); + + iser_dbg("%s itt %d dseg_len %d offset %d\n", + __func__,(int)itt,(int)data_seg_len,(int)buf_offset); + + tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + if (tx_desc == NULL) { + iser_err("Failed to alloc desc for post dataout\n"); + return -ENOMEM; + } + + tx_desc->type = ISCSI_TX_DATAOUT; + memcpy(&tx_desc->iscsi_header, hdr, sizeof(struct iscsi_hdr)); + + /* build the tx desc regd header and add it to the tx desc dto */ + send_dto = &tx_desc->dto; + send_dto->ctask = iser_ctask; + iser_create_send_desc(iser_conn, tx_desc); + + iser_reg_single(iser_conn->ib_conn->device, + send_dto->regd[0], DMA_TO_DEVICE); + + /* all data was registered for RDMA, we can use the lkey */ + iser_dto_add_regd_buff(send_dto, + &iser_ctask->rdma_regd[ISER_DIR_OUT], + buf_offset, + data_seg_len); + + if (buf_offset + data_seg_len > iser_ctask->data[ISER_DIR_OUT].data_len) { + iser_err("Offset:%ld & DSL:%ld in Data-Out " + "inconsistent with total len:%ld, itt:%d\n", + buf_offset, data_seg_len, + iser_ctask->data[ISER_DIR_OUT].data_len, itt); + err = -EINVAL; + goto send_data_out_error; + } + iser_dbg("data-out itt: %d, offset: %ld, sz: %ld\n", + itt, buf_offset, data_seg_len); + + + err = iser_post_send(tx_desc); + if (!err) + return 0; + +send_data_out_error: + iser_dto_buffs_release(send_dto); + kmem_cache_free(ig.desc_cache, tx_desc); + iser_err("conn %p failed err %d\n",conn, err); + return err; +} + +int iser_send_control(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_desc *mdesc = mtask->dd_data; + struct iser_dto *send_dto = NULL; + unsigned int itt; + unsigned long data_seg_len; + int err = 0; + unsigned char opcode; + struct iser_regd_buf *regd_buf; + struct iser_device *device; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + + if (iser_check_xmit(conn,mtask)) + return -EAGAIN; + + /* build the tx desc regd header and add it to the tx desc dto */ + mdesc->type = ISCSI_TX_CONTROL; + send_dto = &mdesc->dto; + send_dto->ctask = NULL; + iser_create_send_desc(iser_conn, mdesc); + + device = iser_conn->ib_conn->device; + + iser_reg_single(device, send_dto->regd[0], DMA_TO_DEVICE); + + itt = ntohl(mtask->hdr->itt); + opcode = mtask->hdr->opcode & ISCSI_OPCODE_MASK; + data_seg_len = ntoh24(mtask->hdr->dlength); + + if (data_seg_len > 0) { + regd_buf = &mdesc->data_regd_buf; + memset(regd_buf, 0, sizeof(struct iser_regd_buf)); + regd_buf->device = device; + regd_buf->virt_addr = mtask->data; + regd_buf->data_size = mtask->data_count; + iser_reg_single(device, regd_buf, + DMA_TO_DEVICE); + iser_dto_add_regd_buff(send_dto, regd_buf, + 0, + data_seg_len); + } + + if (iser_post_receive_control(conn) != 0) { + iser_err("post_rcv_buff failed!\n"); + err = -ENOMEM; + goto send_control_error; + } + + err = iser_post_send(mdesc); + if (!err) + return 0; + +send_control_error: + iser_dto_buffs_release(send_dto); + iser_err("conn %p failed err %d\n",conn, err); + return err; +} + +/** + * iser_rcv_dto_completion - recv DTO completion + */ +void iser_rcv_completion(struct iser_desc *rx_desc, + unsigned long dto_xfer_len) +{ + struct iser_dto *dto = &rx_desc->dto; + struct iscsi_iser_conn *conn = dto->conn; + struct iscsi_session *session = conn->iscsi_conn->session; + struct iscsi_cmd_task *ctask; + struct iscsi_iser_cmd_task *iser_ctask; + struct iscsi_hdr *hdr; + char *rx_data = NULL; + int rx_data_len = 0; + unsigned int itt; + unsigned char opcode; + + hdr = &rx_desc->iscsi_header; + + iser_dbg("op 0x%x itt 0x%x\n", hdr->opcode,hdr->itt); + + if (dto_xfer_len > ISER_TOTAL_HEADERS_LEN) { /* we have data */ + rx_data_len = dto_xfer_len - ISER_TOTAL_HEADERS_LEN; + rx_data = dto->regd[1]->virt_addr; + rx_data += dto->offset[1]; + } + + opcode = hdr->opcode & ISCSI_OPCODE_MASK; + + if (opcode == ISCSI_OP_SCSI_CMD_RSP) { + itt = hdr->itt & ISCSI_ITT_MASK; /* mask out cid and age bits */ + if (!(itt < session->cmds_max)) + iser_err("itt can't be matched to task!!!" + "conn %p opcode %d cmds_max %d itt %d\n", + conn->iscsi_conn,opcode,session->cmds_max,itt); + /* use the mapping given with the cmds array indexed by itt */ + ctask = (struct iscsi_cmd_task *)session->cmds[itt]; + iser_ctask = ctask->dd_data; + iser_dbg("itt %d ctask %p\n",itt,ctask); + iser_ctask->status = ISER_TASK_STATUS_COMPLETED; + iser_ctask_rdma_finalize(iser_ctask); + } + + iser_dto_buffs_release(dto); + + iscsi_iser_recv(conn->iscsi_conn, hdr, rx_data, rx_data_len); + + kfree(rx_desc->data); + kmem_cache_free(ig.desc_cache, rx_desc); + + /* decrementing conn->post_recv_buf_count only --after-- freeing the * + * task eliminates the need to worry on tasks which are completed in * + * parallel to the execution of iser_conn_term. So the code that waits * + * for the posted rx bufs refcount to become zero handles everything */ + atomic_dec(&conn->ib_conn->post_recv_buf_count); +} + +void iser_snd_completion(struct iser_desc *tx_desc) +{ + struct iser_dto *dto = &tx_desc->dto; + struct iscsi_iser_conn *iser_conn = dto->conn; + struct iscsi_conn *conn = iser_conn->iscsi_conn; + struct iscsi_mgmt_task *mtask; + + iser_dbg("Initiator, Data sent dto=0x%p\n", dto); + + iser_dto_buffs_release(dto); + + if (tx_desc->type == ISCSI_TX_DATAOUT) + kmem_cache_free(ig.desc_cache, tx_desc); + + atomic_dec(&iser_conn->ib_conn->post_send_buf_count); + + write_lock(conn->recv_lock); + if (conn->suspend_tx) { + iser_dbg("%ld resuming tx\n",jiffies); + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + scsi_queue_work(conn->session->host, &conn->xmitwork); + } + write_unlock(conn->recv_lock); + + if (tx_desc->type == ISCSI_TX_CONTROL) { + /* this arithmetic is legal by libiscsi dd_data allocation */ + mtask = (void *) ((long)(void *)tx_desc - + sizeof(struct iscsi_mgmt_task)); + if (mtask->hdr->itt == cpu_to_be32(ISCSI_RESERVED_TAG)) { + struct iscsi_session *session = conn->session; + + spin_lock(&conn->session->lock); + list_del(&mtask->running); + __kfifo_put(session->mgmtpool.queue, (void*)&mtask, + sizeof(void*)); + spin_unlock(&session->lock); + } + } +} + +void iser_ctask_rdma_init(struct iscsi_iser_cmd_task *iser_ctask) + +{ + iser_ctask->status = ISER_TASK_STATUS_INIT; + + iser_ctask->dir[ISER_DIR_IN] = 0; + iser_ctask->dir[ISER_DIR_OUT] = 0; + + iser_ctask->data[ISER_DIR_IN].data_len = 0; + iser_ctask->data[ISER_DIR_OUT].data_len = 0; + + memset(&iser_ctask->rdma_regd[ISER_DIR_IN], 0, + sizeof(struct iser_regd_buf)); + memset(&iser_ctask->rdma_regd[ISER_DIR_OUT], 0, + sizeof(struct iser_regd_buf)); +} + +void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask) +{ + int deferred; + + /* if we were reading, copy back to unaligned sglist, + * anyway dma_unmap and free the copy + */ + if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) + iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN); + if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) + iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT); + + if (iser_ctask->dir[ISER_DIR_IN]) { + deferred = iser_regd_buff_release + (&iser_ctask->rdma_regd[ISER_DIR_IN]); + if (deferred) { + iser_err("References remain for BUF-IN rdma reg\n"); + BUG(); + } + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + deferred = iser_regd_buff_release + (&iser_ctask->rdma_regd[ISER_DIR_OUT]); + if (deferred) { + iser_err("References remain for BUF-OUT rdma reg\n"); + BUG(); + } + } + + iser_dma_unmap_task_data(iser_ctask); +} + +void iser_dto_buffs_release(struct iser_dto *dto) +{ + int i; + + for (i = 0; i < dto->regd_vector_len; i++) + iser_regd_buff_release(dto->regd[i]); +} + diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c new file mode 100644 index 0000000..0881f55 --- /dev/null +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -0,0 +1,401 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_memory.c 6964 2006-05-07 11:11:43Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +#define ISER_KMALLOC_THRESHOLD 0x20000 /* 128K - kmalloc limit */ +/** + * Decrements the reference count for the + * registered buffer & releases it + * + * returns 0 if released, 1 if deferred + */ +int iser_regd_buff_release(struct iser_regd_buf *regd_buf) +{ + struct device *dma_device; + + if ((atomic_read(®d_buf->ref_count) == 0) || + atomic_dec_and_test(®d_buf->ref_count)) { + /* if we used the dma mr, unreg is just NOP */ + if (regd_buf->reg.rkey != 0) + iser_unreg_mem(®d_buf->reg); + + if (regd_buf->dma_addr) { + dma_device = regd_buf->device->ib_device->dma_device; + dma_unmap_single(dma_device, + regd_buf->dma_addr, + regd_buf->data_size, + regd_buf->direction); + } + /* else this regd buf is associated with task which we */ + /* dma_unmap_single/sg later */ + return 0; + } else { + iser_dbg("Release deferred, regd.buff: 0x%p\n", regd_buf); + return 1; + } +} + +/** + * iser_reg_single - fills registered buffer descriptor with + * registration information + */ +void iser_reg_single(struct iser_device *device, + struct iser_regd_buf *regd_buf, + enum dma_data_direction direction) +{ + dma_addr_t dma_addr; + + dma_addr = dma_map_single(device->ib_device->dma_device, + regd_buf->virt_addr, + regd_buf->data_size, direction); + BUG_ON(dma_mapping_error(dma_addr)); + + regd_buf->reg.lkey = device->mr->lkey; + regd_buf->reg.rkey = 0; /* indicate there's no need to unreg */ + regd_buf->reg.len = regd_buf->data_size; + regd_buf->reg.va = dma_addr; + + regd_buf->dma_addr = dma_addr; + regd_buf->direction = direction; +} + +/** + * iser_start_rdma_unaligned_sg + */ +int iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + int dma_nents; + struct device *dma_device; + char *mem = NULL; + struct iser_data_buf *data = &iser_ctask->data[cmd_dir]; + unsigned long cmd_data_len = data->data_len; + + if (cmd_data_len > ISER_KMALLOC_THRESHOLD) + mem = (void *)__get_free_pages(GFP_KERNEL, + long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); + else + mem = kmalloc(cmd_data_len, GFP_KERNEL); + + if (mem == NULL) { + iser_err("Failed to allocate mem size %d %d for copying sglist\n", + data->size,(int)cmd_data_len); + return -ENOMEM; + } + + if (cmd_dir == ISER_DIR_OUT) { + /* copy the unaligned sg the buffer which is used for RDMA */ + struct scatterlist *sg = (struct scatterlist *)data->buf; + int i; + char *p, *from; + + for (p = mem, i = 0; i < data->size; i++) { + from = kmap_atomic(sg[i].page, KM_USER0); + memcpy(p, + from + sg[i].offset, + sg[i].length); + kunmap_atomic(from, KM_USER0); + p += sg[i].length; + } + } + + sg_init_one(&iser_ctask->data_copy[cmd_dir].sg_single, mem, cmd_data_len); + iser_ctask->data_copy[cmd_dir].buf = + &iser_ctask->data_copy[cmd_dir].sg_single; + iser_ctask->data_copy[cmd_dir].size = 1; + + iser_ctask->data_copy[cmd_dir].copy_buf = mem; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (cmd_dir == ISER_DIR_OUT) + dma_nents = dma_map_sg(dma_device, + &iser_ctask->data_copy[cmd_dir].sg_single, + 1, DMA_TO_DEVICE); + else + dma_nents = dma_map_sg(dma_device, + &iser_ctask->data_copy[cmd_dir].sg_single, + 1, DMA_FROM_DEVICE); + + BUG_ON(dma_nents == 0); + + iser_ctask->data_copy[cmd_dir].dma_nents = dma_nents; + return 0; +} + +/** + * iser_finalize_rdma_unaligned_sg + */ +void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + struct device *dma_device; + struct iser_data_buf *mem_copy; + unsigned long cmd_data_len; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + mem_copy = &iser_ctask->data_copy[cmd_dir]; + + if (cmd_dir == ISER_DIR_OUT) + dma_unmap_sg(dma_device, &mem_copy->sg_single, 1, + DMA_TO_DEVICE); + else + dma_unmap_sg(dma_device, &mem_copy->sg_single, 1, + DMA_FROM_DEVICE); + + if (cmd_dir == ISER_DIR_IN) { + char *mem; + struct scatterlist *sg; + unsigned char *p, *to; + unsigned int sg_size; + int i; + + /* copy back read RDMA to unaligned sg */ + mem = mem_copy->copy_buf; + + sg = (struct scatterlist *)iser_ctask->data[ISER_DIR_IN].buf; + sg_size = iser_ctask->data[ISER_DIR_IN].size; + + for (p = mem, i = 0; i < sg_size; i++){ + to = kmap_atomic(sg[i].page, KM_SOFTIRQ0); + memcpy(to + sg[i].offset, + p, + sg[i].length); + kunmap_atomic(to, KM_SOFTIRQ0); + p += sg[i].length; + } + } + + cmd_data_len = iser_ctask->data[cmd_dir].data_len; + + if (cmd_data_len > ISER_KMALLOC_THRESHOLD) + free_pages((unsigned long)mem_copy->copy_buf, + long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); + else + kfree(mem_copy->copy_buf); + + mem_copy->copy_buf = NULL; +} + +/** + * iser_sg_to_page_vec - Translates scatterlist entries to physical addresses + * and returns the length of resulting physical address array (may be less than + * the original due to possible compaction). + * + * we build a "page vec" under the assumption that the SG meets the RDMA + * alignment requirements. Other then the first and last SG elements, all + * the "internal" elements can be compacted into a list whose elements are + * dma addresses of physical pages. The code supports also the weird case + * where --few fragments of the same page-- are present in the SG as + * consecutive elements. Also, it handles one entry SG. + */ +static int iser_sg_to_page_vec(struct iser_data_buf *data, + struct iser_page_vec *page_vec) +{ + struct scatterlist *sg = (struct scatterlist *)data->buf; + dma_addr_t first_addr, last_addr, page; + int start_aligned, end_aligned; + unsigned int cur_page = 0; + unsigned long total_sz = 0; + int i; + + /* compute the offset of first element */ + page_vec->offset = (u64) sg[0].offset; + + for (i = 0; i < data->dma_nents; i++) { + total_sz += sg_dma_len(&sg[i]); + + first_addr = sg_dma_address(&sg[i]); + last_addr = first_addr + sg_dma_len(&sg[i]); + + start_aligned = !(first_addr & ~PAGE_MASK); + end_aligned = !(last_addr & ~PAGE_MASK); + + /* continue to collect page fragments till aligned or SG ends */ + while (!end_aligned && (i + 1 < data->dma_nents)) { + i++; + total_sz += sg_dma_len(&sg[i]); + last_addr = sg_dma_address(&sg[i]) + sg_dma_len(&sg[i]); + end_aligned = !(last_addr & ~PAGE_MASK); + } + + first_addr = first_addr & PAGE_MASK; + + for (page = first_addr; page < last_addr; page += PAGE_SIZE) + page_vec->pages[cur_page++] = page; + + } + page_vec->data_size = total_sz; + iser_dbg("page_vec->data_size:%d cur_page %d\n", page_vec->data_size,cur_page); + return cur_page; +} + +#define MASK_4K ((1UL << 12) - 1) /* 0xFFF */ +#define IS_4K_ALIGNED(addr) ((((unsigned long)addr) & MASK_4K) == 0) + +/** + * iser_data_buf_aligned_len - Tries to determine the maximal correctly aligned + * for RDMA sub-list of a scatter-gather list of memory buffers, and returns + * the number of entries which are aligned correctly. Supports the case where + * consecutive SG elements are actually fragments of the same physcial page. + */ +static unsigned int iser_data_buf_aligned_len(struct iser_data_buf *data) +{ + struct scatterlist *sg; + dma_addr_t end_addr, next_addr; + int i, cnt; + unsigned int ret_len = 0; + + sg = (struct scatterlist *)data->buf; + + for (cnt = 0, i = 0; i < data->dma_nents; i++, cnt++) { + /* iser_dbg("Checking sg iobuf [%d]: phys=0x%08lX " + "offset: %ld sz: %ld\n", i, + (unsigned long)page_to_phys(sg[i].page), + (unsigned long)sg[i].offset, + (unsigned long)sg[i].length); */ + end_addr = sg_dma_address(&sg[i]) + + sg_dma_len(&sg[i]); + /* iser_dbg("Checking sg iobuf end address " + "0x%08lX\n", end_addr); */ + if (i + 1 < data->dma_nents) { + next_addr = sg_dma_address(&sg[i+1]); + /* are i, i+1 fragments of the same page? */ + if (end_addr == next_addr) + continue; + else if (!IS_4K_ALIGNED(end_addr)) { + ret_len = cnt + 1; + break; + } + } + } + if (i == data->dma_nents) + ret_len = cnt; /* loop ended */ + iser_dbg("Found %d aligned entries out of %d in sg:0x%p\n", + ret_len, data->dma_nents, data); + return ret_len; +} + +static void iser_data_buf_dump(struct iser_data_buf *data) +{ + struct scatterlist *sg = (struct scatterlist *)data->buf; + int i; + + for (i = 0; i < data->size; i++) + iser_err("sg[%d] dma_addr:0x%lX page:0x%p " + "off:%d sz:%d dma_len:%d\n", + i, (unsigned long)sg_dma_address(&sg[i]), + sg[i].page, sg[i].offset, + sg[i].length,sg_dma_len(&sg[i])); +} + +static void iser_dump_page_vec(struct iser_page_vec *page_vec) +{ + int i; + + iser_err("page vec length %d data size %d\n", + page_vec->length, page_vec->data_size); + for (i = 0; i < page_vec->length; i++) + iser_err("%d %lx\n",i,(unsigned long)page_vec->pages[i]); +} + +static void iser_page_vec_build(struct iser_data_buf *data, + struct iser_page_vec *page_vec) +{ + int page_vec_len = 0; + + page_vec->length = 0; + page_vec->offset = 0; + + iser_dbg("Translating sg sz: %d\n", data->dma_nents); + page_vec_len = iser_sg_to_page_vec(data,page_vec); + iser_dbg("sg len %d page_vec_len %d\n", data->dma_nents,page_vec_len); + + page_vec->length = page_vec_len; + + if (page_vec_len * PAGE_SIZE < page_vec->data_size) { + iser_err("page_vec too short to hold this SG\n"); + iser_data_buf_dump(data); + iser_dump_page_vec(page_vec); + BUG(); + } +} + +/** + * iser_reg_rdma_mem - Registers memory intended for RDMA, + * obtaining rkey and va + * + * returns 0 on success, errno code on failure + */ +int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + struct iser_conn *ib_conn = iser_ctask->iser_conn->ib_conn; + struct iser_data_buf *mem = &iser_ctask->data[cmd_dir]; + struct iser_regd_buf *regd_buf; + int aligned_len; + int err; + + regd_buf = &iser_ctask->rdma_regd[cmd_dir]; + + aligned_len = iser_data_buf_aligned_len(mem); + if (aligned_len != mem->size) { + iser_err("rdma alignment violation %d/%d aligned\n", + aligned_len, mem->size); + iser_data_buf_dump(mem); + /* allocate copy buf, if we are writing, copy the */ + /* unaligned scatterlist, dma map the copy */ + if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0) + return -ENOMEM; + mem = &iser_ctask->data_copy[cmd_dir]; + } + + iser_page_vec_build(mem, ib_conn->page_vec); + err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, ®d_buf->reg); + if (err) + return err; + + /* take a reference on this regd buf such that it will not be released * + * (eg in send dto completion) before we get the scsi response */ + atomic_inc(®d_buf->ref_count); + return 0; +} diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c new file mode 100644 index 0000000..ff117bb --- /dev/null +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -0,0 +1,827 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_verbs.c 7051 2006-05-10 12:29:11Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +#define ISCSI_ISER_MAX_CONN 8 +#define ISER_MAX_CQ_LEN ((ISER_QP_MAX_RECV_DTOS + \ + ISER_QP_MAX_REQ_DTOS) * \ + ISCSI_ISER_MAX_CONN) + +static void iser_cq_tasklet_fn(unsigned long data); +static void iser_cq_callback(struct ib_cq *cq, void *cq_context); +static void iser_comp_error_worker(void *data); + +static void iser_cq_event_callback(struct ib_event *cause, void *context) +{ + iser_err("got cq event %d \n", cause->event); +} + +static void iser_qp_event_callback(struct ib_event *cause, void *context) +{ + iser_err("got qp event %d\n",cause->event); +} + +/** + * iser_create_device_ib_res - creates Protection Domain (PD), Completion + * Queue (CQ), DMA Memory Region (DMA MR) with the device associated with + * the adapator. + * + * returns 0 on success, -1 on failure + */ +static int iser_create_device_ib_res(struct iser_device *device) +{ + device->pd = ib_alloc_pd(device->ib_device); + if (IS_ERR(device->pd)) + goto pd_err; + + device->cq = ib_create_cq(device->ib_device, + iser_cq_callback, + iser_cq_event_callback, + (void *)device, + ISER_MAX_CQ_LEN); + if (IS_ERR(device->cq)) + goto cq_err; + + if (ib_req_notify_cq(device->cq, IB_CQ_NEXT_COMP)) + goto cq_arm_err; + + tasklet_init(&device->cq_tasklet, + iser_cq_tasklet_fn, + (unsigned long)device); + + device->mr = ib_get_dma_mr(device->pd, + IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(device->mr)) + goto dma_mr_err; + + return 0; + +dma_mr_err: + tasklet_kill(&device->cq_tasklet); +cq_arm_err: + ib_destroy_cq(device->cq); +cq_err: + ib_dealloc_pd(device->pd); +pd_err: + iser_err("failed to allocate an IB resource\n"); + return -1; +} + +/** + * iser_free_device_ib_res - destory/dealloc/dereg the DMA MR, + * CQ and PD created with the device associated with the adapator. + */ +static void iser_free_device_ib_res(struct iser_device *device) +{ + BUG_ON(device->mr == NULL); + + tasklet_kill(&device->cq_tasklet); + + (void)ib_dereg_mr(device->mr); + (void)ib_destroy_cq(device->cq); + (void)ib_dealloc_pd(device->pd); + + device->mr = NULL; + device->cq = NULL; + device->pd = NULL; +} + +/** + * iser_create_ib_conn_res - Creates FMR pool and Queue-Pair (QP) + * + * returns 0 on success, -1 on failure + */ +static int iser_create_ib_conn_res(struct iser_conn *ib_conn) +{ + struct iser_device *device; + struct ib_qp_init_attr init_attr; + int ret; + struct ib_fmr_pool_param params; + + BUG_ON(ib_conn->device == NULL); + + device = ib_conn->device; + + ib_conn->page_vec = kmalloc(sizeof(struct iser_page_vec) + + (sizeof(u64) * (ISCSI_ISER_SG_TABLESIZE +1)), + GFP_KERNEL); + if (!ib_conn->page_vec) { + ret = -ENOMEM; + goto alloc_err; + } + ib_conn->page_vec->pages = (u64 *) (ib_conn->page_vec + 1); + + params.page_shift = PAGE_SHIFT; + /* when the first/last SG element are not start/end * + * page aligned, the map whould be of N+1 pages */ + params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1; + /* make the pool size twice the max number of SCSI commands * + * the ML is expected to queue, watermark for unmap at 50% */ + params.pool_size = ISCSI_XMIT_CMDS_MAX * 2; + params.dirty_watermark = ISCSI_XMIT_CMDS_MAX; + params.cache = 0; + params.flush_function = NULL; + params.access = (IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ); + + ib_conn->fmr_pool = ib_create_fmr_pool(device->pd, ¶ms); + if (IS_ERR(ib_conn->fmr_pool)) { + ret = PTR_ERR(ib_conn->fmr_pool); + goto fmr_pool_err; + } + + memset(&init_attr, 0, sizeof init_attr); + + init_attr.event_handler = iser_qp_event_callback; + init_attr.qp_context = (void *)ib_conn; + init_attr.send_cq = device->cq; + init_attr.recv_cq = device->cq; + init_attr.cap.max_send_wr = ISER_QP_MAX_REQ_DTOS; + init_attr.cap.max_recv_wr = ISER_QP_MAX_RECV_DTOS; + init_attr.cap.max_send_sge = MAX_REGD_BUF_VECTOR_LEN; + init_attr.cap.max_recv_sge = 2; + init_attr.sq_sig_type = IB_SIGNAL_REQ_WR; + init_attr.qp_type = IB_QPT_RC; + + ret = rdma_create_qp(ib_conn->cma_id, device->pd, &init_attr); + if (ret) + goto qp_err; + + ib_conn->qp = ib_conn->cma_id->qp; + iser_err("setting conn %p cma_id %p: fmr_pool %p qp %p\n", + ib_conn, ib_conn->cma_id, + ib_conn->fmr_pool, ib_conn->cma_id->qp); + return ret; + +qp_err: + (void)ib_destroy_fmr_pool(ib_conn->fmr_pool); +fmr_pool_err: + kfree(ib_conn->page_vec); +alloc_err: + iser_err("unable to alloc mem or create resource, err %d\n", ret); + return ret; +} + +/** + * releases the FMR pool, QP and CMA ID objects, returns 0 on success, + * -1 on failure + */ +static int iser_free_ib_conn_res(struct iser_conn *ib_conn) +{ + BUG_ON(ib_conn == NULL); + + iser_err("freeing conn %p cma_id %p fmr pool %p qp %p\n", + ib_conn, ib_conn->cma_id, + ib_conn->fmr_pool, ib_conn->qp); + + /* qp is created only once both addr & route are resolved */ + if (ib_conn->fmr_pool != NULL) + ib_destroy_fmr_pool(ib_conn->fmr_pool); + + if (ib_conn->qp != NULL) + rdma_destroy_qp(ib_conn->cma_id); + + if (ib_conn->cma_id != NULL) + rdma_destroy_id(ib_conn->cma_id); + + ib_conn->fmr_pool = NULL; + ib_conn->qp = NULL; + ib_conn->cma_id = NULL; + kfree(ib_conn->page_vec); + + return 0; +} + +/** + * based on the resolved device node GUID see if there already allocated + * device for this device. If there's no such, create one. + */ +static +struct iser_device *iser_device_find_by_ib_device(struct rdma_cm_id *cma_id) +{ + struct list_head *p_list; + struct iser_device *device = NULL; + + mutex_lock(&ig.device_list_mutex); + + p_list = ig.device_list.next; + while (p_list != &ig.device_list) { + device = list_entry(p_list, struct iser_device, ig_list); + /* find if there's a match using the node GUID */ + if (device->ib_device->node_guid == cma_id->device->node_guid) + break; + } + + if (device == NULL) { + device = kzalloc(sizeof *device, GFP_KERNEL); + if (device == NULL) + goto out; + /* assign this device to the device */ + device->ib_device = cma_id->device; + /* init the device and link it into ig device list */ + if (iser_create_device_ib_res(device)) { + kfree(device); + device = NULL; + goto out; + } + list_add(&device->ig_list, &ig.device_list); + } +out: + BUG_ON(device == NULL); + device->refcount++; + mutex_unlock(&ig.device_list_mutex); + return device; +} + +/* if there's no demand for this device, release it */ +static void iser_device_try_release(struct iser_device *device) +{ + mutex_lock(&ig.device_list_mutex); + device->refcount--; + iser_err("device %p refcount %d\n",device,device->refcount); + if (!device->refcount) { + iser_free_device_ib_res(device); + list_del(&device->ig_list); + kfree(device); + } + mutex_unlock(&ig.device_list_mutex); +} + +int iser_conn_state_comp(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + ret = (ib_conn->state == comp); + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +static int iser_conn_state_comp_exch(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp, + enum iser_ib_conn_state exch) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + if ((ret = (ib_conn->state == comp))) + ib_conn->state = exch; + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +/** + * triggers start of the disconnect procedures and wait for them to be done + */ +void iser_conn_terminate(struct iser_conn *ib_conn) +{ + int err = 0; + + /* change the ib conn state only if the conn is UP, however always call + * rdma_disconnect since this is the only way to cause the CMA to change + * the QP state to ERROR + */ + + iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, ISER_CONN_TERMINATING); + err = rdma_disconnect(ib_conn->cma_id); + if (err) + iser_err("Failed to disconnect, conn: 0x%p err %d\n", + ib_conn,err); + + wait_event_interruptible(ib_conn->wait, + ib_conn->state == ISER_CONN_DOWN); + + iser_conn_release(ib_conn); +} + +static void iser_connect_error(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + ib_conn = (struct iser_conn *)cma_id->context; + + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); +} + +static void iser_addr_handler(struct rdma_cm_id *cma_id) +{ + struct iser_device *device; + struct iser_conn *ib_conn; + int ret; + + device = iser_device_find_by_ib_device(cma_id); + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->device = device; + + ret = rdma_resolve_route(cma_id, 1000); + if (ret) { + iser_err("resolve route failed: %d\n", ret); + iser_connect_error(cma_id); + } + return; +} + +static void iser_route_handler(struct rdma_cm_id *cma_id) +{ + struct rdma_conn_param conn_param; + int ret; + + ret = iser_create_ib_conn_res((struct iser_conn *)cma_id->context); + if (ret) + goto failure; + + iser_dbg("path.mtu is %d setting it to %d\n", + cma_id->route.path_rec->mtu, IB_MTU_1024); + + /* we must set the MTU to 1024 as this is what the target is assuming */ + if (cma_id->route.path_rec->mtu > IB_MTU_1024) + cma_id->route.path_rec->mtu = IB_MTU_1024; + + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 4; + conn_param.initiator_depth = 1; + conn_param.retry_count = 7; + conn_param.rnr_retry_count = 6; + + ret = rdma_connect(cma_id, &conn_param); + if (ret) { + iser_err("failure connecting: %d\n", ret); + goto failure; + } + + return; +failure: + iser_connect_error(cma_id); +} + +static void iser_connected_handler(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->state = ISER_CONN_UP; + wake_up_interruptible(&ib_conn->wait); +} + +static void iser_disconnected_handler(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->disc_evt_flag = 1; + + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, + ISCSI_ERR_CONN_FAILED); + + /* Complete the termination process if no posts are pending */ + if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && + (atomic_read(&ib_conn->post_send_buf_count) == 0)) { + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); + } +} + +static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) +{ + int ret = 0; + + iser_err("event %d conn %p id %p\n",event->event,cma_id->context,cma_id); + + switch (event->event) { + case RDMA_CM_EVENT_ADDR_RESOLVED: + iser_addr_handler(cma_id); + break; + case RDMA_CM_EVENT_ROUTE_RESOLVED: + iser_route_handler(cma_id); + break; + case RDMA_CM_EVENT_ESTABLISHED: + iser_connected_handler(cma_id); + break; + case RDMA_CM_EVENT_ADDR_ERROR: + case RDMA_CM_EVENT_ROUTE_ERROR: + case RDMA_CM_EVENT_CONNECT_ERROR: + case RDMA_CM_EVENT_UNREACHABLE: + case RDMA_CM_EVENT_REJECTED: + iser_err("event: %d, error: %d\n", event->event, event->status); + iser_connect_error(cma_id); + break; + case RDMA_CM_EVENT_DISCONNECTED: + iser_disconnected_handler(cma_id); + break; + case RDMA_CM_EVENT_DEVICE_REMOVAL: + BUG(); + break; + case RDMA_CM_EVENT_CONNECT_RESPONSE: + BUG(); + break; + case RDMA_CM_EVENT_CONNECT_REQUEST: + default: + break; + } + return ret; +} + +int iser_conn_init(struct iser_conn **ibconn) +{ + struct iser_conn *ib_conn; + + ib_conn = kzalloc(sizeof *ib_conn, GFP_KERNEL); + if (!ib_conn) { + iser_err("can't alloc memory for struct iser_conn\n"); + return -ENOMEM; + } + ib_conn->state = ISER_CONN_INIT; + init_waitqueue_head(&ib_conn->wait); + atomic_set(&ib_conn->post_recv_buf_count, 0); + atomic_set(&ib_conn->post_send_buf_count, 0); + INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker, + ib_conn); + INIT_LIST_HEAD(&ib_conn->conn_list); + spin_lock_init(&ib_conn->lock); + + *ibconn = ib_conn; + return 0; +} + + /** + * starts the process of connecting to the target + * sleeps untill the connection is established or rejected + */ +int iser_connect(struct iser_conn *ib_conn, + struct sockaddr_in *src_addr, + struct sockaddr_in *dst_addr, + int non_blocking) +{ + struct sockaddr *src, *dst; + int err = 0; + + sprintf(ib_conn->name,"%d.%d.%d.%d:%d", + NIPQUAD(dst_addr->sin_addr.s_addr), dst_addr->sin_port); + + /* the device is known only --after-- address resolution */ + ib_conn->device = NULL; + + iser_err("connecting to: %d.%d.%d.%d, port 0x%x\n", + NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port); + + ib_conn->state = ISER_CONN_PENDING; + + ib_conn->cma_id = rdma_create_id(iser_cma_handler, + (void *)ib_conn, + RDMA_PS_TCP); + if (IS_ERR(ib_conn->cma_id)) { + err = PTR_ERR(ib_conn->cma_id); + iser_err("rdma_create_id failed: %d\n", err); + goto id_failure; + } + + src = (struct sockaddr *)src_addr; + dst = (struct sockaddr *)dst_addr; + err = rdma_resolve_addr(ib_conn->cma_id, src, dst, 1000); + if (err) { + iser_err("rdma_resolve_addr failed: %d\n", err); + goto addr_failure; + } + + if (!non_blocking) { + wait_event_interruptible(ib_conn->wait, + (ib_conn->state != ISER_CONN_PENDING)); + + if (ib_conn->state != ISER_CONN_UP) { + err = -EIO; + goto connect_failure; + } + } + + mutex_lock(&ig.connlist_mutex); + list_add(&ib_conn->conn_list, &ig.connlist); + mutex_unlock(&ig.connlist_mutex); + return 0; + +id_failure: + ib_conn->cma_id = NULL; +addr_failure: + ib_conn->state = ISER_CONN_DOWN; +connect_failure: + iser_conn_release(ib_conn); + return err; +} + +/** + * Frees all conn objects and deallocs conn descriptor + */ +void iser_conn_release(struct iser_conn *ib_conn) +{ + struct iser_device *device = ib_conn->device; + + BUG_ON(ib_conn->state != ISER_CONN_DOWN); + + mutex_lock(&ig.connlist_mutex); + list_del(&ib_conn->conn_list); + mutex_unlock(&ig.connlist_mutex); + + iser_free_ib_conn_res(ib_conn); + ib_conn->device = NULL; + /* on EVENT_ADDR_ERROR there's no device yet for this conn */ + if (device != NULL) + iser_device_try_release(device); + kfree(ib_conn); +} + + +/** + * iser_reg_page_vec - Register physical memory + * + * returns: 0 on success, errno code on failure + */ +int iser_reg_page_vec(struct iser_conn *ib_conn, + struct iser_page_vec *page_vec, + struct iser_mem_reg *mem_reg) +{ + struct ib_pool_fmr *mem; + u64 io_addr; + u64 *page_list; + int status; + + page_list = page_vec->pages; + io_addr = page_list[0]; + + mem = ib_fmr_pool_map_phys(ib_conn->fmr_pool, + page_list, + page_vec->length, + &io_addr); + + if (IS_ERR(mem)) { + status = (int)PTR_ERR(mem); + iser_err("ib_fmr_pool_map_phys failed: %d\n", status); + return status; + } + + mem_reg->lkey = mem->fmr->lkey; + mem_reg->rkey = mem->fmr->rkey; + mem_reg->len = page_vec->length * PAGE_SIZE; + mem_reg->va = io_addr; + mem_reg->mem_h = (void *)mem; + + mem_reg->va += page_vec->offset; + mem_reg->len = page_vec->data_size; + + iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, " + "entry[0]: (0x%08lx,%ld)] -> " + "[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n", + page_vec, page_vec->length, + (unsigned long)page_vec->pages[0], + (unsigned long)page_vec->data_size, + (unsigned int)mem_reg->lkey, mem_reg->mem_h, + (unsigned long)mem_reg->va, (unsigned long)mem_reg->len); + return 0; +} + +/** + * Unregister (previosuly registered) memory. + */ +void iser_unreg_mem(struct iser_mem_reg *reg) +{ + int ret; + + iser_dbg("PHYSICAL Mem.Unregister mem_h %p\n",reg->mem_h); + + ret = ib_fmr_pool_unmap((struct ib_pool_fmr *)reg->mem_h); + if (ret) + iser_err("ib_fmr_pool_unmap failed %d\n", ret); + + reg->mem_h = NULL; +} + +/** + * iser_dto_to_iov - builds IOV from a dto descriptor + */ +static void iser_dto_to_iov(struct iser_dto *dto, struct ib_sge *iov, int iov_len) +{ + int i; + struct ib_sge *sge; + struct iser_regd_buf *regd_buf; + + if (dto->regd_vector_len > iov_len) { + iser_err("iov size %d too small for posting dto of len %d\n", + iov_len, dto->regd_vector_len); + BUG(); + } + + for (i = 0; i < dto->regd_vector_len; i++) { + sge = &iov[i]; + regd_buf = dto->regd[i]; + + sge->addr = regd_buf->reg.va; + sge->length = regd_buf->reg.len; + sge->lkey = regd_buf->reg.lkey; + + if (dto->used_sz[i] > 0) /* Adjust size */ + sge->length = dto->used_sz[i]; + + /* offset and length should not exceed the regd buf length */ + if (sge->length + dto->offset[i] > regd_buf->reg.len) { + iser_err("Used len:%ld + offset:%d, exceed reg.buf.len:" + "%ld in dto:0x%p [%d], va:0x%08lX\n", + (unsigned long)sge->length, dto->offset[i], + (unsigned long)regd_buf->reg.len, dto, i, + (unsigned long)sge->addr); + BUG(); + } + + sge->addr += dto->offset[i]; /* Adjust offset */ + } +} + +/** + * iser_post_recv - Posts a receive buffer. + * + * returns 0 on success, -1 on failure + */ +int iser_post_recv(struct iser_desc *rx_desc) +{ + int ib_ret, ret_val = 0; + struct ib_recv_wr recv_wr, *recv_wr_failed; + struct ib_sge iov[2]; + struct iser_conn *ib_conn; + struct iser_dto *recv_dto = &rx_desc->dto; + + /* Retrieve conn */ + ib_conn = recv_dto->conn->ib_conn; + + iser_dto_to_iov(recv_dto, iov, 2); + + recv_wr.next = NULL; + recv_wr.sg_list = iov; + recv_wr.num_sge = recv_dto->regd_vector_len; + recv_wr.wr_id = (unsigned long)rx_desc; + + atomic_inc(&ib_conn->post_recv_buf_count); + ib_ret = ib_post_recv(ib_conn->qp, &recv_wr, &recv_wr_failed); + if (ib_ret) { + iser_err("ib_post_recv failed ret=%d\n", ib_ret); + atomic_dec(&ib_conn->post_recv_buf_count); + ret_val = -1; + } + + return ret_val; +} + +/** + * iser_start_send - Initiate a Send DTO operation + * + * returns 0 on success, -1 on failure + */ +int iser_post_send(struct iser_desc *tx_desc) +{ + int ib_ret, ret_val = 0; + struct ib_send_wr send_wr, *send_wr_failed; + struct ib_sge iov[MAX_REGD_BUF_VECTOR_LEN]; + struct iser_conn *ib_conn; + struct iser_dto *dto = &tx_desc->dto; + + ib_conn = dto->conn->ib_conn; + + iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN); + + send_wr.next = NULL; + send_wr.wr_id = (unsigned long)tx_desc; + send_wr.sg_list = iov; + send_wr.num_sge = dto->regd_vector_len; + send_wr.opcode = IB_WR_SEND; + send_wr.send_flags = dto->notify_enable ? IB_SEND_SIGNALED : 0; + + atomic_inc(&ib_conn->post_send_buf_count); + + ib_ret = ib_post_send(ib_conn->qp, &send_wr, &send_wr_failed); + if (ib_ret) { + iser_err("Failed to start SEND DTO, dto: 0x%p, IOV len: %d\n", + dto, dto->regd_vector_len); + iser_err("ib_post_send failed, ret:%d\n", ib_ret); + atomic_dec(&ib_conn->post_send_buf_count); + ret_val = -1; + } + + return ret_val; +} + +static void iser_comp_error_worker(void *data) +{ + struct iser_conn *ib_conn = data; + + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, + ISCSI_ERR_CONN_FAILED); + + /* complete the termination process if disconnect event was delivered * + * note there are no more non completed posts to the QP */ + if (ib_conn->disc_evt_flag) { + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); + } +} + +static void iser_handle_comp_error(struct iser_desc *desc) +{ + struct iser_dto *dto = &desc->dto; + struct iser_conn *ib_conn = dto->conn->ib_conn; + + iser_dto_buffs_release(dto); + + if (desc->type == ISCSI_RX) { + kfree(desc->data); + kmem_cache_free(ig.desc_cache, desc); + atomic_dec(&ib_conn->post_recv_buf_count); + } else { /* type is TX control/command/dataout */ + if (desc->type == ISCSI_TX_DATAOUT) + kmem_cache_free(ig.desc_cache, desc); + atomic_dec(&ib_conn->post_send_buf_count); + } + + if (atomic_read(&ib_conn->post_recv_buf_count) == 0 && + atomic_read(&ib_conn->post_send_buf_count) == 0) + schedule_work(&ib_conn->comperror_work); +} + +static void iser_cq_tasklet_fn(unsigned long data) +{ + struct iser_device *device = (struct iser_device *)data; + struct ib_cq *cq = device->cq; + struct ib_wc wc; + struct iser_desc *desc; + unsigned long xfer_len; + + while (ib_poll_cq(cq, 1, &wc) == 1) { + desc = (struct iser_desc *) (unsigned long) wc.wr_id; + BUG_ON(desc == NULL); + + if (wc.status == IB_WC_SUCCESS) { + if (desc->type == ISCSI_RX) { + xfer_len = (unsigned long)wc.byte_len; + iser_rcv_completion(desc, xfer_len); + } else /* type == ISCSI_TX_CONTROL/SCSI_CMD/DOUT */ + iser_snd_completion(desc); + } else { + iser_err("comp w. error op %d status %d\n",desc->type,wc.status); + iser_handle_comp_error(desc); + } + } + /* #warning "it is assumed here that arming CQ only once its empty" * + * " would not cause interrupts to be missed" */ + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); +} + +static void iser_cq_callback(struct ib_cq *cq, void *cq_context) +{ + struct iser_device *device = (struct iser_device *)cq_context; + + tasklet_schedule(&device->cq_tasklet); +} diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 9cbdffa..a01b73b 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -62,6 +62,13 @@ MODULE_DESCRIPTION("InfiniBand SCSI RDMA "v" DRV_VERSION " (" DRV_RELDATE ")"); MODULE_LICENSE("Dual BSD/GPL"); +static int srp_sg_tablesize = SRP_DEF_SG_TABLESIZE; +static int srp_max_iu_len; + +module_param(srp_sg_tablesize, int, 0444); +MODULE_PARM_DESC(srp_sg_tablesize, + "Max number of gather/scatter entries per I/O (default is 12)"); + static int topspin_workarounds = 1; module_param(topspin_workarounds, int, 0444); @@ -105,7 +112,8 @@ static struct srp_iu *srp_alloc_iu(struc if (!iu->buf) goto out_free_iu; - iu->dma = dma_map_single(host->dev->dma_device, iu->buf, size, direction); + iu->dma = dma_map_single(host->dev->dev->dma_device, + iu->buf, size, direction); if (dma_mapping_error(iu->dma)) goto out_free_buf; @@ -127,7 +135,8 @@ static void srp_free_iu(struct srp_host if (!iu) return; - dma_unmap_single(host->dev->dma_device, iu->dma, iu->size, iu->direction); + dma_unmap_single(host->dev->dev->dma_device, + iu->dma, iu->size, iu->direction); kfree(iu->buf); kfree(iu); } @@ -147,7 +156,7 @@ static int srp_init_qp(struct srp_target if (!attr) return -ENOMEM; - ret = ib_find_cached_pkey(target->srp_host->dev, + ret = ib_find_cached_pkey(target->srp_host->dev->dev, target->srp_host->port, be16_to_cpu(target->path.pkey), &attr->pkey_index); @@ -179,7 +188,7 @@ static int srp_create_target_ib(struct s if (!init_attr) return -ENOMEM; - target->cq = ib_create_cq(target->srp_host->dev, srp_completion, + target->cq = ib_create_cq(target->srp_host->dev->dev, srp_completion, NULL, target, SRP_CQ_SIZE); if (IS_ERR(target->cq)) { ret = PTR_ERR(target->cq); @@ -198,7 +207,7 @@ static int srp_create_target_ib(struct s init_attr->send_cq = target->cq; init_attr->recv_cq = target->cq; - target->qp = ib_create_qp(target->srp_host->pd, init_attr); + target->qp = ib_create_qp(target->srp_host->dev->pd, init_attr); if (IS_ERR(target->qp)) { ret = PTR_ERR(target->qp); ib_destroy_cq(target->cq); @@ -250,7 +259,7 @@ static int srp_lookup_path(struct srp_ta init_completion(&target->done); - target->path_query_id = ib_sa_path_rec_get(target->srp_host->dev, + target->path_query_id = ib_sa_path_rec_get(target->srp_host->dev->dev, target->srp_host->port, &target->path, IB_SA_PATH_REC_DGID | @@ -309,7 +318,7 @@ static int srp_send_req(struct srp_targe req->priv.opcode = SRP_LOGIN_REQ; req->priv.tag = 0; - req->priv.req_it_iu_len = cpu_to_be32(SRP_MAX_IU_LEN); + req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id, 16); @@ -359,9 +368,9 @@ static void srp_remove_work(void *target target->state = SRP_TARGET_REMOVED; spin_unlock_irq(target->scsi_host->host_lock); - mutex_lock(&target->srp_host->target_mutex); + spin_lock(&target->srp_host->target_lock); list_del(&target->list); - mutex_unlock(&target->srp_host->target_mutex); + spin_unlock(&target->srp_host->target_lock); scsi_remove_host(target->scsi_host); ib_destroy_cm_id(target->cm_id); @@ -421,6 +430,11 @@ static void srp_unmap_data(struct scsi_c scmnd->sc_data_direction != DMA_FROM_DEVICE)) return; + if (req->fmr) { + ib_fmr_pool_unmap(req->fmr); + req->fmr = NULL; + } + /* * This handling of non-SG commands can be killed when the * SCSI midlayer no longer generates non-SG commands. @@ -433,7 +447,7 @@ static void srp_unmap_data(struct scsi_c scat = &req->fake_sg; } - dma_unmap_sg(target->srp_host->dev->dma_device, scat, nents, + dma_unmap_sg(target->srp_host->dev->dev->dma_device, scat, nents, scmnd->sc_data_direction); } @@ -459,7 +473,7 @@ static int srp_reconnect_target(struct s * Now get a new local CM ID so that we avoid confusing the * target in case things are really fouled up. */ - new_cm_id = ib_create_cm_id(target->srp_host->dev, + new_cm_id = ib_create_cm_id(target->srp_host->dev->dev, srp_cm_handler, target); if (IS_ERR(new_cm_id)) { ret = PTR_ERR(new_cm_id); @@ -528,14 +542,79 @@ err: return ret; } +static int srp_map_fmr(struct srp_device *dev, struct scatterlist *scat, + int sg_cnt, struct srp_request *req, + struct srp_direct_buf *buf) +{ + u64 io_addr = 0; + u64 *dma_pages; + u32 len; + int page_cnt; + int i, j; + int ret; + + if (!dev->fmr_pool) + return -ENODEV; + + len = page_cnt = 0; + for (i = 0; i < sg_cnt; ++i) { + if (sg_dma_address(&scat[i]) & ~dev->fmr_page_mask) { + if (i > 0) + return -EINVAL; + else + ++page_cnt; + } + if ((sg_dma_address(&scat[i]) + sg_dma_len(&scat[i])) & + ~dev->fmr_page_mask) { + if (i < sg_cnt - 1) + return -EINVAL; + else + ++page_cnt; + } + + len += sg_dma_len(&scat[i]); + } + + page_cnt += len >> dev->fmr_page_shift; + if (page_cnt > SRP_FMR_SIZE) + return -ENOMEM; + + dma_pages = kmalloc(sizeof (u64) * page_cnt, GFP_ATOMIC); + if (!dma_pages) + return -ENOMEM; + + page_cnt = 0; + for (i = 0; i < sg_cnt; ++i) + for (j = 0; j < sg_dma_len(&scat[i]); j += dev->fmr_page_size) + dma_pages[page_cnt++] = + (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; + + req->fmr = ib_fmr_pool_map_phys(dev->fmr_pool, + dma_pages, page_cnt, &io_addr); + if (IS_ERR(req->fmr)) { + ret = PTR_ERR(req->fmr); + goto out; + } + + buf->va = cpu_to_be64(sg_dma_address(&scat[0]) & ~dev->fmr_page_mask); + buf->key = cpu_to_be32(req->fmr->fmr->rkey); + buf->len = cpu_to_be32(len); + + ret = 0; + +out: + kfree(dma_pages); + + return ret; +} + static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target, struct srp_request *req) { struct scatterlist *scat; struct srp_cmd *cmd = req->cmd->buf; int len, nents, count; - int i; - u8 fmt; + u8 fmt = SRP_DATA_DESC_DIRECT; if (!scmnd->request_buffer || scmnd->sc_data_direction == DMA_NONE) return sizeof (struct srp_cmd); @@ -560,53 +639,63 @@ static int srp_map_data(struct scsi_cmnd sg_init_one(scat, scmnd->request_buffer, scmnd->request_bufflen); } - count = dma_map_sg(target->srp_host->dev->dma_device, scat, nents, - scmnd->sc_data_direction); + count = dma_map_sg(target->srp_host->dev->dev->dma_device, + scat, nents, scmnd->sc_data_direction); + + fmt = SRP_DATA_DESC_DIRECT; + len = sizeof (struct srp_cmd) + sizeof (struct srp_direct_buf); if (count == 1) { + /* + * The midlayer only generated a single gather/scatter + * entry, or DMA mapping coalesced everything to a + * single entry. So a direct descriptor along with + * the DMA MR suffices. + */ struct srp_direct_buf *buf = (void *) cmd->add_data; - fmt = SRP_DATA_DESC_DIRECT; - buf->va = cpu_to_be64(sg_dma_address(scat)); - buf->key = cpu_to_be32(target->srp_host->mr->rkey); + buf->key = cpu_to_be32(target->srp_host->dev->mr->rkey); buf->len = cpu_to_be32(sg_dma_len(scat)); - - len = sizeof (struct srp_cmd) + - sizeof (struct srp_direct_buf); - } else { + } else if (srp_map_fmr(target->srp_host->dev, scat, count, req, + (void *) cmd->add_data)) { + /* + * FMR mapping failed, and the scatterlist has more + * than one entry. Generate an indirect memory + * descriptor. + */ struct srp_indirect_buf *buf = (void *) cmd->add_data; u32 datalen = 0; + int i; fmt = SRP_DATA_DESC_INDIRECT; + len = sizeof (struct srp_cmd) + + sizeof (struct srp_indirect_buf) + + count * sizeof (struct srp_direct_buf); + + for (i = 0; i < count; ++i) { + buf->desc_list[i].va = + cpu_to_be64(sg_dma_address(&scat[i])); + buf->desc_list[i].key = + cpu_to_be32(target->srp_host->dev->mr->rkey); + buf->desc_list[i].len = + cpu_to_be32(sg_dma_len(&scat[i])); + datalen += sg_dma_len(&scat[i]); + } if (scmnd->sc_data_direction == DMA_TO_DEVICE) cmd->data_out_desc_cnt = count; else cmd->data_in_desc_cnt = count; - buf->table_desc.va = cpu_to_be64(req->cmd->dma + - sizeof *cmd + - sizeof *buf); + buf->table_desc.va = + cpu_to_be64(req->cmd->dma + sizeof *cmd + sizeof *buf); buf->table_desc.key = - cpu_to_be32(target->srp_host->mr->rkey); + cpu_to_be32(target->srp_host->dev->mr->rkey); buf->table_desc.len = cpu_to_be32(count * sizeof (struct srp_direct_buf)); - for (i = 0; i < count; ++i) { - buf->desc_list[i].va = cpu_to_be64(sg_dma_address(&scat[i])); - buf->desc_list[i].key = - cpu_to_be32(target->srp_host->mr->rkey); - buf->desc_list[i].len = cpu_to_be32(sg_dma_len(&scat[i])); - - datalen += sg_dma_len(&scat[i]); - } - buf->len = cpu_to_be32(datalen); - - len = sizeof (struct srp_cmd) + - sizeof (struct srp_indirect_buf) + - count * sizeof (struct srp_direct_buf); } if (scmnd->sc_data_direction == DMA_TO_DEVICE) @@ -689,7 +778,7 @@ static void srp_handle_recv(struct srp_t iu = target->rx_ring[wc->wr_id & ~SRP_OP_RECV]; - dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, + dma_sync_single_for_cpu(target->srp_host->dev->dev->dma_device, iu->dma, target->max_ti_iu_len, DMA_FROM_DEVICE); opcode = *(u8 *) iu->buf; @@ -726,7 +815,7 @@ static void srp_handle_recv(struct srp_t break; } - dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, + dma_sync_single_for_device(target->srp_host->dev->dev->dma_device, iu->dma, target->max_ti_iu_len, DMA_FROM_DEVICE); } @@ -770,7 +859,7 @@ static int __srp_post_recv(struct srp_ta list.addr = iu->dma; list.length = iu->size; - list.lkey = target->srp_host->mr->lkey; + list.lkey = target->srp_host->dev->mr->lkey; wr.next = NULL; wr.sg_list = &list; @@ -828,7 +917,7 @@ static int __srp_post_send(struct srp_ta list.addr = iu->dma; list.length = len; - list.lkey = target->srp_host->mr->lkey; + list.lkey = target->srp_host->dev->mr->lkey; wr.next = NULL; wr.wr_id = target->tx_head & SRP_SQ_SIZE; @@ -870,8 +959,8 @@ static int srp_queuecommand(struct scsi_ if (!iu) goto err; - dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, - SRP_MAX_IU_LEN, DMA_TO_DEVICE); + dma_sync_single_for_cpu(target->srp_host->dev->dev->dma_device, iu->dma, + srp_max_iu_len, DMA_TO_DEVICE); req = list_entry(target->free_reqs.next, struct srp_request, list); @@ -903,8 +992,8 @@ static int srp_queuecommand(struct scsi_ goto err_unmap; } - dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, - SRP_MAX_IU_LEN, DMA_TO_DEVICE); + dma_sync_single_for_device(target->srp_host->dev->dev->dma_device, iu->dma, + srp_max_iu_len, DMA_TO_DEVICE); if (__srp_post_send(target, iu, len)) { printk(KERN_ERR PFX "Send failed\n"); @@ -936,7 +1025,7 @@ static int srp_alloc_iu_bufs(struct srp_ for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { target->tx_ring[i] = srp_alloc_iu(target->srp_host, - SRP_MAX_IU_LEN, + srp_max_iu_len, GFP_KERNEL, DMA_TO_DEVICE); if (!target->tx_ring[i]) goto err; @@ -1354,7 +1443,6 @@ static struct scsi_host_template srp_tem .eh_host_reset_handler = srp_reset_host, .can_queue = SRP_SQ_SIZE, .this_id = -1, - .sg_tablesize = SRP_MAX_INDIRECT, .cmd_per_lun = SRP_SQ_SIZE, .use_clustering = ENABLE_CLUSTERING, .shost_attrs = srp_host_attrs @@ -1365,18 +1453,17 @@ static int srp_add_target(struct srp_hos sprintf(target->target_name, "SRP.T10:%016llX", (unsigned long long) be64_to_cpu(target->id_ext)); - if (scsi_add_host(target->scsi_host, host->dev->dma_device)) + if (scsi_add_host(target->scsi_host, host->dev->dev->dma_device)) return -ENODEV; - mutex_lock(&host->target_mutex); + spin_lock(&host->target_lock); list_add_tail(&target->list, &host->target_list); - mutex_unlock(&host->target_mutex); + spin_unlock(&host->target_lock); target->state = SRP_TARGET_LIVE; - /* XXX: are we supposed to have a definition of SCAN_WILD_CARD ?? */ scsi_scan_target(&target->scsi_host->shost_gendev, - 0, target->scsi_id, ~0, 0); + 0, target->scsi_id, SCAN_WILD_CARD, 0); return 0; } @@ -1410,6 +1497,7 @@ enum { SRP_OPT_PKEY = 1 << 3, SRP_OPT_SERVICE_ID = 1 << 4, SRP_OPT_MAX_SECT = 1 << 5, + SRP_OPT_MAX_CMD_PER_LUN = 1 << 6, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -1418,13 +1506,14 @@ enum { }; static match_table_t srp_opt_tokens = { - { SRP_OPT_ID_EXT, "id_ext=%s" }, - { SRP_OPT_IOC_GUID, "ioc_guid=%s" }, - { SRP_OPT_DGID, "dgid=%s" }, - { SRP_OPT_PKEY, "pkey=%x" }, - { SRP_OPT_SERVICE_ID, "service_id=%s" }, - { SRP_OPT_MAX_SECT, "max_sect=%d" }, - { SRP_OPT_ERR, NULL } + { SRP_OPT_ID_EXT, "id_ext=%s" }, + { SRP_OPT_IOC_GUID, "ioc_guid=%s" }, + { SRP_OPT_DGID, "dgid=%s" }, + { SRP_OPT_PKEY, "pkey=%x" }, + { SRP_OPT_SERVICE_ID, "service_id=%s" }, + { SRP_OPT_MAX_SECT, "max_sect=%d" }, + { SRP_OPT_MAX_CMD_PER_LUN, "max_cmd_per_lun=%d" }, + { SRP_OPT_ERR, NULL } }; static int srp_parse_options(const char *buf, struct srp_target_port *target) @@ -1500,6 +1589,14 @@ static int srp_parse_options(const char target->scsi_host->max_sectors = token; break; + case SRP_OPT_MAX_CMD_PER_LUN: + if (match_int(args, &token)) { + printk(KERN_WARNING PFX "bad max cmd_per_lun parameter '%s'\n", p); + goto out; + } + target->scsi_host->cmd_per_lun = min(token, SRP_SQ_SIZE); + break; + default: printk(KERN_WARNING PFX "unknown parameter or missing value " "'%s' in target creation request\n", p); @@ -1558,7 +1655,7 @@ static ssize_t srp_create_target(struct if (ret) goto err; - ib_get_cached_gid(host->dev, host->port, 0, &target->path.sgid); + ib_get_cached_gid(host->dev->dev, host->port, 0, &target->path.sgid); printk(KERN_DEBUG PFX "new target: id_ext %016llx ioc_guid %016llx pkey %04x " "service_id %016llx dgid %04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", @@ -1579,7 +1676,7 @@ static ssize_t srp_create_target(struct if (ret) goto err; - target->cm_id = ib_create_cm_id(host->dev, srp_cm_handler, target); + target->cm_id = ib_create_cm_id(host->dev->dev, srp_cm_handler, target); if (IS_ERR(target->cm_id)) { ret = PTR_ERR(target->cm_id); goto err_free; @@ -1619,7 +1716,7 @@ static ssize_t show_ibdev(struct class_d struct srp_host *host = container_of(class_dev, struct srp_host, class_dev); - return sprintf(buf, "%s\n", host->dev->name); + return sprintf(buf, "%s\n", host->dev->dev->name); } static CLASS_DEVICE_ATTR(ibdev, S_IRUGO, show_ibdev, NULL); @@ -1634,7 +1731,7 @@ static ssize_t show_port(struct class_de static CLASS_DEVICE_ATTR(port, S_IRUGO, show_port, NULL); -static struct srp_host *srp_add_port(struct ib_device *device, u8 port) +static struct srp_host *srp_add_port(struct srp_device *device, u8 port) { struct srp_host *host; @@ -1643,32 +1740,21 @@ static struct srp_host *srp_add_port(str return NULL; INIT_LIST_HEAD(&host->target_list); - mutex_init(&host->target_mutex); + spin_lock_init(&host->target_lock); init_completion(&host->released); host->dev = device; host->port = port; host->initiator_port_id[7] = port; - memcpy(host->initiator_port_id + 8, &device->node_guid, 8); - - host->pd = ib_alloc_pd(device); - if (IS_ERR(host->pd)) - goto err_free; - - host->mr = ib_get_dma_mr(host->pd, - IB_ACCESS_LOCAL_WRITE | - IB_ACCESS_REMOTE_READ | - IB_ACCESS_REMOTE_WRITE); - if (IS_ERR(host->mr)) - goto err_pd; + memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8); host->class_dev.class = &srp_class; - host->class_dev.dev = device->dma_device; + host->class_dev.dev = device->dev->dma_device; snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", - device->name, port); + device->dev->name, port); if (class_device_register(&host->class_dev)) - goto err_mr; + goto free_host; if (class_device_create_file(&host->class_dev, &class_device_attr_add_target)) goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_ibdev)) @@ -1681,13 +1767,7 @@ static struct srp_host *srp_add_port(str err_class: class_device_unregister(&host->class_dev); -err_mr: - ib_dereg_mr(host->mr); - -err_pd: - ib_dealloc_pd(host->pd); - -err_free: +free_host: kfree(host); return NULL; @@ -1695,15 +1775,62 @@ err_free: static void srp_add_one(struct ib_device *device) { - struct list_head *dev_list; + struct srp_device *srp_dev; + struct ib_device_attr *dev_attr; + struct ib_fmr_pool_param fmr_param; struct srp_host *host; int s, e, p; - dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); - if (!dev_list) + dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL); + if (!dev_attr) return; - INIT_LIST_HEAD(dev_list); + if (ib_query_device(device, dev_attr)) { + printk(KERN_WARNING PFX "Query device failed for %s\n", + device->name); + goto free_attr; + } + + srp_dev = kmalloc(sizeof *srp_dev, GFP_KERNEL); + if (!srp_dev) + goto free_attr; + + /* + * Use the smallest page size supported by the HCA, down to a + * minimum of 512 bytes (which is the smallest sector that a + * SCSI command will ever carry). + */ + srp_dev->fmr_page_shift = max(9, ffs(dev_attr->page_size_cap) - 1); + srp_dev->fmr_page_size = 1 << srp_dev->fmr_page_shift; + srp_dev->fmr_page_mask = ~((unsigned long) srp_dev->fmr_page_size - 1); + + INIT_LIST_HEAD(&srp_dev->dev_list); + + srp_dev->dev = device; + srp_dev->pd = ib_alloc_pd(device); + if (IS_ERR(srp_dev->pd)) + goto free_dev; + + srp_dev->mr = ib_get_dma_mr(srp_dev->pd, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + if (IS_ERR(srp_dev->mr)) + goto err_pd; + + memset(&fmr_param, 0, sizeof fmr_param); + fmr_param.pool_size = SRP_FMR_POOL_SIZE; + fmr_param.dirty_watermark = SRP_FMR_DIRTY_SIZE; + fmr_param.cache = 1; + fmr_param.max_pages_per_fmr = SRP_FMR_SIZE; + fmr_param.page_shift = srp_dev->fmr_page_shift; + fmr_param.access = (IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ); + + srp_dev->fmr_pool = ib_create_fmr_pool(srp_dev->pd, &fmr_param); + if (IS_ERR(srp_dev->fmr_pool)) + srp_dev->fmr_pool = NULL; if (device->node_type == IB_NODE_SWITCH) { s = 0; @@ -1714,25 +1841,35 @@ static void srp_add_one(struct ib_device } for (p = s; p <= e; ++p) { - host = srp_add_port(device, p); + host = srp_add_port(srp_dev, p); if (host) - list_add_tail(&host->list, dev_list); + list_add_tail(&host->list, &srp_dev->dev_list); } - ib_set_client_data(device, &srp_client, dev_list); + ib_set_client_data(device, &srp_client, srp_dev); + + goto free_attr; + +err_pd: + ib_dealloc_pd(srp_dev->pd); + +free_dev: + kfree(srp_dev); + +free_attr: + kfree(dev_attr); } static void srp_remove_one(struct ib_device *device) { - struct list_head *dev_list; + struct srp_device *srp_dev; struct srp_host *host, *tmp_host; LIST_HEAD(target_list); struct srp_target_port *target, *tmp_target; - unsigned long flags; - dev_list = ib_get_client_data(device, &srp_client); + srp_dev = ib_get_client_data(device, &srp_client); - list_for_each_entry_safe(host, tmp_host, dev_list, list) { + list_for_each_entry_safe(host, tmp_host, &srp_dev->dev_list, list) { class_device_unregister(&host->class_dev); /* * Wait for the sysfs entry to go away, so that no new @@ -1744,15 +1881,13 @@ static void srp_remove_one(struct ib_dev * Mark all target ports as removed, so we stop queueing * commands and don't try to reconnect. */ - mutex_lock(&host->target_mutex); - list_for_each_entry_safe(target, tmp_target, - &host->target_list, list) { - spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state != SRP_TARGET_REMOVED) - target->state = SRP_TARGET_REMOVED; - spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + spin_lock(&host->target_lock); + list_for_each_entry(target, &host->target_list, list) { + spin_lock_irq(target->scsi_host->host_lock); + target->state = SRP_TARGET_REMOVED; + spin_unlock_irq(target->scsi_host->host_lock); } - mutex_unlock(&host->target_mutex); + spin_unlock(&host->target_lock); /* * Wait for any reconnection tasks that may have @@ -1770,18 +1905,26 @@ static void srp_remove_one(struct ib_dev scsi_host_put(target->scsi_host); } - ib_dereg_mr(host->mr); - ib_dealloc_pd(host->pd); kfree(host); } - kfree(dev_list); + if (srp_dev->fmr_pool) + ib_destroy_fmr_pool(srp_dev->fmr_pool); + ib_dereg_mr(srp_dev->mr); + ib_dealloc_pd(srp_dev->pd); + + kfree(srp_dev); } static int __init srp_init_module(void) { int ret; + srp_template.sg_tablesize = srp_sg_tablesize; + srp_max_iu_len = (sizeof (struct srp_cmd) + + sizeof (struct srp_indirect_buf) + + srp_sg_tablesize * 16); + ret = class_register(&srp_class); if (ret) { printk(KERN_ERR PFX "couldn't register class infiniband_srp\n"); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index c5cd43a..033a447 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -46,6 +46,7 @@ #include #include #include #include +#include enum { SRP_PATH_REC_TIMEOUT_MS = 1000, @@ -55,20 +56,21 @@ enum { SRP_DLID_REDIRECT = 2, SRP_MAX_LUN = 512, - SRP_MAX_IU_LEN = 256, + SRP_DEF_SG_TABLESIZE = 12, SRP_RQ_SHIFT = 6, SRP_RQ_SIZE = 1 << SRP_RQ_SHIFT, SRP_SQ_SIZE = SRP_RQ_SIZE - 1, SRP_CQ_SIZE = SRP_SQ_SIZE + SRP_RQ_SIZE, - SRP_TAG_TSK_MGMT = 1 << (SRP_RQ_SHIFT + 1) + SRP_TAG_TSK_MGMT = 1 << (SRP_RQ_SHIFT + 1), + + SRP_FMR_SIZE = 256, + SRP_FMR_POOL_SIZE = 1024, + SRP_FMR_DIRTY_SIZE = SRP_FMR_POOL_SIZE / 4 }; #define SRP_OP_RECV (1 << 31) -#define SRP_MAX_INDIRECT ((SRP_MAX_IU_LEN - \ - sizeof (struct srp_cmd) - \ - sizeof (struct srp_indirect_buf)) / 16) enum srp_target_state { SRP_TARGET_LIVE, @@ -77,15 +79,24 @@ enum srp_target_state { SRP_TARGET_REMOVED }; -struct srp_host { - u8 initiator_port_id[16]; +struct srp_device { + struct list_head dev_list; struct ib_device *dev; - u8 port; struct ib_pd *pd; struct ib_mr *mr; + struct ib_fmr_pool *fmr_pool; + int fmr_page_shift; + int fmr_page_size; + unsigned long fmr_page_mask; +}; + +struct srp_host { + u8 initiator_port_id[16]; + struct srp_device *dev; + u8 port; struct class_device class_dev; struct list_head target_list; - struct mutex target_mutex; + spinlock_t target_lock; struct completion released; struct list_head list; }; @@ -95,6 +106,7 @@ struct srp_request { struct scsi_cmnd *scmnd; struct srp_iu *cmd; struct srp_iu *tsk_mgmt; + struct ib_pool_fmr *fmr; /* * Fake scatterlist used when scmnd->use_sg==0. Can be killed * when the SCSI midlayer no longer generates non-SG commands. diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 81803a1..ee80ff1 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -33,7 +33,8 @@ obj-$(CONFIG_SCSI_FC_ATTRS) += scsi_tra obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o obj-$(CONFIG_SCSI_SAS_ATTRS) += scsi_transport_sas.o -obj-$(CONFIG_ISCSI_TCP) += iscsi_tcp.o +obj-$(CONFIG_ISCSI_TCP) += libiscsi.o iscsi_tcp.o +obj-$(CONFIG_INFINIBAND_ISER) += libiscsi.o obj-$(CONFIG_SCSI_AMIGA7XX) += amiga7xx.o 53c7xx.o obj-$(CONFIG_A3000_SCSI) += a3000.o wd33c93.o obj-$(CONFIG_A2091_SCSI) += a2091.o wd33c93.o diff --git a/drivers/scsi/dc395x.c b/drivers/scsi/dc395x.c index cbf8252..0c56095 100644 --- a/drivers/scsi/dc395x.c +++ b/drivers/scsi/dc395x.c @@ -230,13 +230,12 @@ struct ScsiReqBlk { struct scsi_cmnd *cmd; struct SGentry *segment_x; /* Linear array of hw sg entries (up to 64 entries) */ - u32 sg_bus_addr; /* Bus address of sg list (ie, of segment_x) */ + dma_addr_t sg_bus_addr; /* Bus address of sg list (ie, of segment_x) */ u8 sg_count; /* No of HW sg entries for this request */ u8 sg_index; /* Index of HW sg entry for this request */ - u32 total_xfer_length; /* Total number of bytes remaining to be transfered */ - unsigned char *virt_addr; /* Virtual address of current transfer position */ - + size_t total_xfer_length; /* Total number of bytes remaining to be transfered */ + size_t request_length; /* Total number of bytes in this request */ /* * The sense buffer handling function, request_sense, uses * the first hw sg entry (segment_x[0]) and the transfer @@ -246,8 +245,7 @@ struct ScsiReqBlk { * total_xfer_length in xferred. These values are restored in * pci_unmap_srb_sense. This is the only place xferred is used. */ - unsigned char *virt_addr_req; /* Saved virtual address of the request buffer */ - u32 xferred; /* Saved copy of total_xfer_length */ + size_t xferred; /* Saved copy of total_xfer_length */ u16 state; @@ -977,17 +975,6 @@ static void send_srb(struct AdapterCtlBl } } -static inline void pio_trigger(void) -{ - static int feedback_requested; - - if (!feedback_requested) { - feedback_requested = 1; - printk(KERN_WARNING "%s: Please, contact " - "to help improve support for your system.\n", __FILE__); - } -} - /* Prepare SRB for being sent to Device DCB w/ command *cmd */ static void build_srb(struct scsi_cmnd *cmd, struct DeviceCtlBlk *dcb, struct ScsiReqBlk *srb) @@ -1001,7 +988,6 @@ static void build_srb(struct scsi_cmnd * srb->sg_count = 0; srb->total_xfer_length = 0; srb->sg_bus_addr = 0; - srb->virt_addr = NULL; srb->sg_index = 0; srb->adapter_status = 0; srb->target_status = 0; @@ -1032,7 +1018,6 @@ static void build_srb(struct scsi_cmnd * reqlen, cmd->request_buffer, cmd->use_sg, srb->sg_count); - srb->virt_addr = page_address(sl->page); for (i = 0; i < srb->sg_count; i++) { u32 busaddr = (u32)sg_dma_address(&sl[i]); u32 seglen = (u32)sl[i].length; @@ -1077,12 +1062,14 @@ static void build_srb(struct scsi_cmnd * srb->total_xfer_length++; srb->segment_x[0].length = srb->total_xfer_length; - srb->virt_addr = cmd->request_buffer; + dprintkdbg(DBG_0, "build_srb: [1] len=%d buf=%p use_sg=%d map=%08x\n", srb->total_xfer_length, cmd->request_buffer, cmd->use_sg, srb->segment_x[0].address); } + + srb->request_length = srb->total_xfer_length; } @@ -1414,10 +1401,10 @@ static int dc395x_eh_abort(struct scsi_c } srb = find_cmd(cmd, &dcb->srb_going_list); if (srb) { - dprintkl(KERN_DEBUG, "eh_abort: Command in progress"); + dprintkl(KERN_DEBUG, "eh_abort: Command in progress\n"); /* XXX: Should abort the command here */ } else { - dprintkl(KERN_DEBUG, "eh_abort: Command not found"); + dprintkl(KERN_DEBUG, "eh_abort: Command not found\n"); } return FAILED; } @@ -1976,14 +1963,11 @@ static void sg_verify_length(struct Scsi /* * Compute the next Scatter Gather list index and adjust its length - * and address if necessary; also compute virt_addr + * and address if necessary */ static void sg_update_list(struct ScsiReqBlk *srb, u32 left) { u8 idx; - struct scatterlist *sg; - struct scsi_cmnd *cmd = srb->cmd; - int segment = cmd->use_sg; u32 xferred = srb->total_xfer_length - left; /* bytes transfered */ struct SGentry *psge = srb->segment_x + srb->sg_index; @@ -2016,29 +2000,6 @@ static void sg_update_list(struct ScsiRe psge++; } sg_verify_length(srb); - - /* we need the corresponding virtual address */ - if (!segment || (srb->flag & AUTO_REQSENSE)) { - srb->virt_addr += xferred; - return; - } - - /* We have to walk the scatterlist to find it */ - sg = (struct scatterlist *)cmd->request_buffer; - while (segment--) { - unsigned long mask = - ~((unsigned long)sg->length - 1) & PAGE_MASK; - if ((sg_dma_address(sg) & mask) == (psge->address & mask)) { - srb->virt_addr = (page_address(sg->page) - + psge->address - - (psge->address & PAGE_MASK)); - return; - } - ++sg; - } - - dprintkl(KERN_ERR, "sg_update_list: sg_to_virt failed\n"); - srb->virt_addr = NULL; } @@ -2050,15 +2011,7 @@ static void sg_update_list(struct ScsiRe */ static void sg_subtract_one(struct ScsiReqBlk *srb) { - srb->total_xfer_length--; - srb->segment_x[srb->sg_index].length--; - if (srb->total_xfer_length && - !srb->segment_x[srb->sg_index].length) { - if (debug_enabled(DBG_PIO)) - printk(" (next segment)"); - srb->sg_index++; - sg_update_list(srb, srb->total_xfer_length); - } + sg_update_list(srb, srb->total_xfer_length - 1); } @@ -2118,7 +2071,7 @@ static void data_out_phase0(struct Adapt * If we need more data, the DMA SG list will be freshly set up, anyway */ dprintkdbg(DBG_PIO, "data_out_phase0: " - "DMA{fifcnt=0x%02x fifostat=0x%02x} " + "DMA{fifocnt=0x%02x fifostat=0x%02x} " "SCSI{fifocnt=0x%02x cnt=0x%06x status=0x%04x} total=0x%06x\n", DC395x_read8(acb, TRM_S1040_DMA_FIFOCNT), DC395x_read8(acb, TRM_S1040_DMA_FIFOSTAT), @@ -2239,12 +2192,11 @@ static void data_out_phase1(struct Adapt data_io_transfer(acb, srb, XFERDATAOUT); } - static void data_in_phase0(struct AdapterCtlBlk *acb, struct ScsiReqBlk *srb, u16 *pscsi_status) { u16 scsi_status = *pscsi_status; - u32 d_left_counter = 0; + dprintkdbg(DBG_0, "data_in_phase0: (pid#%li) <%02i-%i>\n", srb->cmd->pid, srb->cmd->device->id, srb->cmd->device->lun); @@ -2262,6 +2214,9 @@ static void data_in_phase0(struct Adapte * seem to be a bad idea, actually. */ if (!(srb->state & SRB_XFERPAD)) { + u32 d_left_counter; + unsigned int sc, fc; + if (scsi_status & PARITYERROR) { dprintkl(KERN_INFO, "data_in_phase0: (pid#%li) " "Parity Error\n", srb->cmd->pid); @@ -2298,18 +2253,19 @@ #endif DC395x_read8(acb, TRM_S1040_DMA_FIFOSTAT)); } /* Now: Check remainig data: The SCSI counters should tell us ... */ - d_left_counter = DC395x_read32(acb, TRM_S1040_SCSI_COUNTER) - + ((DC395x_read8(acb, TRM_S1040_SCSI_FIFOCNT) & 0x1f) + sc = DC395x_read32(acb, TRM_S1040_SCSI_COUNTER); + fc = DC395x_read8(acb, TRM_S1040_SCSI_FIFOCNT); + d_left_counter = sc + ((fc & 0x1f) << ((srb->dcb->sync_period & WIDE_SYNC) ? 1 : 0)); dprintkdbg(DBG_KG, "data_in_phase0: " "SCSI{fifocnt=0x%02x%s ctr=0x%08x} " "DMA{fifocnt=0x%02x fifostat=0x%02x ctr=0x%08x} " "Remain{totxfer=%i scsi_fifo+ctr=%i}\n", - DC395x_read8(acb, TRM_S1040_SCSI_FIFOCNT), + fc, (srb->dcb->sync_period & WIDE_SYNC) ? "words" : "bytes", - DC395x_read32(acb, TRM_S1040_SCSI_COUNTER), - DC395x_read8(acb, TRM_S1040_DMA_FIFOCNT), + sc, + fc, DC395x_read8(acb, TRM_S1040_DMA_FIFOSTAT), DC395x_read32(acb, TRM_S1040_DMA_CXCNT), srb->total_xfer_length, d_left_counter); @@ -2317,40 +2273,79 @@ #if DC395x_LASTPIO /* KG: Less than or equal to 4 bytes can not be transfered via DMA, it seems. */ if (d_left_counter && srb->total_xfer_length <= DC395x_LASTPIO) { + size_t left_io = srb->total_xfer_length; + /*u32 addr = (srb->segment_x[srb->sg_index].address); */ /*sg_update_list (srb, d_left_counter); */ - dprintkdbg(DBG_PIO, "data_in_phase0: PIO (%i %s) to " - "%p for remaining %i bytes:", - DC395x_read8(acb, TRM_S1040_SCSI_FIFOCNT) & 0x1f, + dprintkdbg(DBG_PIO, "data_in_phase0: PIO (%i %s) " + "for remaining %i bytes:", + fc & 0x1f, (srb->dcb->sync_period & WIDE_SYNC) ? "words" : "bytes", - srb->virt_addr, srb->total_xfer_length); if (srb->dcb->sync_period & WIDE_SYNC) DC395x_write8(acb, TRM_S1040_SCSI_CONFIG2, CFG2_WIDEFIFO); - while (DC395x_read8(acb, TRM_S1040_SCSI_FIFOCNT) != 0x40) { - u8 byte = DC395x_read8(acb, TRM_S1040_SCSI_FIFO); - pio_trigger(); - *(srb->virt_addr)++ = byte; - if (debug_enabled(DBG_PIO)) - printk(" %02x", byte); - d_left_counter--; - sg_subtract_one(srb); - } - if (srb->dcb->sync_period & WIDE_SYNC) { -#if 1 - /* Read the last byte ... */ - if (srb->total_xfer_length > 0) { - u8 byte = DC395x_read8(acb, TRM_S1040_SCSI_FIFO); - pio_trigger(); - *(srb->virt_addr)++ = byte; - srb->total_xfer_length--; + while (left_io) { + unsigned char *virt, *base = NULL; + unsigned long flags = 0; + size_t len = left_io; + + if (srb->cmd->use_sg) { + size_t offset = srb->request_length - left_io; + local_irq_save(flags); + /* Assumption: it's inside one page as it's at most 4 bytes and + I just assume it's on a 4-byte boundary */ + base = scsi_kmap_atomic_sg((struct scatterlist *)srb->cmd->request_buffer, + srb->sg_count, &offset, &len); + virt = base + offset; + } else { + virt = srb->cmd->request_buffer + srb->cmd->request_bufflen - left_io; + len = left_io; + } + left_io -= len; + + while (len) { + u8 byte; + byte = DC395x_read8(acb, TRM_S1040_SCSI_FIFO); + *virt++ = byte; + if (debug_enabled(DBG_PIO)) printk(" %02x", byte); + + d_left_counter--; + sg_subtract_one(srb); + + len--; + + fc = DC395x_read8(acb, TRM_S1040_SCSI_FIFOCNT); + + if (fc == 0x40) { + left_io = 0; + break; + } + } + + WARN_ON((fc != 0x40) == !d_left_counter); + + if (fc == 0x40 && (srb->dcb->sync_period & WIDE_SYNC)) { + /* Read the last byte ... */ + if (srb->total_xfer_length > 0) { + u8 byte = DC395x_read8(acb, TRM_S1040_SCSI_FIFO); + + *virt++ = byte; + srb->total_xfer_length--; + if (debug_enabled(DBG_PIO)) + printk(" %02x", byte); + } + + DC395x_write8(acb, TRM_S1040_SCSI_CONFIG2, 0); + } + + if (srb->cmd->use_sg) { + scsi_kunmap_atomic_sg(base); + local_irq_restore(flags); } -#endif - DC395x_write8(acb, TRM_S1040_SCSI_CONFIG2, 0); } /*printk(" %08x", *(u32*)(bus_to_virt (addr))); */ /*srb->total_xfer_length = 0; */ @@ -2509,22 +2504,43 @@ #if DC395x_LASTPIO SCMD_FIFO_IN); } else { /* write */ int ln = srb->total_xfer_length; + size_t left_io = srb->total_xfer_length; + if (srb->dcb->sync_period & WIDE_SYNC) DC395x_write8(acb, TRM_S1040_SCSI_CONFIG2, CFG2_WIDEFIFO); - dprintkdbg(DBG_PIO, - "data_io_transfer: PIO %i bytes from %p:", - srb->total_xfer_length, srb->virt_addr); - while (srb->total_xfer_length) { - if (debug_enabled(DBG_PIO)) - printk(" %02x", (unsigned char) *(srb->virt_addr)); + while (left_io) { + unsigned char *virt, *base = NULL; + unsigned long flags = 0; + size_t len = left_io; + + if (srb->cmd->use_sg) { + size_t offset = srb->request_length - left_io; + local_irq_save(flags); + /* Again, max 4 bytes */ + base = scsi_kmap_atomic_sg((struct scatterlist *)srb->cmd->request_buffer, + srb->sg_count, &offset, &len); + virt = base + offset; + } else { + virt = srb->cmd->request_buffer + srb->cmd->request_bufflen - left_io; + len = left_io; + } + left_io -= len; + + while (len--) { + if (debug_enabled(DBG_PIO)) + printk(" %02x", *virt); - pio_trigger(); - DC395x_write8(acb, TRM_S1040_SCSI_FIFO, - *(srb->virt_addr)++); + DC395x_write8(acb, TRM_S1040_SCSI_FIFO, *virt++); - sg_subtract_one(srb); + sg_subtract_one(srb); + } + + if (srb->cmd->use_sg) { + scsi_kunmap_atomic_sg(base); + local_irq_restore(flags); + } } if (srb->dcb->sync_period & WIDE_SYNC) { if (ln % 2) { @@ -3319,7 +3335,6 @@ static void pci_unmap_srb_sense(struct A srb->segment_x[DC395x_MAX_SG_LISTENTRY - 1].address; srb->segment_x[0].length = srb->segment_x[DC395x_MAX_SG_LISTENTRY - 1].length; - srb->virt_addr = srb->virt_addr_req; } @@ -3713,8 +3728,6 @@ static void request_sense(struct Adapter srb->xferred = srb->total_xfer_length; /* srb->segment_x : a one entry of S/G list table */ srb->total_xfer_length = sizeof(cmd->sense_buffer); - srb->virt_addr_req = srb->virt_addr; - srb->virt_addr = cmd->sense_buffer; srb->segment_x[0].length = sizeof(cmd->sense_buffer); /* Map sense buffer */ srb->segment_x[0].address = diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 2e9be83..944fc12 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -121,10 +121,9 @@ static int initialize_event_pool(struct pool->size = size; pool->next = 0; - pool->events = kmalloc(pool->size * sizeof(*pool->events), GFP_KERNEL); + pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL); if (!pool->events) return -ENOMEM; - memset(pool->events, 0x00, pool->size * sizeof(*pool->events)); pool->iu_storage = dma_alloc_coherent(hostdata->dev, diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 2068b66..d94038e 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -3,7 +3,8 @@ * * Copyright (C) 2004 Dmitry Yusupov * Copyright (C) 2004 Alex Aizman - * Copyright (C) 2005 Mike Christie + * Copyright (C) 2005 - 2006 Mike Christie + * Copyright (C) 2006 Red Hat, Inc. All rights reserved. * maintained by open-iscsi@googlegroups.com * * This program is free software; you can redistribute it and/or modify @@ -36,10 +37,6 @@ #include #include #include #include -#include -#include -#include -#include #include #include #include @@ -52,21 +49,14 @@ MODULE_DESCRIPTION("iSCSI/TCP data-path" MODULE_LICENSE("GPL"); MODULE_VERSION("0:4.445"); /* #define DEBUG_TCP */ -/* #define DEBUG_SCSI */ #define DEBUG_ASSERT #ifdef DEBUG_TCP -#define debug_tcp(fmt...) printk(KERN_DEBUG "tcp: " fmt) +#define debug_tcp(fmt...) printk(KERN_INFO "tcp: " fmt) #else #define debug_tcp(fmt...) #endif -#ifdef DEBUG_SCSI -#define debug_scsi(fmt...) printk(KERN_DEBUG "scsi: " fmt) -#else -#define debug_scsi(fmt...) -#endif - #ifndef DEBUG_ASSERT #ifdef BUG_ON #undef BUG_ON @@ -74,8 +64,6 @@ #endif #define BUG_ON(expr) #endif -#define INVALID_SN_DELTA 0xffff - static unsigned int iscsi_max_lun = 512; module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO); @@ -130,68 +118,39 @@ static inline void iscsi_hdr_digest(struct iscsi_conn *conn, struct iscsi_buf *buf, u8* crc) { - crypto_digest_digest(conn->tx_tfm, &buf->sg, 1, crc); - buf->sg.length += sizeof(uint32_t); -} + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; -static void -iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err) -{ - struct iscsi_session *session = conn->session; - unsigned long flags; - - spin_lock_irqsave(&session->lock, flags); - if (session->conn_cnt == 1 || session->leadconn == conn) - session->state = ISCSI_STATE_FAILED; - spin_unlock_irqrestore(&session->lock, flags); - set_bit(SUSPEND_BIT, &conn->suspend_tx); - set_bit(SUSPEND_BIT, &conn->suspend_rx); - iscsi_conn_error(conn->cls_conn, err); + crypto_digest_digest(tcp_conn->tx_tfm, &buf->sg, 1, crc); + buf->sg.length += sizeof(uint32_t); } static inline int -iscsi_check_assign_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr) +iscsi_hdr_extract(struct iscsi_tcp_conn *tcp_conn) { - uint32_t max_cmdsn = be32_to_cpu(hdr->max_cmdsn); - uint32_t exp_cmdsn = be32_to_cpu(hdr->exp_cmdsn); - - if (max_cmdsn < exp_cmdsn -1 && - max_cmdsn > exp_cmdsn - INVALID_SN_DELTA) - return ISCSI_ERR_MAX_CMDSN; - if (max_cmdsn > session->max_cmdsn || - max_cmdsn < session->max_cmdsn - INVALID_SN_DELTA) - session->max_cmdsn = max_cmdsn; - if (exp_cmdsn > session->exp_cmdsn || - exp_cmdsn < session->exp_cmdsn - INVALID_SN_DELTA) - session->exp_cmdsn = exp_cmdsn; + struct sk_buff *skb = tcp_conn->in.skb; - return 0; -} + tcp_conn->in.zero_copy_hdr = 0; -static inline int -iscsi_hdr_extract(struct iscsi_conn *conn) -{ - struct sk_buff *skb = conn->in.skb; - - if (conn->in.copy >= conn->hdr_size && - conn->in_progress == IN_PROGRESS_WAIT_HEADER) { + if (tcp_conn->in.copy >= tcp_conn->hdr_size && + tcp_conn->in_progress == IN_PROGRESS_WAIT_HEADER) { /* * Zero-copy PDU Header: using connection context * to store header pointer. */ if (skb_shinfo(skb)->frag_list == NULL && - !skb_shinfo(skb)->nr_frags) - conn->in.hdr = (struct iscsi_hdr *) - ((char*)skb->data + conn->in.offset); - else { + !skb_shinfo(skb)->nr_frags) { + tcp_conn->in.hdr = (struct iscsi_hdr *) + ((char*)skb->data + tcp_conn->in.offset); + tcp_conn->in.zero_copy_hdr = 1; + } else { /* ignoring return code since we checked * in.copy before */ - skb_copy_bits(skb, conn->in.offset, - &conn->hdr, conn->hdr_size); - conn->in.hdr = &conn->hdr; + skb_copy_bits(skb, tcp_conn->in.offset, + &tcp_conn->hdr, tcp_conn->hdr_size); + tcp_conn->in.hdr = &tcp_conn->hdr; } - conn->in.offset += conn->hdr_size; - conn->in.copy -= conn->hdr_size; + tcp_conn->in.offset += tcp_conn->hdr_size; + tcp_conn->in.copy -= tcp_conn->hdr_size; } else { int hdr_remains; int copylen; @@ -201,118 +160,61 @@ iscsi_hdr_extract(struct iscsi_conn *con * copying it... This'll happen quite rarely. */ - if (conn->in_progress == IN_PROGRESS_WAIT_HEADER) - conn->in.hdr_offset = 0; + if (tcp_conn->in_progress == IN_PROGRESS_WAIT_HEADER) + tcp_conn->in.hdr_offset = 0; - hdr_remains = conn->hdr_size - conn->in.hdr_offset; + hdr_remains = tcp_conn->hdr_size - tcp_conn->in.hdr_offset; BUG_ON(hdr_remains <= 0); - copylen = min(conn->in.copy, hdr_remains); - skb_copy_bits(skb, conn->in.offset, - (char*)&conn->hdr + conn->in.hdr_offset, copylen); + copylen = min(tcp_conn->in.copy, hdr_remains); + skb_copy_bits(skb, tcp_conn->in.offset, + (char*)&tcp_conn->hdr + tcp_conn->in.hdr_offset, + copylen); debug_tcp("PDU gather offset %d bytes %d in.offset %d " - "in.copy %d\n", conn->in.hdr_offset, copylen, - conn->in.offset, conn->in.copy); + "in.copy %d\n", tcp_conn->in.hdr_offset, copylen, + tcp_conn->in.offset, tcp_conn->in.copy); - conn->in.offset += copylen; - conn->in.copy -= copylen; + tcp_conn->in.offset += copylen; + tcp_conn->in.copy -= copylen; if (copylen < hdr_remains) { - conn->in_progress = IN_PROGRESS_HEADER_GATHER; - conn->in.hdr_offset += copylen; + tcp_conn->in_progress = IN_PROGRESS_HEADER_GATHER; + tcp_conn->in.hdr_offset += copylen; return -EAGAIN; } - conn->in.hdr = &conn->hdr; - conn->discontiguous_hdr_cnt++; - conn->in_progress = IN_PROGRESS_WAIT_HEADER; + tcp_conn->in.hdr = &tcp_conn->hdr; + tcp_conn->discontiguous_hdr_cnt++; + tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER; } return 0; } -static inline void -iscsi_ctask_cleanup(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +/* + * must be called with session lock + */ +static void +__iscsi_ctask_cleanup(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { - struct scsi_cmnd *sc = ctask->sc; - struct iscsi_session *session = conn->session; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct scsi_cmnd *sc; - spin_lock(&session->lock); - if (unlikely(!sc)) { - spin_unlock(&session->lock); + sc = ctask->sc; + if (unlikely(!sc)) return; - } + if (sc->sc_data_direction == DMA_TO_DEVICE) { struct iscsi_data_task *dtask, *n; + /* WRITE: cleanup Data-Out's if any */ - list_for_each_entry_safe(dtask, n, &ctask->dataqueue, item) { + list_for_each_entry_safe(dtask, n, &tcp_ctask->dataqueue, + item) { list_del(&dtask->item); - mempool_free(dtask, ctask->datapool); + mempool_free(dtask, tcp_ctask->datapool); } } - ctask->xmstate = XMSTATE_IDLE; - ctask->r2t = NULL; - ctask->sc = NULL; - __kfifo_put(session->cmdpool.queue, (void*)&ctask, sizeof(void*)); - spin_unlock(&session->lock); -} - -/** - * iscsi_cmd_rsp - SCSI Command Response processing - * @conn: iscsi connection - * @ctask: scsi command task - **/ -static int -iscsi_cmd_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) -{ - int rc; - struct iscsi_cmd_rsp *rhdr = (struct iscsi_cmd_rsp *)conn->in.hdr; - struct iscsi_session *session = conn->session; - struct scsi_cmnd *sc = ctask->sc; - - rc = iscsi_check_assign_cmdsn(session, (struct iscsi_nopin*)rhdr); - if (rc) { - sc->result = (DID_ERROR << 16); - goto out; - } - - conn->exp_statsn = be32_to_cpu(rhdr->statsn) + 1; - - sc->result = (DID_OK << 16) | rhdr->cmd_status; - - if (rhdr->response != ISCSI_STATUS_CMD_COMPLETED) { - sc->result = (DID_ERROR << 16); - goto out; - } - - if (rhdr->cmd_status == SAM_STAT_CHECK_CONDITION && conn->senselen) { - int sensecopy = min(conn->senselen, SCSI_SENSE_BUFFERSIZE); - - memcpy(sc->sense_buffer, conn->data + 2, sensecopy); - debug_scsi("copied %d bytes of sense\n", sensecopy); - } - - if (sc->sc_data_direction == DMA_TO_DEVICE) - goto out; - - if (rhdr->flags & ISCSI_FLAG_CMD_UNDERFLOW) { - int res_count = be32_to_cpu(rhdr->residual_count); - - if (res_count > 0 && res_count <= sc->request_bufflen) - sc->resid = res_count; - else - sc->result = (DID_BAD_TARGET << 16) | rhdr->cmd_status; - } else if (rhdr->flags & ISCSI_FLAG_CMD_BIDI_UNDERFLOW) - sc->result = (DID_BAD_TARGET << 16) | rhdr->cmd_status; - else if (rhdr->flags & ISCSI_FLAG_CMD_OVERFLOW) - sc->resid = be32_to_cpu(rhdr->residual_count); - -out: - debug_scsi("done [sc %lx res %d itt 0x%x]\n", - (long)sc, sc->result, ctask->itt); - conn->scsirsp_pdus_cnt++; - iscsi_ctask_cleanup(conn, ctask); - sc->scsi_done(sc); - return rc; + tcp_ctask->xmstate = XMSTATE_IDLE; + tcp_ctask->r2t = NULL; } /** @@ -324,7 +226,9 @@ static int iscsi_data_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { int rc; - struct iscsi_data_rsp *rhdr = (struct iscsi_data_rsp *)conn->in.hdr; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_data_rsp *rhdr = (struct iscsi_data_rsp *)tcp_conn->in.hdr; struct iscsi_session *session = conn->session; int datasn = be32_to_cpu(rhdr->datasn); @@ -334,9 +238,9 @@ iscsi_data_rsp(struct iscsi_conn *conn, /* * setup Data-In byte counter (gets decremented..) */ - ctask->data_count = conn->in.datalen; + ctask->data_count = tcp_conn->in.datalen; - if (conn->in.datalen == 0) + if (tcp_conn->in.datalen == 0) return 0; if (ctask->datasn != datasn) @@ -344,8 +248,8 @@ iscsi_data_rsp(struct iscsi_conn *conn, ctask->datasn++; - ctask->data_offset = be32_to_cpu(rhdr->offset); - if (ctask->data_offset + conn->in.datalen > ctask->total_length) + tcp_ctask->data_offset = be32_to_cpu(rhdr->offset); + if (tcp_ctask->data_offset + tcp_conn->in.datalen > ctask->total_length) return ISCSI_ERR_DATA_OFFSET; if (rhdr->flags & ISCSI_FLAG_DATA_STATUS) { @@ -392,17 +296,19 @@ iscsi_solicit_data_init(struct iscsi_con struct iscsi_data *hdr; struct iscsi_data_task *dtask; struct scsi_cmnd *sc = ctask->sc; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; - dtask = mempool_alloc(ctask->datapool, GFP_ATOMIC); + dtask = mempool_alloc(tcp_ctask->datapool, GFP_ATOMIC); BUG_ON(!dtask); + INIT_LIST_HEAD(&dtask->item); hdr = &dtask->hdr; memset(hdr, 0, sizeof(struct iscsi_data)); hdr->ttt = r2t->ttt; hdr->datasn = cpu_to_be32(r2t->solicit_datasn); r2t->solicit_datasn++; hdr->opcode = ISCSI_OP_SCSI_DATA_OUT; - memcpy(hdr->lun, ctask->hdr.lun, sizeof(hdr->lun)); - hdr->itt = ctask->hdr.itt; + memcpy(hdr->lun, ctask->hdr->lun, sizeof(hdr->lun)); + hdr->itt = ctask->hdr->itt; hdr->exp_statsn = r2t->exp_statsn; hdr->offset = cpu_to_be32(r2t->data_offset); if (r2t->data_length > conn->max_xmit_dlength) { @@ -451,11 +357,11 @@ iscsi_solicit_data_init(struct iscsi_con } BUG_ON(r2t->sg == NULL); } else - iscsi_buf_init_iov(&ctask->sendbuf, + iscsi_buf_init_iov(&tcp_ctask->sendbuf, (char*)sc->request_buffer + r2t->data_offset, r2t->data_count); - list_add(&dtask->item, &ctask->dataqueue); + list_add(&dtask->item, &tcp_ctask->dataqueue); } /** @@ -468,17 +374,16 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, s { struct iscsi_r2t_info *r2t; struct iscsi_session *session = conn->session; - struct iscsi_r2t_rsp *rhdr = (struct iscsi_r2t_rsp *)conn->in.hdr; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct iscsi_r2t_rsp *rhdr = (struct iscsi_r2t_rsp *)tcp_conn->in.hdr; int r2tsn = be32_to_cpu(rhdr->r2tsn); int rc; - if (conn->in.ahslen) - return ISCSI_ERR_AHSLEN; - - if (conn->in.datalen) + if (tcp_conn->in.datalen) return ISCSI_ERR_DATALEN; - if (ctask->exp_r2tsn && ctask->exp_r2tsn != r2tsn) + if (tcp_ctask->exp_r2tsn && tcp_ctask->exp_r2tsn != r2tsn) return ISCSI_ERR_R2TSN; rc = iscsi_check_assign_cmdsn(session, (struct iscsi_nopin*)rhdr); @@ -496,7 +401,7 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, s spin_unlock(&session->lock); return 0; } - rc = __kfifo_get(ctask->r2tpool.queue, (void*)&r2t, sizeof(void*)); + rc = __kfifo_get(tcp_ctask->r2tpool.queue, (void*)&r2t, sizeof(void*)); BUG_ON(!rc); r2t->exp_statsn = rhdr->statsn; @@ -518,10 +423,10 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, s iscsi_solicit_data_init(conn, ctask, r2t); - ctask->exp_r2tsn = r2tsn + 1; - ctask->xmstate |= XMSTATE_SOL_HDR; - __kfifo_put(ctask->r2tqueue, (void*)&r2t, sizeof(void*)); - __kfifo_put(conn->writequeue, (void*)&ctask, sizeof(void*)); + tcp_ctask->exp_r2tsn = r2tsn + 1; + tcp_ctask->xmstate |= XMSTATE_SOL_HDR; + __kfifo_put(tcp_ctask->r2tqueue, (void*)&r2t, sizeof(void*)); + __kfifo_put(conn->xmitqueue, (void*)&ctask, sizeof(void*)); scsi_queue_work(session->host, &conn->xmitwork); conn->r2t_pdus_cnt++; @@ -531,258 +436,136 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, s } static int -iscsi_hdr_recv(struct iscsi_conn *conn) +iscsi_tcp_hdr_recv(struct iscsi_conn *conn) { - int rc = 0; + int rc = 0, opcode, ahslen; struct iscsi_hdr *hdr; - struct iscsi_cmd_task *ctask; struct iscsi_session *session = conn->session; - uint32_t cdgst, rdgst = 0; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + uint32_t cdgst, rdgst = 0, itt; - hdr = conn->in.hdr; + hdr = tcp_conn->in.hdr; /* verify PDU length */ - conn->in.datalen = ntoh24(hdr->dlength); - if (conn->in.datalen > conn->max_recv_dlength) { + tcp_conn->in.datalen = ntoh24(hdr->dlength); + if (tcp_conn->in.datalen > conn->max_recv_dlength) { printk(KERN_ERR "iscsi_tcp: datalen %d > %d\n", - conn->in.datalen, conn->max_recv_dlength); + tcp_conn->in.datalen, conn->max_recv_dlength); return ISCSI_ERR_DATALEN; } - conn->data_copied = 0; + tcp_conn->data_copied = 0; /* read AHS */ - conn->in.ahslen = hdr->hlength * 4; - conn->in.offset += conn->in.ahslen; - conn->in.copy -= conn->in.ahslen; - if (conn->in.copy < 0) { + ahslen = hdr->hlength << 2; + tcp_conn->in.offset += ahslen; + tcp_conn->in.copy -= ahslen; + if (tcp_conn->in.copy < 0) { printk(KERN_ERR "iscsi_tcp: can't handle AHS with length " - "%d bytes\n", conn->in.ahslen); + "%d bytes\n", ahslen); return ISCSI_ERR_AHSLEN; } /* calculate read padding */ - conn->in.padding = conn->in.datalen & (ISCSI_PAD_LEN-1); - if (conn->in.padding) { - conn->in.padding = ISCSI_PAD_LEN - conn->in.padding; - debug_scsi("read padding %d bytes\n", conn->in.padding); + tcp_conn->in.padding = tcp_conn->in.datalen & (ISCSI_PAD_LEN-1); + if (tcp_conn->in.padding) { + tcp_conn->in.padding = ISCSI_PAD_LEN - tcp_conn->in.padding; + debug_scsi("read padding %d bytes\n", tcp_conn->in.padding); } if (conn->hdrdgst_en) { struct scatterlist sg; sg_init_one(&sg, (u8 *)hdr, - sizeof(struct iscsi_hdr) + conn->in.ahslen); - crypto_digest_digest(conn->rx_tfm, &sg, 1, (u8 *)&cdgst); + sizeof(struct iscsi_hdr) + ahslen); + crypto_digest_digest(tcp_conn->rx_tfm, &sg, 1, (u8 *)&cdgst); rdgst = *(uint32_t*)((char*)hdr + sizeof(struct iscsi_hdr) + - conn->in.ahslen); + ahslen); if (cdgst != rdgst) { - printk(KERN_ERR "iscsi_tcp: itt %x: hdrdgst error " - "recv 0x%x calc 0x%x\n", conn->in.itt, rdgst, - cdgst); + printk(KERN_ERR "iscsi_tcp: hdrdgst error " + "recv 0x%x calc 0x%x\n", rdgst, cdgst); return ISCSI_ERR_HDR_DGST; } } - /* save opcode for later */ - conn->in.opcode = hdr->opcode & ISCSI_OPCODE_MASK; - + opcode = hdr->opcode & ISCSI_OPCODE_MASK; /* verify itt (itt encoding: age+cid+itt) */ - if (hdr->itt != cpu_to_be32(ISCSI_RESERVED_TAG)) { - if ((hdr->itt & AGE_MASK) != - (session->age << AGE_SHIFT)) { - printk(KERN_ERR "iscsi_tcp: received itt %x expected " - "session age (%x)\n", hdr->itt, - session->age & AGE_MASK); - return ISCSI_ERR_BAD_ITT; - } - - if ((hdr->itt & CID_MASK) != (conn->id << CID_SHIFT)) { - printk(KERN_ERR "iscsi_tcp: received itt %x, expected " - "CID (%x)\n", hdr->itt, conn->id); - return ISCSI_ERR_BAD_ITT; - } - conn->in.itt = hdr->itt & ITT_MASK; - } else - conn->in.itt = hdr->itt; + rc = iscsi_verify_itt(conn, hdr, &itt); + if (rc == ISCSI_ERR_NO_SCSI_CMD) { + tcp_conn->in.datalen = 0; /* force drop */ + return 0; + } else if (rc) + return rc; debug_tcp("opcode 0x%x offset %d copy %d ahslen %d datalen %d\n", - hdr->opcode, conn->in.offset, conn->in.copy, - conn->in.ahslen, conn->in.datalen); - - if (conn->in.itt < session->cmds_max) { - ctask = (struct iscsi_cmd_task *)session->cmds[conn->in.itt]; - - if (!ctask->sc) { - printk(KERN_INFO "iscsi_tcp: dropping ctask with " - "itt 0x%x\n", ctask->itt); - conn->in.datalen = 0; /* force drop */ - return 0; - } - - if (ctask->sc->SCp.phase != session->age) { - printk(KERN_ERR "iscsi_tcp: ctask's session age %d, " - "expected %d\n", ctask->sc->SCp.phase, - session->age); - return ISCSI_ERR_SESSION_FAILED; - } - - conn->in.ctask = ctask; - - debug_scsi("rsp [op 0x%x cid %d sc %lx itt 0x%x len %d]\n", - hdr->opcode, conn->id, (long)ctask->sc, - ctask->itt, conn->in.datalen); + opcode, tcp_conn->in.offset, tcp_conn->in.copy, + ahslen, tcp_conn->in.datalen); - switch(conn->in.opcode) { - case ISCSI_OP_SCSI_CMD_RSP: - BUG_ON((void*)ctask != ctask->sc->SCp.ptr); - if (!conn->in.datalen) - rc = iscsi_cmd_rsp(conn, ctask); - else - /* - * got sense or response data; copying PDU - * Header to the connection's header - * placeholder - */ - memcpy(&conn->hdr, hdr, - sizeof(struct iscsi_hdr)); - break; - case ISCSI_OP_SCSI_DATA_IN: - BUG_ON((void*)ctask != ctask->sc->SCp.ptr); - /* save flags for non-exceptional status */ - conn->in.flags = hdr->flags; - /* save cmd_status for sense data */ - conn->in.cmd_status = - ((struct iscsi_data_rsp*)hdr)->cmd_status; - rc = iscsi_data_rsp(conn, ctask); - break; - case ISCSI_OP_R2T: - BUG_ON((void*)ctask != ctask->sc->SCp.ptr); - if (ctask->sc->sc_data_direction == DMA_TO_DEVICE) - rc = iscsi_r2t_rsp(conn, ctask); - else - rc = ISCSI_ERR_PROTO; - break; - default: - rc = ISCSI_ERR_BAD_OPCODE; - break; - } - } else if (conn->in.itt >= ISCSI_MGMT_ITT_OFFSET && - conn->in.itt < ISCSI_MGMT_ITT_OFFSET + - session->mgmtpool_max) { - struct iscsi_mgmt_task *mtask = (struct iscsi_mgmt_task *) - session->mgmt_cmds[conn->in.itt - - ISCSI_MGMT_ITT_OFFSET]; - - debug_scsi("immrsp [op 0x%x cid %d itt 0x%x len %d]\n", - conn->in.opcode, conn->id, mtask->itt, - conn->in.datalen); - - switch(conn->in.opcode) { - case ISCSI_OP_LOGIN_RSP: - case ISCSI_OP_TEXT_RSP: - case ISCSI_OP_LOGOUT_RSP: - rc = iscsi_check_assign_cmdsn(session, - (struct iscsi_nopin*)hdr); - if (rc) - break; - - if (!conn->in.datalen) { - rc = iscsi_recv_pdu(conn->cls_conn, hdr, - NULL, 0); - if (conn->login_mtask != mtask) { - spin_lock(&session->lock); - __kfifo_put(session->mgmtpool.queue, - (void*)&mtask, sizeof(void*)); - spin_unlock(&session->lock); - } - } - break; - case ISCSI_OP_SCSI_TMFUNC_RSP: - rc = iscsi_check_assign_cmdsn(session, - (struct iscsi_nopin*)hdr); - if (rc) - break; - - if (conn->in.datalen || conn->in.ahslen) { - rc = ISCSI_ERR_PROTO; - break; - } - conn->tmfrsp_pdus_cnt++; - spin_lock(&session->lock); - if (conn->tmabort_state == TMABORT_INITIAL) { - __kfifo_put(session->mgmtpool.queue, - (void*)&mtask, sizeof(void*)); - conn->tmabort_state = - ((struct iscsi_tm_rsp *)hdr)-> - response == ISCSI_TMF_RSP_COMPLETE ? - TMABORT_SUCCESS:TMABORT_FAILED; - /* unblock eh_abort() */ - wake_up(&conn->ehwait); - } - spin_unlock(&session->lock); - break; - case ISCSI_OP_NOOP_IN: - if (hdr->ttt != ISCSI_RESERVED_TAG) { - rc = ISCSI_ERR_PROTO; - break; - } - rc = iscsi_check_assign_cmdsn(session, - (struct iscsi_nopin*)hdr); - if (rc) - break; - conn->exp_statsn = be32_to_cpu(hdr->statsn) + 1; - - if (!conn->in.datalen) { - struct iscsi_mgmt_task *mtask; - - rc = iscsi_recv_pdu(conn->cls_conn, hdr, - NULL, 0); - mtask = (struct iscsi_mgmt_task *) - session->mgmt_cmds[conn->in.itt - - ISCSI_MGMT_ITT_OFFSET]; - if (conn->login_mtask != mtask) { - spin_lock(&session->lock); - __kfifo_put(session->mgmtpool.queue, - (void*)&mtask, sizeof(void*)); - spin_unlock(&session->lock); - } - } - break; - default: - rc = ISCSI_ERR_BAD_OPCODE; - break; - } - } else if (conn->in.itt == ISCSI_RESERVED_TAG) { - switch(conn->in.opcode) { - case ISCSI_OP_NOOP_IN: - if (!conn->in.datalen) { - rc = iscsi_check_assign_cmdsn(session, - (struct iscsi_nopin*)hdr); - if (!rc && hdr->ttt != ISCSI_RESERVED_TAG) - rc = iscsi_recv_pdu(conn->cls_conn, - hdr, NULL, 0); - } else - rc = ISCSI_ERR_PROTO; - break; - case ISCSI_OP_REJECT: - /* we need sth like iscsi_reject_rsp()*/ - case ISCSI_OP_ASYNC_EVENT: - /* we need sth like iscsi_async_event_rsp() */ - rc = ISCSI_ERR_BAD_OPCODE; - break; - default: - rc = ISCSI_ERR_BAD_OPCODE; - break; - } - } else - rc = ISCSI_ERR_BAD_ITT; + switch(opcode) { + case ISCSI_OP_SCSI_DATA_IN: + tcp_conn->in.ctask = session->cmds[itt]; + rc = iscsi_data_rsp(conn, tcp_conn->in.ctask); + /* fall through */ + case ISCSI_OP_SCSI_CMD_RSP: + tcp_conn->in.ctask = session->cmds[itt]; + if (tcp_conn->in.datalen) + goto copy_hdr; + + spin_lock(&session->lock); + __iscsi_ctask_cleanup(conn, tcp_conn->in.ctask); + rc = __iscsi_complete_pdu(conn, hdr, NULL, 0); + spin_unlock(&session->lock); + break; + case ISCSI_OP_R2T: + tcp_conn->in.ctask = session->cmds[itt]; + if (ahslen) + rc = ISCSI_ERR_AHSLEN; + else if (tcp_conn->in.ctask->sc->sc_data_direction == + DMA_TO_DEVICE) + rc = iscsi_r2t_rsp(conn, tcp_conn->in.ctask); + else + rc = ISCSI_ERR_PROTO; + break; + case ISCSI_OP_LOGIN_RSP: + case ISCSI_OP_TEXT_RSP: + case ISCSI_OP_LOGOUT_RSP: + case ISCSI_OP_NOOP_IN: + case ISCSI_OP_REJECT: + case ISCSI_OP_ASYNC_EVENT: + if (tcp_conn->in.datalen) + goto copy_hdr; + /* fall through */ + case ISCSI_OP_SCSI_TMFUNC_RSP: + rc = iscsi_complete_pdu(conn, hdr, NULL, 0); + break; + default: + rc = ISCSI_ERR_BAD_OPCODE; + break; + } return rc; + +copy_hdr: + /* + * if we did zero copy for the header but we will need multiple + * skbs to complete the command then we have to copy the header + * for later use + */ + if (tcp_conn->in.zero_copy_hdr && tcp_conn->in.copy < + (tcp_conn->in.datalen + tcp_conn->in.padding + + (conn->datadgst_en ? 4 : 0))) { + debug_tcp("Copying header for later use. in.copy %d in.datalen" + " %d\n", tcp_conn->in.copy, tcp_conn->in.datalen); + memcpy(&tcp_conn->hdr, tcp_conn->in.hdr, + sizeof(struct iscsi_hdr)); + tcp_conn->in.hdr = &tcp_conn->hdr; + tcp_conn->in.zero_copy_hdr = 0; + } + return 0; } /** * iscsi_ctask_copy - copy skb bits to the destanation cmd task - * @conn: iscsi connection + * @conn: iscsi tcp connection * @ctask: scsi command task * @buf: buffer to copy to * @buf_size: size of buffer @@ -804,110 +587,113 @@ iscsi_hdr_recv(struct iscsi_conn *conn) * buf_left left to copy from in progress buffer **/ static inline int -iscsi_ctask_copy(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask, +iscsi_ctask_copy(struct iscsi_tcp_conn *tcp_conn, struct iscsi_cmd_task *ctask, void *buf, int buf_size, int offset) { - int buf_left = buf_size - (conn->data_copied + offset); - int size = min(conn->in.copy, buf_left); + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + int buf_left = buf_size - (tcp_conn->data_copied + offset); + int size = min(tcp_conn->in.copy, buf_left); int rc; size = min(size, ctask->data_count); debug_tcp("ctask_copy %d bytes at offset %d copied %d\n", - size, conn->in.offset, conn->in.copied); + size, tcp_conn->in.offset, tcp_conn->in.copied); BUG_ON(size <= 0); - BUG_ON(ctask->sent + size > ctask->total_length); + BUG_ON(tcp_ctask->sent + size > ctask->total_length); - rc = skb_copy_bits(conn->in.skb, conn->in.offset, - (char*)buf + (offset + conn->data_copied), size); + rc = skb_copy_bits(tcp_conn->in.skb, tcp_conn->in.offset, + (char*)buf + (offset + tcp_conn->data_copied), size); /* must fit into skb->len */ BUG_ON(rc); - conn->in.offset += size; - conn->in.copy -= size; - conn->in.copied += size; - conn->data_copied += size; - ctask->sent += size; + tcp_conn->in.offset += size; + tcp_conn->in.copy -= size; + tcp_conn->in.copied += size; + tcp_conn->data_copied += size; + tcp_ctask->sent += size; ctask->data_count -= size; - BUG_ON(conn->in.copy < 0); + BUG_ON(tcp_conn->in.copy < 0); BUG_ON(ctask->data_count < 0); - if (buf_size != (conn->data_copied + offset)) { + if (buf_size != (tcp_conn->data_copied + offset)) { if (!ctask->data_count) { - BUG_ON(buf_size - conn->data_copied < 0); + BUG_ON(buf_size - tcp_conn->data_copied < 0); /* done with this PDU */ - return buf_size - conn->data_copied; + return buf_size - tcp_conn->data_copied; } return -EAGAIN; } /* done with this buffer or with both - PDU and buffer */ - conn->data_copied = 0; + tcp_conn->data_copied = 0; return 0; } /** * iscsi_tcp_copy - copy skb bits to the destanation buffer - * @conn: iscsi connection - * @buf: buffer to copy to - * @buf_size: number of bytes to copy + * @conn: iscsi tcp connection * * Notes: * The function calls skb_copy_bits() and updates per-connection * byte counters. **/ static inline int -iscsi_tcp_copy(struct iscsi_conn *conn, void *buf, int buf_size) +iscsi_tcp_copy(struct iscsi_tcp_conn *tcp_conn) { - int buf_left = buf_size - conn->data_copied; - int size = min(conn->in.copy, buf_left); + void *buf = tcp_conn->data; + int buf_size = tcp_conn->in.datalen; + int buf_left = buf_size - tcp_conn->data_copied; + int size = min(tcp_conn->in.copy, buf_left); int rc; debug_tcp("tcp_copy %d bytes at offset %d copied %d\n", - size, conn->in.offset, conn->data_copied); + size, tcp_conn->in.offset, tcp_conn->data_copied); BUG_ON(size <= 0); - rc = skb_copy_bits(conn->in.skb, conn->in.offset, - (char*)buf + conn->data_copied, size); + rc = skb_copy_bits(tcp_conn->in.skb, tcp_conn->in.offset, + (char*)buf + tcp_conn->data_copied, size); BUG_ON(rc); - conn->in.offset += size; - conn->in.copy -= size; - conn->in.copied += size; - conn->data_copied += size; + tcp_conn->in.offset += size; + tcp_conn->in.copy -= size; + tcp_conn->in.copied += size; + tcp_conn->data_copied += size; - if (buf_size != conn->data_copied) + if (buf_size != tcp_conn->data_copied) return -EAGAIN; return 0; } static inline void -partial_sg_digest_update(struct iscsi_conn *conn, struct scatterlist *sg, - int offset, int length) +partial_sg_digest_update(struct iscsi_tcp_conn *tcp_conn, + struct scatterlist *sg, int offset, int length) { struct scatterlist temp; memcpy(&temp, sg, sizeof(struct scatterlist)); temp.offset = offset; temp.length = length; - crypto_digest_update(conn->data_rx_tfm, &temp, 1); + crypto_digest_update(tcp_conn->data_rx_tfm, &temp, 1); } static void -iscsi_recv_digest_update(struct iscsi_conn *conn, char* buf, int len) +iscsi_recv_digest_update(struct iscsi_tcp_conn *tcp_conn, char* buf, int len) { struct scatterlist tmp; sg_init_one(&tmp, buf, len); - crypto_digest_update(conn->data_rx_tfm, &tmp, 1); + crypto_digest_update(tcp_conn->data_rx_tfm, &tmp, 1); } static int iscsi_scsi_data_in(struct iscsi_conn *conn) { - struct iscsi_cmd_task *ctask = conn->in.ctask; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct iscsi_cmd_task *ctask = tcp_conn->in.ctask; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; struct scsi_cmnd *sc = ctask->sc; struct scatterlist *sg; int i, offset, rc = 0; @@ -919,31 +705,33 @@ static int iscsi_scsi_data_in(struct isc */ if (!sc->use_sg) { i = ctask->data_count; - rc = iscsi_ctask_copy(conn, ctask, sc->request_buffer, - sc->request_bufflen, ctask->data_offset); + rc = iscsi_ctask_copy(tcp_conn, ctask, sc->request_buffer, + sc->request_bufflen, + tcp_ctask->data_offset); if (rc == -EAGAIN) return rc; if (conn->datadgst_en) - iscsi_recv_digest_update(conn, sc->request_buffer, i); + iscsi_recv_digest_update(tcp_conn, sc->request_buffer, + i); rc = 0; goto done; } - offset = ctask->data_offset; + offset = tcp_ctask->data_offset; sg = sc->request_buffer; - if (ctask->data_offset) - for (i = 0; i < ctask->sg_count; i++) + if (tcp_ctask->data_offset) + for (i = 0; i < tcp_ctask->sg_count; i++) offset -= sg[i].length; /* we've passed through partial sg*/ if (offset < 0) offset = 0; - for (i = ctask->sg_count; i < sc->use_sg; i++) { + for (i = tcp_ctask->sg_count; i < sc->use_sg; i++) { char *dest; dest = kmap_atomic(sg[i].page, KM_SOFTIRQ0); - rc = iscsi_ctask_copy(conn, ctask, dest + sg[i].offset, + rc = iscsi_ctask_copy(tcp_conn, ctask, dest + sg[i].offset, sg[i].length, offset); kunmap_atomic(dest, KM_SOFTIRQ0); if (rc == -EAGAIN) @@ -952,15 +740,17 @@ static int iscsi_scsi_data_in(struct isc if (!rc) { if (conn->datadgst_en) { if (!offset) - crypto_digest_update(conn->data_rx_tfm, - &sg[i], 1); + crypto_digest_update( + tcp_conn->data_rx_tfm, + &sg[i], 1); else - partial_sg_digest_update(conn, &sg[i], + partial_sg_digest_update(tcp_conn, + &sg[i], sg[i].offset + offset, sg[i].length - offset); } offset = 0; - ctask->sg_count++; + tcp_ctask->sg_count++; } if (!ctask->data_count) { @@ -968,25 +758,26 @@ static int iscsi_scsi_data_in(struct isc /* * data-in is complete, but buffer not... */ - partial_sg_digest_update(conn, &sg[i], + partial_sg_digest_update(tcp_conn, &sg[i], sg[i].offset, sg[i].length-rc); rc = 0; break; } - if (!conn->in.copy) + if (!tcp_conn->in.copy) return -EAGAIN; } BUG_ON(ctask->data_count); done: /* check for non-exceptional status */ - if (conn->in.flags & ISCSI_FLAG_DATA_STATUS) { + if (tcp_conn->in.hdr->flags & ISCSI_FLAG_DATA_STATUS) { debug_scsi("done [sc %lx res %d itt 0x%x]\n", (long)sc, sc->result, ctask->itt); - conn->scsirsp_pdus_cnt++; - iscsi_ctask_cleanup(conn, ctask); - sc->scsi_done(sc); + spin_lock(&conn->session->lock); + __iscsi_ctask_cleanup(conn, ctask); + __iscsi_complete_pdu(conn, tcp_conn->in.hdr, NULL, 0); + spin_unlock(&conn->session->lock); } return rc; @@ -995,71 +786,38 @@ done: static int iscsi_data_recv(struct iscsi_conn *conn) { - struct iscsi_session *session = conn->session; - int rc = 0; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + int rc = 0, opcode; - switch(conn->in.opcode) { + opcode = tcp_conn->in.hdr->opcode & ISCSI_OPCODE_MASK; + switch (opcode) { case ISCSI_OP_SCSI_DATA_IN: rc = iscsi_scsi_data_in(conn); break; - case ISCSI_OP_SCSI_CMD_RSP: { - /* - * SCSI Sense Data: - * copying the entire Data Segment. - */ - if (iscsi_tcp_copy(conn, conn->data, conn->in.datalen)) { - rc = -EAGAIN; - goto exit; - } - - /* - * check for sense - */ - conn->in.hdr = &conn->hdr; - conn->senselen = (conn->data[0] << 8) | conn->data[1]; - rc = iscsi_cmd_rsp(conn, conn->in.ctask); - if (!rc && conn->datadgst_en) - iscsi_recv_digest_update(conn, conn->data, - conn->in.datalen); - } - break; + case ISCSI_OP_SCSI_CMD_RSP: + spin_lock(&conn->session->lock); + __iscsi_ctask_cleanup(conn, tcp_conn->in.ctask); + spin_unlock(&conn->session->lock); case ISCSI_OP_TEXT_RSP: case ISCSI_OP_LOGIN_RSP: - case ISCSI_OP_NOOP_IN: { - struct iscsi_mgmt_task *mtask = NULL; - - if (conn->in.itt != ISCSI_RESERVED_TAG) - mtask = (struct iscsi_mgmt_task *) - session->mgmt_cmds[conn->in.itt - - ISCSI_MGMT_ITT_OFFSET]; - + case ISCSI_OP_NOOP_IN: + case ISCSI_OP_ASYNC_EVENT: + case ISCSI_OP_REJECT: /* * Collect data segment to the connection's data * placeholder */ - if (iscsi_tcp_copy(conn, conn->data, conn->in.datalen)) { + if (iscsi_tcp_copy(tcp_conn)) { rc = -EAGAIN; goto exit; } - rc = iscsi_recv_pdu(conn->cls_conn, conn->in.hdr, - conn->data, conn->in.datalen); - - if (!rc && conn->datadgst_en && - conn->in.opcode != ISCSI_OP_LOGIN_RSP) - iscsi_recv_digest_update(conn, conn->data, - conn->in.datalen); - - if (mtask && conn->login_mtask != mtask) { - spin_lock(&session->lock); - __kfifo_put(session->mgmtpool.queue, (void*)&mtask, - sizeof(void*)); - spin_unlock(&session->lock); - } - } - break; - case ISCSI_OP_ASYNC_EVENT: - case ISCSI_OP_REJECT: + rc = iscsi_complete_pdu(conn, tcp_conn->in.hdr, tcp_conn->data, + tcp_conn->in.datalen); + if (!rc && conn->datadgst_en && opcode != ISCSI_OP_LOGIN_RSP) + iscsi_recv_digest_update(tcp_conn, tcp_conn->data, + tcp_conn->in.datalen); + break; default: BUG_ON(1); } @@ -1080,6 +838,7 @@ iscsi_tcp_data_recv(read_descriptor_t *r { int rc; struct iscsi_conn *conn = rd_desc->arg.data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; int processed; char pad[ISCSI_PAD_LEN]; struct scatterlist sg; @@ -1088,15 +847,15 @@ iscsi_tcp_data_recv(read_descriptor_t *r * Save current SKB and its offset in the corresponding * connection context. */ - conn->in.copy = skb->len - offset; - conn->in.offset = offset; - conn->in.skb = skb; - conn->in.len = conn->in.copy; - BUG_ON(conn->in.copy <= 0); - debug_tcp("in %d bytes\n", conn->in.copy); + tcp_conn->in.copy = skb->len - offset; + tcp_conn->in.offset = offset; + tcp_conn->in.skb = skb; + tcp_conn->in.len = tcp_conn->in.copy; + BUG_ON(tcp_conn->in.copy <= 0); + debug_tcp("in %d bytes\n", tcp_conn->in.copy); more: - conn->in.copied = 0; + tcp_conn->in.copied = 0; rc = 0; if (unlikely(conn->suspend_rx)) { @@ -1104,9 +863,9 @@ more: return 0; } - if (conn->in_progress == IN_PROGRESS_WAIT_HEADER || - conn->in_progress == IN_PROGRESS_HEADER_GATHER) { - rc = iscsi_hdr_extract(conn); + if (tcp_conn->in_progress == IN_PROGRESS_WAIT_HEADER || + tcp_conn->in_progress == IN_PROGRESS_HEADER_GATHER) { + rc = iscsi_hdr_extract(tcp_conn); if (rc) { if (rc == -EAGAIN) goto nomore; @@ -1119,90 +878,91 @@ more: /* * Verify and process incoming PDU header. */ - rc = iscsi_hdr_recv(conn); - if (!rc && conn->in.datalen) { + rc = iscsi_tcp_hdr_recv(conn); + if (!rc && tcp_conn->in.datalen) { if (conn->datadgst_en) { - BUG_ON(!conn->data_rx_tfm); - crypto_digest_init(conn->data_rx_tfm); + BUG_ON(!tcp_conn->data_rx_tfm); + crypto_digest_init(tcp_conn->data_rx_tfm); } - conn->in_progress = IN_PROGRESS_DATA_RECV; + tcp_conn->in_progress = IN_PROGRESS_DATA_RECV; } else if (rc) { iscsi_conn_failure(conn, rc); return 0; } } - if (conn->in_progress == IN_PROGRESS_DDIGEST_RECV) { + if (tcp_conn->in_progress == IN_PROGRESS_DDIGEST_RECV) { uint32_t recv_digest; + debug_tcp("extra data_recv offset %d copy %d\n", - conn->in.offset, conn->in.copy); - skb_copy_bits(conn->in.skb, conn->in.offset, + tcp_conn->in.offset, tcp_conn->in.copy); + skb_copy_bits(tcp_conn->in.skb, tcp_conn->in.offset, &recv_digest, 4); - conn->in.offset += 4; - conn->in.copy -= 4; - if (recv_digest != conn->in.datadgst) { + tcp_conn->in.offset += 4; + tcp_conn->in.copy -= 4; + if (recv_digest != tcp_conn->in.datadgst) { debug_tcp("iscsi_tcp: data digest error!" "0x%x != 0x%x\n", recv_digest, - conn->in.datadgst); + tcp_conn->in.datadgst); iscsi_conn_failure(conn, ISCSI_ERR_DATA_DGST); return 0; } else { debug_tcp("iscsi_tcp: data digest match!" "0x%x == 0x%x\n", recv_digest, - conn->in.datadgst); - conn->in_progress = IN_PROGRESS_WAIT_HEADER; + tcp_conn->in.datadgst); + tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER; } } - if (conn->in_progress == IN_PROGRESS_DATA_RECV && conn->in.copy) { + if (tcp_conn->in_progress == IN_PROGRESS_DATA_RECV && + tcp_conn->in.copy) { debug_tcp("data_recv offset %d copy %d\n", - conn->in.offset, conn->in.copy); + tcp_conn->in.offset, tcp_conn->in.copy); rc = iscsi_data_recv(conn); if (rc) { - if (rc == -EAGAIN) { - rd_desc->count = conn->in.datalen - - conn->in.ctask->data_count; + if (rc == -EAGAIN) goto again; - } iscsi_conn_failure(conn, rc); return 0; } - conn->in.copy -= conn->in.padding; - conn->in.offset += conn->in.padding; + tcp_conn->in.copy -= tcp_conn->in.padding; + tcp_conn->in.offset += tcp_conn->in.padding; if (conn->datadgst_en) { - if (conn->in.padding) { - debug_tcp("padding -> %d\n", conn->in.padding); - memset(pad, 0, conn->in.padding); - sg_init_one(&sg, pad, conn->in.padding); - crypto_digest_update(conn->data_rx_tfm, &sg, 1); + if (tcp_conn->in.padding) { + debug_tcp("padding -> %d\n", + tcp_conn->in.padding); + memset(pad, 0, tcp_conn->in.padding); + sg_init_one(&sg, pad, tcp_conn->in.padding); + crypto_digest_update(tcp_conn->data_rx_tfm, + &sg, 1); } - crypto_digest_final(conn->data_rx_tfm, - (u8 *) & conn->in.datadgst); - debug_tcp("rx digest 0x%x\n", conn->in.datadgst); - conn->in_progress = IN_PROGRESS_DDIGEST_RECV; + crypto_digest_final(tcp_conn->data_rx_tfm, + (u8 *) & tcp_conn->in.datadgst); + debug_tcp("rx digest 0x%x\n", tcp_conn->in.datadgst); + tcp_conn->in_progress = IN_PROGRESS_DDIGEST_RECV; } else - conn->in_progress = IN_PROGRESS_WAIT_HEADER; + tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER; } debug_tcp("f, processed %d from out of %d padding %d\n", - conn->in.offset - offset, (int)len, conn->in.padding); - BUG_ON(conn->in.offset - offset > len); + tcp_conn->in.offset - offset, (int)len, tcp_conn->in.padding); + BUG_ON(tcp_conn->in.offset - offset > len); - if (conn->in.offset - offset != len) { + if (tcp_conn->in.offset - offset != len) { debug_tcp("continue to process %d bytes\n", - (int)len - (conn->in.offset - offset)); + (int)len - (tcp_conn->in.offset - offset)); goto more; } nomore: - processed = conn->in.offset - offset; + processed = tcp_conn->in.offset - offset; BUG_ON(processed == 0); return processed; again: - processed = conn->in.offset - offset; + processed = tcp_conn->in.offset - offset; debug_tcp("c, processed %d from out of %d rd_desc_cnt %d\n", processed, (int)len, (int)rd_desc->count); BUG_ON(processed == 0); @@ -1220,9 +980,14 @@ iscsi_tcp_data_ready(struct sock *sk, in read_lock(&sk->sk_callback_lock); - /* use rd_desc to pass 'conn' to iscsi_tcp_data_recv */ + /* + * Use rd_desc to pass 'conn' to iscsi_tcp_data_recv. + * We set count to 1 because we want the network layer to + * hand us all the skbs that are available. iscsi_tcp_data_recv + * handled pdus that cross buffers or pdus that still need data. + */ rd_desc.arg.data = conn; - rd_desc.count = 0; + rd_desc.count = 1; tcp_read_sock(sk, &rd_desc, iscsi_tcp_data_recv); read_unlock(&sk->sk_callback_lock); @@ -1231,6 +996,7 @@ iscsi_tcp_data_ready(struct sock *sk, in static void iscsi_tcp_state_change(struct sock *sk) { + struct iscsi_tcp_conn *tcp_conn; struct iscsi_conn *conn; struct iscsi_session *session; void (*old_state_change)(struct sock *); @@ -1247,7 +1013,8 @@ iscsi_tcp_state_change(struct sock *sk) iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); } - old_state_change = conn->old_state_change; + tcp_conn = conn->dd_data; + old_state_change = tcp_conn->old_state_change; read_unlock(&sk->sk_callback_lock); @@ -1262,23 +1029,26 @@ static void iscsi_write_space(struct sock *sk) { struct iscsi_conn *conn = (struct iscsi_conn*)sk->sk_user_data; - conn->old_write_space(sk); + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + + tcp_conn->old_write_space(sk); debug_tcp("iscsi_write_space: cid %d\n", conn->id); - clear_bit(SUSPEND_BIT, &conn->suspend_tx); + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); scsi_queue_work(conn->session->host, &conn->xmitwork); } static void iscsi_conn_set_callbacks(struct iscsi_conn *conn) { - struct sock *sk = conn->sock->sk; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct sock *sk = tcp_conn->sock->sk; /* assign new callbacks */ write_lock_bh(&sk->sk_callback_lock); sk->sk_user_data = conn; - conn->old_data_ready = sk->sk_data_ready; - conn->old_state_change = sk->sk_state_change; - conn->old_write_space = sk->sk_write_space; + tcp_conn->old_data_ready = sk->sk_data_ready; + tcp_conn->old_state_change = sk->sk_state_change; + tcp_conn->old_write_space = sk->sk_write_space; sk->sk_data_ready = iscsi_tcp_data_ready; sk->sk_state_change = iscsi_tcp_state_change; sk->sk_write_space = iscsi_write_space; @@ -1288,14 +1058,15 @@ iscsi_conn_set_callbacks(struct iscsi_co static void iscsi_conn_restore_callbacks(struct iscsi_conn *conn) { - struct sock *sk = conn->sock->sk; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct sock *sk = tcp_conn->sock->sk; /* restore socket callbacks, see also: iscsi_conn_set_callbacks() */ write_lock_bh(&sk->sk_callback_lock); sk->sk_user_data = NULL; - sk->sk_data_ready = conn->old_data_ready; - sk->sk_state_change = conn->old_state_change; - sk->sk_write_space = conn->old_write_space; + sk->sk_data_ready = tcp_conn->old_data_ready; + sk->sk_state_change = tcp_conn->old_state_change; + sk->sk_write_space = tcp_conn->old_write_space; sk->sk_no_check = 0; write_unlock_bh(&sk->sk_callback_lock); } @@ -1310,7 +1081,8 @@ iscsi_conn_restore_callbacks(struct iscs static inline int iscsi_send(struct iscsi_conn *conn, struct iscsi_buf *buf, int size, int flags) { - struct socket *sk = conn->sock; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct socket *sk = tcp_conn->sock; int offset = buf->sg.offset + buf->sent; /* @@ -1324,7 +1096,8 @@ iscsi_send(struct iscsi_conn *conn, stru if (buf->use_sendmsg) return sock_no_sendpage(sk, buf->sg.page, offset, size, flags); else - return conn->sendpage(sk, buf->sg.page, offset, size, flags); + return tcp_conn->sendpage(sk, buf->sg.page, offset, size, + flags); } /** @@ -1339,6 +1112,7 @@ iscsi_send(struct iscsi_conn *conn, stru static inline int iscsi_sendhdr(struct iscsi_conn *conn, struct iscsi_buf *buf, int datalen) { + struct iscsi_tcp_conn *tcp_conn; int flags = 0; /* MSG_DONTWAIT; */ int res, size; @@ -1356,8 +1130,9 @@ iscsi_sendhdr(struct iscsi_conn *conn, s return -EAGAIN; return 0; } else if (res == -EAGAIN) { - conn->sendpage_failures_cnt++; - set_bit(SUSPEND_BIT, &conn->suspend_tx); + tcp_conn = conn->dd_data; + tcp_conn->sendpage_failures_cnt++; + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); } else if (res == -EPIPE) iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); @@ -1378,6 +1153,7 @@ static inline int iscsi_sendpage(struct iscsi_conn *conn, struct iscsi_buf *buf, int *count, int *sent) { + struct iscsi_tcp_conn *tcp_conn; int flags = 0; /* MSG_DONTWAIT; */ int res, size; @@ -1400,8 +1176,9 @@ iscsi_sendpage(struct iscsi_conn *conn, return -EAGAIN; return 0; } else if (res == -EAGAIN) { - conn->sendpage_failures_cnt++; - set_bit(SUSPEND_BIT, &conn->suspend_tx); + tcp_conn = conn->dd_data; + tcp_conn->sendpage_failures_cnt++; + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); } else if (res == -EPIPE) iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); @@ -1409,30 +1186,35 @@ iscsi_sendpage(struct iscsi_conn *conn, } static inline void -iscsi_data_digest_init(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +iscsi_data_digest_init(struct iscsi_tcp_conn *tcp_conn, + struct iscsi_cmd_task *ctask) { - BUG_ON(!conn->data_tx_tfm); - crypto_digest_init(conn->data_tx_tfm); - ctask->digest_count = 4; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + + BUG_ON(!tcp_conn->data_tx_tfm); + crypto_digest_init(tcp_conn->data_tx_tfm); + tcp_ctask->digest_count = 4; } static int iscsi_digest_final_send(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask, struct iscsi_buf *buf, uint32_t *digest, int final) { + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; int rc = 0; int sent = 0; if (final) - crypto_digest_final(conn->data_tx_tfm, (u8*)digest); + crypto_digest_final(tcp_conn->data_tx_tfm, (u8*)digest); iscsi_buf_init_virt(buf, (char*)digest, 4); - rc = iscsi_sendpage(conn, buf, &ctask->digest_count, &sent); + rc = iscsi_sendpage(conn, buf, &tcp_ctask->digest_count, &sent); if (rc) { - ctask->datadigest = *digest; - ctask->xmstate |= XMSTATE_DATA_DIGEST; + tcp_ctask->datadigest = *digest; + tcp_ctask->xmstate |= XMSTATE_DATA_DIGEST; } else - ctask->digest_count = 4; + tcp_ctask->digest_count = 4; return rc; } @@ -1453,21 +1235,23 @@ static void iscsi_solicit_data_cont(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask, struct iscsi_r2t_info *r2t, int left) { + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; struct iscsi_data *hdr; struct iscsi_data_task *dtask; struct scsi_cmnd *sc = ctask->sc; int new_offset; - dtask = mempool_alloc(ctask->datapool, GFP_ATOMIC); + dtask = mempool_alloc(tcp_ctask->datapool, GFP_ATOMIC); BUG_ON(!dtask); + INIT_LIST_HEAD(&dtask->item); hdr = &dtask->hdr; memset(hdr, 0, sizeof(struct iscsi_data)); hdr->ttt = r2t->ttt; hdr->datasn = cpu_to_be32(r2t->solicit_datasn); r2t->solicit_datasn++; hdr->opcode = ISCSI_OP_SCSI_DATA_OUT; - memcpy(hdr->lun, ctask->hdr.lun, sizeof(hdr->lun)); - hdr->itt = ctask->hdr.itt; + memcpy(hdr->lun, ctask->hdr->lun, sizeof(hdr->lun)); + hdr->itt = ctask->hdr->itt; hdr->exp_statsn = r2t->exp_statsn; new_offset = r2t->data_offset + r2t->sent; hdr->offset = cpu_to_be32(new_offset); @@ -1487,175 +1271,102 @@ iscsi_solicit_data_cont(struct iscsi_con r2t->dtask = dtask; if (sc->use_sg && !iscsi_buf_left(&r2t->sendbuf)) { - BUG_ON(ctask->bad_sg == r2t->sg); + BUG_ON(tcp_ctask->bad_sg == r2t->sg); iscsi_buf_init_sg(&r2t->sendbuf, r2t->sg); r2t->sg += 1; } else - iscsi_buf_init_iov(&ctask->sendbuf, + iscsi_buf_init_iov(&tcp_ctask->sendbuf, (char*)sc->request_buffer + new_offset, r2t->data_count); - list_add(&dtask->item, &ctask->dataqueue); + list_add(&dtask->item, &tcp_ctask->dataqueue); } static void iscsi_unsolicit_data_init(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { - struct iscsi_data *hdr; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; struct iscsi_data_task *dtask; - dtask = mempool_alloc(ctask->datapool, GFP_ATOMIC); + dtask = mempool_alloc(tcp_ctask->datapool, GFP_ATOMIC); BUG_ON(!dtask); - hdr = &dtask->hdr; - memset(hdr, 0, sizeof(struct iscsi_data)); - hdr->ttt = cpu_to_be32(ISCSI_RESERVED_TAG); - hdr->datasn = cpu_to_be32(ctask->unsol_datasn); - ctask->unsol_datasn++; - hdr->opcode = ISCSI_OP_SCSI_DATA_OUT; - memcpy(hdr->lun, ctask->hdr.lun, sizeof(hdr->lun)); - hdr->itt = ctask->hdr.itt; - hdr->exp_statsn = cpu_to_be32(conn->exp_statsn); - hdr->offset = cpu_to_be32(ctask->total_length - - ctask->r2t_data_count - - ctask->unsol_count); - if (ctask->unsol_count > conn->max_xmit_dlength) { - hton24(hdr->dlength, conn->max_xmit_dlength); - ctask->data_count = conn->max_xmit_dlength; - hdr->flags = 0; - } else { - hton24(hdr->dlength, ctask->unsol_count); - ctask->data_count = ctask->unsol_count; - hdr->flags = ISCSI_FLAG_CMD_FINAL; - } + INIT_LIST_HEAD(&dtask->item); - iscsi_buf_init_virt(&ctask->headbuf, (char*)hdr, + iscsi_prep_unsolicit_data_pdu(ctask, &dtask->hdr, + tcp_ctask->r2t_data_count); + iscsi_buf_init_virt(&tcp_ctask->headbuf, (char*)&dtask->hdr, sizeof(struct iscsi_hdr)); - list_add(&dtask->item, &ctask->dataqueue); - - ctask->dtask = dtask; + list_add(&dtask->item, &tcp_ctask->dataqueue); + tcp_ctask->dtask = dtask; } /** - * iscsi_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands + * iscsi_tcp_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands * @conn: iscsi connection * @ctask: scsi command task * @sc: scsi command **/ static void -iscsi_cmd_init(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask, - struct scsi_cmnd *sc) +iscsi_tcp_cmd_init(struct iscsi_cmd_task *ctask) { - struct iscsi_session *session = conn->session; - - BUG_ON(__kfifo_len(ctask->r2tqueue)); - - ctask->sc = sc; - ctask->conn = conn; - ctask->hdr.opcode = ISCSI_OP_SCSI_CMD; - ctask->hdr.flags = ISCSI_ATTR_SIMPLE; - int_to_scsilun(sc->device->lun, (struct scsi_lun *)ctask->hdr.lun); - ctask->hdr.itt = ctask->itt | (conn->id << CID_SHIFT) | - (session->age << AGE_SHIFT); - ctask->hdr.data_length = cpu_to_be32(sc->request_bufflen); - ctask->hdr.cmdsn = cpu_to_be32(session->cmdsn); session->cmdsn++; - ctask->hdr.exp_statsn = cpu_to_be32(conn->exp_statsn); - memcpy(ctask->hdr.cdb, sc->cmnd, sc->cmd_len); - memset(&ctask->hdr.cdb[sc->cmd_len], 0, MAX_COMMAND_SIZE - sc->cmd_len); + struct scsi_cmnd *sc = ctask->sc; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; - ctask->mtask = NULL; - ctask->sent = 0; - ctask->sg_count = 0; + BUG_ON(__kfifo_len(tcp_ctask->r2tqueue)); - ctask->total_length = sc->request_bufflen; + tcp_ctask->sent = 0; + tcp_ctask->sg_count = 0; if (sc->sc_data_direction == DMA_TO_DEVICE) { - ctask->exp_r2tsn = 0; - ctask->hdr.flags |= ISCSI_FLAG_CMD_WRITE; + tcp_ctask->xmstate = XMSTATE_W_HDR; + tcp_ctask->exp_r2tsn = 0; BUG_ON(ctask->total_length == 0); + if (sc->use_sg) { struct scatterlist *sg = sc->request_buffer; - iscsi_buf_init_sg(&ctask->sendbuf, - &sg[ctask->sg_count++]); - ctask->sg = sg; - ctask->bad_sg = sg + sc->use_sg; - } else { - iscsi_buf_init_iov(&ctask->sendbuf, sc->request_buffer, - sc->request_bufflen); - } + iscsi_buf_init_sg(&tcp_ctask->sendbuf, + &sg[tcp_ctask->sg_count++]); + tcp_ctask->sg = sg; + tcp_ctask->bad_sg = sg + sc->use_sg; + } else + iscsi_buf_init_iov(&tcp_ctask->sendbuf, + sc->request_buffer, + sc->request_bufflen); - /* - * Write counters: - * - * imm_count bytes to be sent right after - * SCSI PDU Header - * - * unsol_count bytes(as Data-Out) to be sent - * without R2T ack right after - * immediate data - * - * r2t_data_count bytes to be sent via R2T ack's - * - * pad_count bytes to be sent as zero-padding - */ - ctask->imm_count = 0; - ctask->unsol_count = 0; - ctask->unsol_datasn = 0; - ctask->xmstate = XMSTATE_W_HDR; - /* calculate write padding */ - ctask->pad_count = ctask->total_length & (ISCSI_PAD_LEN-1); - if (ctask->pad_count) { - ctask->pad_count = ISCSI_PAD_LEN - ctask->pad_count; + if (ctask->imm_count) + tcp_ctask->xmstate |= XMSTATE_IMM_DATA; + + tcp_ctask->pad_count = ctask->total_length & (ISCSI_PAD_LEN-1); + if (tcp_ctask->pad_count) { + tcp_ctask->pad_count = ISCSI_PAD_LEN - + tcp_ctask->pad_count; debug_scsi("write padding %d bytes\n", - ctask->pad_count); - ctask->xmstate |= XMSTATE_W_PAD; + tcp_ctask->pad_count); + tcp_ctask->xmstate |= XMSTATE_W_PAD; } - if (session->imm_data_en) { - if (ctask->total_length >= session->first_burst) - ctask->imm_count = min(session->first_burst, - conn->max_xmit_dlength); - else - ctask->imm_count = min(ctask->total_length, - conn->max_xmit_dlength); - hton24(ctask->hdr.dlength, ctask->imm_count); - ctask->xmstate |= XMSTATE_IMM_DATA; - } else - zero_data(ctask->hdr.dlength); - - if (!session->initial_r2t_en) - ctask->unsol_count = min(session->first_burst, - ctask->total_length) - ctask->imm_count; - if (!ctask->unsol_count) - /* No unsolicit Data-Out's */ - ctask->hdr.flags |= ISCSI_FLAG_CMD_FINAL; - else - ctask->xmstate |= XMSTATE_UNS_HDR | XMSTATE_UNS_INIT; - ctask->r2t_data_count = ctask->total_length - + if (ctask->unsol_count) + tcp_ctask->xmstate |= XMSTATE_UNS_HDR | + XMSTATE_UNS_INIT; + tcp_ctask->r2t_data_count = ctask->total_length - ctask->imm_count - ctask->unsol_count; debug_scsi("cmd [itt %x total %d imm %d imm_data %d " "r2t_data %d]\n", ctask->itt, ctask->total_length, ctask->imm_count, - ctask->unsol_count, ctask->r2t_data_count); - } else { - ctask->hdr.flags |= ISCSI_FLAG_CMD_FINAL; - if (sc->sc_data_direction == DMA_FROM_DEVICE) - ctask->hdr.flags |= ISCSI_FLAG_CMD_READ; - ctask->datasn = 0; - ctask->xmstate = XMSTATE_R_HDR; - zero_data(ctask->hdr.dlength); - } + ctask->unsol_count, tcp_ctask->r2t_data_count); + } else + tcp_ctask->xmstate = XMSTATE_R_HDR; - iscsi_buf_init_virt(&ctask->headbuf, (char*)&ctask->hdr, + iscsi_buf_init_virt(&tcp_ctask->headbuf, (char*)ctask->hdr, sizeof(struct iscsi_hdr)); - conn->scsicmd_pdus_cnt++; } /** - * iscsi_mtask_xmit - xmit management(immediate) task + * iscsi_tcp_mtask_xmit - xmit management(immediate) task * @conn: iscsi connection * @mtask: task management task * @@ -1669,70 +1380,87 @@ iscsi_cmd_init(struct iscsi_conn *conn, * IN_PROGRESS_IMM_DATA - PDU Data xmit in progress **/ static int -iscsi_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask) +iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask) { + struct iscsi_tcp_mgmt_task *tcp_mtask = mtask->dd_data; debug_scsi("mtask deq [cid %d state %x itt 0x%x]\n", - conn->id, mtask->xmstate, mtask->itt); + conn->id, tcp_mtask->xmstate, mtask->itt); - if (mtask->xmstate & XMSTATE_IMM_HDR) { - mtask->xmstate &= ~XMSTATE_IMM_HDR; + if (tcp_mtask->xmstate & XMSTATE_IMM_HDR) { + tcp_mtask->xmstate &= ~XMSTATE_IMM_HDR; if (mtask->data_count) - mtask->xmstate |= XMSTATE_IMM_DATA; + tcp_mtask->xmstate |= XMSTATE_IMM_DATA; if (conn->c_stage != ISCSI_CONN_INITIAL_STAGE && - conn->stop_stage != STOP_CONN_RECOVER && + conn->stop_stage != STOP_CONN_RECOVER && conn->hdrdgst_en) - iscsi_hdr_digest(conn, &mtask->headbuf, - (u8*)mtask->hdrext); - if (iscsi_sendhdr(conn, &mtask->headbuf, mtask->data_count)) { - mtask->xmstate |= XMSTATE_IMM_HDR; + iscsi_hdr_digest(conn, &tcp_mtask->headbuf, + (u8*)tcp_mtask->hdrext); + if (iscsi_sendhdr(conn, &tcp_mtask->headbuf, + mtask->data_count)) { + tcp_mtask->xmstate |= XMSTATE_IMM_HDR; if (mtask->data_count) - mtask->xmstate &= ~XMSTATE_IMM_DATA; + tcp_mtask->xmstate &= ~XMSTATE_IMM_DATA; return -EAGAIN; } } - if (mtask->xmstate & XMSTATE_IMM_DATA) { + if (tcp_mtask->xmstate & XMSTATE_IMM_DATA) { BUG_ON(!mtask->data_count); - mtask->xmstate &= ~XMSTATE_IMM_DATA; + tcp_mtask->xmstate &= ~XMSTATE_IMM_DATA; /* FIXME: implement. * Virtual buffer could be spreaded across multiple pages... */ do { - if (iscsi_sendpage(conn, &mtask->sendbuf, - &mtask->data_count, &mtask->sent)) { - mtask->xmstate |= XMSTATE_IMM_DATA; + if (iscsi_sendpage(conn, &tcp_mtask->sendbuf, + &mtask->data_count, &tcp_mtask->sent)) { + tcp_mtask->xmstate |= XMSTATE_IMM_DATA; return -EAGAIN; } } while (mtask->data_count); } - BUG_ON(mtask->xmstate != XMSTATE_IDLE); + BUG_ON(tcp_mtask->xmstate != XMSTATE_IDLE); + if (mtask->hdr->itt == cpu_to_be32(ISCSI_RESERVED_TAG)) { + struct iscsi_session *session = conn->session; + + spin_lock_bh(&session->lock); + list_del(&conn->mtask->running); + __kfifo_put(session->mgmtpool.queue, (void*)&conn->mtask, + sizeof(void*)); + spin_unlock_bh(&session->lock); + } return 0; } static inline int -handle_xmstate_r_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +handle_xmstate_r_hdr(struct iscsi_conn *conn, + struct iscsi_tcp_cmd_task *tcp_ctask) { - ctask->xmstate &= ~XMSTATE_R_HDR; + tcp_ctask->xmstate &= ~XMSTATE_R_HDR; if (conn->hdrdgst_en) - iscsi_hdr_digest(conn, &ctask->headbuf, (u8*)ctask->hdrext); - if (!iscsi_sendhdr(conn, &ctask->headbuf, 0)) { - BUG_ON(ctask->xmstate != XMSTATE_IDLE); + iscsi_hdr_digest(conn, &tcp_ctask->headbuf, + (u8*)tcp_ctask->hdrext); + if (!iscsi_sendhdr(conn, &tcp_ctask->headbuf, 0)) { + BUG_ON(tcp_ctask->xmstate != XMSTATE_IDLE); return 0; /* wait for Data-In */ } - ctask->xmstate |= XMSTATE_R_HDR; + tcp_ctask->xmstate |= XMSTATE_R_HDR; return -EAGAIN; } static inline int -handle_xmstate_w_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +handle_xmstate_w_hdr(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) { - ctask->xmstate &= ~XMSTATE_W_HDR; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + + tcp_ctask->xmstate &= ~XMSTATE_W_HDR; if (conn->hdrdgst_en) - iscsi_hdr_digest(conn, &ctask->headbuf, (u8*)ctask->hdrext); - if (iscsi_sendhdr(conn, &ctask->headbuf, ctask->imm_count)) { - ctask->xmstate |= XMSTATE_W_HDR; + iscsi_hdr_digest(conn, &tcp_ctask->headbuf, + (u8*)tcp_ctask->hdrext); + if (iscsi_sendhdr(conn, &tcp_ctask->headbuf, ctask->imm_count)) { + tcp_ctask->xmstate |= XMSTATE_W_HDR; return -EAGAIN; } return 0; @@ -1742,13 +1470,15 @@ static inline int handle_xmstate_data_digest(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { - ctask->xmstate &= ~XMSTATE_DATA_DIGEST; - debug_tcp("resent data digest 0x%x\n", ctask->datadigest); - if (iscsi_digest_final_send(conn, ctask, &ctask->immbuf, - &ctask->datadigest, 0)) { - ctask->xmstate |= XMSTATE_DATA_DIGEST; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + + tcp_ctask->xmstate &= ~XMSTATE_DATA_DIGEST; + debug_tcp("resent data digest 0x%x\n", tcp_ctask->datadigest); + if (iscsi_digest_final_send(conn, ctask, &tcp_ctask->immbuf, + &tcp_ctask->datadigest, 0)) { + tcp_ctask->xmstate |= XMSTATE_DATA_DIGEST; debug_tcp("resent data digest 0x%x fail!\n", - ctask->datadigest); + tcp_ctask->datadigest); return -EAGAIN; } return 0; @@ -1757,44 +1487,47 @@ handle_xmstate_data_digest(struct iscsi_ static inline int handle_xmstate_imm_data(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + BUG_ON(!ctask->imm_count); - ctask->xmstate &= ~XMSTATE_IMM_DATA; + tcp_ctask->xmstate &= ~XMSTATE_IMM_DATA; if (conn->datadgst_en) { - iscsi_data_digest_init(conn, ctask); - ctask->immdigest = 0; + iscsi_data_digest_init(tcp_conn, ctask); + tcp_ctask->immdigest = 0; } for (;;) { - if (iscsi_sendpage(conn, &ctask->sendbuf, &ctask->imm_count, - &ctask->sent)) { - ctask->xmstate |= XMSTATE_IMM_DATA; + if (iscsi_sendpage(conn, &tcp_ctask->sendbuf, &ctask->imm_count, + &tcp_ctask->sent)) { + tcp_ctask->xmstate |= XMSTATE_IMM_DATA; if (conn->datadgst_en) { - crypto_digest_final(conn->data_tx_tfm, - (u8*)&ctask->immdigest); + crypto_digest_final(tcp_conn->data_tx_tfm, + (u8*)&tcp_ctask->immdigest); debug_tcp("tx imm sendpage fail 0x%x\n", - ctask->datadigest); + tcp_ctask->datadigest); } return -EAGAIN; } if (conn->datadgst_en) - crypto_digest_update(conn->data_tx_tfm, - &ctask->sendbuf.sg, 1); + crypto_digest_update(tcp_conn->data_tx_tfm, + &tcp_ctask->sendbuf.sg, 1); if (!ctask->imm_count) break; - iscsi_buf_init_sg(&ctask->sendbuf, - &ctask->sg[ctask->sg_count++]); + iscsi_buf_init_sg(&tcp_ctask->sendbuf, + &tcp_ctask->sg[tcp_ctask->sg_count++]); } - if (conn->datadgst_en && !(ctask->xmstate & XMSTATE_W_PAD)) { - if (iscsi_digest_final_send(conn, ctask, &ctask->immbuf, - &ctask->immdigest, 1)) { + if (conn->datadgst_en && !(tcp_ctask->xmstate & XMSTATE_W_PAD)) { + if (iscsi_digest_final_send(conn, ctask, &tcp_ctask->immbuf, + &tcp_ctask->immdigest, 1)) { debug_tcp("sending imm digest 0x%x fail!\n", - ctask->immdigest); + tcp_ctask->immdigest); return -EAGAIN; } - debug_tcp("sending imm digest 0x%x\n", ctask->immdigest); + debug_tcp("sending imm digest 0x%x\n", tcp_ctask->immdigest); } return 0; @@ -1803,52 +1536,55 @@ handle_xmstate_imm_data(struct iscsi_con static inline int handle_xmstate_uns_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; struct iscsi_data_task *dtask; - ctask->xmstate |= XMSTATE_UNS_DATA; - if (ctask->xmstate & XMSTATE_UNS_INIT) { + tcp_ctask->xmstate |= XMSTATE_UNS_DATA; + if (tcp_ctask->xmstate & XMSTATE_UNS_INIT) { iscsi_unsolicit_data_init(conn, ctask); - BUG_ON(!ctask->dtask); - dtask = ctask->dtask; + BUG_ON(!tcp_ctask->dtask); + dtask = tcp_ctask->dtask; if (conn->hdrdgst_en) - iscsi_hdr_digest(conn, &ctask->headbuf, + iscsi_hdr_digest(conn, &tcp_ctask->headbuf, (u8*)dtask->hdrext); - ctask->xmstate &= ~XMSTATE_UNS_INIT; + tcp_ctask->xmstate &= ~XMSTATE_UNS_INIT; } - if (iscsi_sendhdr(conn, &ctask->headbuf, ctask->data_count)) { - ctask->xmstate &= ~XMSTATE_UNS_DATA; - ctask->xmstate |= XMSTATE_UNS_HDR; + if (iscsi_sendhdr(conn, &tcp_ctask->headbuf, ctask->data_count)) { + tcp_ctask->xmstate &= ~XMSTATE_UNS_DATA; + tcp_ctask->xmstate |= XMSTATE_UNS_HDR; return -EAGAIN; } debug_scsi("uns dout [itt 0x%x dlen %d sent %d]\n", - ctask->itt, ctask->unsol_count, ctask->sent); + ctask->itt, ctask->unsol_count, tcp_ctask->sent); return 0; } static inline int handle_xmstate_uns_data(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { - struct iscsi_data_task *dtask = ctask->dtask; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_data_task *dtask = tcp_ctask->dtask; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; BUG_ON(!ctask->data_count); - ctask->xmstate &= ~XMSTATE_UNS_DATA; + tcp_ctask->xmstate &= ~XMSTATE_UNS_DATA; if (conn->datadgst_en) { - iscsi_data_digest_init(conn, ctask); + iscsi_data_digest_init(tcp_conn, ctask); dtask->digest = 0; } for (;;) { - int start = ctask->sent; + int start = tcp_ctask->sent; - if (iscsi_sendpage(conn, &ctask->sendbuf, &ctask->data_count, - &ctask->sent)) { - ctask->unsol_count -= ctask->sent - start; - ctask->xmstate |= XMSTATE_UNS_DATA; + if (iscsi_sendpage(conn, &tcp_ctask->sendbuf, + &ctask->data_count, &tcp_ctask->sent)) { + ctask->unsol_count -= tcp_ctask->sent - start; + tcp_ctask->xmstate |= XMSTATE_UNS_DATA; /* will continue with this ctask later.. */ if (conn->datadgst_en) { - crypto_digest_final(conn->data_tx_tfm, + crypto_digest_final(tcp_conn->data_tx_tfm, (u8 *)&dtask->digest); debug_tcp("tx uns data fail 0x%x\n", dtask->digest); @@ -1856,21 +1592,21 @@ handle_xmstate_uns_data(struct iscsi_con return -EAGAIN; } - BUG_ON(ctask->sent > ctask->total_length); - ctask->unsol_count -= ctask->sent - start; + BUG_ON(tcp_ctask->sent > ctask->total_length); + ctask->unsol_count -= tcp_ctask->sent - start; /* * XXX:we may run here with un-initial sendbuf. * so pass it */ - if (conn->datadgst_en && ctask->sent - start > 0) - crypto_digest_update(conn->data_tx_tfm, - &ctask->sendbuf.sg, 1); + if (conn->datadgst_en && tcp_ctask->sent - start > 0) + crypto_digest_update(tcp_conn->data_tx_tfm, + &tcp_ctask->sendbuf.sg, 1); if (!ctask->data_count) break; - iscsi_buf_init_sg(&ctask->sendbuf, - &ctask->sg[ctask->sg_count++]); + iscsi_buf_init_sg(&tcp_ctask->sendbuf, + &tcp_ctask->sg[tcp_ctask->sg_count++]); } BUG_ON(ctask->unsol_count < 0); @@ -1890,11 +1626,11 @@ handle_xmstate_uns_data(struct iscsi_con debug_tcp("sending uns digest 0x%x, more uns\n", dtask->digest); } - ctask->xmstate |= XMSTATE_UNS_INIT; + tcp_ctask->xmstate |= XMSTATE_UNS_INIT; return 1; } - if (conn->datadgst_en && !(ctask->xmstate & XMSTATE_W_PAD)) { + if (conn->datadgst_en && !(tcp_ctask->xmstate & XMSTATE_W_PAD)) { if (iscsi_digest_final_send(conn, ctask, &dtask->digestbuf, &dtask->digest, 1)) { @@ -1912,15 +1648,17 @@ static inline int handle_xmstate_sol_data(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { struct iscsi_session *session = conn->session; - struct iscsi_r2t_info *r2t = ctask->r2t; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_r2t_info *r2t = tcp_ctask->r2t; struct iscsi_data_task *dtask = r2t->dtask; int left; - ctask->xmstate &= ~XMSTATE_SOL_DATA; - ctask->dtask = dtask; + tcp_ctask->xmstate &= ~XMSTATE_SOL_DATA; + tcp_ctask->dtask = dtask; if (conn->datadgst_en) { - iscsi_data_digest_init(conn, ctask); + iscsi_data_digest_init(tcp_conn, ctask); dtask->digest = 0; } solicit_again: @@ -1931,10 +1669,10 @@ solicit_again: goto data_out_done; if (iscsi_sendpage(conn, &r2t->sendbuf, &r2t->data_count, &r2t->sent)) { - ctask->xmstate |= XMSTATE_SOL_DATA; + tcp_ctask->xmstate |= XMSTATE_SOL_DATA; /* will continue with this ctask later.. */ if (conn->datadgst_en) { - crypto_digest_final(conn->data_tx_tfm, + crypto_digest_final(tcp_conn->data_tx_tfm, (u8 *)&dtask->digest); debug_tcp("r2t data send fail 0x%x\n", dtask->digest); } @@ -1943,12 +1681,13 @@ solicit_again: BUG_ON(r2t->data_count < 0); if (conn->datadgst_en) - crypto_digest_update(conn->data_tx_tfm, &r2t->sendbuf.sg, 1); + crypto_digest_update(tcp_conn->data_tx_tfm, &r2t->sendbuf.sg, + 1); if (r2t->data_count) { BUG_ON(ctask->sc->use_sg == 0); if (!iscsi_buf_left(&r2t->sendbuf)) { - BUG_ON(ctask->bad_sg == r2t->sg); + BUG_ON(tcp_ctask->bad_sg == r2t->sg); iscsi_buf_init_sg(&r2t->sendbuf, r2t->sg); r2t->sg += 1; } @@ -1975,8 +1714,8 @@ data_out_done: dtask->digest); } iscsi_solicit_data_cont(conn, ctask, r2t, left); - ctask->xmstate |= XMSTATE_SOL_DATA; - ctask->xmstate &= ~XMSTATE_SOL_HDR; + tcp_ctask->xmstate |= XMSTATE_SOL_DATA; + tcp_ctask->xmstate &= ~XMSTATE_SOL_HDR; return 1; } @@ -1984,7 +1723,7 @@ data_out_done: * Done with this R2T. Check if there are more * outstanding R2Ts ready to be processed. */ - BUG_ON(ctask->r2t_data_count - r2t->data_length < 0); + BUG_ON(tcp_ctask->r2t_data_count - r2t->data_length < 0); if (conn->datadgst_en) { if (iscsi_digest_final_send(conn, ctask, &dtask->digestbuf, &dtask->digest, 1)) { @@ -1995,15 +1734,15 @@ data_out_done: debug_tcp("r2t done dout digest 0x%x\n", dtask->digest); } - ctask->r2t_data_count -= r2t->data_length; - ctask->r2t = NULL; + tcp_ctask->r2t_data_count -= r2t->data_length; + tcp_ctask->r2t = NULL; spin_lock_bh(&session->lock); - __kfifo_put(ctask->r2tpool.queue, (void*)&r2t, sizeof(void*)); + __kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t, sizeof(void*)); spin_unlock_bh(&session->lock); - if (__kfifo_get(ctask->r2tqueue, (void*)&r2t, sizeof(void*))) { - ctask->r2t = r2t; - ctask->xmstate |= XMSTATE_SOL_DATA; - ctask->xmstate &= ~XMSTATE_SOL_HDR; + if (__kfifo_get(tcp_ctask->r2tqueue, (void*)&r2t, sizeof(void*))) { + tcp_ctask->r2t = r2t; + tcp_ctask->xmstate |= XMSTATE_SOL_DATA; + tcp_ctask->xmstate &= ~XMSTATE_SOL_HDR; return 1; } @@ -2013,29 +1752,34 @@ data_out_done: static inline int handle_xmstate_w_pad(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { - struct iscsi_data_task *dtask = ctask->dtask; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct iscsi_data_task *dtask = tcp_ctask->dtask; int sent; - ctask->xmstate &= ~XMSTATE_W_PAD; - iscsi_buf_init_virt(&ctask->sendbuf, (char*)&ctask->pad, - ctask->pad_count); - if (iscsi_sendpage(conn, &ctask->sendbuf, &ctask->pad_count, &sent)) { - ctask->xmstate |= XMSTATE_W_PAD; + tcp_ctask->xmstate &= ~XMSTATE_W_PAD; + iscsi_buf_init_virt(&tcp_ctask->sendbuf, (char*)&tcp_ctask->pad, + tcp_ctask->pad_count); + if (iscsi_sendpage(conn, &tcp_ctask->sendbuf, &tcp_ctask->pad_count, + &sent)) { + tcp_ctask->xmstate |= XMSTATE_W_PAD; return -EAGAIN; } if (conn->datadgst_en) { - crypto_digest_update(conn->data_tx_tfm, &ctask->sendbuf.sg, 1); + crypto_digest_update(tcp_conn->data_tx_tfm, + &tcp_ctask->sendbuf.sg, 1); /* imm data? */ if (!dtask) { - if (iscsi_digest_final_send(conn, ctask, &ctask->immbuf, - &ctask->immdigest, 1)) { + if (iscsi_digest_final_send(conn, ctask, + &tcp_ctask->immbuf, + &tcp_ctask->immdigest, 1)) { debug_tcp("send padding digest 0x%x" - "fail!\n", ctask->immdigest); + "fail!\n", tcp_ctask->immdigest); return -EAGAIN; } debug_tcp("done with padding, digest 0x%x\n", - ctask->datadigest); + tcp_ctask->datadigest); } else { if (iscsi_digest_final_send(conn, ctask, &dtask->digestbuf, @@ -2053,12 +1797,13 @@ handle_xmstate_w_pad(struct iscsi_conn * } static int -iscsi_ctask_xmit(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +iscsi_tcp_ctask_xmit(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; int rc = 0; debug_scsi("ctask deq [cid %d xmstate %x itt 0x%x]\n", - conn->id, ctask->xmstate, ctask->itt); + conn->id, tcp_ctask->xmstate, ctask->itt); /* * serialize with TMF AbortTask @@ -2066,40 +1811,40 @@ iscsi_ctask_xmit(struct iscsi_conn *conn if (ctask->mtask) return rc; - if (ctask->xmstate & XMSTATE_R_HDR) { - rc = handle_xmstate_r_hdr(conn, ctask); + if (tcp_ctask->xmstate & XMSTATE_R_HDR) { + rc = handle_xmstate_r_hdr(conn, tcp_ctask); return rc; } - if (ctask->xmstate & XMSTATE_W_HDR) { + if (tcp_ctask->xmstate & XMSTATE_W_HDR) { rc = handle_xmstate_w_hdr(conn, ctask); if (rc) return rc; } /* XXX: for data digest xmit recover */ - if (ctask->xmstate & XMSTATE_DATA_DIGEST) { + if (tcp_ctask->xmstate & XMSTATE_DATA_DIGEST) { rc = handle_xmstate_data_digest(conn, ctask); if (rc) return rc; } - if (ctask->xmstate & XMSTATE_IMM_DATA) { + if (tcp_ctask->xmstate & XMSTATE_IMM_DATA) { rc = handle_xmstate_imm_data(conn, ctask); if (rc) return rc; } - if (ctask->xmstate & XMSTATE_UNS_HDR) { + if (tcp_ctask->xmstate & XMSTATE_UNS_HDR) { BUG_ON(!ctask->unsol_count); - ctask->xmstate &= ~XMSTATE_UNS_HDR; + tcp_ctask->xmstate &= ~XMSTATE_UNS_HDR; unsolicit_head_again: rc = handle_xmstate_uns_hdr(conn, ctask); if (rc) return rc; } - if (ctask->xmstate & XMSTATE_UNS_DATA) { + if (tcp_ctask->xmstate & XMSTATE_UNS_DATA) { rc = handle_xmstate_uns_data(conn, ctask); if (rc == 1) goto unsolicit_head_again; @@ -2108,22 +1853,22 @@ unsolicit_head_again: goto done; } - if (ctask->xmstate & XMSTATE_SOL_HDR) { + if (tcp_ctask->xmstate & XMSTATE_SOL_HDR) { struct iscsi_r2t_info *r2t; - ctask->xmstate &= ~XMSTATE_SOL_HDR; - ctask->xmstate |= XMSTATE_SOL_DATA; - if (!ctask->r2t) - __kfifo_get(ctask->r2tqueue, (void*)&ctask->r2t, + tcp_ctask->xmstate &= ~XMSTATE_SOL_HDR; + tcp_ctask->xmstate |= XMSTATE_SOL_DATA; + if (!tcp_ctask->r2t) + __kfifo_get(tcp_ctask->r2tqueue, (void*)&tcp_ctask->r2t, sizeof(void*)); solicit_head_again: - r2t = ctask->r2t; + r2t = tcp_ctask->r2t; if (conn->hdrdgst_en) iscsi_hdr_digest(conn, &r2t->headbuf, (u8*)r2t->dtask->hdrext); if (iscsi_sendhdr(conn, &r2t->headbuf, r2t->data_count)) { - ctask->xmstate &= ~XMSTATE_SOL_DATA; - ctask->xmstate |= XMSTATE_SOL_HDR; + tcp_ctask->xmstate &= ~XMSTATE_SOL_DATA; + tcp_ctask->xmstate |= XMSTATE_SOL_HDR; return -EAGAIN; } @@ -2132,7 +1877,7 @@ solicit_head_again: r2t->sent); } - if (ctask->xmstate & XMSTATE_SOL_DATA) { + if (tcp_ctask->xmstate & XMSTATE_SOL_DATA) { rc = handle_xmstate_sol_data(conn, ctask); if (rc == 1) goto solicit_head_again; @@ -2145,527 +1890,116 @@ done: * Last thing to check is whether we need to send write * padding. Note that we check for xmstate equality, not just the bit. */ - if (ctask->xmstate == XMSTATE_W_PAD) + if (tcp_ctask->xmstate == XMSTATE_W_PAD) rc = handle_xmstate_w_pad(conn, ctask); return rc; } -/** - * iscsi_data_xmit - xmit any command into the scheduled connection - * @conn: iscsi connection - * - * Notes: - * The function can return -EAGAIN in which case the caller must - * re-schedule it again later or recover. '0' return code means - * successful xmit. - **/ -static int -iscsi_data_xmit(struct iscsi_conn *conn) -{ - if (unlikely(conn->suspend_tx)) { - debug_tcp("conn %d Tx suspended!\n", conn->id); - return 0; - } - - /* - * Transmit in the following order: - * - * 1) un-finished xmit (ctask or mtask) - * 2) immediate control PDUs - * 3) write data - * 4) SCSI commands - * 5) non-immediate control PDUs - * - * No need to lock around __kfifo_get as long as - * there's one producer and one consumer. - */ - - BUG_ON(conn->ctask && conn->mtask); - - if (conn->ctask) { - if (iscsi_ctask_xmit(conn, conn->ctask)) - goto again; - /* done with this in-progress ctask */ - conn->ctask = NULL; - } - if (conn->mtask) { - if (iscsi_mtask_xmit(conn, conn->mtask)) - goto again; - /* done with this in-progress mtask */ - conn->mtask = NULL; - } - - /* process immediate first */ - if (unlikely(__kfifo_len(conn->immqueue))) { - struct iscsi_session *session = conn->session; - while (__kfifo_get(conn->immqueue, (void*)&conn->mtask, - sizeof(void*))) { - if (iscsi_mtask_xmit(conn, conn->mtask)) - goto again; - - if (conn->mtask->hdr.itt == - cpu_to_be32(ISCSI_RESERVED_TAG)) { - spin_lock_bh(&session->lock); - __kfifo_put(session->mgmtpool.queue, - (void*)&conn->mtask, sizeof(void*)); - spin_unlock_bh(&session->lock); - } - } - /* done with this mtask */ - conn->mtask = NULL; - } - - /* process write queue */ - while (__kfifo_get(conn->writequeue, (void*)&conn->ctask, - sizeof(void*))) { - if (iscsi_ctask_xmit(conn, conn->ctask)) - goto again; - } - - /* process command queue */ - while (__kfifo_get(conn->xmitqueue, (void*)&conn->ctask, - sizeof(void*))) { - if (iscsi_ctask_xmit(conn, conn->ctask)) - goto again; - } - /* done with this ctask */ - conn->ctask = NULL; - - /* process the rest control plane PDUs, if any */ - if (unlikely(__kfifo_len(conn->mgmtqueue))) { - struct iscsi_session *session = conn->session; - - while (__kfifo_get(conn->mgmtqueue, (void*)&conn->mtask, - sizeof(void*))) { - if (iscsi_mtask_xmit(conn, conn->mtask)) - goto again; - - if (conn->mtask->hdr.itt == - cpu_to_be32(ISCSI_RESERVED_TAG)) { - spin_lock_bh(&session->lock); - __kfifo_put(session->mgmtpool.queue, - (void*)&conn->mtask, - sizeof(void*)); - spin_unlock_bh(&session->lock); - } - } - /* done with this mtask */ - conn->mtask = NULL; - } - - return 0; - -again: - if (unlikely(conn->suspend_tx)) - return 0; - - return -EAGAIN; -} - -static void -iscsi_xmitworker(void *data) -{ - struct iscsi_conn *conn = data; - - /* - * serialize Xmit worker on a per-connection basis. - */ - mutex_lock(&conn->xmitmutex); - if (iscsi_data_xmit(conn)) - scsi_queue_work(conn->session->host, &conn->xmitwork); - mutex_unlock(&conn->xmitmutex); -} - -#define FAILURE_BAD_HOST 1 -#define FAILURE_SESSION_FAILED 2 -#define FAILURE_SESSION_FREED 3 -#define FAILURE_WINDOW_CLOSED 4 -#define FAILURE_SESSION_TERMINATE 5 - -static int -iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *)) -{ - struct Scsi_Host *host; - int reason = 0; - struct iscsi_session *session; - struct iscsi_conn *conn = NULL; - struct iscsi_cmd_task *ctask = NULL; - - sc->scsi_done = done; - sc->result = 0; - - host = sc->device->host; - session = iscsi_hostdata(host->hostdata); - BUG_ON(host != session->host); - - spin_lock(&session->lock); - - if (session->state != ISCSI_STATE_LOGGED_IN) { - if (session->state == ISCSI_STATE_FAILED) { - reason = FAILURE_SESSION_FAILED; - goto reject; - } else if (session->state == ISCSI_STATE_TERMINATE) { - reason = FAILURE_SESSION_TERMINATE; - goto fault; - } - reason = FAILURE_SESSION_FREED; - goto fault; - } - - /* - * Check for iSCSI window and take care of CmdSN wrap-around - */ - if ((int)(session->max_cmdsn - session->cmdsn) < 0) { - reason = FAILURE_WINDOW_CLOSED; - goto reject; - } - - conn = session->leadconn; - - __kfifo_get(session->cmdpool.queue, (void*)&ctask, sizeof(void*)); - BUG_ON(ctask->sc); - - sc->SCp.phase = session->age; - sc->SCp.ptr = (char*)ctask; - iscsi_cmd_init(conn, ctask, sc); - - __kfifo_put(conn->xmitqueue, (void*)&ctask, sizeof(void*)); - debug_scsi( - "ctask enq [%s cid %d sc %lx itt 0x%x len %d cmdsn %d win %d]\n", - sc->sc_data_direction == DMA_TO_DEVICE ? "write" : "read", - conn->id, (long)sc, ctask->itt, sc->request_bufflen, - session->cmdsn, session->max_cmdsn - session->exp_cmdsn + 1); - spin_unlock(&session->lock); - - scsi_queue_work(host, &conn->xmitwork); - return 0; - -reject: - spin_unlock(&session->lock); - debug_scsi("cmd 0x%x rejected (%d)\n", sc->cmnd[0], reason); - return SCSI_MLQUEUE_HOST_BUSY; - -fault: - spin_unlock(&session->lock); - printk(KERN_ERR "iscsi_tcp: cmd 0x%x is not queued (%d)\n", - sc->cmnd[0], reason); - sc->sense_buffer[0] = 0x70; - sc->sense_buffer[2] = NOT_READY; - sc->sense_buffer[7] = 0x6; - sc->sense_buffer[12] = 0x08; - sc->sense_buffer[13] = 0x00; - sc->result = (DID_NO_CONNECT << 16); - sc->resid = sc->request_bufflen; - sc->scsi_done(sc); - return 0; -} - -static int -iscsi_change_queue_depth(struct scsi_device *sdev, int depth) -{ - if (depth > ISCSI_MAX_CMD_PER_LUN) - depth = ISCSI_MAX_CMD_PER_LUN; - scsi_adjust_queue_depth(sdev, scsi_get_tag_type(sdev), depth); - return sdev->queue_depth; -} - -static int -iscsi_pool_init(struct iscsi_queue *q, int max, void ***items, int item_size) -{ - int i; - - *items = kmalloc(max * sizeof(void*), GFP_KERNEL); - if (*items == NULL) - return -ENOMEM; - - q->max = max; - q->pool = kmalloc(max * sizeof(void*), GFP_KERNEL); - if (q->pool == NULL) { - kfree(*items); - return -ENOMEM; - } - - q->queue = kfifo_init((void*)q->pool, max * sizeof(void*), - GFP_KERNEL, NULL); - if (q->queue == ERR_PTR(-ENOMEM)) { - kfree(q->pool); - kfree(*items); - return -ENOMEM; - } - - for (i = 0; i < max; i++) { - q->pool[i] = kmalloc(item_size, GFP_KERNEL); - if (q->pool[i] == NULL) { - int j; - - for (j = 0; j < i; j++) - kfree(q->pool[j]); - - kfifo_free(q->queue); - kfree(q->pool); - kfree(*items); - return -ENOMEM; - } - memset(q->pool[i], 0, item_size); - (*items)[i] = q->pool[i]; - __kfifo_put(q->queue, (void*)&q->pool[i], sizeof(void*)); - } - return 0; -} - -static void -iscsi_pool_free(struct iscsi_queue *q, void **items) -{ - int i; - - for (i = 0; i < q->max; i++) - kfree(items[i]); - kfree(q->pool); - kfree(items); -} - static struct iscsi_cls_conn * -iscsi_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx) +iscsi_tcp_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx) { - struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); - struct iscsi_session *session = iscsi_hostdata(shost->hostdata); struct iscsi_conn *conn; struct iscsi_cls_conn *cls_conn; + struct iscsi_tcp_conn *tcp_conn; - cls_conn = iscsi_create_conn(cls_session, conn_idx); + cls_conn = iscsi_conn_setup(cls_session, conn_idx); if (!cls_conn) return NULL; conn = cls_conn->dd_data; - memset(conn, 0, sizeof(*conn)); - - conn->cls_conn = cls_conn; - conn->c_stage = ISCSI_CONN_INITIAL_STAGE; - conn->in_progress = IN_PROGRESS_WAIT_HEADER; - conn->id = conn_idx; - conn->exp_statsn = 0; - conn->tmabort_state = TMABORT_INITIAL; - - /* initial operational parameters */ - conn->hdr_size = sizeof(struct iscsi_hdr); - conn->data_size = DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH; + /* + * due to strange issues with iser these are not set + * in iscsi_conn_setup + */ conn->max_recv_dlength = DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH; - /* initialize general xmit PDU commands queue */ - conn->xmitqueue = kfifo_alloc(session->cmds_max * sizeof(void*), - GFP_KERNEL, NULL); - if (conn->xmitqueue == ERR_PTR(-ENOMEM)) - goto xmitqueue_alloc_fail; - - /* initialize write response PDU commands queue */ - conn->writequeue = kfifo_alloc(session->cmds_max * sizeof(void*), - GFP_KERNEL, NULL); - if (conn->writequeue == ERR_PTR(-ENOMEM)) - goto writequeue_alloc_fail; - - /* initialize general immediate & non-immediate PDU commands queue */ - conn->immqueue = kfifo_alloc(session->mgmtpool_max * sizeof(void*), - GFP_KERNEL, NULL); - if (conn->immqueue == ERR_PTR(-ENOMEM)) - goto immqueue_alloc_fail; + tcp_conn = kzalloc(sizeof(*tcp_conn), GFP_KERNEL); + if (!tcp_conn) + goto tcp_conn_alloc_fail; - conn->mgmtqueue = kfifo_alloc(session->mgmtpool_max * sizeof(void*), - GFP_KERNEL, NULL); - if (conn->mgmtqueue == ERR_PTR(-ENOMEM)) - goto mgmtqueue_alloc_fail; - - INIT_WORK(&conn->xmitwork, iscsi_xmitworker, conn); - - /* allocate login_mtask used for the login/text sequences */ - spin_lock_bh(&session->lock); - if (!__kfifo_get(session->mgmtpool.queue, - (void*)&conn->login_mtask, - sizeof(void*))) { - spin_unlock_bh(&session->lock); - goto login_mtask_alloc_fail; - } - spin_unlock_bh(&session->lock); + conn->dd_data = tcp_conn; + tcp_conn->iscsi_conn = conn; + tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER; + /* initial operational parameters */ + tcp_conn->hdr_size = sizeof(struct iscsi_hdr); + tcp_conn->data_size = DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH; /* allocate initial PDU receive place holder */ - if (conn->data_size <= PAGE_SIZE) - conn->data = kmalloc(conn->data_size, GFP_KERNEL); + if (tcp_conn->data_size <= PAGE_SIZE) + tcp_conn->data = kmalloc(tcp_conn->data_size, GFP_KERNEL); else - conn->data = (void*)__get_free_pages(GFP_KERNEL, - get_order(conn->data_size)); - if (!conn->data) + tcp_conn->data = (void*)__get_free_pages(GFP_KERNEL, + get_order(tcp_conn->data_size)); + if (!tcp_conn->data) goto max_recv_dlenght_alloc_fail; - init_timer(&conn->tmabort_timer); - mutex_init(&conn->xmitmutex); - init_waitqueue_head(&conn->ehwait); - return cls_conn; max_recv_dlenght_alloc_fail: - spin_lock_bh(&session->lock); - __kfifo_put(session->mgmtpool.queue, (void*)&conn->login_mtask, - sizeof(void*)); - spin_unlock_bh(&session->lock); -login_mtask_alloc_fail: - kfifo_free(conn->mgmtqueue); -mgmtqueue_alloc_fail: - kfifo_free(conn->immqueue); -immqueue_alloc_fail: - kfifo_free(conn->writequeue); -writequeue_alloc_fail: - kfifo_free(conn->xmitqueue); -xmitqueue_alloc_fail: - iscsi_destroy_conn(cls_conn); + kfree(tcp_conn); +tcp_conn_alloc_fail: + iscsi_conn_teardown(cls_conn); return NULL; } static void -iscsi_conn_destroy(struct iscsi_cls_conn *cls_conn) +iscsi_tcp_conn_destroy(struct iscsi_cls_conn *cls_conn) { struct iscsi_conn *conn = cls_conn->dd_data; - struct iscsi_session *session = conn->session; - unsigned long flags; - - mutex_lock(&conn->xmitmutex); - set_bit(SUSPEND_BIT, &conn->suspend_tx); - if (conn->c_stage == ISCSI_CONN_INITIAL_STAGE && conn->sock) { - struct sock *sk = conn->sock->sk; - - /* - * conn_start() has never been called! - * need to cleanup the socket. - */ - write_lock_bh(&sk->sk_callback_lock); - set_bit(SUSPEND_BIT, &conn->suspend_rx); - write_unlock_bh(&sk->sk_callback_lock); + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + int digest = 0; - sock_hold(conn->sock->sk); - iscsi_conn_restore_callbacks(conn); - sock_put(conn->sock->sk); - sock_release(conn->sock); - conn->sock = NULL; - } - - spin_lock_bh(&session->lock); - conn->c_stage = ISCSI_CONN_CLEANUP_WAIT; - if (session->leadconn == conn) { - /* - * leading connection? then give up on recovery. - */ - session->state = ISCSI_STATE_TERMINATE; - wake_up(&conn->ehwait); - } - spin_unlock_bh(&session->lock); - - mutex_unlock(&conn->xmitmutex); + if (conn->hdrdgst_en || conn->datadgst_en) + digest = 1; - /* - * Block until all in-progress commands for this connection - * time out or fail. - */ - for (;;) { - spin_lock_irqsave(session->host->host_lock, flags); - if (!session->host->host_busy) { /* OK for ERL == 0 */ - spin_unlock_irqrestore(session->host->host_lock, flags); - break; - } - spin_unlock_irqrestore(session->host->host_lock, flags); - msleep_interruptible(500); - printk("conn_destroy(): host_busy %d host_failed %d\n", - session->host->host_busy, session->host->host_failed); - /* - * force eh_abort() to unblock - */ - wake_up(&conn->ehwait); - } + iscsi_conn_teardown(cls_conn); - /* now free crypto */ - if (conn->hdrdgst_en || conn->datadgst_en) { - if (conn->tx_tfm) - crypto_free_tfm(conn->tx_tfm); - if (conn->rx_tfm) - crypto_free_tfm(conn->rx_tfm); - if (conn->data_tx_tfm) - crypto_free_tfm(conn->data_tx_tfm); - if (conn->data_rx_tfm) - crypto_free_tfm(conn->data_rx_tfm); + /* now free tcp_conn */ + if (digest) { + if (tcp_conn->tx_tfm) + crypto_free_tfm(tcp_conn->tx_tfm); + if (tcp_conn->rx_tfm) + crypto_free_tfm(tcp_conn->rx_tfm); + if (tcp_conn->data_tx_tfm) + crypto_free_tfm(tcp_conn->data_tx_tfm); + if (tcp_conn->data_rx_tfm) + crypto_free_tfm(tcp_conn->data_rx_tfm); } /* free conn->data, size = MaxRecvDataSegmentLength */ - if (conn->data_size <= PAGE_SIZE) - kfree(conn->data); + if (tcp_conn->data_size <= PAGE_SIZE) + kfree(tcp_conn->data); else - free_pages((unsigned long)conn->data, - get_order(conn->data_size)); - - spin_lock_bh(&session->lock); - __kfifo_put(session->mgmtpool.queue, (void*)&conn->login_mtask, - sizeof(void*)); - list_del(&conn->item); - if (list_empty(&session->connections)) - session->leadconn = NULL; - if (session->leadconn && session->leadconn == conn) - session->leadconn = container_of(session->connections.next, - struct iscsi_conn, item); - - if (session->leadconn == NULL) - /* none connections exits.. reset sequencing */ - session->cmdsn = session->max_cmdsn = session->exp_cmdsn = 1; - spin_unlock_bh(&session->lock); - - kfifo_free(conn->xmitqueue); - kfifo_free(conn->writequeue); - kfifo_free(conn->immqueue); - kfifo_free(conn->mgmtqueue); - - iscsi_destroy_conn(cls_conn); + free_pages((unsigned long)tcp_conn->data, + get_order(tcp_conn->data_size)); + kfree(tcp_conn); } static int -iscsi_conn_bind(struct iscsi_cls_session *cls_session, - struct iscsi_cls_conn *cls_conn, uint32_t transport_fd, - int is_leading) +iscsi_tcp_conn_bind(struct iscsi_cls_session *cls_session, + struct iscsi_cls_conn *cls_conn, uint64_t transport_eph, + int is_leading) { - struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); - struct iscsi_session *session = iscsi_hostdata(shost->hostdata); - struct iscsi_conn *tmp = ERR_PTR(-EEXIST), *conn = cls_conn->dd_data; + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; struct sock *sk; struct socket *sock; int err; /* lookup for existing socket */ - sock = sockfd_lookup(transport_fd, &err); + sock = sockfd_lookup((int)transport_eph, &err); if (!sock) { printk(KERN_ERR "iscsi_tcp: sockfd_lookup failed %d\n", err); return -EEXIST; } - /* lookup for existing connection */ - spin_lock_bh(&session->lock); - list_for_each_entry(tmp, &session->connections, item) { - if (tmp == conn) { - if (conn->c_stage != ISCSI_CONN_STOPPED || - conn->stop_stage == STOP_CONN_TERM) { - printk(KERN_ERR "iscsi_tcp: can't bind " - "non-stopped connection (%d:%d)\n", - conn->c_stage, conn->stop_stage); - spin_unlock_bh(&session->lock); - return -EIO; - } - break; - } - } - if (tmp != conn) { - /* bind new iSCSI connection to session */ - conn->session = session; - - list_add(&conn->item, &session->connections); - } - spin_unlock_bh(&session->lock); + err = iscsi_conn_bind(cls_session, cls_conn, is_leading); + if (err) + return err; if (conn->stop_stage != STOP_CONN_SUSPEND) { /* bind iSCSI connection and socket */ - conn->sock = sock; + tcp_conn->sock = sock; /* setup Socket parameters */ sk = sock->sk; @@ -2679,488 +2013,78 @@ iscsi_conn_bind(struct iscsi_cls_session * Intercept TCP callbacks for sendfile like receive * processing. */ + conn->recv_lock = &sk->sk_callback_lock; iscsi_conn_set_callbacks(conn); - - conn->sendpage = conn->sock->ops->sendpage; - + tcp_conn->sendpage = tcp_conn->sock->ops->sendpage; /* * set receive state machine into initial state */ - conn->in_progress = IN_PROGRESS_WAIT_HEADER; + tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER; } - if (is_leading) - session->leadconn = conn; - - /* - * Unblock xmitworker(), Login Phase will pass through. - */ - clear_bit(SUSPEND_BIT, &conn->suspend_rx); - clear_bit(SUSPEND_BIT, &conn->suspend_tx); - return 0; } -static int -iscsi_conn_start(struct iscsi_cls_conn *cls_conn) +static void +iscsi_tcp_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) { - struct iscsi_conn *conn = cls_conn->dd_data; - struct iscsi_session *session = conn->session; - struct sock *sk; - - /* FF phase warming up... */ - - if (session == NULL) { - printk(KERN_ERR "iscsi_tcp: can't start unbound connection\n"); - return -EPERM; - } + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + struct iscsi_r2t_info *r2t; - sk = conn->sock->sk; + /* flush ctask's r2t queues */ + while (__kfifo_get(tcp_ctask->r2tqueue, (void*)&r2t, sizeof(void*))) + __kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t, + sizeof(void*)); - write_lock_bh(&sk->sk_callback_lock); - spin_lock_bh(&session->lock); - conn->c_stage = ISCSI_CONN_STARTED; - session->state = ISCSI_STATE_LOGGED_IN; - - switch(conn->stop_stage) { - case STOP_CONN_RECOVER: - /* - * unblock eh_abort() if it is blocked. re-try all - * commands after successful recovery - */ - session->conn_cnt++; - conn->stop_stage = 0; - conn->tmabort_state = TMABORT_INITIAL; - session->age++; - wake_up(&conn->ehwait); - break; - case STOP_CONN_TERM: - session->conn_cnt++; - conn->stop_stage = 0; - break; - case STOP_CONN_SUSPEND: - conn->stop_stage = 0; - clear_bit(SUSPEND_BIT, &conn->suspend_rx); - clear_bit(SUSPEND_BIT, &conn->suspend_tx); - break; - default: - break; - } - spin_unlock_bh(&session->lock); - write_unlock_bh(&sk->sk_callback_lock); - - return 0; + __iscsi_ctask_cleanup(conn, ctask); } static void -iscsi_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) +iscsi_tcp_suspend_conn_rx(struct iscsi_conn *conn) { - struct iscsi_conn *conn = cls_conn->dd_data; - struct iscsi_session *session = conn->session; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; struct sock *sk; - unsigned long flags; - BUG_ON(!conn->sock); - sk = conn->sock->sk; + if (!tcp_conn->sock) + return; + + sk = tcp_conn->sock->sk; write_lock_bh(&sk->sk_callback_lock); - set_bit(SUSPEND_BIT, &conn->suspend_rx); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx); write_unlock_bh(&sk->sk_callback_lock); - - mutex_lock(&conn->xmitmutex); - - spin_lock_irqsave(session->host->host_lock, flags); - spin_lock(&session->lock); - conn->stop_stage = flag; - conn->c_stage = ISCSI_CONN_STOPPED; - set_bit(SUSPEND_BIT, &conn->suspend_tx); - - if (flag != STOP_CONN_SUSPEND) - session->conn_cnt--; - - if (session->conn_cnt == 0 || session->leadconn == conn) - session->state = ISCSI_STATE_FAILED; - - spin_unlock(&session->lock); - spin_unlock_irqrestore(session->host->host_lock, flags); - - if (flag == STOP_CONN_TERM || flag == STOP_CONN_RECOVER) { - struct iscsi_cmd_task *ctask; - struct iscsi_mgmt_task *mtask; - - /* - * Socket must go now. - */ - sock_hold(conn->sock->sk); - iscsi_conn_restore_callbacks(conn); - sock_put(conn->sock->sk); - - /* - * flush xmit queues. - */ - spin_lock_bh(&session->lock); - while (__kfifo_get(conn->writequeue, (void*)&ctask, - sizeof(void*)) || - __kfifo_get(conn->xmitqueue, (void*)&ctask, - sizeof(void*))) { - struct iscsi_r2t_info *r2t; - - /* - * flush ctask's r2t queues - */ - while (__kfifo_get(ctask->r2tqueue, (void*)&r2t, - sizeof(void*))) - __kfifo_put(ctask->r2tpool.queue, (void*)&r2t, - sizeof(void*)); - - spin_unlock_bh(&session->lock); - local_bh_disable(); - iscsi_ctask_cleanup(conn, ctask); - local_bh_enable(); - spin_lock_bh(&session->lock); - } - conn->ctask = NULL; - while (__kfifo_get(conn->immqueue, (void*)&mtask, - sizeof(void*)) || - __kfifo_get(conn->mgmtqueue, (void*)&mtask, - sizeof(void*))) { - __kfifo_put(session->mgmtpool.queue, - (void*)&mtask, sizeof(void*)); - } - conn->mtask = NULL; - spin_unlock_bh(&session->lock); - - /* - * release socket only after we stopped data_xmit() - * activity and flushed all outstandings - */ - sock_release(conn->sock); - conn->sock = NULL; - - /* - * for connection level recovery we should not calculate - * header digest. conn->hdr_size used for optimization - * in hdr_extract() and will be re-negotiated at - * set_param() time. - */ - if (flag == STOP_CONN_RECOVER) { - conn->hdr_size = sizeof(struct iscsi_hdr); - conn->hdrdgst_en = 0; - conn->datadgst_en = 0; - } - } - mutex_unlock(&conn->xmitmutex); -} - -static int -iscsi_conn_send_generic(struct iscsi_conn *conn, struct iscsi_hdr *hdr, - char *data, uint32_t data_size) -{ - struct iscsi_session *session = conn->session; - struct iscsi_nopout *nop = (struct iscsi_nopout *)hdr; - struct iscsi_mgmt_task *mtask; - - spin_lock_bh(&session->lock); - if (session->state == ISCSI_STATE_TERMINATE) { - spin_unlock_bh(&session->lock); - return -EPERM; - } - if (hdr->opcode == (ISCSI_OP_LOGIN | ISCSI_OP_IMMEDIATE) || - hdr->opcode == (ISCSI_OP_TEXT | ISCSI_OP_IMMEDIATE)) - /* - * Login and Text are sent serially, in - * request-followed-by-response sequence. - * Same mtask can be used. Same ITT must be used. - * Note that login_mtask is preallocated at conn_create(). - */ - mtask = conn->login_mtask; - else { - BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE); - BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED); - - if (!__kfifo_get(session->mgmtpool.queue, - (void*)&mtask, sizeof(void*))) { - spin_unlock_bh(&session->lock); - return -ENOSPC; - } - } - - /* - * pre-format CmdSN and ExpStatSN for outgoing PDU. - */ - if (hdr->itt != cpu_to_be32(ISCSI_RESERVED_TAG)) { - hdr->itt = mtask->itt | (conn->id << CID_SHIFT) | - (session->age << AGE_SHIFT); - nop->cmdsn = cpu_to_be32(session->cmdsn); - if (conn->c_stage == ISCSI_CONN_STARTED && - !(hdr->opcode & ISCSI_OP_IMMEDIATE)) - session->cmdsn++; - } else - /* do not advance CmdSN */ - nop->cmdsn = cpu_to_be32(session->cmdsn); - - nop->exp_statsn = cpu_to_be32(conn->exp_statsn); - - memcpy(&mtask->hdr, hdr, sizeof(struct iscsi_hdr)); - - iscsi_buf_init_virt(&mtask->headbuf, (char*)&mtask->hdr, - sizeof(struct iscsi_hdr)); - - spin_unlock_bh(&session->lock); - - if (data_size) { - memcpy(mtask->data, data, data_size); - mtask->data_count = data_size; - } else - mtask->data_count = 0; - - mtask->xmstate = XMSTATE_IMM_HDR; - - if (mtask->data_count) { - iscsi_buf_init_iov(&mtask->sendbuf, (char*)mtask->data, - mtask->data_count); - } - - debug_scsi("mgmtpdu [op 0x%x hdr->itt 0x%x datalen %d]\n", - hdr->opcode, hdr->itt, data_size); - - /* - * since send_pdu() could be called at least from two contexts, - * we need to serialize __kfifo_put, so we don't have to take - * additional lock on fast data-path - */ - if (hdr->opcode & ISCSI_OP_IMMEDIATE) - __kfifo_put(conn->immqueue, (void*)&mtask, sizeof(void*)); - else - __kfifo_put(conn->mgmtqueue, (void*)&mtask, sizeof(void*)); - - scsi_queue_work(session->host, &conn->xmitwork); - return 0; } -static int -iscsi_eh_host_reset(struct scsi_cmnd *sc) +static void +iscsi_tcp_terminate_conn(struct iscsi_conn *conn) { - struct iscsi_cmd_task *ctask = (struct iscsi_cmd_task *)sc->SCp.ptr; - struct iscsi_conn *conn = ctask->conn; - struct iscsi_session *session = conn->session; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; - spin_lock_bh(&session->lock); - if (session->state == ISCSI_STATE_TERMINATE) { - debug_scsi("failing host reset: session terminated " - "[CID %d age %d]", conn->id, session->age); - spin_unlock_bh(&session->lock); - return FAILED; - } - spin_unlock_bh(&session->lock); + if (!tcp_conn->sock) + return; - debug_scsi("failing connection CID %d due to SCSI host reset " - "[itt 0x%x age %d]", conn->id, ctask->itt, - session->age); - iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + sock_hold(tcp_conn->sock->sk); + iscsi_conn_restore_callbacks(conn); + sock_put(tcp_conn->sock->sk); - return SUCCESS; + sock_release(tcp_conn->sock); + tcp_conn->sock = NULL; + conn->recv_lock = NULL; } +/* called with host lock */ static void -iscsi_tmabort_timedout(unsigned long data) -{ - struct iscsi_cmd_task *ctask = (struct iscsi_cmd_task *)data; - struct iscsi_conn *conn = ctask->conn; - struct iscsi_session *session = conn->session; - - spin_lock(&session->lock); - if (conn->tmabort_state == TMABORT_INITIAL) { - __kfifo_put(session->mgmtpool.queue, - (void*)&ctask->mtask, sizeof(void*)); - conn->tmabort_state = TMABORT_TIMEDOUT; - debug_scsi("tmabort timedout [sc %lx itt 0x%x]\n", - (long)ctask->sc, ctask->itt); - /* unblock eh_abort() */ - wake_up(&conn->ehwait); - } - spin_unlock(&session->lock); -} - -static int -iscsi_eh_abort(struct scsi_cmnd *sc) +iscsi_tcp_mgmt_init(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask, + char *data, uint32_t data_size) { - int rc; - struct iscsi_cmd_task *ctask = (struct iscsi_cmd_task *)sc->SCp.ptr; - struct iscsi_conn *conn = ctask->conn; - struct iscsi_session *session = conn->session; - - conn->eh_abort_cnt++; - debug_scsi("aborting [sc %lx itt 0x%x]\n", (long)sc, ctask->itt); - - /* - * two cases for ERL=0 here: - * - * 1) connection-level failure; - * 2) recovery due protocol error; - */ - mutex_lock(&conn->xmitmutex); - spin_lock_bh(&session->lock); - if (session->state != ISCSI_STATE_LOGGED_IN) { - if (session->state == ISCSI_STATE_TERMINATE) { - spin_unlock_bh(&session->lock); - mutex_unlock(&conn->xmitmutex); - goto failed; - } - spin_unlock_bh(&session->lock); - } else { - struct iscsi_tm *hdr = &conn->tmhdr; - - /* - * Still LOGGED_IN... - */ - - if (!ctask->sc || sc->SCp.phase != session->age) { - /* - * 1) ctask completed before time out. But session - * is still ok => Happy Retry. - * 2) session was re-open during time out of ctask. - */ - spin_unlock_bh(&session->lock); - mutex_unlock(&conn->xmitmutex); - goto success; - } - conn->tmabort_state = TMABORT_INITIAL; - spin_unlock_bh(&session->lock); - - /* - * ctask timed out but session is OK - * ERL=0 requires task mgmt abort to be issued on each - * failed command. requests must be serialized. - */ - memset(hdr, 0, sizeof(struct iscsi_tm)); - hdr->opcode = ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE; - hdr->flags = ISCSI_TM_FUNC_ABORT_TASK; - hdr->flags |= ISCSI_FLAG_CMD_FINAL; - memcpy(hdr->lun, ctask->hdr.lun, sizeof(hdr->lun)); - hdr->rtt = ctask->hdr.itt; - hdr->refcmdsn = ctask->hdr.cmdsn; - - rc = iscsi_conn_send_generic(conn, (struct iscsi_hdr *)hdr, - NULL, 0); - if (rc) { - iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); - debug_scsi("abort sent failure [itt 0x%x]", ctask->itt); - } else { - struct iscsi_r2t_info *r2t; - - /* - * TMF abort vs. TMF response race logic - */ - spin_lock_bh(&session->lock); - ctask->mtask = (struct iscsi_mgmt_task *) - session->mgmt_cmds[(hdr->itt & ITT_MASK) - - ISCSI_MGMT_ITT_OFFSET]; - /* - * have to flush r2tqueue to avoid r2t leaks - */ - while (__kfifo_get(ctask->r2tqueue, (void*)&r2t, - sizeof(void*))) { - __kfifo_put(ctask->r2tpool.queue, (void*)&r2t, - sizeof(void*)); - } - if (conn->tmabort_state == TMABORT_INITIAL) { - conn->tmfcmd_pdus_cnt++; - conn->tmabort_timer.expires = 3*HZ + jiffies; - conn->tmabort_timer.function = - iscsi_tmabort_timedout; - conn->tmabort_timer.data = (unsigned long)ctask; - add_timer(&conn->tmabort_timer); - debug_scsi("abort sent [itt 0x%x]", ctask->itt); - } else { - if (!ctask->sc || - conn->tmabort_state == TMABORT_SUCCESS) { - conn->tmabort_state = TMABORT_INITIAL; - spin_unlock_bh(&session->lock); - mutex_unlock(&conn->xmitmutex); - goto success; - } - conn->tmabort_state = TMABORT_INITIAL; - iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); - } - spin_unlock_bh(&session->lock); - } - } - mutex_unlock(&conn->xmitmutex); - + struct iscsi_tcp_mgmt_task *tcp_mtask = mtask->dd_data; - /* - * block eh thread until: - * - * 1) abort response; - * 2) abort timeout; - * 3) session re-opened; - * 4) session terminated; - */ - for (;;) { - int p_state = session->state; - - rc = wait_event_interruptible(conn->ehwait, - (p_state == ISCSI_STATE_LOGGED_IN ? - (session->state == ISCSI_STATE_TERMINATE || - conn->tmabort_state != TMABORT_INITIAL) : - (session->state == ISCSI_STATE_TERMINATE || - session->state == ISCSI_STATE_LOGGED_IN))); - if (rc) { - /* shutdown.. */ - session->state = ISCSI_STATE_TERMINATE; - goto failed; - } - - if (signal_pending(current)) - flush_signals(current); - - if (session->state == ISCSI_STATE_TERMINATE) - goto failed; - - spin_lock_bh(&session->lock); - if (sc->SCp.phase == session->age && - (conn->tmabort_state == TMABORT_TIMEDOUT || - conn->tmabort_state == TMABORT_FAILED)) { - conn->tmabort_state = TMABORT_INITIAL; - if (!ctask->sc) { - /* - * ctask completed before tmf abort response or - * time out. - * But session is still ok => Happy Retry. - */ - spin_unlock_bh(&session->lock); - break; - } - spin_unlock_bh(&session->lock); - iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); - continue; - } - spin_unlock_bh(&session->lock); - break; - } - -success: - debug_scsi("abort success [sc %lx itt 0x%x]\n", (long)sc, ctask->itt); - rc = SUCCESS; - goto exit; - -failed: - debug_scsi("abort failed [sc %lx itt 0x%x]\n", (long)sc, ctask->itt); - rc = FAILED; - -exit: - del_timer_sync(&conn->tmabort_timer); - - mutex_lock(&conn->xmitmutex); - if (conn->sock) { - struct sock *sk = conn->sock->sk; + iscsi_buf_init_virt(&tcp_mtask->headbuf, (char*)mtask->hdr, + sizeof(struct iscsi_hdr)); + tcp_mtask->xmstate = XMSTATE_IMM_HDR; - write_lock_bh(&sk->sk_callback_lock); - iscsi_ctask_cleanup(conn, ctask); - write_unlock_bh(&sk->sk_callback_lock); - } - mutex_unlock(&conn->xmitmutex); - return rc; + if (mtask->data_count) + iscsi_buf_init_iov(&tcp_mtask->sendbuf, (char*)mtask->data, + mtask->data_count); } static int @@ -3174,6 +2098,7 @@ iscsi_r2tpool_alloc(struct iscsi_session */ for (cmd_i = 0; cmd_i < session->cmds_max; cmd_i++) { struct iscsi_cmd_task *ctask = session->cmds[cmd_i]; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; /* * pre-allocated x4 as much r2ts to handle race when @@ -3182,16 +2107,18 @@ iscsi_r2tpool_alloc(struct iscsi_session */ /* R2T pool */ - if (iscsi_pool_init(&ctask->r2tpool, session->max_r2t * 4, - (void***)&ctask->r2ts, sizeof(struct iscsi_r2t_info))) { + if (iscsi_pool_init(&tcp_ctask->r2tpool, session->max_r2t * 4, + (void***)&tcp_ctask->r2ts, + sizeof(struct iscsi_r2t_info))) { goto r2t_alloc_fail; } /* R2T xmit queue */ - ctask->r2tqueue = kfifo_alloc( + tcp_ctask->r2tqueue = kfifo_alloc( session->max_r2t * 4 * sizeof(void*), GFP_KERNEL, NULL); - if (ctask->r2tqueue == ERR_PTR(-ENOMEM)) { - iscsi_pool_free(&ctask->r2tpool, (void**)ctask->r2ts); + if (tcp_ctask->r2tqueue == ERR_PTR(-ENOMEM)) { + iscsi_pool_free(&tcp_ctask->r2tpool, + (void**)tcp_ctask->r2ts); goto r2t_alloc_fail; } @@ -3200,24 +2127,28 @@ iscsi_r2tpool_alloc(struct iscsi_session * Data-Out PDU's within R2T-sequence can be quite big; * using mempool */ - ctask->datapool = mempool_create_slab_pool(ISCSI_DTASK_DEFAULT_MAX, - taskcache); - if (ctask->datapool == NULL) { - kfifo_free(ctask->r2tqueue); - iscsi_pool_free(&ctask->r2tpool, (void**)ctask->r2ts); + tcp_ctask->datapool = mempool_create_slab_pool(ISCSI_DTASK_DEFAULT_MAX, + taskcache); + if (tcp_ctask->datapool == NULL) { + kfifo_free(tcp_ctask->r2tqueue); + iscsi_pool_free(&tcp_ctask->r2tpool, + (void**)tcp_ctask->r2ts); goto r2t_alloc_fail; } - INIT_LIST_HEAD(&ctask->dataqueue); + INIT_LIST_HEAD(&tcp_ctask->dataqueue); } return 0; r2t_alloc_fail: for (i = 0; i < cmd_i; i++) { - mempool_destroy(session->cmds[i]->datapool); - kfifo_free(session->cmds[i]->r2tqueue); - iscsi_pool_free(&session->cmds[i]->r2tpool, - (void**)session->cmds[i]->r2ts); + struct iscsi_cmd_task *ctask = session->cmds[i]; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + + mempool_destroy(tcp_ctask->datapool); + kfifo_free(tcp_ctask->r2tqueue); + iscsi_pool_free(&tcp_ctask->r2tpool, + (void**)tcp_ctask->r2ts); } return -ENOMEM; } @@ -3228,127 +2159,14 @@ iscsi_r2tpool_free(struct iscsi_session int i; for (i = 0; i < session->cmds_max; i++) { - mempool_destroy(session->cmds[i]->datapool); - kfifo_free(session->cmds[i]->r2tqueue); - iscsi_pool_free(&session->cmds[i]->r2tpool, - (void**)session->cmds[i]->r2ts); - } -} + struct iscsi_cmd_task *ctask = session->cmds[i]; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; -static struct scsi_host_template iscsi_sht = { - .name = "iSCSI Initiator over TCP/IP, v." - ISCSI_VERSION_STR, - .queuecommand = iscsi_queuecommand, - .change_queue_depth = iscsi_change_queue_depth, - .can_queue = ISCSI_XMIT_CMDS_MAX - 1, - .sg_tablesize = ISCSI_SG_TABLESIZE, - .cmd_per_lun = ISCSI_DEF_CMD_PER_LUN, - .eh_abort_handler = iscsi_eh_abort, - .eh_host_reset_handler = iscsi_eh_host_reset, - .use_clustering = DISABLE_CLUSTERING, - .proc_name = "iscsi_tcp", - .this_id = -1, -}; - -static struct iscsi_transport iscsi_tcp_transport; - -static struct iscsi_cls_session * -iscsi_session_create(struct scsi_transport_template *scsit, - uint32_t initial_cmdsn, uint32_t *sid) -{ - struct Scsi_Host *shost; - struct iscsi_session *session; - int cmd_i; - - shost = iscsi_transport_create_session(scsit, &iscsi_tcp_transport); - if (!shost) - return NULL; - - session = iscsi_hostdata(shost->hostdata); - memset(session, 0, sizeof(struct iscsi_session)); - session->host = shost; - session->state = ISCSI_STATE_FREE; - session->mgmtpool_max = ISCSI_MGMT_CMDS_MAX; - session->cmds_max = ISCSI_XMIT_CMDS_MAX; - session->cmdsn = initial_cmdsn; - session->exp_cmdsn = initial_cmdsn + 1; - session->max_cmdsn = initial_cmdsn + 1; - session->max_r2t = 1; - *sid = shost->host_no; - - /* initialize SCSI PDU commands pool */ - if (iscsi_pool_init(&session->cmdpool, session->cmds_max, - (void***)&session->cmds, sizeof(struct iscsi_cmd_task))) - goto cmdpool_alloc_fail; - - /* pre-format cmds pool with ITT */ - for (cmd_i = 0; cmd_i < session->cmds_max; cmd_i++) - session->cmds[cmd_i]->itt = cmd_i; - - spin_lock_init(&session->lock); - INIT_LIST_HEAD(&session->connections); - - /* initialize immediate command pool */ - if (iscsi_pool_init(&session->mgmtpool, session->mgmtpool_max, - (void***)&session->mgmt_cmds, sizeof(struct iscsi_mgmt_task))) - goto mgmtpool_alloc_fail; - - - /* pre-format immediate cmds pool with ITT */ - for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) { - session->mgmt_cmds[cmd_i]->itt = ISCSI_MGMT_ITT_OFFSET + cmd_i; - session->mgmt_cmds[cmd_i]->data = kmalloc( - DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH, GFP_KERNEL); - if (!session->mgmt_cmds[cmd_i]->data) { - int j; - - for (j = 0; j < cmd_i; j++) - kfree(session->mgmt_cmds[j]->data); - goto immdata_alloc_fail; - } - } - - if (iscsi_r2tpool_alloc(session)) - goto r2tpool_alloc_fail; - - return hostdata_session(shost->hostdata); - -r2tpool_alloc_fail: - for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) - kfree(session->mgmt_cmds[cmd_i]->data); -immdata_alloc_fail: - iscsi_pool_free(&session->mgmtpool, (void**)session->mgmt_cmds); -mgmtpool_alloc_fail: - iscsi_pool_free(&session->cmdpool, (void**)session->cmds); -cmdpool_alloc_fail: - iscsi_transport_destroy_session(shost); - return NULL; -} - -static void -iscsi_session_destroy(struct iscsi_cls_session *cls_session) -{ - struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); - struct iscsi_session *session = iscsi_hostdata(shost->hostdata); - int cmd_i; - struct iscsi_data_task *dtask, *n; - - for (cmd_i = 0; cmd_i < session->cmds_max; cmd_i++) { - struct iscsi_cmd_task *ctask = session->cmds[cmd_i]; - list_for_each_entry_safe(dtask, n, &ctask->dataqueue, item) { - list_del(&dtask->item); - mempool_free(dtask, ctask->datapool); - } + mempool_destroy(tcp_ctask->datapool); + kfifo_free(tcp_ctask->r2tqueue); + iscsi_pool_free(&tcp_ctask->r2tpool, + (void**)tcp_ctask->r2ts); } - - for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) - kfree(session->mgmt_cmds[cmd_i]->data); - - iscsi_r2tpool_free(session); - iscsi_pool_free(&session->mgmtpool, (void**)session->mgmt_cmds); - iscsi_pool_free(&session->cmdpool, (void**)session->cmds); - - iscsi_transport_destroy_session(shost); } static int @@ -3357,23 +2175,14 @@ iscsi_conn_set_param(struct iscsi_cls_co { struct iscsi_conn *conn = cls_conn->dd_data; struct iscsi_session *session = conn->session; - - spin_lock_bh(&session->lock); - if (conn->c_stage != ISCSI_CONN_INITIAL_STAGE && - conn->stop_stage != STOP_CONN_RECOVER) { - printk(KERN_ERR "iscsi_tcp: can not change parameter [%d]\n", - param); - spin_unlock_bh(&session->lock); - return 0; - } - spin_unlock_bh(&session->lock); + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; switch(param) { case ISCSI_PARAM_MAX_RECV_DLENGTH: { - char *saveptr = conn->data; + char *saveptr = tcp_conn->data; gfp_t flags = GFP_KERNEL; - if (conn->data_size >= value) { + if (tcp_conn->data_size >= value) { conn->max_recv_dlength = value; break; } @@ -3384,21 +2193,21 @@ iscsi_conn_set_param(struct iscsi_cls_co spin_unlock_bh(&session->lock); if (value <= PAGE_SIZE) - conn->data = kmalloc(value, flags); + tcp_conn->data = kmalloc(value, flags); else - conn->data = (void*)__get_free_pages(flags, + tcp_conn->data = (void*)__get_free_pages(flags, get_order(value)); - if (conn->data == NULL) { - conn->data = saveptr; + if (tcp_conn->data == NULL) { + tcp_conn->data = saveptr; return -ENOMEM; } - if (conn->data_size <= PAGE_SIZE) + if (tcp_conn->data_size <= PAGE_SIZE) kfree(saveptr); else free_pages((unsigned long)saveptr, - get_order(conn->data_size)); + get_order(tcp_conn->data_size)); conn->max_recv_dlength = value; - conn->data_size = value; + tcp_conn->data_size = value; } break; case ISCSI_PARAM_MAX_XMIT_DLENGTH: @@ -3406,49 +2215,51 @@ iscsi_conn_set_param(struct iscsi_cls_co break; case ISCSI_PARAM_HDRDGST_EN: conn->hdrdgst_en = value; - conn->hdr_size = sizeof(struct iscsi_hdr); + tcp_conn->hdr_size = sizeof(struct iscsi_hdr); if (conn->hdrdgst_en) { - conn->hdr_size += sizeof(__u32); - if (!conn->tx_tfm) - conn->tx_tfm = crypto_alloc_tfm("crc32c", 0); - if (!conn->tx_tfm) + tcp_conn->hdr_size += sizeof(__u32); + if (!tcp_conn->tx_tfm) + tcp_conn->tx_tfm = crypto_alloc_tfm("crc32c", + 0); + if (!tcp_conn->tx_tfm) return -ENOMEM; - if (!conn->rx_tfm) - conn->rx_tfm = crypto_alloc_tfm("crc32c", 0); - if (!conn->rx_tfm) { - crypto_free_tfm(conn->tx_tfm); + if (!tcp_conn->rx_tfm) + tcp_conn->rx_tfm = crypto_alloc_tfm("crc32c", + 0); + if (!tcp_conn->rx_tfm) { + crypto_free_tfm(tcp_conn->tx_tfm); return -ENOMEM; } } else { - if (conn->tx_tfm) - crypto_free_tfm(conn->tx_tfm); - if (conn->rx_tfm) - crypto_free_tfm(conn->rx_tfm); + if (tcp_conn->tx_tfm) + crypto_free_tfm(tcp_conn->tx_tfm); + if (tcp_conn->rx_tfm) + crypto_free_tfm(tcp_conn->rx_tfm); } break; case ISCSI_PARAM_DATADGST_EN: conn->datadgst_en = value; if (conn->datadgst_en) { - if (!conn->data_tx_tfm) - conn->data_tx_tfm = + if (!tcp_conn->data_tx_tfm) + tcp_conn->data_tx_tfm = crypto_alloc_tfm("crc32c", 0); - if (!conn->data_tx_tfm) + if (!tcp_conn->data_tx_tfm) return -ENOMEM; - if (!conn->data_rx_tfm) - conn->data_rx_tfm = + if (!tcp_conn->data_rx_tfm) + tcp_conn->data_rx_tfm = crypto_alloc_tfm("crc32c", 0); - if (!conn->data_rx_tfm) { - crypto_free_tfm(conn->data_tx_tfm); + if (!tcp_conn->data_rx_tfm) { + crypto_free_tfm(tcp_conn->data_tx_tfm); return -ENOMEM; } } else { - if (conn->data_tx_tfm) - crypto_free_tfm(conn->data_tx_tfm); - if (conn->data_rx_tfm) - crypto_free_tfm(conn->data_rx_tfm); + if (tcp_conn->data_tx_tfm) + crypto_free_tfm(tcp_conn->data_tx_tfm); + if (tcp_conn->data_rx_tfm) + crypto_free_tfm(tcp_conn->data_rx_tfm); } - conn->sendpage = conn->datadgst_en ? - sock_no_sendpage : conn->sock->ops->sendpage; + tcp_conn->sendpage = conn->datadgst_en ? + sock_no_sendpage : tcp_conn->sock->ops->sendpage; break; case ISCSI_PARAM_INITIAL_R2T_EN: session->initial_r2t_en = value; @@ -3489,6 +2300,9 @@ iscsi_conn_set_param(struct iscsi_cls_co BUG_ON(value); session->ofmarker_en = value; break; + case ISCSI_PARAM_EXP_STATSN: + conn->exp_statsn = value; + break; default: break; } @@ -3535,7 +2349,7 @@ iscsi_session_get_param(struct iscsi_cls *value = session->ofmarker_en; break; default: - return ISCSI_ERR_PARAM_NOT_FOUND; + return -EINVAL; } return 0; @@ -3546,6 +2360,8 @@ iscsi_conn_get_param(struct iscsi_cls_co enum iscsi_param param, uint32_t *value) { struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct inet_sock *inet; switch(param) { case ISCSI_PARAM_MAX_RECV_DLENGTH: @@ -3560,17 +2376,70 @@ iscsi_conn_get_param(struct iscsi_cls_co case ISCSI_PARAM_DATADGST_EN: *value = conn->datadgst_en; break; + case ISCSI_PARAM_CONN_PORT: + mutex_lock(&conn->xmitmutex); + if (!tcp_conn->sock) { + mutex_unlock(&conn->xmitmutex); + return -EINVAL; + } + + inet = inet_sk(tcp_conn->sock->sk); + *value = be16_to_cpu(inet->dport); + mutex_unlock(&conn->xmitmutex); + case ISCSI_PARAM_EXP_STATSN: + *value = conn->exp_statsn; + break; default: - return ISCSI_ERR_PARAM_NOT_FOUND; + return -EINVAL; } return 0; } +static int +iscsi_conn_get_str_param(struct iscsi_cls_conn *cls_conn, + enum iscsi_param param, char *buf) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; + struct sock *sk; + struct inet_sock *inet; + struct ipv6_pinfo *np; + int len = 0; + + switch (param) { + case ISCSI_PARAM_CONN_ADDRESS: + mutex_lock(&conn->xmitmutex); + if (!tcp_conn->sock) { + mutex_unlock(&conn->xmitmutex); + return -EINVAL; + } + + sk = tcp_conn->sock->sk; + if (sk->sk_family == PF_INET) { + inet = inet_sk(sk); + len = sprintf(buf, "%u.%u.%u.%u\n", + NIPQUAD(inet->daddr)); + } else { + np = inet6_sk(sk); + len = sprintf(buf, + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + NIP6(np->daddr)); + } + mutex_unlock(&conn->xmitmutex); + break; + default: + return -EINVAL; + } + + return len; +} + static void iscsi_conn_get_stats(struct iscsi_cls_conn *cls_conn, struct iscsi_stats *stats) { struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_tcp_conn *tcp_conn = conn->dd_data; stats->txdata_octets = conn->txdata_octets; stats->rxdata_octets = conn->rxdata_octets; @@ -3583,56 +2452,150 @@ iscsi_conn_get_stats(struct iscsi_cls_co stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt; stats->custom_length = 3; strcpy(stats->custom[0].desc, "tx_sendpage_failures"); - stats->custom[0].value = conn->sendpage_failures_cnt; + stats->custom[0].value = tcp_conn->sendpage_failures_cnt; strcpy(stats->custom[1].desc, "rx_discontiguous_hdr"); - stats->custom[1].value = conn->discontiguous_hdr_cnt; + stats->custom[1].value = tcp_conn->discontiguous_hdr_cnt; strcpy(stats->custom[2].desc, "eh_abort_cnt"); stats->custom[2].value = conn->eh_abort_cnt; } -static int -iscsi_conn_send_pdu(struct iscsi_cls_conn *cls_conn, struct iscsi_hdr *hdr, - char *data, uint32_t data_size) +static struct iscsi_cls_session * +iscsi_tcp_session_create(struct iscsi_transport *iscsit, + struct scsi_transport_template *scsit, + uint32_t initial_cmdsn, uint32_t *hostno) { - struct iscsi_conn *conn = cls_conn->dd_data; - int rc; + struct iscsi_cls_session *cls_session; + struct iscsi_session *session; + uint32_t hn; + int cmd_i; - mutex_lock(&conn->xmitmutex); - rc = iscsi_conn_send_generic(conn, hdr, data, data_size); - mutex_unlock(&conn->xmitmutex); + cls_session = iscsi_session_setup(iscsit, scsit, + sizeof(struct iscsi_tcp_cmd_task), + sizeof(struct iscsi_tcp_mgmt_task), + initial_cmdsn, &hn); + if (!cls_session) + return NULL; + *hostno = hn; - return rc; + session = class_to_transport_session(cls_session); + for (cmd_i = 0; cmd_i < session->cmds_max; cmd_i++) { + struct iscsi_cmd_task *ctask = session->cmds[cmd_i]; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + + ctask->hdr = &tcp_ctask->hdr; + } + + for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) { + struct iscsi_mgmt_task *mtask = session->mgmt_cmds[cmd_i]; + struct iscsi_tcp_mgmt_task *tcp_mtask = mtask->dd_data; + + mtask->hdr = &tcp_mtask->hdr; + } + + if (iscsi_r2tpool_alloc(class_to_transport_session(cls_session))) + goto r2tpool_alloc_fail; + + return cls_session; + +r2tpool_alloc_fail: + iscsi_session_teardown(cls_session); + return NULL; +} + +static void iscsi_tcp_session_destroy(struct iscsi_cls_session *cls_session) +{ + struct iscsi_session *session = class_to_transport_session(cls_session); + struct iscsi_data_task *dtask, *n; + int cmd_i; + + for (cmd_i = 0; cmd_i < session->cmds_max; cmd_i++) { + struct iscsi_cmd_task *ctask = session->cmds[cmd_i]; + struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data; + + list_for_each_entry_safe(dtask, n, &tcp_ctask->dataqueue, + item) { + list_del(&dtask->item); + mempool_free(dtask, tcp_ctask->datapool); + } + } + + iscsi_r2tpool_free(class_to_transport_session(cls_session)); + iscsi_session_teardown(cls_session); } +static struct scsi_host_template iscsi_sht = { + .name = "iSCSI Initiator over TCP/IP, v." + ISCSI_VERSION_STR, + .queuecommand = iscsi_queuecommand, + .change_queue_depth = iscsi_change_queue_depth, + .can_queue = ISCSI_XMIT_CMDS_MAX - 1, + .sg_tablesize = ISCSI_SG_TABLESIZE, + .cmd_per_lun = ISCSI_DEF_CMD_PER_LUN, + .eh_abort_handler = iscsi_eh_abort, + .eh_host_reset_handler = iscsi_eh_host_reset, + .use_clustering = DISABLE_CLUSTERING, + .proc_name = "iscsi_tcp", + .this_id = -1, +}; + static struct iscsi_transport iscsi_tcp_transport = { .owner = THIS_MODULE, .name = "tcp", .caps = CAP_RECOVERY_L0 | CAP_MULTI_R2T | CAP_HDRDGST | CAP_DATADGST, + .param_mask = ISCSI_MAX_RECV_DLENGTH | + ISCSI_MAX_XMIT_DLENGTH | + ISCSI_HDRDGST_EN | + ISCSI_DATADGST_EN | + ISCSI_INITIAL_R2T_EN | + ISCSI_MAX_R2T | + ISCSI_IMM_DATA_EN | + ISCSI_FIRST_BURST | + ISCSI_MAX_BURST | + ISCSI_PDU_INORDER_EN | + ISCSI_DATASEQ_INORDER_EN | + ISCSI_ERL | + ISCSI_CONN_PORT | + ISCSI_CONN_ADDRESS | + ISCSI_EXP_STATSN, .host_template = &iscsi_sht, - .hostdata_size = sizeof(struct iscsi_session), .conndata_size = sizeof(struct iscsi_conn), .max_conn = 1, .max_cmd_len = ISCSI_TCP_MAX_CMD_LEN, - .create_session = iscsi_session_create, - .destroy_session = iscsi_session_destroy, - .create_conn = iscsi_conn_create, - .bind_conn = iscsi_conn_bind, - .destroy_conn = iscsi_conn_destroy, + /* session management */ + .create_session = iscsi_tcp_session_create, + .destroy_session = iscsi_tcp_session_destroy, + /* connection management */ + .create_conn = iscsi_tcp_conn_create, + .bind_conn = iscsi_tcp_conn_bind, + .destroy_conn = iscsi_tcp_conn_destroy, .set_param = iscsi_conn_set_param, .get_conn_param = iscsi_conn_get_param, + .get_conn_str_param = iscsi_conn_get_str_param, .get_session_param = iscsi_session_get_param, .start_conn = iscsi_conn_start, .stop_conn = iscsi_conn_stop, + /* these are called as part of conn recovery */ + .suspend_conn_recv = iscsi_tcp_suspend_conn_rx, + .terminate_conn = iscsi_tcp_terminate_conn, + /* IO */ .send_pdu = iscsi_conn_send_pdu, .get_stats = iscsi_conn_get_stats, + .init_cmd_task = iscsi_tcp_cmd_init, + .init_mgmt_task = iscsi_tcp_mgmt_init, + .xmit_cmd_task = iscsi_tcp_ctask_xmit, + .xmit_mgmt_task = iscsi_tcp_mtask_xmit, + .cleanup_cmd_task = iscsi_tcp_cleanup_ctask, + /* recovery */ + .session_recovery_timedout = iscsi_session_recovery_timedout, }; static int __init iscsi_tcp_init(void) { if (iscsi_max_lun < 1) { - printk(KERN_ERR "Invalid max_lun value of %u\n", iscsi_max_lun); + printk(KERN_ERR "iscsi_tcp: Invalid max_lun value of %u\n", + iscsi_max_lun); return -EINVAL; } iscsi_tcp_transport.max_lun = iscsi_max_lun; diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h index ba26741..c591885 100644 --- a/drivers/scsi/iscsi_tcp.h +++ b/drivers/scsi/iscsi_tcp.h @@ -2,7 +2,8 @@ * iSCSI Initiator TCP Transport * Copyright (C) 2004 Dmitry Yusupov * Copyright (C) 2004 Alex Aizman - * Copyright (C) 2005 Mike Christie + * Copyright (C) 2005 - 2006 Mike Christie + * Copyright (C) 2006 Red Hat, Inc. All rights reserved. * maintained by open-iscsi@googlegroups.com * * This program is free software; you can redistribute it and/or modify @@ -21,20 +22,7 @@ #ifndef ISCSI_TCP_H #define ISCSI_TCP_H -/* Session's states */ -#define ISCSI_STATE_FREE 1 -#define ISCSI_STATE_LOGGED_IN 2 -#define ISCSI_STATE_FAILED 3 -#define ISCSI_STATE_TERMINATE 4 - -/* Connection's states */ -#define ISCSI_CONN_INITIAL_STAGE 0 -#define ISCSI_CONN_STARTED 1 -#define ISCSI_CONN_STOPPED 2 -#define ISCSI_CONN_CLEANUP_WAIT 3 - -/* Connection suspend "bit" */ -#define SUSPEND_BIT 1 +#include /* Socket's Receive state machine */ #define IN_PROGRESS_WAIT_HEADER 0x0 @@ -42,12 +30,6 @@ #define IN_PROGRESS_HEADER_GATHER 0x1 #define IN_PROGRESS_DATA_RECV 0x2 #define IN_PROGRESS_DDIGEST_RECV 0x3 -/* Task Mgmt states */ -#define TMABORT_INITIAL 0x0 -#define TMABORT_SUCCESS 0x1 -#define TMABORT_FAILED 0x2 -#define TMABORT_TIMEDOUT 0x3 - /* xmit state machine */ #define XMSTATE_IDLE 0x0 #define XMSTATE_R_HDR 0x1 @@ -62,34 +44,14 @@ #define XMSTATE_SOL_DATA 0x100 #define XMSTATE_W_PAD 0x200 #define XMSTATE_DATA_DIGEST 0x400 -#define ISCSI_CONN_MAX 1 #define ISCSI_CONN_RCVBUF_MIN 262144 #define ISCSI_CONN_SNDBUF_MIN 262144 #define ISCSI_PAD_LEN 4 #define ISCSI_R2T_MAX 16 -#define ISCSI_XMIT_CMDS_MAX 128 /* must be power of 2 */ -#define ISCSI_MGMT_CMDS_MAX 32 /* must be power of 2 */ -#define ISCSI_MGMT_ITT_OFFSET 0xa00 #define ISCSI_SG_TABLESIZE SG_ALL -#define ISCSI_DEF_CMD_PER_LUN 32 -#define ISCSI_MAX_CMD_PER_LUN 128 #define ISCSI_TCP_MAX_CMD_LEN 16 -#define ITT_MASK (0xfff) -#define CID_SHIFT 12 -#define CID_MASK (0xffff< +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct iscsi_session * +class_to_transport_session(struct iscsi_cls_session *cls_session) +{ + struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); + return iscsi_hostdata(shost->hostdata); +} +EXPORT_SYMBOL_GPL(class_to_transport_session); + +#define INVALID_SN_DELTA 0xffff + +int +iscsi_check_assign_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr) +{ + uint32_t max_cmdsn = be32_to_cpu(hdr->max_cmdsn); + uint32_t exp_cmdsn = be32_to_cpu(hdr->exp_cmdsn); + + if (max_cmdsn < exp_cmdsn -1 && + max_cmdsn > exp_cmdsn - INVALID_SN_DELTA) + return ISCSI_ERR_MAX_CMDSN; + if (max_cmdsn > session->max_cmdsn || + max_cmdsn < session->max_cmdsn - INVALID_SN_DELTA) + session->max_cmdsn = max_cmdsn; + if (exp_cmdsn > session->exp_cmdsn || + exp_cmdsn < session->exp_cmdsn - INVALID_SN_DELTA) + session->exp_cmdsn = exp_cmdsn; + + return 0; +} +EXPORT_SYMBOL_GPL(iscsi_check_assign_cmdsn); + +void iscsi_prep_unsolicit_data_pdu(struct iscsi_cmd_task *ctask, + struct iscsi_data *hdr, + int transport_data_cnt) +{ + struct iscsi_conn *conn = ctask->conn; + + memset(hdr, 0, sizeof(struct iscsi_data)); + hdr->ttt = cpu_to_be32(ISCSI_RESERVED_TAG); + hdr->datasn = cpu_to_be32(ctask->unsol_datasn); + ctask->unsol_datasn++; + hdr->opcode = ISCSI_OP_SCSI_DATA_OUT; + memcpy(hdr->lun, ctask->hdr->lun, sizeof(hdr->lun)); + + hdr->itt = ctask->hdr->itt; + hdr->exp_statsn = cpu_to_be32(conn->exp_statsn); + + hdr->offset = cpu_to_be32(ctask->total_length - + transport_data_cnt - + ctask->unsol_count); + + if (ctask->unsol_count > conn->max_xmit_dlength) { + hton24(hdr->dlength, conn->max_xmit_dlength); + ctask->data_count = conn->max_xmit_dlength; + hdr->flags = 0; + } else { + hton24(hdr->dlength, ctask->unsol_count); + ctask->data_count = ctask->unsol_count; + hdr->flags = ISCSI_FLAG_CMD_FINAL; + } +} +EXPORT_SYMBOL_GPL(iscsi_prep_unsolicit_data_pdu); + +/** + * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu + * @ctask: iscsi cmd task + * + * Prep basic iSCSI PDU fields for a scsi cmd pdu. The LLD should set + * fields like dlength or final based on how much data it sends + */ +static void iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask) +{ + struct iscsi_conn *conn = ctask->conn; + struct iscsi_session *session = conn->session; + struct iscsi_cmd *hdr = ctask->hdr; + struct scsi_cmnd *sc = ctask->sc; + + hdr->opcode = ISCSI_OP_SCSI_CMD; + hdr->flags = ISCSI_ATTR_SIMPLE; + int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun); + hdr->itt = ctask->itt | (conn->id << ISCSI_CID_SHIFT) | + (session->age << ISCSI_AGE_SHIFT); + hdr->data_length = cpu_to_be32(sc->request_bufflen); + hdr->cmdsn = cpu_to_be32(session->cmdsn); + session->cmdsn++; + hdr->exp_statsn = cpu_to_be32(conn->exp_statsn); + memcpy(hdr->cdb, sc->cmnd, sc->cmd_len); + memset(&hdr->cdb[sc->cmd_len], 0, MAX_COMMAND_SIZE - sc->cmd_len); + + if (sc->sc_data_direction == DMA_TO_DEVICE) { + hdr->flags |= ISCSI_FLAG_CMD_WRITE; + /* + * Write counters: + * + * imm_count bytes to be sent right after + * SCSI PDU Header + * + * unsol_count bytes(as Data-Out) to be sent + * without R2T ack right after + * immediate data + * + * r2t_data_count bytes to be sent via R2T ack's + * + * pad_count bytes to be sent as zero-padding + */ + ctask->imm_count = 0; + ctask->unsol_count = 0; + ctask->unsol_datasn = 0; + + if (session->imm_data_en) { + if (ctask->total_length >= session->first_burst) + ctask->imm_count = min(session->first_burst, + conn->max_xmit_dlength); + else + ctask->imm_count = min(ctask->total_length, + conn->max_xmit_dlength); + hton24(ctask->hdr->dlength, ctask->imm_count); + } else + zero_data(ctask->hdr->dlength); + + if (!session->initial_r2t_en) + ctask->unsol_count = min(session->first_burst, + ctask->total_length) - ctask->imm_count; + if (!ctask->unsol_count) + /* No unsolicit Data-Out's */ + ctask->hdr->flags |= ISCSI_FLAG_CMD_FINAL; + } else { + ctask->datasn = 0; + hdr->flags |= ISCSI_FLAG_CMD_FINAL; + zero_data(hdr->dlength); + + if (sc->sc_data_direction == DMA_FROM_DEVICE) + hdr->flags |= ISCSI_FLAG_CMD_READ; + } + + conn->scsicmd_pdus_cnt++; +} +EXPORT_SYMBOL_GPL(iscsi_prep_scsi_cmd_pdu); + +/** + * iscsi_complete_command - return command back to scsi-ml + * @session: iscsi session + * @ctask: iscsi cmd task + * + * Must be called with session lock. + * This function returns the scsi command to scsi-ml and returns + * the cmd task to the pool of available cmd tasks. + */ +static void iscsi_complete_command(struct iscsi_session *session, + struct iscsi_cmd_task *ctask) +{ + struct scsi_cmnd *sc = ctask->sc; + + ctask->sc = NULL; + list_del_init(&ctask->running); + __kfifo_put(session->cmdpool.queue, (void*)&ctask, sizeof(void*)); + sc->scsi_done(sc); +} + +/** + * iscsi_cmd_rsp - SCSI Command Response processing + * @conn: iscsi connection + * @hdr: iscsi header + * @ctask: scsi command task + * @data: cmd data buffer + * @datalen: len of buffer + * + * iscsi_cmd_rsp sets up the scsi_cmnd fields based on the PDU and + * then completes the command and task. + **/ +static int iscsi_scsi_cmd_rsp(struct iscsi_conn *conn, struct iscsi_hdr *hdr, + struct iscsi_cmd_task *ctask, char *data, + int datalen) +{ + int rc; + struct iscsi_cmd_rsp *rhdr = (struct iscsi_cmd_rsp *)hdr; + struct iscsi_session *session = conn->session; + struct scsi_cmnd *sc = ctask->sc; + + rc = iscsi_check_assign_cmdsn(session, (struct iscsi_nopin*)rhdr); + if (rc) { + sc->result = DID_ERROR << 16; + goto out; + } + + conn->exp_statsn = be32_to_cpu(rhdr->statsn) + 1; + + sc->result = (DID_OK << 16) | rhdr->cmd_status; + + if (rhdr->response != ISCSI_STATUS_CMD_COMPLETED) { + sc->result = DID_ERROR << 16; + goto out; + } + + if (rhdr->cmd_status == SAM_STAT_CHECK_CONDITION) { + int senselen; + + if (datalen < 2) { +invalid_datalen: + printk(KERN_ERR "iscsi: Got CHECK_CONDITION but " + "invalid data buffer size of %d\n", datalen); + sc->result = DID_BAD_TARGET << 16; + goto out; + } + + senselen = (data[0] << 8) | data[1]; + if (datalen < senselen) + goto invalid_datalen; + + memcpy(sc->sense_buffer, data + 2, + min(senselen, SCSI_SENSE_BUFFERSIZE)); + debug_scsi("copied %d bytes of sense\n", + min(senselen, SCSI_SENSE_BUFFERSIZE)); + } + + if (sc->sc_data_direction == DMA_TO_DEVICE) + goto out; + + if (rhdr->flags & ISCSI_FLAG_CMD_UNDERFLOW) { + int res_count = be32_to_cpu(rhdr->residual_count); + + if (res_count > 0 && res_count <= sc->request_bufflen) + sc->resid = res_count; + else + sc->result = (DID_BAD_TARGET << 16) | rhdr->cmd_status; + } else if (rhdr->flags & ISCSI_FLAG_CMD_BIDI_UNDERFLOW) + sc->result = (DID_BAD_TARGET << 16) | rhdr->cmd_status; + else if (rhdr->flags & ISCSI_FLAG_CMD_OVERFLOW) + sc->resid = be32_to_cpu(rhdr->residual_count); + +out: + debug_scsi("done [sc %lx res %d itt 0x%x]\n", + (long)sc, sc->result, ctask->itt); + conn->scsirsp_pdus_cnt++; + + iscsi_complete_command(conn->session, ctask); + return rc; +} + +/** + * __iscsi_complete_pdu - complete pdu + * @conn: iscsi conn + * @hdr: iscsi header + * @data: data buffer + * @datalen: len of data buffer + * + * Completes pdu processing by freeing any resources allocated at + * queuecommand or send generic. session lock must be held and verify + * itt must have been called. + */ +int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, + char *data, int datalen) +{ + struct iscsi_session *session = conn->session; + int opcode = hdr->opcode & ISCSI_OPCODE_MASK, rc = 0; + struct iscsi_cmd_task *ctask; + struct iscsi_mgmt_task *mtask; + uint32_t itt; + + if (hdr->itt != cpu_to_be32(ISCSI_RESERVED_TAG)) + itt = hdr->itt & ISCSI_ITT_MASK; + else + itt = hdr->itt; + + if (itt < session->cmds_max) { + ctask = session->cmds[itt]; + + debug_scsi("cmdrsp [op 0x%x cid %d itt 0x%x len %d]\n", + opcode, conn->id, ctask->itt, datalen); + + switch(opcode) { + case ISCSI_OP_SCSI_CMD_RSP: + BUG_ON((void*)ctask != ctask->sc->SCp.ptr); + rc = iscsi_scsi_cmd_rsp(conn, hdr, ctask, data, + datalen); + break; + case ISCSI_OP_SCSI_DATA_IN: + BUG_ON((void*)ctask != ctask->sc->SCp.ptr); + if (hdr->flags & ISCSI_FLAG_DATA_STATUS) { + conn->scsirsp_pdus_cnt++; + iscsi_complete_command(session, ctask); + } + break; + case ISCSI_OP_R2T: + /* LLD handles this for now */ + break; + default: + rc = ISCSI_ERR_BAD_OPCODE; + break; + } + } else if (itt >= ISCSI_MGMT_ITT_OFFSET && + itt < ISCSI_MGMT_ITT_OFFSET + session->mgmtpool_max) { + mtask = session->mgmt_cmds[itt - ISCSI_MGMT_ITT_OFFSET]; + + debug_scsi("immrsp [op 0x%x cid %d itt 0x%x len %d]\n", + opcode, conn->id, mtask->itt, datalen); + + rc = iscsi_check_assign_cmdsn(session, + (struct iscsi_nopin*)hdr); + if (rc) + goto done; + + switch(opcode) { + case ISCSI_OP_LOGOUT_RSP: + conn->exp_statsn = be32_to_cpu(hdr->statsn) + 1; + /* fall through */ + case ISCSI_OP_LOGIN_RSP: + case ISCSI_OP_TEXT_RSP: + /* + * login related PDU's exp_statsn is handled in + * userspace + */ + rc = iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen); + list_del(&mtask->running); + if (conn->login_mtask != mtask) + __kfifo_put(session->mgmtpool.queue, + (void*)&mtask, sizeof(void*)); + break; + case ISCSI_OP_SCSI_TMFUNC_RSP: + if (datalen) { + rc = ISCSI_ERR_PROTO; + break; + } + + conn->exp_statsn = be32_to_cpu(hdr->statsn) + 1; + conn->tmfrsp_pdus_cnt++; + if (conn->tmabort_state == TMABORT_INITIAL) { + conn->tmabort_state = + ((struct iscsi_tm_rsp *)hdr)-> + response == ISCSI_TMF_RSP_COMPLETE ? + TMABORT_SUCCESS:TMABORT_FAILED; + /* unblock eh_abort() */ + wake_up(&conn->ehwait); + } + break; + case ISCSI_OP_NOOP_IN: + if (hdr->ttt != ISCSI_RESERVED_TAG) { + rc = ISCSI_ERR_PROTO; + break; + } + conn->exp_statsn = be32_to_cpu(hdr->statsn) + 1; + + rc = iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen); + list_del(&mtask->running); + if (conn->login_mtask != mtask) + __kfifo_put(session->mgmtpool.queue, + (void*)&mtask, sizeof(void*)); + break; + default: + rc = ISCSI_ERR_BAD_OPCODE; + break; + } + } else if (itt == ISCSI_RESERVED_TAG) { + switch(opcode) { + case ISCSI_OP_NOOP_IN: + if (!datalen) { + rc = iscsi_check_assign_cmdsn(session, + (struct iscsi_nopin*)hdr); + if (!rc && hdr->ttt != ISCSI_RESERVED_TAG) + rc = iscsi_recv_pdu(conn->cls_conn, + hdr, NULL, 0); + } else + rc = ISCSI_ERR_PROTO; + break; + case ISCSI_OP_REJECT: + /* we need sth like iscsi_reject_rsp()*/ + case ISCSI_OP_ASYNC_EVENT: + conn->exp_statsn = be32_to_cpu(hdr->statsn) + 1; + /* we need sth like iscsi_async_event_rsp() */ + rc = ISCSI_ERR_BAD_OPCODE; + break; + default: + rc = ISCSI_ERR_BAD_OPCODE; + break; + } + } else + rc = ISCSI_ERR_BAD_ITT; + +done: + return rc; +} +EXPORT_SYMBOL_GPL(__iscsi_complete_pdu); + +int iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, + char *data, int datalen) +{ + int rc; + + spin_lock(&conn->session->lock); + rc = __iscsi_complete_pdu(conn, hdr, data, datalen); + spin_unlock(&conn->session->lock); + return rc; +} +EXPORT_SYMBOL_GPL(iscsi_complete_pdu); + +/* verify itt (itt encoding: age+cid+itt) */ +int iscsi_verify_itt(struct iscsi_conn *conn, struct iscsi_hdr *hdr, + uint32_t *ret_itt) +{ + struct iscsi_session *session = conn->session; + struct iscsi_cmd_task *ctask; + uint32_t itt; + + if (hdr->itt != cpu_to_be32(ISCSI_RESERVED_TAG)) { + if ((hdr->itt & ISCSI_AGE_MASK) != + (session->age << ISCSI_AGE_SHIFT)) { + printk(KERN_ERR "iscsi: received itt %x expected " + "session age (%x)\n", hdr->itt, + session->age & ISCSI_AGE_MASK); + return ISCSI_ERR_BAD_ITT; + } + + if ((hdr->itt & ISCSI_CID_MASK) != + (conn->id << ISCSI_CID_SHIFT)) { + printk(KERN_ERR "iscsi: received itt %x, expected " + "CID (%x)\n", hdr->itt, conn->id); + return ISCSI_ERR_BAD_ITT; + } + itt = hdr->itt & ISCSI_ITT_MASK; + } else + itt = hdr->itt; + + if (itt < session->cmds_max) { + ctask = session->cmds[itt]; + + if (!ctask->sc) { + printk(KERN_INFO "iscsi: dropping ctask with " + "itt 0x%x\n", ctask->itt); + /* force drop */ + return ISCSI_ERR_NO_SCSI_CMD; + } + + if (ctask->sc->SCp.phase != session->age) { + printk(KERN_ERR "iscsi: ctask's session age %d, " + "expected %d\n", ctask->sc->SCp.phase, + session->age); + return ISCSI_ERR_SESSION_FAILED; + } + } + + *ret_itt = itt; + return 0; +} +EXPORT_SYMBOL_GPL(iscsi_verify_itt); + +void iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err) +{ + struct iscsi_session *session = conn->session; + unsigned long flags; + + spin_lock_irqsave(&session->lock, flags); + if (session->conn_cnt == 1 || session->leadconn == conn) + session->state = ISCSI_STATE_FAILED; + spin_unlock_irqrestore(&session->lock, flags); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx); + iscsi_conn_error(conn->cls_conn, err); +} +EXPORT_SYMBOL_GPL(iscsi_conn_failure); + +/** + * iscsi_data_xmit - xmit any command into the scheduled connection + * @conn: iscsi connection + * + * Notes: + * The function can return -EAGAIN in which case the caller must + * re-schedule it again later or recover. '0' return code means + * successful xmit. + **/ +static int iscsi_data_xmit(struct iscsi_conn *conn) +{ + struct iscsi_transport *tt; + + if (unlikely(conn->suspend_tx)) { + debug_scsi("conn %d Tx suspended!\n", conn->id); + return 0; + } + tt = conn->session->tt; + + /* + * Transmit in the following order: + * + * 1) un-finished xmit (ctask or mtask) + * 2) immediate control PDUs + * 3) write data + * 4) SCSI commands + * 5) non-immediate control PDUs + * + * No need to lock around __kfifo_get as long as + * there's one producer and one consumer. + */ + + BUG_ON(conn->ctask && conn->mtask); + + if (conn->ctask) { + if (tt->xmit_cmd_task(conn, conn->ctask)) + goto again; + /* done with this in-progress ctask */ + conn->ctask = NULL; + } + if (conn->mtask) { + if (tt->xmit_mgmt_task(conn, conn->mtask)) + goto again; + /* done with this in-progress mtask */ + conn->mtask = NULL; + } + + /* process immediate first */ + if (unlikely(__kfifo_len(conn->immqueue))) { + while (__kfifo_get(conn->immqueue, (void*)&conn->mtask, + sizeof(void*))) { + list_add_tail(&conn->mtask->running, + &conn->mgmt_run_list); + if (tt->xmit_mgmt_task(conn, conn->mtask)) + goto again; + } + /* done with this mtask */ + conn->mtask = NULL; + } + + /* process command queue */ + while (__kfifo_get(conn->xmitqueue, (void*)&conn->ctask, + sizeof(void*))) { + /* + * iscsi tcp may readd the task to the xmitqueue to send + * write data + */ + if (list_empty(&conn->ctask->running)) + list_add_tail(&conn->ctask->running, &conn->run_list); + if (tt->xmit_cmd_task(conn, conn->ctask)) + goto again; + } + /* done with this ctask */ + conn->ctask = NULL; + + /* process the rest control plane PDUs, if any */ + if (unlikely(__kfifo_len(conn->mgmtqueue))) { + while (__kfifo_get(conn->mgmtqueue, (void*)&conn->mtask, + sizeof(void*))) { + list_add_tail(&conn->mtask->running, + &conn->mgmt_run_list); + if (tt->xmit_mgmt_task(conn, conn->mtask)) + goto again; + } + /* done with this mtask */ + conn->mtask = NULL; + } + + return 0; + +again: + if (unlikely(conn->suspend_tx)) + return 0; + + return -EAGAIN; +} + +static void iscsi_xmitworker(void *data) +{ + struct iscsi_conn *conn = data; + + /* + * serialize Xmit worker on a per-connection basis. + */ + mutex_lock(&conn->xmitmutex); + if (iscsi_data_xmit(conn)) + scsi_queue_work(conn->session->host, &conn->xmitwork); + mutex_unlock(&conn->xmitmutex); +} + +enum { + FAILURE_BAD_HOST = 1, + FAILURE_SESSION_FAILED, + FAILURE_SESSION_FREED, + FAILURE_WINDOW_CLOSED, + FAILURE_SESSION_TERMINATE, + FAILURE_SESSION_RECOVERY_TIMEOUT, +}; + +int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *)) +{ + struct Scsi_Host *host; + int reason = 0; + struct iscsi_session *session; + struct iscsi_conn *conn; + struct iscsi_cmd_task *ctask = NULL; + + sc->scsi_done = done; + sc->result = 0; + + host = sc->device->host; + session = iscsi_hostdata(host->hostdata); + + spin_lock(&session->lock); + + if (session->state != ISCSI_STATE_LOGGED_IN) { + if (session->recovery_failed) { + reason = FAILURE_SESSION_RECOVERY_TIMEOUT; + goto fault; + } else if (session->state == ISCSI_STATE_FAILED) { + reason = FAILURE_SESSION_FAILED; + goto reject; + } else if (session->state == ISCSI_STATE_TERMINATE) { + reason = FAILURE_SESSION_TERMINATE; + goto fault; + } + reason = FAILURE_SESSION_FREED; + goto fault; + } + + /* + * Check for iSCSI window and take care of CmdSN wrap-around + */ + if ((int)(session->max_cmdsn - session->cmdsn) < 0) { + reason = FAILURE_WINDOW_CLOSED; + goto reject; + } + + conn = session->leadconn; + + __kfifo_get(session->cmdpool.queue, (void*)&ctask, sizeof(void*)); + sc->SCp.phase = session->age; + sc->SCp.ptr = (char *)ctask; + + ctask->mtask = NULL; + ctask->conn = conn; + ctask->sc = sc; + INIT_LIST_HEAD(&ctask->running); + ctask->total_length = sc->request_bufflen; + iscsi_prep_scsi_cmd_pdu(ctask); + + session->tt->init_cmd_task(ctask); + + __kfifo_put(conn->xmitqueue, (void*)&ctask, sizeof(void*)); + debug_scsi( + "ctask enq [%s cid %d sc %lx itt 0x%x len %d cmdsn %d win %d]\n", + sc->sc_data_direction == DMA_TO_DEVICE ? "write" : "read", + conn->id, (long)sc, ctask->itt, sc->request_bufflen, + session->cmdsn, session->max_cmdsn - session->exp_cmdsn + 1); + spin_unlock(&session->lock); + + scsi_queue_work(host, &conn->xmitwork); + return 0; + +reject: + spin_unlock(&session->lock); + debug_scsi("cmd 0x%x rejected (%d)\n", sc->cmnd[0], reason); + return SCSI_MLQUEUE_HOST_BUSY; + +fault: + spin_unlock(&session->lock); + printk(KERN_ERR "iscsi: cmd 0x%x is not queued (%d)\n", + sc->cmnd[0], reason); + sc->result = (DID_NO_CONNECT << 16); + sc->resid = sc->request_bufflen; + sc->scsi_done(sc); + return 0; +} +EXPORT_SYMBOL_GPL(iscsi_queuecommand); + +int iscsi_change_queue_depth(struct scsi_device *sdev, int depth) +{ + if (depth > ISCSI_MAX_CMD_PER_LUN) + depth = ISCSI_MAX_CMD_PER_LUN; + scsi_adjust_queue_depth(sdev, scsi_get_tag_type(sdev), depth); + return sdev->queue_depth; +} +EXPORT_SYMBOL_GPL(iscsi_change_queue_depth); + +static int +iscsi_conn_send_generic(struct iscsi_conn *conn, struct iscsi_hdr *hdr, + char *data, uint32_t data_size) +{ + struct iscsi_session *session = conn->session; + struct iscsi_nopout *nop = (struct iscsi_nopout *)hdr; + struct iscsi_mgmt_task *mtask; + + spin_lock_bh(&session->lock); + if (session->state == ISCSI_STATE_TERMINATE) { + spin_unlock_bh(&session->lock); + return -EPERM; + } + if (hdr->opcode == (ISCSI_OP_LOGIN | ISCSI_OP_IMMEDIATE) || + hdr->opcode == (ISCSI_OP_TEXT | ISCSI_OP_IMMEDIATE)) + /* + * Login and Text are sent serially, in + * request-followed-by-response sequence. + * Same mtask can be used. Same ITT must be used. + * Note that login_mtask is preallocated at conn_create(). + */ + mtask = conn->login_mtask; + else { + BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE); + BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED); + + nop->exp_statsn = cpu_to_be32(conn->exp_statsn); + if (!__kfifo_get(session->mgmtpool.queue, + (void*)&mtask, sizeof(void*))) { + spin_unlock_bh(&session->lock); + return -ENOSPC; + } + } + + /* + * pre-format CmdSN for outgoing PDU. + */ + if (hdr->itt != cpu_to_be32(ISCSI_RESERVED_TAG)) { + hdr->itt = mtask->itt | (conn->id << ISCSI_CID_SHIFT) | + (session->age << ISCSI_AGE_SHIFT); + nop->cmdsn = cpu_to_be32(session->cmdsn); + if (conn->c_stage == ISCSI_CONN_STARTED && + !(hdr->opcode & ISCSI_OP_IMMEDIATE)) + session->cmdsn++; + } else + /* do not advance CmdSN */ + nop->cmdsn = cpu_to_be32(session->cmdsn); + + if (data_size) { + memcpy(mtask->data, data, data_size); + mtask->data_count = data_size; + } else + mtask->data_count = 0; + + INIT_LIST_HEAD(&mtask->running); + memcpy(mtask->hdr, hdr, sizeof(struct iscsi_hdr)); + if (session->tt->init_mgmt_task) + session->tt->init_mgmt_task(conn, mtask, data, data_size); + spin_unlock_bh(&session->lock); + + debug_scsi("mgmtpdu [op 0x%x hdr->itt 0x%x datalen %d]\n", + hdr->opcode, hdr->itt, data_size); + + /* + * since send_pdu() could be called at least from two contexts, + * we need to serialize __kfifo_put, so we don't have to take + * additional lock on fast data-path + */ + if (hdr->opcode & ISCSI_OP_IMMEDIATE) + __kfifo_put(conn->immqueue, (void*)&mtask, sizeof(void*)); + else + __kfifo_put(conn->mgmtqueue, (void*)&mtask, sizeof(void*)); + + scsi_queue_work(session->host, &conn->xmitwork); + return 0; +} + +int iscsi_conn_send_pdu(struct iscsi_cls_conn *cls_conn, struct iscsi_hdr *hdr, + char *data, uint32_t data_size) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + int rc; + + mutex_lock(&conn->xmitmutex); + rc = iscsi_conn_send_generic(conn, hdr, data, data_size); + mutex_unlock(&conn->xmitmutex); + + return rc; +} +EXPORT_SYMBOL_GPL(iscsi_conn_send_pdu); + +void iscsi_session_recovery_timedout(struct iscsi_cls_session *cls_session) +{ + struct iscsi_session *session = class_to_transport_session(cls_session); + struct iscsi_conn *conn = session->leadconn; + + spin_lock_bh(&session->lock); + if (session->state != ISCSI_STATE_LOGGED_IN) { + session->recovery_failed = 1; + if (conn) + wake_up(&conn->ehwait); + } + spin_unlock_bh(&session->lock); +} +EXPORT_SYMBOL_GPL(iscsi_session_recovery_timedout); + +int iscsi_eh_host_reset(struct scsi_cmnd *sc) +{ + struct Scsi_Host *host = sc->device->host; + struct iscsi_session *session = iscsi_hostdata(host->hostdata); + struct iscsi_conn *conn = session->leadconn; + int fail_session = 0; + + spin_lock_bh(&session->lock); + if (session->state == ISCSI_STATE_TERMINATE) { +failed: + debug_scsi("failing host reset: session terminated " + "[CID %d age %d]", conn->id, session->age); + spin_unlock_bh(&session->lock); + return FAILED; + } + + if (sc->SCp.phase == session->age) { + debug_scsi("failing connection CID %d due to SCSI host reset", + conn->id); + fail_session = 1; + } + spin_unlock_bh(&session->lock); + + /* + * we drop the lock here but the leadconn cannot be destoyed while + * we are in the scsi eh + */ + if (fail_session) { + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + /* + * if userspace cannot respond then we must kick this off + * here for it + */ + iscsi_start_session_recovery(session, conn, STOP_CONN_RECOVER); + } + + debug_scsi("iscsi_eh_host_reset wait for relogin\n"); + wait_event_interruptible(conn->ehwait, + session->state == ISCSI_STATE_TERMINATE || + session->state == ISCSI_STATE_LOGGED_IN || + session->recovery_failed); + if (signal_pending(current)) + flush_signals(current); + + spin_lock_bh(&session->lock); + if (session->state == ISCSI_STATE_LOGGED_IN) + printk(KERN_INFO "iscsi: host reset succeeded\n"); + else + goto failed; + spin_unlock_bh(&session->lock); + + return SUCCESS; +} +EXPORT_SYMBOL_GPL(iscsi_eh_host_reset); + +static void iscsi_tmabort_timedout(unsigned long data) +{ + struct iscsi_cmd_task *ctask = (struct iscsi_cmd_task *)data; + struct iscsi_conn *conn = ctask->conn; + struct iscsi_session *session = conn->session; + + spin_lock(&session->lock); + if (conn->tmabort_state == TMABORT_INITIAL) { + conn->tmabort_state = TMABORT_TIMEDOUT; + debug_scsi("tmabort timedout [sc %p itt 0x%x]\n", + ctask->sc, ctask->itt); + /* unblock eh_abort() */ + wake_up(&conn->ehwait); + } + spin_unlock(&session->lock); +} + +/* must be called with the mutex lock */ +static int iscsi_exec_abort_task(struct scsi_cmnd *sc, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_conn *conn = ctask->conn; + struct iscsi_session *session = conn->session; + struct iscsi_tm *hdr = &conn->tmhdr; + int rc; + + /* + * ctask timed out but session is OK requests must be serialized. + */ + memset(hdr, 0, sizeof(struct iscsi_tm)); + hdr->opcode = ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE; + hdr->flags = ISCSI_TM_FUNC_ABORT_TASK; + hdr->flags |= ISCSI_FLAG_CMD_FINAL; + memcpy(hdr->lun, ctask->hdr->lun, sizeof(hdr->lun)); + hdr->rtt = ctask->hdr->itt; + hdr->refcmdsn = ctask->hdr->cmdsn; + + rc = iscsi_conn_send_generic(conn, (struct iscsi_hdr *)hdr, + NULL, 0); + if (rc) { + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + debug_scsi("abort sent failure [itt 0x%x] %d", ctask->itt, rc); + return rc; + } + + debug_scsi("abort sent [itt 0x%x]\n", ctask->itt); + + spin_lock_bh(&session->lock); + ctask->mtask = (struct iscsi_mgmt_task *) + session->mgmt_cmds[(hdr->itt & ISCSI_ITT_MASK) - + ISCSI_MGMT_ITT_OFFSET]; + + if (conn->tmabort_state == TMABORT_INITIAL) { + conn->tmfcmd_pdus_cnt++; + conn->tmabort_timer.expires = 10*HZ + jiffies; + conn->tmabort_timer.function = iscsi_tmabort_timedout; + conn->tmabort_timer.data = (unsigned long)ctask; + add_timer(&conn->tmabort_timer); + debug_scsi("abort set timeout [itt 0x%x]", ctask->itt); + } + spin_unlock_bh(&session->lock); + mutex_unlock(&conn->xmitmutex); + + /* + * block eh thread until: + * + * 1) abort response + * 2) abort timeout + * 3) session is terminated or restarted or userspace has + * given up on recovery + */ + wait_event_interruptible(conn->ehwait, + sc->SCp.phase != session->age || + session->state != ISCSI_STATE_LOGGED_IN || + conn->tmabort_state != TMABORT_INITIAL || + session->recovery_failed); + if (signal_pending(current)) + flush_signals(current); + del_timer_sync(&conn->tmabort_timer); + + mutex_lock(&conn->xmitmutex); + return 0; +} + +/* + * xmit mutex and session lock must be held + */ +#define iscsi_remove_task(tasktype) \ +static struct iscsi_##tasktype * \ +iscsi_remove_##tasktype(struct kfifo *fifo, uint32_t itt) \ +{ \ + int i, nr_tasks = __kfifo_len(fifo) / sizeof(void*); \ + struct iscsi_##tasktype *task; \ + \ + debug_scsi("searching %d tasks\n", nr_tasks); \ + \ + for (i = 0; i < nr_tasks; i++) { \ + __kfifo_get(fifo, (void*)&task, sizeof(void*)); \ + debug_scsi("check task %u\n", task->itt); \ + \ + if (task->itt == itt) { \ + debug_scsi("matched task\n"); \ + break; \ + } \ + \ + __kfifo_put(fifo, (void*)&task, sizeof(void*)); \ + } \ + return NULL; \ +} + +iscsi_remove_task(mgmt_task); +iscsi_remove_task(cmd_task); + +static int iscsi_ctask_mtask_cleanup(struct iscsi_cmd_task *ctask) +{ + struct iscsi_conn *conn = ctask->conn; + struct iscsi_session *session = conn->session; + + if (!ctask->mtask) + return -EINVAL; + + if (!iscsi_remove_mgmt_task(conn->immqueue, ctask->mtask->itt)) + list_del(&ctask->mtask->running); + __kfifo_put(session->mgmtpool.queue, (void*)&ctask->mtask, + sizeof(void*)); + ctask->mtask = NULL; + return 0; +} + +/* + * session lock and xmitmutex must be held + */ +static void fail_command(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask, + int err) +{ + struct scsi_cmnd *sc; + + conn->session->tt->cleanup_cmd_task(conn, ctask); + iscsi_ctask_mtask_cleanup(ctask); + + sc = ctask->sc; + if (!sc) + return; + sc->result = err; + sc->resid = sc->request_bufflen; + iscsi_complete_command(conn->session, ctask); +} + +int iscsi_eh_abort(struct scsi_cmnd *sc) +{ + struct iscsi_cmd_task *ctask = (struct iscsi_cmd_task *)sc->SCp.ptr; + struct iscsi_conn *conn = ctask->conn; + struct iscsi_session *session = conn->session; + struct iscsi_cmd_task *pending_ctask; + int rc; + + conn->eh_abort_cnt++; + debug_scsi("aborting [sc %p itt 0x%x]\n", sc, ctask->itt); + + mutex_lock(&conn->xmitmutex); + spin_lock_bh(&session->lock); + + /* + * If we are not logged in or we have started a new session + * then let the host reset code handle this + */ + if (session->state != ISCSI_STATE_LOGGED_IN || + sc->SCp.phase != session->age) + goto failed; + + /* ctask completed before time out */ + if (!ctask->sc) + goto success; + + /* what should we do here ? */ + if (conn->ctask == ctask) { + printk(KERN_INFO "iscsi: sc %p itt 0x%x partially sent. " + "Failing abort\n", sc, ctask->itt); + goto failed; + } + + /* check for the easy pending cmd abort */ + pending_ctask = iscsi_remove_cmd_task(conn->xmitqueue, ctask->itt); + if (pending_ctask) { + /* iscsi_tcp queues write transfers on the xmitqueue */ + if (list_empty(&pending_ctask->running)) { + debug_scsi("found pending task\n"); + goto success; + } else + __kfifo_put(conn->xmitqueue, (void*)&pending_ctask, + sizeof(void*)); + } + + conn->tmabort_state = TMABORT_INITIAL; + + spin_unlock_bh(&session->lock); + rc = iscsi_exec_abort_task(sc, ctask); + spin_lock_bh(&session->lock); + + iscsi_ctask_mtask_cleanup(ctask); + if (rc || sc->SCp.phase != session->age || + session->state != ISCSI_STATE_LOGGED_IN) + goto failed; + + /* ctask completed before tmf abort response */ + if (!ctask->sc) { + debug_scsi("sc completed while abort in progress\n"); + goto success; + } + + if (conn->tmabort_state != TMABORT_SUCCESS) { + spin_unlock_bh(&session->lock); + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + spin_lock_bh(&session->lock); + goto failed; + } + +success: + debug_scsi("abort success [sc %lx itt 0x%x]\n", (long)sc, ctask->itt); + spin_unlock_bh(&session->lock); + + /* + * clean up task if aborted. we have the xmitmutex so grab + * the recv lock as a writer + */ + write_lock_bh(conn->recv_lock); + spin_lock(&session->lock); + fail_command(conn, ctask, DID_ABORT << 16); + spin_unlock(&session->lock); + write_unlock_bh(conn->recv_lock); + + mutex_unlock(&conn->xmitmutex); + return SUCCESS; + +failed: + spin_unlock_bh(&session->lock); + mutex_unlock(&conn->xmitmutex); + + debug_scsi("abort failed [sc %lx itt 0x%x]\n", (long)sc, ctask->itt); + return FAILED; +} +EXPORT_SYMBOL_GPL(iscsi_eh_abort); + +int +iscsi_pool_init(struct iscsi_queue *q, int max, void ***items, int item_size) +{ + int i; + + *items = kmalloc(max * sizeof(void*), GFP_KERNEL); + if (*items == NULL) + return -ENOMEM; + + q->max = max; + q->pool = kmalloc(max * sizeof(void*), GFP_KERNEL); + if (q->pool == NULL) { + kfree(*items); + return -ENOMEM; + } + + q->queue = kfifo_init((void*)q->pool, max * sizeof(void*), + GFP_KERNEL, NULL); + if (q->queue == ERR_PTR(-ENOMEM)) { + kfree(q->pool); + kfree(*items); + return -ENOMEM; + } + + for (i = 0; i < max; i++) { + q->pool[i] = kmalloc(item_size, GFP_KERNEL); + if (q->pool[i] == NULL) { + int j; + + for (j = 0; j < i; j++) + kfree(q->pool[j]); + + kfifo_free(q->queue); + kfree(q->pool); + kfree(*items); + return -ENOMEM; + } + memset(q->pool[i], 0, item_size); + (*items)[i] = q->pool[i]; + __kfifo_put(q->queue, (void*)&q->pool[i], sizeof(void*)); + } + return 0; +} +EXPORT_SYMBOL_GPL(iscsi_pool_init); + +void iscsi_pool_free(struct iscsi_queue *q, void **items) +{ + int i; + + for (i = 0; i < q->max; i++) + kfree(items[i]); + kfree(q->pool); + kfree(items); +} +EXPORT_SYMBOL_GPL(iscsi_pool_free); + +/* + * iSCSI Session's hostdata organization: + * + * *------------------* <== hostdata_session(host->hostdata) + * | ptr to class sess| + * |------------------| <== iscsi_hostdata(host->hostdata) + * | iscsi_session | + * *------------------* + */ + +#define hostdata_privsize(_sz) (sizeof(unsigned long) + _sz + \ + _sz % sizeof(unsigned long)) + +#define hostdata_session(_hostdata) (iscsi_ptr(*(unsigned long *)_hostdata)) + +/** + * iscsi_session_setup - create iscsi cls session and host and session + * @scsit: scsi transport template + * @iscsit: iscsi transport template + * @initial_cmdsn: initial CmdSN + * @hostno: host no allocated + * + * This can be used by software iscsi_transports that allocate + * a session per scsi host. + **/ +struct iscsi_cls_session * +iscsi_session_setup(struct iscsi_transport *iscsit, + struct scsi_transport_template *scsit, + int cmd_task_size, int mgmt_task_size, + uint32_t initial_cmdsn, uint32_t *hostno) +{ + struct Scsi_Host *shost; + struct iscsi_session *session; + struct iscsi_cls_session *cls_session; + int cmd_i; + + shost = scsi_host_alloc(iscsit->host_template, + hostdata_privsize(sizeof(*session))); + if (!shost) + return NULL; + + shost->max_id = 1; + shost->max_channel = 0; + shost->max_lun = iscsit->max_lun; + shost->max_cmd_len = iscsit->max_cmd_len; + shost->transportt = scsit; + shost->transportt->create_work_queue = 1; + *hostno = shost->host_no; + + session = iscsi_hostdata(shost->hostdata); + memset(session, 0, sizeof(struct iscsi_session)); + session->host = shost; + session->state = ISCSI_STATE_FREE; + session->mgmtpool_max = ISCSI_MGMT_CMDS_MAX; + session->cmds_max = ISCSI_XMIT_CMDS_MAX; + session->cmdsn = initial_cmdsn; + session->exp_cmdsn = initial_cmdsn + 1; + session->max_cmdsn = initial_cmdsn + 1; + session->max_r2t = 1; + session->tt = iscsit; + + /* initialize SCSI PDU commands pool */ + if (iscsi_pool_init(&session->cmdpool, session->cmds_max, + (void***)&session->cmds, + cmd_task_size + sizeof(struct iscsi_cmd_task))) + goto cmdpool_alloc_fail; + + /* pre-format cmds pool with ITT */ + for (cmd_i = 0; cmd_i < session->cmds_max; cmd_i++) { + struct iscsi_cmd_task *ctask = session->cmds[cmd_i]; + + if (cmd_task_size) + ctask->dd_data = &ctask[1]; + ctask->itt = cmd_i; + } + + spin_lock_init(&session->lock); + INIT_LIST_HEAD(&session->connections); + + /* initialize immediate command pool */ + if (iscsi_pool_init(&session->mgmtpool, session->mgmtpool_max, + (void***)&session->mgmt_cmds, + mgmt_task_size + sizeof(struct iscsi_mgmt_task))) + goto mgmtpool_alloc_fail; + + + /* pre-format immediate cmds pool with ITT */ + for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) { + struct iscsi_mgmt_task *mtask = session->mgmt_cmds[cmd_i]; + + if (mgmt_task_size) + mtask->dd_data = &mtask[1]; + mtask->itt = ISCSI_MGMT_ITT_OFFSET + cmd_i; + mtask->data = kmalloc(DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH, + GFP_KERNEL); + if (!mtask->data) { + int j; + + for (j = 0; j < cmd_i; j++) + kfree(session->mgmt_cmds[j]->data); + goto immdata_alloc_fail; + } + } + + if (scsi_add_host(shost, NULL)) + goto add_host_fail; + + cls_session = iscsi_create_session(shost, iscsit, 0); + if (!cls_session) + goto cls_session_fail; + *(unsigned long*)shost->hostdata = (unsigned long)cls_session; + + return cls_session; + +cls_session_fail: + scsi_remove_host(shost); +add_host_fail: + for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) + kfree(session->mgmt_cmds[cmd_i]->data); +immdata_alloc_fail: + iscsi_pool_free(&session->mgmtpool, (void**)session->mgmt_cmds); +mgmtpool_alloc_fail: + iscsi_pool_free(&session->cmdpool, (void**)session->cmds); +cmdpool_alloc_fail: + scsi_host_put(shost); + return NULL; +} +EXPORT_SYMBOL_GPL(iscsi_session_setup); + +/** + * iscsi_session_teardown - destroy session, host, and cls_session + * shost: scsi host + * + * This can be used by software iscsi_transports that allocate + * a session per scsi host. + **/ +void iscsi_session_teardown(struct iscsi_cls_session *cls_session) +{ + struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); + struct iscsi_session *session = iscsi_hostdata(shost->hostdata); + int cmd_i; + + scsi_remove_host(shost); + + for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) + kfree(session->mgmt_cmds[cmd_i]->data); + + iscsi_pool_free(&session->mgmtpool, (void**)session->mgmt_cmds); + iscsi_pool_free(&session->cmdpool, (void**)session->cmds); + + iscsi_destroy_session(cls_session); + scsi_host_put(shost); +} +EXPORT_SYMBOL_GPL(iscsi_session_teardown); + +/** + * iscsi_conn_setup - create iscsi_cls_conn and iscsi_conn + * @cls_session: iscsi_cls_session + * @conn_idx: cid + **/ +struct iscsi_cls_conn * +iscsi_conn_setup(struct iscsi_cls_session *cls_session, uint32_t conn_idx) +{ + struct iscsi_session *session = class_to_transport_session(cls_session); + struct iscsi_conn *conn; + struct iscsi_cls_conn *cls_conn; + + cls_conn = iscsi_create_conn(cls_session, conn_idx); + if (!cls_conn) + return NULL; + conn = cls_conn->dd_data; + memset(conn, 0, sizeof(*conn)); + + conn->session = session; + conn->cls_conn = cls_conn; + conn->c_stage = ISCSI_CONN_INITIAL_STAGE; + conn->id = conn_idx; + conn->exp_statsn = 0; + conn->tmabort_state = TMABORT_INITIAL; + INIT_LIST_HEAD(&conn->run_list); + INIT_LIST_HEAD(&conn->mgmt_run_list); + + /* initialize general xmit PDU commands queue */ + conn->xmitqueue = kfifo_alloc(session->cmds_max * sizeof(void*), + GFP_KERNEL, NULL); + if (conn->xmitqueue == ERR_PTR(-ENOMEM)) + goto xmitqueue_alloc_fail; + + /* initialize general immediate & non-immediate PDU commands queue */ + conn->immqueue = kfifo_alloc(session->mgmtpool_max * sizeof(void*), + GFP_KERNEL, NULL); + if (conn->immqueue == ERR_PTR(-ENOMEM)) + goto immqueue_alloc_fail; + + conn->mgmtqueue = kfifo_alloc(session->mgmtpool_max * sizeof(void*), + GFP_KERNEL, NULL); + if (conn->mgmtqueue == ERR_PTR(-ENOMEM)) + goto mgmtqueue_alloc_fail; + + INIT_WORK(&conn->xmitwork, iscsi_xmitworker, conn); + + /* allocate login_mtask used for the login/text sequences */ + spin_lock_bh(&session->lock); + if (!__kfifo_get(session->mgmtpool.queue, + (void*)&conn->login_mtask, + sizeof(void*))) { + spin_unlock_bh(&session->lock); + goto login_mtask_alloc_fail; + } + spin_unlock_bh(&session->lock); + + init_timer(&conn->tmabort_timer); + mutex_init(&conn->xmitmutex); + init_waitqueue_head(&conn->ehwait); + + return cls_conn; + +login_mtask_alloc_fail: + kfifo_free(conn->mgmtqueue); +mgmtqueue_alloc_fail: + kfifo_free(conn->immqueue); +immqueue_alloc_fail: + kfifo_free(conn->xmitqueue); +xmitqueue_alloc_fail: + iscsi_destroy_conn(cls_conn); + return NULL; +} +EXPORT_SYMBOL_GPL(iscsi_conn_setup); + +/** + * iscsi_conn_teardown - teardown iscsi connection + * cls_conn: iscsi class connection + * + * TODO: we may need to make this into a two step process + * like scsi-mls remove + put host + */ +void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_session *session = conn->session; + unsigned long flags; + + mutex_lock(&conn->xmitmutex); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + if (conn->c_stage == ISCSI_CONN_INITIAL_STAGE) { + if (session->tt->suspend_conn_recv) + session->tt->suspend_conn_recv(conn); + + session->tt->terminate_conn(conn); + } + + spin_lock_bh(&session->lock); + conn->c_stage = ISCSI_CONN_CLEANUP_WAIT; + if (session->leadconn == conn) { + /* + * leading connection? then give up on recovery. + */ + session->state = ISCSI_STATE_TERMINATE; + wake_up(&conn->ehwait); + } + spin_unlock_bh(&session->lock); + + mutex_unlock(&conn->xmitmutex); + + /* + * Block until all in-progress commands for this connection + * time out or fail. + */ + for (;;) { + spin_lock_irqsave(session->host->host_lock, flags); + if (!session->host->host_busy) { /* OK for ERL == 0 */ + spin_unlock_irqrestore(session->host->host_lock, flags); + break; + } + spin_unlock_irqrestore(session->host->host_lock, flags); + msleep_interruptible(500); + printk(KERN_INFO "iscsi: scsi conn_destroy(): host_busy %d " + "host_failed %d\n", session->host->host_busy, + session->host->host_failed); + /* + * force eh_abort() to unblock + */ + wake_up(&conn->ehwait); + } + + spin_lock_bh(&session->lock); + __kfifo_put(session->mgmtpool.queue, (void*)&conn->login_mtask, + sizeof(void*)); + list_del(&conn->item); + if (list_empty(&session->connections)) + session->leadconn = NULL; + if (session->leadconn && session->leadconn == conn) + session->leadconn = container_of(session->connections.next, + struct iscsi_conn, item); + + if (session->leadconn == NULL) + /* no connections exits.. reset sequencing */ + session->cmdsn = session->max_cmdsn = session->exp_cmdsn = 1; + spin_unlock_bh(&session->lock); + + kfifo_free(conn->xmitqueue); + kfifo_free(conn->immqueue); + kfifo_free(conn->mgmtqueue); + + iscsi_destroy_conn(cls_conn); +} +EXPORT_SYMBOL_GPL(iscsi_conn_teardown); + +int iscsi_conn_start(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_session *session = conn->session; + + if (session == NULL) { + printk(KERN_ERR "iscsi: can't start unbound connection\n"); + return -EPERM; + } + + spin_lock_bh(&session->lock); + conn->c_stage = ISCSI_CONN_STARTED; + session->state = ISCSI_STATE_LOGGED_IN; + + switch(conn->stop_stage) { + case STOP_CONN_RECOVER: + /* + * unblock eh_abort() if it is blocked. re-try all + * commands after successful recovery + */ + session->conn_cnt++; + conn->stop_stage = 0; + conn->tmabort_state = TMABORT_INITIAL; + session->age++; + session->recovery_failed = 0; + spin_unlock_bh(&session->lock); + + iscsi_unblock_session(session_to_cls(session)); + wake_up(&conn->ehwait); + return 0; + case STOP_CONN_TERM: + session->conn_cnt++; + conn->stop_stage = 0; + break; + case STOP_CONN_SUSPEND: + conn->stop_stage = 0; + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx); + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + break; + default: + break; + } + spin_unlock_bh(&session->lock); + + return 0; +} +EXPORT_SYMBOL_GPL(iscsi_conn_start); + +static void +flush_control_queues(struct iscsi_session *session, struct iscsi_conn *conn) +{ + struct iscsi_mgmt_task *mtask, *tmp; + + /* handle pending */ + while (__kfifo_get(conn->immqueue, (void*)&mtask, sizeof(void*)) || + __kfifo_get(conn->mgmtqueue, (void*)&mtask, sizeof(void*))) { + if (mtask == conn->login_mtask) + continue; + debug_scsi("flushing pending mgmt task itt 0x%x\n", mtask->itt); + __kfifo_put(session->mgmtpool.queue, (void*)&mtask, + sizeof(void*)); + } + + /* handle running */ + list_for_each_entry_safe(mtask, tmp, &conn->mgmt_run_list, running) { + debug_scsi("flushing running mgmt task itt 0x%x\n", mtask->itt); + list_del(&mtask->running); + + if (mtask == conn->login_mtask) + continue; + __kfifo_put(session->mgmtpool.queue, (void*)&mtask, + sizeof(void*)); + } + + conn->mtask = NULL; +} + +/* Fail commands. Mutex and session lock held and recv side suspended */ +static void fail_all_commands(struct iscsi_conn *conn) +{ + struct iscsi_cmd_task *ctask, *tmp; + + /* flush pending */ + while (__kfifo_get(conn->xmitqueue, (void*)&ctask, sizeof(void*))) { + debug_scsi("failing pending sc %p itt 0x%x\n", ctask->sc, + ctask->itt); + fail_command(conn, ctask, DID_BUS_BUSY << 16); + } + + /* fail all other running */ + list_for_each_entry_safe(ctask, tmp, &conn->run_list, running) { + debug_scsi("failing in progress sc %p itt 0x%x\n", + ctask->sc, ctask->itt); + fail_command(conn, ctask, DID_BUS_BUSY << 16); + } + + conn->ctask = NULL; +} + +void iscsi_start_session_recovery(struct iscsi_session *session, + struct iscsi_conn *conn, int flag) +{ + int old_stop_stage; + + spin_lock_bh(&session->lock); + if (conn->stop_stage == STOP_CONN_TERM) { + spin_unlock_bh(&session->lock); + return; + } + + /* + * When this is called for the in_login state, we only want to clean + * up the login task and connection. + */ + if (conn->stop_stage != STOP_CONN_RECOVER) + session->conn_cnt--; + + old_stop_stage = conn->stop_stage; + conn->stop_stage = flag; + spin_unlock_bh(&session->lock); + + if (session->tt->suspend_conn_recv) + session->tt->suspend_conn_recv(conn); + + mutex_lock(&conn->xmitmutex); + spin_lock_bh(&session->lock); + conn->c_stage = ISCSI_CONN_STOPPED; + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + + if (session->conn_cnt == 0 || session->leadconn == conn) + session->state = ISCSI_STATE_FAILED; + + spin_unlock_bh(&session->lock); + + session->tt->terminate_conn(conn); + /* + * flush queues. + */ + spin_lock_bh(&session->lock); + fail_all_commands(conn); + flush_control_queues(session, conn); + spin_unlock_bh(&session->lock); + + /* + * for connection level recovery we should not calculate + * header digest. conn->hdr_size used for optimization + * in hdr_extract() and will be re-negotiated at + * set_param() time. + */ + if (flag == STOP_CONN_RECOVER) { + conn->hdrdgst_en = 0; + conn->datadgst_en = 0; + + /* + * if this is called from the eh and and from userspace + * then we only need to block once. + */ + if (session->state == ISCSI_STATE_FAILED && + old_stop_stage != STOP_CONN_RECOVER) + iscsi_block_session(session_to_cls(session)); + } + mutex_unlock(&conn->xmitmutex); +} +EXPORT_SYMBOL_GPL(iscsi_start_session_recovery); + +void iscsi_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_session *session = conn->session; + + switch (flag) { + case STOP_CONN_RECOVER: + case STOP_CONN_TERM: + iscsi_start_session_recovery(session, conn, flag); + break; + case STOP_CONN_SUSPEND: + if (session->tt->suspend_conn_recv) + session->tt->suspend_conn_recv(conn); + + mutex_lock(&conn->xmitmutex); + spin_lock_bh(&session->lock); + + conn->stop_stage = flag; + conn->c_stage = ISCSI_CONN_STOPPED; + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + + spin_unlock_bh(&session->lock); + mutex_unlock(&conn->xmitmutex); + break; + default: + printk(KERN_ERR "iscsi: invalid stop flag %d\n", flag); + } +} +EXPORT_SYMBOL_GPL(iscsi_conn_stop); + +int iscsi_conn_bind(struct iscsi_cls_session *cls_session, + struct iscsi_cls_conn *cls_conn, int is_leading) +{ + struct iscsi_session *session = class_to_transport_session(cls_session); + struct iscsi_conn *tmp = ERR_PTR(-EEXIST), *conn = cls_conn->dd_data; + + /* lookup for existing connection */ + spin_lock_bh(&session->lock); + list_for_each_entry(tmp, &session->connections, item) { + if (tmp == conn) { + if (conn->c_stage != ISCSI_CONN_STOPPED || + conn->stop_stage == STOP_CONN_TERM) { + printk(KERN_ERR "iscsi: can't bind " + "non-stopped connection (%d:%d)\n", + conn->c_stage, conn->stop_stage); + spin_unlock_bh(&session->lock); + return -EIO; + } + break; + } + } + if (tmp != conn) { + /* bind new iSCSI connection to session */ + conn->session = session; + list_add(&conn->item, &session->connections); + } + spin_unlock_bh(&session->lock); + + if (is_leading) + session->leadconn = conn; + + /* + * Unblock xmitworker(), Login Phase will pass through. + */ + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx); + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + return 0; +} +EXPORT_SYMBOL_GPL(iscsi_conn_bind); + +MODULE_AUTHOR("Mike Christie"); +MODULE_DESCRIPTION("iSCSI library functions"); +MODULE_LICENSE("GPL"); diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c index 941c1e1..21b06ff 100644 --- a/drivers/scsi/scsi_devinfo.c +++ b/drivers/scsi/scsi_devinfo.c @@ -159,6 +159,8 @@ static struct { {"HITACHI", "DF400", "*", BLIST_SPARSELUN}, {"HITACHI", "DF500", "*", BLIST_SPARSELUN}, {"HITACHI", "DF600", "*", BLIST_SPARSELUN}, + {"HITACHI", "DISK-SUBSYSTEM", "*", BLIST_ATTACH_PQ3 | BLIST_SPARSELUN | BLIST_LARGELUN}, + {"HITACHI", "OPEN-E", "*", BLIST_ATTACH_PQ3 | BLIST_SPARSELUN | BLIST_LARGELUN}, {"HP", "A6189A", NULL, BLIST_SPARSELUN | BLIST_LARGELUN}, /* HP VA7400 */ {"HP", "OPEN-", "*", BLIST_SPARSELUN | BLIST_LARGELUN}, /* HP XP Arrays */ {"HP", "NetRAID-4M", NULL, BLIST_FORCELUN}, diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 764a8b3..84390d4 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2363,3 +2363,61 @@ scsi_target_unblock(struct device *dev) device_for_each_child(dev, NULL, target_unblock); } EXPORT_SYMBOL_GPL(scsi_target_unblock); + +/** + * scsi_kmap_atomic_sg - find and atomically map an sg-elemnt + * @sg: scatter-gather list + * @sg_count: number of segments in sg + * @offset: offset in bytes into sg, on return offset into the mapped area + * @len: bytes to map, on return number of bytes mapped + * + * Returns virtual address of the start of the mapped page + */ +void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count, + size_t *offset, size_t *len) +{ + int i; + size_t sg_len = 0, len_complete = 0; + struct page *page; + + for (i = 0; i < sg_count; i++) { + len_complete = sg_len; /* Complete sg-entries */ + sg_len += sg[i].length; + if (sg_len > *offset) + break; + } + + if (unlikely(i == sg_count)) { + printk(KERN_ERR "%s: Bytes in sg: %zu, requested offset %zu, " + "elements %d\n", + __FUNCTION__, sg_len, *offset, sg_count); + WARN_ON(1); + return NULL; + } + + /* Offset starting from the beginning of first page in this sg-entry */ + *offset = *offset - len_complete + sg[i].offset; + + /* Assumption: contiguous pages can be accessed as "page + i" */ + page = nth_page(sg[i].page, (*offset >> PAGE_SHIFT)); + *offset &= ~PAGE_MASK; + + /* Bytes in this sg-entry from *offset to the end of the page */ + sg_len = PAGE_SIZE - *offset; + if (*len > sg_len) + *len = sg_len; + + return kmap_atomic(page, KM_BIO_SRC_IRQ); +} +EXPORT_SYMBOL(scsi_kmap_atomic_sg); + +/** + * scsi_kunmap_atomic_sg - atomically unmap a virtual address, previously + * mapped with scsi_kmap_atomic_sg + * @virt: virtual address to be unmapped + */ +void scsi_kunmap_atomic_sg(void *virt) +{ + kunmap_atomic(virt, KM_BIO_SRC_IRQ); +} +EXPORT_SYMBOL(scsi_kunmap_atomic_sg); diff --git a/drivers/scsi/scsi_logging.h b/drivers/scsi/scsi_logging.h index e1722ba..a3e2af6 100644 --- a/drivers/scsi/scsi_logging.h +++ b/drivers/scsi/scsi_logging.h @@ -45,10 +45,12 @@ #define SCSI_LOG_LEVEL(SHIFT, BITS) \ ((scsi_logging_level >> (SHIFT)) & ((1 << (BITS)) - 1)) #define SCSI_CHECK_LOGGING(SHIFT, BITS, LEVEL, CMD) \ -{ \ +do { \ if (unlikely((SCSI_LOG_LEVEL(SHIFT, BITS)) > (LEVEL))) \ - (CMD); \ -} + do { \ + CMD; \ + } while (0); \ +} while (0) #else #define SCSI_CHECK_LOGGING(SHIFT, BITS, LEVEL, CMD) #endif /* CONFIG_SCSI_LOGGING */ diff --git a/drivers/scsi/scsi_proc.c b/drivers/scsi/scsi_proc.c index 07be62b..55200e4 100644 --- a/drivers/scsi/scsi_proc.c +++ b/drivers/scsi/scsi_proc.c @@ -266,8 +266,6 @@ static ssize_t proc_scsi_write(struct fi lun = simple_strtoul(p + 1, &p, 0); err = scsi_add_single_device(host, channel, id, lun); - if (err >= 0) - err = length; /* * Usage: echo "scsi remove-single-device 0 1 2 3" >/proc/scsi/scsi @@ -284,6 +282,13 @@ static ssize_t proc_scsi_write(struct fi err = scsi_remove_single_device(host, channel, id, lun); } + /* + * convert success returns so that we return the + * number of bytes consumed. + */ + if (!err) + err = length; + out: free_page((unsigned long)buffer); return err; diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 1a5474b..f85d910 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -816,6 +816,32 @@ static inline void scsi_destroy_sdev(str put_device(&sdev->sdev_gendev); } +#ifdef CONFIG_SCSI_LOGGING +/** + * scsi_inq_str - print INQUIRY data from min to max index, + * strip trailing whitespace + * @buf: Output buffer with at least end-first+1 bytes of space + * @inq: Inquiry buffer (input) + * @first: Offset of string into inq + * @end: Index after last character in inq + */ +static unsigned char *scsi_inq_str(unsigned char *buf, unsigned char *inq, + unsigned first, unsigned end) +{ + unsigned term = 0, idx; + + for (idx = 0; idx + first < end && idx + first < inq[4] + 5; idx++) { + if (inq[idx+first] > ' ') { + buf[idx] = inq[idx+first]; + term = idx+1; + } else { + buf[idx] = ' '; + } + } + buf[term] = 0; + return buf; +} +#endif /** * scsi_probe_and_add_lun - probe a LUN, if a LUN is found add it @@ -880,10 +906,12 @@ static int scsi_probe_and_add_lun(struct if (scsi_probe_lun(sdev, result, result_len, &bflags)) goto out_free_result; + if (bflagsp) + *bflagsp = bflags; /* * result contains valid SCSI INQUIRY data. */ - if ((result[0] >> 5) == 3) { + if (((result[0] >> 5) == 3) && !(bflags & BLIST_ATTACH_PQ3)) { /* * For a Peripheral qualifier 3 (011b), the SCSI * spec says: The device server is not capable of @@ -894,9 +922,22 @@ static int scsi_probe_and_add_lun(struct * logical disk configured at sdev->lun, but there * is a target id responding. */ - SCSI_LOG_SCAN_BUS(3, printk(KERN_INFO - "scsi scan: peripheral qualifier of 3," - " no device added\n")); + SCSI_LOG_SCAN_BUS(2, sdev_printk(KERN_INFO, sdev, "scsi scan:" + " peripheral qualifier of 3, device not" + " added\n")) + if (lun == 0) { + SCSI_LOG_SCAN_BUS(1, { + unsigned char vend[9]; + unsigned char mod[17]; + + sdev_printk(KERN_INFO, sdev, + "scsi scan: consider passing scsi_mod." + "dev_flags=%s:%s:0x240 or 0x800240\n", + scsi_inq_str(vend, result, 8, 16), + scsi_inq_str(mod, result, 16, 32)); + }); + } + res = SCSI_SCAN_TARGET_PRESENT; goto out_free_result; } @@ -920,8 +961,6 @@ static int scsi_probe_and_add_lun(struct sdev->lockable = 0; scsi_unlock_floptical(sdev, result); } - if (bflagsp) - *bflagsp = bflags; } out_free_result: @@ -946,7 +985,6 @@ static int scsi_probe_and_add_lun(struct * scsi_sequential_lun_scan - sequentially scan a SCSI target * @starget: pointer to target structure to scan * @bflags: black/white list flag for LUN 0 - * @lun0_res: result of scanning LUN 0 * * Description: * Generally, scan from LUN 1 (LUN 0 is assumed to already have been @@ -956,8 +994,7 @@ static int scsi_probe_and_add_lun(struct * Modifies sdevscan->lun. **/ static void scsi_sequential_lun_scan(struct scsi_target *starget, - int bflags, int lun0_res, int scsi_level, - int rescan) + int bflags, int scsi_level, int rescan) { unsigned int sparse_lun, lun, max_dev_lun; struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); @@ -978,13 +1015,6 @@ static void scsi_sequential_lun_scan(str sparse_lun = 0; /* - * If not sparse lun and no device attached at LUN 0 do not scan - * any further. - */ - if (!sparse_lun && (lun0_res != SCSI_SCAN_LUN_PRESENT)) - return; - - /* * If less than SCSI_1_CSS, and no special lun scaning, stop * scanning; this matches 2.4 behaviour, but could just be a bug * (to continue scanning a SCSI_1_CSS device). @@ -1395,7 +1425,7 @@ static void __scsi_scan_target(struct de * do a sequential scan. */ scsi_sequential_lun_scan(starget, bflags, - res, starget->scsi_level, rescan); + starget->scsi_level, rescan); } out_reap: diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 2730d50..44adafa 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -31,31 +31,25 @@ #include #include #include -#define ISCSI_SESSION_ATTRS 8 -#define ISCSI_CONN_ATTRS 6 +#define ISCSI_SESSION_ATTRS 11 +#define ISCSI_CONN_ATTRS 11 +#define ISCSI_HOST_ATTRS 0 struct iscsi_internal { struct scsi_transport_template t; struct iscsi_transport *iscsi_transport; struct list_head list; - /* - * based on transport capabilities, at register time we set these - * bits to tell the transport class it wants attributes displayed - * in sysfs or that it can support different iSCSI Data-Path - * capabilities - */ - uint32_t param_mask; - struct class_device cdev; - /* - * We do not have any private or other attrs. - */ + + struct class_device_attribute *host_attrs[ISCSI_HOST_ATTRS + 1]; struct transport_container conn_cont; struct class_device_attribute *conn_attrs[ISCSI_CONN_ATTRS + 1]; struct transport_container session_cont; struct class_device_attribute *session_attrs[ISCSI_SESSION_ATTRS + 1]; }; +static int iscsi_session_nr; /* sysfs session id for next new session */ + /* * list of registered transports and lock that must * be held while accessing list. The iscsi_transport_lock must @@ -120,6 +114,24 @@ static struct attribute_group iscsi_tran .attrs = iscsi_transport_attrs, }; +static int iscsi_setup_host(struct transport_container *tc, struct device *dev, + struct class_device *cdev) +{ + struct Scsi_Host *shost = dev_to_shost(dev); + struct iscsi_host *ihost = shost->shost_data; + + memset(ihost, 0, sizeof(*ihost)); + INIT_LIST_HEAD(&ihost->sessions); + mutex_init(&ihost->mutex); + return 0; +} + +static DECLARE_TRANSPORT_CLASS(iscsi_host_class, + "iscsi_host", + iscsi_setup_host, + NULL, + NULL); + static DECLARE_TRANSPORT_CLASS(iscsi_session_class, "iscsi_session", NULL, @@ -165,14 +177,23 @@ static DEFINE_SPINLOCK(sesslock); static LIST_HEAD(connlist); static DEFINE_SPINLOCK(connlock); -static struct iscsi_cls_session *iscsi_session_lookup(uint64_t handle) +static uint32_t iscsi_conn_get_sid(struct iscsi_cls_conn *conn) +{ + struct iscsi_cls_session *sess = iscsi_dev_to_session(conn->dev.parent); + return sess->sid; +} + +/* + * Returns the matching session to a given sid + */ +static struct iscsi_cls_session *iscsi_session_lookup(uint32_t sid) { unsigned long flags; struct iscsi_cls_session *sess; spin_lock_irqsave(&sesslock, flags); list_for_each_entry(sess, &sesslist, sess_list) { - if (sess == iscsi_ptr(handle)) { + if (sess->sid == sid) { spin_unlock_irqrestore(&sesslock, flags); return sess; } @@ -181,14 +202,17 @@ static struct iscsi_cls_session *iscsi_s return NULL; } -static struct iscsi_cls_conn *iscsi_conn_lookup(uint64_t handle) +/* + * Returns the matching connection to a given sid / cid tuple + */ +static struct iscsi_cls_conn *iscsi_conn_lookup(uint32_t sid, uint32_t cid) { unsigned long flags; struct iscsi_cls_conn *conn; spin_lock_irqsave(&connlock, flags); list_for_each_entry(conn, &connlist, conn_list) { - if (conn == iscsi_ptr(handle)) { + if ((conn->cid == cid) && (iscsi_conn_get_sid(conn) == sid)) { spin_unlock_irqrestore(&connlock, flags); return conn; } @@ -209,6 +233,7 @@ static void iscsi_session_release(struct shost = iscsi_session_to_shost(session); scsi_host_put(shost); + kfree(session->targetname); kfree(session); module_put(transport->owner); } @@ -218,30 +243,95 @@ static int iscsi_is_session_dev(const st return dev->release == iscsi_session_release; } +static int iscsi_user_scan(struct Scsi_Host *shost, uint channel, + uint id, uint lun) +{ + struct iscsi_host *ihost = shost->shost_data; + struct iscsi_cls_session *session; + + mutex_lock(&ihost->mutex); + list_for_each_entry(session, &ihost->sessions, host_list) { + if ((channel == SCAN_WILD_CARD || + channel == session->channel) && + (id == SCAN_WILD_CARD || id == session->target_id)) + scsi_scan_target(&session->dev, session->channel, + session->target_id, lun, 1); + } + mutex_unlock(&ihost->mutex); + + return 0; +} + +static void session_recovery_timedout(void *data) +{ + struct iscsi_cls_session *session = data; + + dev_printk(KERN_INFO, &session->dev, "iscsi: session recovery timed " + "out after %d secs\n", session->recovery_tmo); + + if (session->transport->session_recovery_timedout) + session->transport->session_recovery_timedout(session); + + scsi_target_unblock(&session->dev); +} + +void iscsi_unblock_session(struct iscsi_cls_session *session) +{ + if (!cancel_delayed_work(&session->recovery_work)) + flush_scheduled_work(); + scsi_target_unblock(&session->dev); +} +EXPORT_SYMBOL_GPL(iscsi_unblock_session); + +void iscsi_block_session(struct iscsi_cls_session *session) +{ + scsi_target_block(&session->dev); + schedule_delayed_work(&session->recovery_work, + session->recovery_tmo * HZ); +} +EXPORT_SYMBOL_GPL(iscsi_block_session); + /** * iscsi_create_session - create iscsi class session * @shost: scsi host * @transport: iscsi transport * - * This can be called from a LLD or iscsi_transport + * This can be called from a LLD or iscsi_transport. **/ struct iscsi_cls_session * -iscsi_create_session(struct Scsi_Host *shost, struct iscsi_transport *transport) +iscsi_create_session(struct Scsi_Host *shost, + struct iscsi_transport *transport, int channel) { + struct iscsi_host *ihost; struct iscsi_cls_session *session; int err; if (!try_module_get(transport->owner)) return NULL; - session = kzalloc(sizeof(*session), GFP_KERNEL); + session = kzalloc(sizeof(*session) + transport->sessiondata_size, + GFP_KERNEL); if (!session) goto module_put; session->transport = transport; + session->recovery_tmo = 120; + INIT_WORK(&session->recovery_work, session_recovery_timedout, session); + INIT_LIST_HEAD(&session->host_list); + INIT_LIST_HEAD(&session->sess_list); + + if (transport->sessiondata_size) + session->dd_data = &session[1]; /* this is released in the dev's release function */ scsi_host_get(shost); - snprintf(session->dev.bus_id, BUS_ID_SIZE, "session%u", shost->host_no); + ihost = shost->shost_data; + + session->sid = iscsi_session_nr++; + session->channel = channel; + session->target_id = ihost->next_target_id++; + + snprintf(session->dev.bus_id, BUS_ID_SIZE, "session%u", + session->sid); session->dev.parent = &shost->shost_gendev; session->dev.release = iscsi_session_release; err = device_register(&session->dev); @@ -252,6 +342,10 @@ iscsi_create_session(struct Scsi_Host *s } transport_register_device(&session->dev); + mutex_lock(&ihost->mutex); + list_add(&session->host_list, &ihost->sessions); + mutex_unlock(&ihost->mutex); + return session; free_session: @@ -272,6 +366,16 @@ EXPORT_SYMBOL_GPL(iscsi_create_session); **/ int iscsi_destroy_session(struct iscsi_cls_session *session) { + struct Scsi_Host *shost = iscsi_session_to_shost(session); + struct iscsi_host *ihost = shost->shost_data; + + if (!cancel_delayed_work(&session->recovery_work)) + flush_scheduled_work(); + + mutex_lock(&ihost->mutex); + list_del(&session->host_list); + mutex_unlock(&ihost->mutex); + transport_unregister_device(&session->dev); device_unregister(&session->dev); return 0; @@ -284,6 +388,7 @@ static void iscsi_conn_release(struct de struct iscsi_cls_conn *conn = iscsi_dev_to_conn(dev); struct device *parent = conn->dev.parent; + kfree(conn->persistent_address); kfree(conn); put_device(parent); } @@ -301,12 +406,16 @@ static int iscsi_is_conn_dev(const struc * This can be called from a LLD or iscsi_transport. The connection * is child of the session so cid must be unique for all connections * on the session. + * + * Since we do not support MCS, cid will normally be zero. In some cases + * for software iscsi we could be trying to preallocate a connection struct + * in which case there could be two connection structs and cid would be + * non-zero. **/ struct iscsi_cls_conn * iscsi_create_conn(struct iscsi_cls_session *session, uint32_t cid) { struct iscsi_transport *transport = session->transport; - struct Scsi_Host *shost = iscsi_session_to_shost(session); struct iscsi_cls_conn *conn; int err; @@ -319,12 +428,14 @@ iscsi_create_conn(struct iscsi_cls_sessi INIT_LIST_HEAD(&conn->conn_list); conn->transport = transport; + conn->cid = cid; /* this is released in the dev's release function */ if (!get_device(&session->dev)) goto free_conn; + snprintf(conn->dev.bus_id, BUS_ID_SIZE, "connection%d:%u", - shost->host_no, cid); + session->sid, cid); conn->dev.parent = &session->dev; conn->dev.release = iscsi_conn_release; err = device_register(&conn->dev); @@ -361,105 +472,6 @@ int iscsi_destroy_conn(struct iscsi_cls_ EXPORT_SYMBOL_GPL(iscsi_destroy_conn); /* - * These functions are used only by software iscsi_transports - * which do not allocate and more their scsi_hosts since this - * is initiated from userspace. - */ - -/* - * iSCSI Session's hostdata organization: - * - * *------------------* <== hostdata_session(host->hostdata) - * | ptr to class sess| - * |------------------| <== iscsi_hostdata(host->hostdata) - * | transport's data | - * *------------------* - */ - -#define hostdata_privsize(_t) (sizeof(unsigned long) + _t->hostdata_size + \ - _t->hostdata_size % sizeof(unsigned long)) - -#define hostdata_session(_hostdata) (iscsi_ptr(*(unsigned long *)_hostdata)) - -/** - * iscsi_transport_create_session - create iscsi cls session and host - * scsit: scsi transport template - * transport: iscsi transport template - * - * This can be used by software iscsi_transports that allocate - * a session per scsi host. - **/ -struct Scsi_Host * -iscsi_transport_create_session(struct scsi_transport_template *scsit, - struct iscsi_transport *transport) -{ - struct iscsi_cls_session *session; - struct Scsi_Host *shost; - unsigned long flags; - - shost = scsi_host_alloc(transport->host_template, - hostdata_privsize(transport)); - if (!shost) { - printk(KERN_ERR "iscsi: can not allocate SCSI host for " - "session\n"); - return NULL; - } - - shost->max_id = 1; - shost->max_channel = 0; - shost->max_lun = transport->max_lun; - shost->max_cmd_len = transport->max_cmd_len; - shost->transportt = scsit; - shost->transportt->create_work_queue = 1; - - if (scsi_add_host(shost, NULL)) - goto free_host; - - session = iscsi_create_session(shost, transport); - if (!session) - goto remove_host; - - *(unsigned long*)shost->hostdata = (unsigned long)session; - spin_lock_irqsave(&sesslock, flags); - list_add(&session->sess_list, &sesslist); - spin_unlock_irqrestore(&sesslock, flags); - return shost; - -remove_host: - scsi_remove_host(shost); -free_host: - scsi_host_put(shost); - return NULL; -} - -EXPORT_SYMBOL_GPL(iscsi_transport_create_session); - -/** - * iscsi_transport_destroy_session - destroy session and scsi host - * shost: scsi host - * - * This can be used by software iscsi_transports that allocate - * a session per scsi host. - **/ -int iscsi_transport_destroy_session(struct Scsi_Host *shost) -{ - struct iscsi_cls_session *session; - unsigned long flags; - - scsi_remove_host(shost); - session = hostdata_session(shost->hostdata); - spin_lock_irqsave(&sesslock, flags); - list_del(&session->sess_list); - spin_unlock_irqrestore(&sesslock, flags); - iscsi_destroy_session(session); - /* ref from host alloc */ - scsi_host_put(shost); - return 0; -} - -EXPORT_SYMBOL_GPL(iscsi_transport_destroy_session); - -/* * iscsi interface functions */ static struct iscsi_internal * @@ -574,6 +586,7 @@ iscsi_unicast_skb(struct mempool_zone *z } spin_lock_irqsave(&zone->freelock, flags); + INIT_LIST_HEAD(skb_to_lh(skb)); list_add(skb_to_lh(skb), &zone->freequeue); spin_unlock_irqrestore(&zone->freelock, flags); @@ -607,7 +620,8 @@ int iscsi_recv_pdu(struct iscsi_cls_conn ev->type = ISCSI_KEVENT_RECV_PDU; if (atomic_read(&conn->z_pdu->allocated) >= conn->z_pdu->hiwat) ev->iferror = -ENOMEM; - ev->r.recv_req.conn_handle = iscsi_handle(conn); + ev->r.recv_req.cid = conn->cid; + ev->r.recv_req.sid = iscsi_conn_get_sid(conn); pdu = (char*)ev + sizeof(*ev); memcpy(pdu, hdr, sizeof(struct iscsi_hdr)); memcpy(pdu + sizeof(struct iscsi_hdr), data, data_size); @@ -639,7 +653,8 @@ void iscsi_conn_error(struct iscsi_cls_c if (atomic_read(&conn->z_error->allocated) >= conn->z_error->hiwat) ev->iferror = -ENOMEM; ev->r.connerror.error = error; - ev->r.connerror.conn_handle = iscsi_handle(conn); + ev->r.connerror.cid = conn->cid; + ev->r.connerror.sid = iscsi_conn_get_sid(conn); iscsi_unicast_skb(conn->z_error, skb); @@ -689,7 +704,7 @@ iscsi_if_get_stats(struct iscsi_transpor ISCSI_STATS_CUSTOM_MAX); int err = 0; - conn = iscsi_conn_lookup(ev->u.get_stats.conn_handle); + conn = iscsi_conn_lookup(ev->u.get_stats.sid, ev->u.get_stats.cid); if (!conn) return -EEXIST; @@ -713,8 +728,10 @@ iscsi_if_get_stats(struct iscsi_transpor evstat->type = nlh->nlmsg_type; if (atomic_read(&conn->z_pdu->allocated) >= conn->z_pdu->hiwat) evstat->iferror = -ENOMEM; - evstat->u.get_stats.conn_handle = - ev->u.get_stats.conn_handle; + evstat->u.get_stats.cid = + ev->u.get_stats.cid; + evstat->u.get_stats.sid = + ev->u.get_stats.sid; stats = (struct iscsi_stats *) ((char*)evstat + sizeof(*evstat)); memset(stats, 0, sizeof(*stats)); @@ -740,16 +757,21 @@ iscsi_if_create_session(struct iscsi_int { struct iscsi_transport *transport = priv->iscsi_transport; struct iscsi_cls_session *session; - uint32_t sid; + unsigned long flags; + uint32_t hostno; - session = transport->create_session(&priv->t, + session = transport->create_session(transport, &priv->t, ev->u.c_session.initial_cmdsn, - &sid); + &hostno); if (!session) return -ENOMEM; - ev->r.c_session_ret.session_handle = iscsi_handle(session); - ev->r.c_session_ret.sid = sid; + spin_lock_irqsave(&sesslock, flags); + list_add(&session->sess_list, &sesslist); + spin_unlock_irqrestore(&sesslock, flags); + + ev->r.c_session_ret.host_no = hostno; + ev->r.c_session_ret.sid = session->sid; return 0; } @@ -760,13 +782,20 @@ iscsi_if_create_conn(struct iscsi_transp struct iscsi_cls_session *session; unsigned long flags; - session = iscsi_session_lookup(ev->u.c_conn.session_handle); - if (!session) + session = iscsi_session_lookup(ev->u.c_conn.sid); + if (!session) { + printk(KERN_ERR "iscsi: invalid session %d\n", + ev->u.c_conn.sid); return -EINVAL; + } conn = transport->create_conn(session, ev->u.c_conn.cid); - if (!conn) + if (!conn) { + printk(KERN_ERR "iscsi: couldn't create a new " + "connection for session %d\n", + session->sid); return -ENOMEM; + } conn->z_pdu = mempool_zone_init(Z_MAX_PDU, NLMSG_SPACE(sizeof(struct iscsi_uevent) + @@ -788,7 +817,8 @@ iscsi_if_create_conn(struct iscsi_transp goto free_pdu_pool; } - ev->r.handle = iscsi_handle(conn); + ev->r.c_conn_ret.sid = session->sid; + ev->r.c_conn_ret.cid = conn->cid; spin_lock_irqsave(&connlock, flags); list_add(&conn->conn_list, &connlist); @@ -812,7 +842,7 @@ iscsi_if_destroy_conn(struct iscsi_trans struct iscsi_cls_conn *conn; struct mempool_zone *z_error, *z_pdu; - conn = iscsi_conn_lookup(ev->u.d_conn.conn_handle); + conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid); if (!conn) return -EINVAL; spin_lock_irqsave(&connlock, flags); @@ -832,6 +862,106 @@ iscsi_if_destroy_conn(struct iscsi_trans return 0; } +static void +iscsi_copy_param(struct iscsi_uevent *ev, uint32_t *value, char *data) +{ + if (ev->u.set_param.len != sizeof(uint32_t)) + BUG(); + memcpy(value, data, min_t(uint32_t, sizeof(uint32_t), + ev->u.set_param.len)); +} + +static int +iscsi_set_param(struct iscsi_transport *transport, struct iscsi_uevent *ev) +{ + char *data = (char*)ev + sizeof(*ev); + struct iscsi_cls_conn *conn; + struct iscsi_cls_session *session; + int err = 0; + uint32_t value = 0; + + session = iscsi_session_lookup(ev->u.set_param.sid); + conn = iscsi_conn_lookup(ev->u.set_param.sid, ev->u.set_param.cid); + if (!conn || !session) + return -EINVAL; + + switch (ev->u.set_param.param) { + case ISCSI_PARAM_SESS_RECOVERY_TMO: + iscsi_copy_param(ev, &value, data); + if (value != 0) + session->recovery_tmo = value; + break; + case ISCSI_PARAM_TARGET_NAME: + /* this should not change between logins */ + if (session->targetname) + return 0; + + session->targetname = kstrdup(data, GFP_KERNEL); + if (!session->targetname) + return -ENOMEM; + break; + case ISCSI_PARAM_TPGT: + iscsi_copy_param(ev, &value, data); + session->tpgt = value; + break; + case ISCSI_PARAM_PERSISTENT_PORT: + iscsi_copy_param(ev, &value, data); + conn->persistent_port = value; + break; + case ISCSI_PARAM_PERSISTENT_ADDRESS: + /* + * this is the address returned in discovery so it should + * not change between logins. + */ + if (conn->persistent_address) + return 0; + + conn->persistent_address = kstrdup(data, GFP_KERNEL); + if (!conn->persistent_address) + return -ENOMEM; + break; + default: + iscsi_copy_param(ev, &value, data); + err = transport->set_param(conn, ev->u.set_param.param, value); + } + + return err; +} + +static int +iscsi_if_transport_ep(struct iscsi_transport *transport, + struct iscsi_uevent *ev, int msg_type) +{ + struct sockaddr *dst_addr; + int rc = 0; + + switch (msg_type) { + case ISCSI_UEVENT_TRANSPORT_EP_CONNECT: + if (!transport->ep_connect) + return -EINVAL; + + dst_addr = (struct sockaddr *)((char*)ev + sizeof(*ev)); + rc = transport->ep_connect(dst_addr, + ev->u.ep_connect.non_blocking, + &ev->r.ep_connect_ret.handle); + break; + case ISCSI_UEVENT_TRANSPORT_EP_POLL: + if (!transport->ep_poll) + return -EINVAL; + + ev->r.retcode = transport->ep_poll(ev->u.ep_poll.ep_handle, + ev->u.ep_poll.timeout_ms); + break; + case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT: + if (!transport->ep_disconnect) + return -EINVAL; + + transport->ep_disconnect(ev->u.ep_disconnect.ep_handle); + break; + } + return rc; +} + static int iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) { @@ -841,6 +971,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, s struct iscsi_internal *priv; struct iscsi_cls_session *session; struct iscsi_cls_conn *conn; + unsigned long flags; priv = iscsi_if_transport_lookup(iscsi_ptr(ev->transport_handle)); if (!priv) @@ -855,10 +986,14 @@ iscsi_if_recv_msg(struct sk_buff *skb, s err = iscsi_if_create_session(priv, ev); break; case ISCSI_UEVENT_DESTROY_SESSION: - session = iscsi_session_lookup(ev->u.d_session.session_handle); - if (session) + session = iscsi_session_lookup(ev->u.d_session.sid); + if (session) { + spin_lock_irqsave(&sesslock, flags); + list_del(&session->sess_list); + spin_unlock_irqrestore(&sesslock, flags); + transport->destroy_session(session); - else + } else err = -EINVAL; break; case ISCSI_UEVENT_CREATE_CONN: @@ -868,41 +1003,35 @@ iscsi_if_recv_msg(struct sk_buff *skb, s err = iscsi_if_destroy_conn(transport, ev); break; case ISCSI_UEVENT_BIND_CONN: - session = iscsi_session_lookup(ev->u.b_conn.session_handle); - conn = iscsi_conn_lookup(ev->u.b_conn.conn_handle); + session = iscsi_session_lookup(ev->u.b_conn.sid); + conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid); if (session && conn) ev->r.retcode = transport->bind_conn(session, conn, - ev->u.b_conn.transport_fd, + ev->u.b_conn.transport_eph, ev->u.b_conn.is_leading); else err = -EINVAL; break; case ISCSI_UEVENT_SET_PARAM: - conn = iscsi_conn_lookup(ev->u.set_param.conn_handle); - if (conn) - ev->r.retcode = transport->set_param(conn, - ev->u.set_param.param, ev->u.set_param.value); - else - err = -EINVAL; + err = iscsi_set_param(transport, ev); break; case ISCSI_UEVENT_START_CONN: - conn = iscsi_conn_lookup(ev->u.start_conn.conn_handle); + conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid); if (conn) ev->r.retcode = transport->start_conn(conn); else err = -EINVAL; - break; case ISCSI_UEVENT_STOP_CONN: - conn = iscsi_conn_lookup(ev->u.stop_conn.conn_handle); + conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid); if (conn) transport->stop_conn(conn, ev->u.stop_conn.flag); else err = -EINVAL; break; case ISCSI_UEVENT_SEND_PDU: - conn = iscsi_conn_lookup(ev->u.send_pdu.conn_handle); + conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid); if (conn) ev->r.retcode = transport->send_pdu(conn, (struct iscsi_hdr*)((char*)ev + sizeof(*ev)), @@ -914,6 +1043,11 @@ iscsi_if_recv_msg(struct sk_buff *skb, s case ISCSI_UEVENT_GET_STATS: err = iscsi_if_get_stats(transport, nlh); break; + case ISCSI_UEVENT_TRANSPORT_EP_CONNECT: + case ISCSI_UEVENT_TRANSPORT_EP_POLL: + case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT: + err = iscsi_if_transport_ep(transport, ev, nlh->nlmsg_type); + break; default: err = -EINVAL; break; @@ -923,9 +1057,11 @@ iscsi_if_recv_msg(struct sk_buff *skb, s return err; } -/* Get message from skb (based on rtnetlink_rcv_skb). Each message is - * processed by iscsi_if_recv_msg. Malformed skbs with wrong length are - * or invalid creds discarded silently. */ +/* + * Get message from skb (based on rtnetlink_rcv_skb). Each message is + * processed by iscsi_if_recv_msg. Malformed skbs with wrong lengths or + * invalid creds are discarded silently. + */ static void iscsi_if_rx(struct sock *sk, int len) { @@ -988,6 +1124,10 @@ free_skb: #define iscsi_cdev_to_conn(_cdev) \ iscsi_dev_to_conn(_cdev->dev) +#define ISCSI_CLASS_ATTR(_prefix,_name,_mode,_show,_store) \ +struct class_device_attribute class_device_attr_##_prefix##_##_name = \ + __ATTR(_name,_mode,_show,_store) + /* * iSCSI connection attrs */ @@ -1005,7 +1145,8 @@ show_conn_int_param_##param(struct class #define iscsi_conn_int_attr(field, param, format) \ iscsi_conn_int_attr_show(param, format) \ -static CLASS_DEVICE_ATTR(field, S_IRUGO, show_conn_int_param_##param, NULL); +static ISCSI_CLASS_ATTR(conn, field, S_IRUGO, show_conn_int_param_##param, \ + NULL); iscsi_conn_int_attr(max_recv_dlength, ISCSI_PARAM_MAX_RECV_DLENGTH, "%u"); iscsi_conn_int_attr(max_xmit_dlength, ISCSI_PARAM_MAX_XMIT_DLENGTH, "%u"); @@ -1013,6 +1154,26 @@ iscsi_conn_int_attr(header_digest, ISCSI iscsi_conn_int_attr(data_digest, ISCSI_PARAM_DATADGST_EN, "%d"); iscsi_conn_int_attr(ifmarker, ISCSI_PARAM_IFMARKER_EN, "%d"); iscsi_conn_int_attr(ofmarker, ISCSI_PARAM_OFMARKER_EN, "%d"); +iscsi_conn_int_attr(persistent_port, ISCSI_PARAM_PERSISTENT_PORT, "%d"); +iscsi_conn_int_attr(port, ISCSI_PARAM_CONN_PORT, "%d"); +iscsi_conn_int_attr(exp_statsn, ISCSI_PARAM_EXP_STATSN, "%u"); + +#define iscsi_conn_str_attr_show(param) \ +static ssize_t \ +show_conn_str_param_##param(struct class_device *cdev, char *buf) \ +{ \ + struct iscsi_cls_conn *conn = iscsi_cdev_to_conn(cdev); \ + struct iscsi_transport *t = conn->transport; \ + return t->get_conn_str_param(conn, param, buf); \ +} + +#define iscsi_conn_str_attr(field, param) \ + iscsi_conn_str_attr_show(param) \ +static ISCSI_CLASS_ATTR(conn, field, S_IRUGO, show_conn_str_param_##param, \ + NULL); + +iscsi_conn_str_attr(persistent_address, ISCSI_PARAM_PERSISTENT_ADDRESS); +iscsi_conn_str_attr(address, ISCSI_PARAM_CONN_ADDRESS); #define iscsi_cdev_to_session(_cdev) \ iscsi_dev_to_session(_cdev->dev) @@ -1034,7 +1195,8 @@ show_session_int_param_##param(struct cl #define iscsi_session_int_attr(field, param, format) \ iscsi_session_int_attr_show(param, format) \ -static CLASS_DEVICE_ATTR(field, S_IRUGO, show_session_int_param_##param, NULL); +static ISCSI_CLASS_ATTR(sess, field, S_IRUGO, show_session_int_param_##param, \ + NULL); iscsi_session_int_attr(initial_r2t, ISCSI_PARAM_INITIAL_R2T_EN, "%d"); iscsi_session_int_attr(max_outstanding_r2t, ISCSI_PARAM_MAX_R2T, "%hu"); @@ -1044,18 +1206,89 @@ iscsi_session_int_attr(max_burst_len, IS iscsi_session_int_attr(data_pdu_in_order, ISCSI_PARAM_PDU_INORDER_EN, "%d"); iscsi_session_int_attr(data_seq_in_order, ISCSI_PARAM_DATASEQ_INORDER_EN, "%d"); iscsi_session_int_attr(erl, ISCSI_PARAM_ERL, "%d"); +iscsi_session_int_attr(tpgt, ISCSI_PARAM_TPGT, "%d"); -#define SETUP_SESSION_RD_ATTR(field, param) \ - if (priv->param_mask & (1 << param)) { \ - priv->session_attrs[count] = &class_device_attr_##field;\ - count++; \ - } +#define iscsi_session_str_attr_show(param) \ +static ssize_t \ +show_session_str_param_##param(struct class_device *cdev, char *buf) \ +{ \ + struct iscsi_cls_session *session = iscsi_cdev_to_session(cdev); \ + struct iscsi_transport *t = session->transport; \ + return t->get_session_str_param(session, param, buf); \ +} + +#define iscsi_session_str_attr(field, param) \ + iscsi_session_str_attr_show(param) \ +static ISCSI_CLASS_ATTR(sess, field, S_IRUGO, show_session_str_param_##param, \ + NULL); + +iscsi_session_str_attr(targetname, ISCSI_PARAM_TARGET_NAME); + +/* + * Private session and conn attrs. userspace uses several iscsi values + * to identify each session between reboots. Some of these values may not + * be present in the iscsi_transport/LLD driver becuase userspace handles + * login (and failback for login redirect) so for these type of drivers + * the class manages the attrs and values for the iscsi_transport/LLD + */ +#define iscsi_priv_session_attr_show(field, format) \ +static ssize_t \ +show_priv_session_##field(struct class_device *cdev, char *buf) \ +{ \ + struct iscsi_cls_session *session = iscsi_cdev_to_session(cdev); \ + return sprintf(buf, format"\n", session->field); \ +} -#define SETUP_CONN_RD_ATTR(field, param) \ - if (priv->param_mask & (1 << param)) { \ - priv->conn_attrs[count] = &class_device_attr_##field; \ +#define iscsi_priv_session_attr(field, format) \ + iscsi_priv_session_attr_show(field, format) \ +static ISCSI_CLASS_ATTR(priv_sess, field, S_IRUGO, show_priv_session_##field, \ + NULL) +iscsi_priv_session_attr(targetname, "%s"); +iscsi_priv_session_attr(tpgt, "%d"); +iscsi_priv_session_attr(recovery_tmo, "%d"); + +#define iscsi_priv_conn_attr_show(field, format) \ +static ssize_t \ +show_priv_conn_##field(struct class_device *cdev, char *buf) \ +{ \ + struct iscsi_cls_conn *conn = iscsi_cdev_to_conn(cdev); \ + return sprintf(buf, format"\n", conn->field); \ +} + +#define iscsi_priv_conn_attr(field, format) \ + iscsi_priv_conn_attr_show(field, format) \ +static ISCSI_CLASS_ATTR(priv_conn, field, S_IRUGO, show_priv_conn_##field, \ + NULL) +iscsi_priv_conn_attr(persistent_address, "%s"); +iscsi_priv_conn_attr(persistent_port, "%d"); + +#define SETUP_PRIV_SESSION_RD_ATTR(field) \ +do { \ + priv->session_attrs[count] = &class_device_attr_priv_sess_##field; \ + count++; \ +} while (0) + +#define SETUP_SESSION_RD_ATTR(field, param_flag) \ +do { \ + if (tt->param_mask & param_flag) { \ + priv->session_attrs[count] = &class_device_attr_sess_##field; \ count++; \ - } + } \ +} while (0) + +#define SETUP_PRIV_CONN_RD_ATTR(field) \ +do { \ + priv->conn_attrs[count] = &class_device_attr_priv_conn_##field; \ + count++; \ +} while (0) + +#define SETUP_CONN_RD_ATTR(field, param_flag) \ +do { \ + if (tt->param_mask & param_flag) { \ + priv->conn_attrs[count] = &class_device_attr_conn_##field; \ + count++; \ + } \ +} while (0) static int iscsi_session_match(struct attribute_container *cont, struct device *dev) @@ -1104,6 +1337,24 @@ static int iscsi_conn_match(struct attri return &priv->conn_cont.ac == cont; } +static int iscsi_host_match(struct attribute_container *cont, + struct device *dev) +{ + struct Scsi_Host *shost; + struct iscsi_internal *priv; + + if (!scsi_is_host_device(dev)) + return 0; + + shost = dev_to_shost(dev); + if (!shost->transportt || + shost->transportt->host_attrs.ac.class != &iscsi_host_class.class) + return 0; + + priv = to_iscsi_internal(shost->transportt); + return &priv->t.host_attrs.ac == cont; +} + struct scsi_transport_template * iscsi_register_transport(struct iscsi_transport *tt) { @@ -1122,6 +1373,7 @@ iscsi_register_transport(struct iscsi_tr return NULL; INIT_LIST_HEAD(&priv->list); priv->iscsi_transport = tt; + priv->t.user_scan = iscsi_user_scan; priv->cdev.class = &iscsi_transport_class; snprintf(priv->cdev.class_id, BUS_ID_SIZE, "%s", tt->name); @@ -1133,18 +1385,13 @@ iscsi_register_transport(struct iscsi_tr if (err) goto unregister_cdev; - /* setup parameters mask */ - priv->param_mask = 0xFFFFFFFF; - if (!(tt->caps & CAP_MULTI_R2T)) - priv->param_mask &= ~(1 << ISCSI_PARAM_MAX_R2T); - if (!(tt->caps & CAP_HDRDGST)) - priv->param_mask &= ~(1 << ISCSI_PARAM_HDRDGST_EN); - if (!(tt->caps & CAP_DATADGST)) - priv->param_mask &= ~(1 << ISCSI_PARAM_DATADGST_EN); - if (!(tt->caps & CAP_MARKERS)) { - priv->param_mask &= ~(1 << ISCSI_PARAM_IFMARKER_EN); - priv->param_mask &= ~(1 << ISCSI_PARAM_OFMARKER_EN); - } + /* host parameters */ + priv->t.host_attrs.ac.attrs = &priv->host_attrs[0]; + priv->t.host_attrs.ac.class = &iscsi_host_class.class; + priv->t.host_attrs.ac.match = iscsi_host_match; + priv->t.host_size = sizeof(struct iscsi_host); + priv->host_attrs[0] = NULL; + transport_container_register(&priv->t.host_attrs); /* connection parameters */ priv->conn_cont.ac.attrs = &priv->conn_attrs[0]; @@ -1152,12 +1399,25 @@ iscsi_register_transport(struct iscsi_tr priv->conn_cont.ac.match = iscsi_conn_match; transport_container_register(&priv->conn_cont); - SETUP_CONN_RD_ATTR(max_recv_dlength, ISCSI_PARAM_MAX_RECV_DLENGTH); - SETUP_CONN_RD_ATTR(max_xmit_dlength, ISCSI_PARAM_MAX_XMIT_DLENGTH); - SETUP_CONN_RD_ATTR(header_digest, ISCSI_PARAM_HDRDGST_EN); - SETUP_CONN_RD_ATTR(data_digest, ISCSI_PARAM_DATADGST_EN); - SETUP_CONN_RD_ATTR(ifmarker, ISCSI_PARAM_IFMARKER_EN); - SETUP_CONN_RD_ATTR(ofmarker, ISCSI_PARAM_OFMARKER_EN); + SETUP_CONN_RD_ATTR(max_recv_dlength, ISCSI_MAX_RECV_DLENGTH); + SETUP_CONN_RD_ATTR(max_xmit_dlength, ISCSI_MAX_XMIT_DLENGTH); + SETUP_CONN_RD_ATTR(header_digest, ISCSI_HDRDGST_EN); + SETUP_CONN_RD_ATTR(data_digest, ISCSI_DATADGST_EN); + SETUP_CONN_RD_ATTR(ifmarker, ISCSI_IFMARKER_EN); + SETUP_CONN_RD_ATTR(ofmarker, ISCSI_OFMARKER_EN); + SETUP_CONN_RD_ATTR(address, ISCSI_CONN_ADDRESS); + SETUP_CONN_RD_ATTR(port, ISCSI_CONN_PORT); + SETUP_CONN_RD_ATTR(exp_statsn, ISCSI_EXP_STATSN); + + if (tt->param_mask & ISCSI_PERSISTENT_ADDRESS) + SETUP_CONN_RD_ATTR(persistent_address, ISCSI_PERSISTENT_ADDRESS); + else + SETUP_PRIV_CONN_RD_ATTR(persistent_address); + + if (tt->param_mask & ISCSI_PERSISTENT_PORT) + SETUP_CONN_RD_ATTR(persistent_port, ISCSI_PERSISTENT_PORT); + else + SETUP_PRIV_CONN_RD_ATTR(persistent_port); BUG_ON(count > ISCSI_CONN_ATTRS); priv->conn_attrs[count] = NULL; @@ -1169,14 +1429,25 @@ iscsi_register_transport(struct iscsi_tr priv->session_cont.ac.match = iscsi_session_match; transport_container_register(&priv->session_cont); - SETUP_SESSION_RD_ATTR(initial_r2t, ISCSI_PARAM_INITIAL_R2T_EN); - SETUP_SESSION_RD_ATTR(max_outstanding_r2t, ISCSI_PARAM_MAX_R2T); - SETUP_SESSION_RD_ATTR(immediate_data, ISCSI_PARAM_IMM_DATA_EN); - SETUP_SESSION_RD_ATTR(first_burst_len, ISCSI_PARAM_FIRST_BURST); - SETUP_SESSION_RD_ATTR(max_burst_len, ISCSI_PARAM_MAX_BURST); - SETUP_SESSION_RD_ATTR(data_pdu_in_order, ISCSI_PARAM_PDU_INORDER_EN); - SETUP_SESSION_RD_ATTR(data_seq_in_order,ISCSI_PARAM_DATASEQ_INORDER_EN) - SETUP_SESSION_RD_ATTR(erl, ISCSI_PARAM_ERL); + SETUP_SESSION_RD_ATTR(initial_r2t, ISCSI_INITIAL_R2T_EN); + SETUP_SESSION_RD_ATTR(max_outstanding_r2t, ISCSI_MAX_R2T); + SETUP_SESSION_RD_ATTR(immediate_data, ISCSI_IMM_DATA_EN); + SETUP_SESSION_RD_ATTR(first_burst_len, ISCSI_FIRST_BURST); + SETUP_SESSION_RD_ATTR(max_burst_len, ISCSI_MAX_BURST); + SETUP_SESSION_RD_ATTR(data_pdu_in_order, ISCSI_PDU_INORDER_EN); + SETUP_SESSION_RD_ATTR(data_seq_in_order, ISCSI_DATASEQ_INORDER_EN); + SETUP_SESSION_RD_ATTR(erl, ISCSI_ERL); + SETUP_PRIV_SESSION_RD_ATTR(recovery_tmo); + + if (tt->param_mask & ISCSI_TARGET_NAME) + SETUP_SESSION_RD_ATTR(targetname, ISCSI_TARGET_NAME); + else + SETUP_PRIV_SESSION_RD_ATTR(targetname); + + if (tt->param_mask & ISCSI_TPGT) + SETUP_SESSION_RD_ATTR(tpgt, ISCSI_TPGT); + else + SETUP_PRIV_SESSION_RD_ATTR(tpgt); BUG_ON(count > ISCSI_SESSION_ATTRS); priv->session_attrs[count] = NULL; @@ -1214,6 +1485,7 @@ int iscsi_unregister_transport(struct is transport_container_unregister(&priv->conn_cont); transport_container_unregister(&priv->session_cont); + transport_container_unregister(&priv->t.host_attrs); sysfs_remove_group(&priv->cdev.kobj, &iscsi_transport_group); class_device_unregister(&priv->cdev); @@ -1257,10 +1529,14 @@ static __init int iscsi_transport_init(v if (err) return err; - err = transport_class_register(&iscsi_connection_class); + err = transport_class_register(&iscsi_host_class); if (err) goto unregister_transport_class; + err = transport_class_register(&iscsi_connection_class); + if (err) + goto unregister_host_class; + err = transport_class_register(&iscsi_session_class); if (err) goto unregister_conn_class; @@ -1288,6 +1564,8 @@ unregister_session_class: transport_class_unregister(&iscsi_session_class); unregister_conn_class: transport_class_unregister(&iscsi_connection_class); +unregister_host_class: + transport_class_unregister(&iscsi_host_class); unregister_transport_class: class_unregister(&iscsi_transport_class); return err; @@ -1300,6 +1578,7 @@ static void __exit iscsi_transport_exit( netlink_unregister_notifier(&iscsi_nl_notifier); transport_class_unregister(&iscsi_connection_class); transport_class_unregister(&iscsi_session_class); + transport_class_unregister(&iscsi_host_class); class_unregister(&iscsi_transport_class); } diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h new file mode 100644 index 0000000..fcb5ba8 --- /dev/null +++ b/include/rdma/ib_addr.h @@ -0,0 +1,114 @@ +/* + * Copyright (c) 2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#if !defined(IB_ADDR_H) +#define IB_ADDR_H + +#include +#include +#include +#include +#include + +struct rdma_dev_addr { + unsigned char src_dev_addr[MAX_ADDR_LEN]; + unsigned char dst_dev_addr[MAX_ADDR_LEN]; + unsigned char broadcast[MAX_ADDR_LEN]; + enum ib_node_type dev_type; +}; + +/** + * rdma_translate_ip - Translate a local IP address to an RDMA hardware + * address. + */ +int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr); + +/** + * rdma_resolve_ip - Resolve source and destination IP addresses to + * RDMA hardware addresses. + * @src_addr: An optional source address to use in the resolution. If a + * source address is not provided, a usable address will be returned via + * the callback. + * @dst_addr: The destination address to resolve. + * @addr: A reference to a data location that will receive the resolved + * addresses. The data location must remain valid until the callback has + * been invoked. + * @timeout_ms: Amount of time to wait for the address resolution to complete. + * @callback: Call invoked once address resolution has completed, timed out, + * or been canceled. A status of 0 indicates success. + * @context: User-specified context associated with the call. + */ +int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr, + struct rdma_dev_addr *addr, int timeout_ms, + void (*callback)(int status, struct sockaddr *src_addr, + struct rdma_dev_addr *addr, void *context), + void *context); + +void rdma_addr_cancel(struct rdma_dev_addr *addr); + +static inline int ip_addr_size(struct sockaddr *addr) +{ + return addr->sa_family == AF_INET6 ? + sizeof(struct sockaddr_in6) : sizeof(struct sockaddr_in); +} + +static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr) +{ + return ((u16)dev_addr->broadcast[8] << 8) | (u16)dev_addr->broadcast[9]; +} + +static inline void ib_addr_set_pkey(struct rdma_dev_addr *dev_addr, u16 pkey) +{ + dev_addr->broadcast[8] = pkey >> 8; + dev_addr->broadcast[9] = (unsigned char) pkey; +} + +static inline union ib_gid *ib_addr_get_sgid(struct rdma_dev_addr *dev_addr) +{ + return (union ib_gid *) (dev_addr->src_dev_addr + 4); +} + +static inline void ib_addr_set_sgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) +{ + memcpy(dev_addr->src_dev_addr + 4, gid, sizeof *gid); +} + +static inline union ib_gid *ib_addr_get_dgid(struct rdma_dev_addr *dev_addr) +{ + return (union ib_gid *) (dev_addr->dst_dev_addr + 4); +} + +static inline void ib_addr_set_dgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) +{ + memcpy(dev_addr->dst_dev_addr + 4, gid, sizeof *gid); +} + +#endif /* IB_ADDR_H */ diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h index 0a9fcd5..8f394f0 100644 --- a/include/rdma/ib_cm.h +++ b/include/rdma/ib_cm.h @@ -32,7 +32,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: ib_cm.h 2730 2005-06-28 16:43:03Z sean.hefty $ + * $Id: ib_cm.h 4311 2005-12-05 18:42:01Z sean.hefty $ */ #if !defined(IB_CM_H) #define IB_CM_H @@ -102,7 +102,8 @@ enum ib_cm_data_size { IB_CM_APR_INFO_LENGTH = 72, IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE = 216, IB_CM_SIDR_REP_PRIVATE_DATA_SIZE = 136, - IB_CM_SIDR_REP_INFO_LENGTH = 72 + IB_CM_SIDR_REP_INFO_LENGTH = 72, + IB_CM_COMPARE_SIZE = 64 }; struct ib_cm_id; @@ -238,7 +239,6 @@ struct ib_cm_sidr_rep_event_param { u32 qpn; void *info; u8 info_len; - }; struct ib_cm_event { @@ -317,6 +317,15 @@ void ib_destroy_cm_id(struct ib_cm_id *c #define IB_SERVICE_ID_AGN_MASK __constant_cpu_to_be64(0xFF00000000000000ULL) #define IB_CM_ASSIGN_SERVICE_ID __constant_cpu_to_be64(0x0200000000000000ULL) +#define IB_CMA_SERVICE_ID __constant_cpu_to_be64(0x0000000001000000ULL) +#define IB_CMA_SERVICE_ID_MASK __constant_cpu_to_be64(0xFFFFFFFFFF000000ULL) +#define IB_SDP_SERVICE_ID __constant_cpu_to_be64(0x0000000000010000ULL) +#define IB_SDP_SERVICE_ID_MASK __constant_cpu_to_be64(0xFFFFFFFFFFFF0000ULL) + +struct ib_cm_compare_data { + u8 data[IB_CM_COMPARE_SIZE]; + u8 mask[IB_CM_COMPARE_SIZE]; +}; /** * ib_cm_listen - Initiates listening on the specified service ID for @@ -330,10 +339,12 @@ #define IB_CM_ASSIGN_SERVICE_ID __consta * range of service IDs. If set to 0, the service ID is matched * exactly. This parameter is ignored if %service_id is set to * IB_CM_ASSIGN_SERVICE_ID. + * @compare_data: This parameter is optional. It specifies data that must + * appear in the private data of a connection request for the specified + * listen request. */ -int ib_cm_listen(struct ib_cm_id *cm_id, - __be64 service_id, - __be64 service_mask); +int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask, + struct ib_cm_compare_data *compare_data); struct ib_cm_req_param { struct ib_sa_path_rec *primary_path; diff --git a/include/rdma/ib_marshall.h b/include/rdma/ib_marshall.h new file mode 100644 index 0000000..66bf4d7 --- /dev/null +++ b/include/rdma/ib_marshall.h @@ -0,0 +1,50 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#if !defined(IB_USER_MARSHALL_H) +#define IB_USER_MARSHALL_H + +#include +#include +#include +#include + +void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, + struct ib_qp_attr *src); + +void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst, + struct ib_sa_path_rec *src); + +void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst, + struct ib_user_path_rec *src); + +#endif /* IB_USER_MARSHALL_H */ diff --git a/include/rdma/ib_user_cm.h b/include/rdma/ib_user_cm.h index 19be116..a9e1b22 100644 --- a/include/rdma/ib_user_cm.h +++ b/include/rdma/ib_user_cm.h @@ -30,13 +30,13 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: ib_user_cm.h 2576 2005-06-09 17:00:30Z libor $ + * $Id: ib_user_cm.h 4019 2005-11-11 00:33:09Z sean.hefty $ */ #ifndef IB_USER_CM_H #define IB_USER_CM_H -#include +#include #define IB_USER_CM_ABI_VERSION 4 @@ -110,58 +110,6 @@ struct ib_ucm_init_qp_attr { __u32 qp_state; }; -struct ib_ucm_ah_attr { - __u8 grh_dgid[16]; - __u32 grh_flow_label; - __u16 dlid; - __u16 reserved; - __u8 grh_sgid_index; - __u8 grh_hop_limit; - __u8 grh_traffic_class; - __u8 sl; - __u8 src_path_bits; - __u8 static_rate; - __u8 is_global; - __u8 port_num; -}; - -struct ib_ucm_init_qp_attr_resp { - __u32 qp_attr_mask; - __u32 qp_state; - __u32 cur_qp_state; - __u32 path_mtu; - __u32 path_mig_state; - __u32 qkey; - __u32 rq_psn; - __u32 sq_psn; - __u32 dest_qp_num; - __u32 qp_access_flags; - - struct ib_ucm_ah_attr ah_attr; - struct ib_ucm_ah_attr alt_ah_attr; - - /* ib_qp_cap */ - __u32 max_send_wr; - __u32 max_recv_wr; - __u32 max_send_sge; - __u32 max_recv_sge; - __u32 max_inline_data; - - __u16 pkey_index; - __u16 alt_pkey_index; - __u8 en_sqd_async_notify; - __u8 sq_draining; - __u8 max_rd_atomic; - __u8 max_dest_rd_atomic; - __u8 min_rnr_timer; - __u8 port_num; - __u8 timeout; - __u8 retry_cnt; - __u8 rnr_retry; - __u8 alt_port_num; - __u8 alt_timeout; -}; - struct ib_ucm_listen { __be64 service_id; __be64 service_mask; @@ -180,28 +128,6 @@ struct ib_ucm_private_data { __u8 reserved[3]; }; -struct ib_ucm_path_rec { - __u8 dgid[16]; - __u8 sgid[16]; - __be16 dlid; - __be16 slid; - __u32 raw_traffic; - __be32 flow_label; - __u32 reversible; - __u32 mtu; - __be16 pkey; - __u8 hop_limit; - __u8 traffic_class; - __u8 numb_path; - __u8 sl; - __u8 mtu_selector; - __u8 rate_selector; - __u8 rate; - __u8 packet_life_time_selector; - __u8 packet_life_time; - __u8 preference; -}; - struct ib_ucm_req { __u32 id; __u32 qpn; @@ -304,8 +230,8 @@ struct ib_ucm_event_get { }; struct ib_ucm_req_event_resp { - struct ib_ucm_path_rec primary_path; - struct ib_ucm_path_rec alternate_path; + struct ib_user_path_rec primary_path; + struct ib_user_path_rec alternate_path; __be64 remote_ca_guid; __u32 remote_qkey; __u32 remote_qpn; @@ -349,7 +275,7 @@ struct ib_ucm_mra_event_resp { }; struct ib_ucm_lap_event_resp { - struct ib_ucm_path_rec path; + struct ib_user_path_rec path; }; struct ib_ucm_apr_event_resp { diff --git a/include/rdma/ib_user_sa.h b/include/rdma/ib_user_sa.h new file mode 100644 index 0000000..6591201 --- /dev/null +++ b/include/rdma/ib_user_sa.h @@ -0,0 +1,60 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef IB_USER_SA_H +#define IB_USER_SA_H + +#include + +struct ib_user_path_rec { + __u8 dgid[16]; + __u8 sgid[16]; + __be16 dlid; + __be16 slid; + __u32 raw_traffic; + __be32 flow_label; + __u32 reversible; + __u32 mtu; + __be16 pkey; + __u8 hop_limit; + __u8 traffic_class; + __u8 numb_path; + __u8 sl; + __u8 mtu_selector; + __u8 rate_selector; + __u8 rate; + __u8 packet_life_time_selector; + __u8 packet_life_time; + __u8 preference; +}; + +#endif /* IB_USER_SA_H */ diff --git a/include/rdma/ib_user_verbs.h b/include/rdma/ib_user_verbs.h index 338ed43..7b53720 100644 --- a/include/rdma/ib_user_verbs.h +++ b/include/rdma/ib_user_verbs.h @@ -32,7 +32,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: ib_user_verbs.h 2708 2005-06-24 17:27:21Z roland $ + * $Id: ib_user_verbs.h 4019 2005-11-11 00:33:09Z sean.hefty $ */ #ifndef IB_USER_VERBS_H @@ -323,6 +323,64 @@ struct ib_uverbs_destroy_cq_resp { __u32 async_events_reported; }; +struct ib_uverbs_global_route { + __u8 dgid[16]; + __u32 flow_label; + __u8 sgid_index; + __u8 hop_limit; + __u8 traffic_class; + __u8 reserved; +}; + +struct ib_uverbs_ah_attr { + struct ib_uverbs_global_route grh; + __u16 dlid; + __u8 sl; + __u8 src_path_bits; + __u8 static_rate; + __u8 is_global; + __u8 port_num; + __u8 reserved; +}; + +struct ib_uverbs_qp_attr { + __u32 qp_attr_mask; + __u32 qp_state; + __u32 cur_qp_state; + __u32 path_mtu; + __u32 path_mig_state; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + + struct ib_uverbs_ah_attr ah_attr; + struct ib_uverbs_ah_attr alt_ah_attr; + + /* ib_qp_cap */ + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 en_sqd_async_notify; + __u8 sq_draining; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 reserved[5]; +}; + struct ib_uverbs_create_qp { __u64 response; __u64 user_handle; @@ -541,26 +599,6 @@ struct ib_uverbs_post_srq_recv_resp { __u32 bad_wr; }; -struct ib_uverbs_global_route { - __u8 dgid[16]; - __u32 flow_label; - __u8 sgid_index; - __u8 hop_limit; - __u8 traffic_class; - __u8 reserved; -}; - -struct ib_uverbs_ah_attr { - struct ib_uverbs_global_route grh; - __u16 dlid; - __u8 sl; - __u8 src_path_bits; - __u8 static_rate; - __u8 is_global; - __u8 port_num; - __u8 reserved; -}; - struct ib_uverbs_create_ah { __u64 response; __u64 user_handle; diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h new file mode 100644 index 0000000..402c63d --- /dev/null +++ b/include/rdma/rdma_cm.h @@ -0,0 +1,256 @@ +/* + * Copyright (c) 2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#if !defined(RDMA_CM_H) +#define RDMA_CM_H + +#include +#include +#include +#include + +/* + * Upon receiving a device removal event, users must destroy the associated + * RDMA identifier and release all resources allocated with the device. + */ +enum rdma_cm_event_type { + RDMA_CM_EVENT_ADDR_RESOLVED, + RDMA_CM_EVENT_ADDR_ERROR, + RDMA_CM_EVENT_ROUTE_RESOLVED, + RDMA_CM_EVENT_ROUTE_ERROR, + RDMA_CM_EVENT_CONNECT_REQUEST, + RDMA_CM_EVENT_CONNECT_RESPONSE, + RDMA_CM_EVENT_CONNECT_ERROR, + RDMA_CM_EVENT_UNREACHABLE, + RDMA_CM_EVENT_REJECTED, + RDMA_CM_EVENT_ESTABLISHED, + RDMA_CM_EVENT_DISCONNECTED, + RDMA_CM_EVENT_DEVICE_REMOVAL, +}; + +enum rdma_port_space { + RDMA_PS_SDP = 0x0001, + RDMA_PS_TCP = 0x0106, + RDMA_PS_UDP = 0x0111, + RDMA_PS_SCTP = 0x0183 +}; + +struct rdma_addr { + struct sockaddr src_addr; + u8 src_pad[sizeof(struct sockaddr_in6) - + sizeof(struct sockaddr)]; + struct sockaddr dst_addr; + u8 dst_pad[sizeof(struct sockaddr_in6) - + sizeof(struct sockaddr)]; + struct rdma_dev_addr dev_addr; +}; + +struct rdma_route { + struct rdma_addr addr; + struct ib_sa_path_rec *path_rec; + int num_paths; +}; + +struct rdma_cm_event { + enum rdma_cm_event_type event; + int status; + void *private_data; + u8 private_data_len; +}; + +struct rdma_cm_id; + +/** + * rdma_cm_event_handler - Callback used to report user events. + * + * Notes: Users may not call rdma_destroy_id from this callback to destroy + * the passed in id, or a corresponding listen id. Returning a + * non-zero value from the callback will destroy the passed in id. + */ +typedef int (*rdma_cm_event_handler)(struct rdma_cm_id *id, + struct rdma_cm_event *event); + +struct rdma_cm_id { + struct ib_device *device; + void *context; + struct ib_qp *qp; + rdma_cm_event_handler event_handler; + struct rdma_route route; + enum rdma_port_space ps; + u8 port_num; +}; + +/** + * rdma_create_id - Create an RDMA identifier. + * + * @event_handler: User callback invoked to report events associated with the + * returned rdma_id. + * @context: User specified context associated with the id. + * @ps: RDMA port space. + */ +struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, + void *context, enum rdma_port_space ps); + +void rdma_destroy_id(struct rdma_cm_id *id); + +/** + * rdma_bind_addr - Bind an RDMA identifier to a source address and + * associated RDMA device, if needed. + * + * @id: RDMA identifier. + * @addr: Local address information. Wildcard values are permitted. + * + * This associates a source address with the RDMA identifier before calling + * rdma_listen. If a specific local address is given, the RDMA identifier will + * be bound to a local RDMA device. + */ +int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr); + +/** + * rdma_resolve_addr - Resolve destination and optional source addresses + * from IP addresses to an RDMA address. If successful, the specified + * rdma_cm_id will be bound to a local device. + * + * @id: RDMA identifier. + * @src_addr: Source address information. This parameter may be NULL. + * @dst_addr: Destination address information. + * @timeout_ms: Time to wait for resolution to complete. + */ +int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, + struct sockaddr *dst_addr, int timeout_ms); + +/** + * rdma_resolve_route - Resolve the RDMA address bound to the RDMA identifier + * into route information needed to establish a connection. + * + * This is called on the client side of a connection. + * Users must have first called rdma_resolve_addr to resolve a dst_addr + * into an RDMA address before calling this routine. + */ +int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms); + +/** + * rdma_create_qp - Allocate a QP and associate it with the specified RDMA + * identifier. + * + * QPs allocated to an rdma_cm_id will automatically be transitioned by the CMA + * through their states. + */ +int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd, + struct ib_qp_init_attr *qp_init_attr); + +/** + * rdma_destroy_qp - Deallocate the QP associated with the specified RDMA + * identifier. + * + * Users must destroy any QP associated with an RDMA identifier before + * destroying the RDMA ID. + */ +void rdma_destroy_qp(struct rdma_cm_id *id); + +/** + * rdma_init_qp_attr - Initializes the QP attributes for use in transitioning + * to a specified QP state. + * @id: Communication identifier associated with the QP attributes to + * initialize. + * @qp_attr: On input, specifies the desired QP state. On output, the + * mandatory and desired optional attributes will be set in order to + * modify the QP to the specified state. + * @qp_attr_mask: The QP attribute mask that may be used to transition the + * QP to the specified state. + * + * Users must set the @qp_attr->qp_state to the desired QP state. This call + * will set all required attributes for the given transition, along with + * known optional attributes. Users may override the attributes returned from + * this call before calling ib_modify_qp. + * + * Users that wish to have their QP automatically transitioned through its + * states can associate a QP with the rdma_cm_id by calling rdma_create_qp(). + */ +int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, + int *qp_attr_mask); + +struct rdma_conn_param { + const void *private_data; + u8 private_data_len; + u8 responder_resources; + u8 initiator_depth; + u8 flow_control; + u8 retry_count; /* ignored when accepting */ + u8 rnr_retry_count; + /* Fields below ignored if a QP is created on the rdma_cm_id. */ + u8 srq; + u32 qp_num; + enum ib_qp_type qp_type; +}; + +/** + * rdma_connect - Initiate an active connection request. + * + * Users must have resolved a route for the rdma_cm_id to connect with + * by having called rdma_resolve_route before calling this routine. + */ +int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); + +/** + * rdma_listen - This function is called by the passive side to + * listen for incoming connection requests. + * + * Users must have bound the rdma_cm_id to a local address by calling + * rdma_bind_addr before calling this routine. + */ +int rdma_listen(struct rdma_cm_id *id, int backlog); + +/** + * rdma_accept - Called to accept a connection request or response. + * @id: Connection identifier associated with the request. + * @conn_param: Information needed to establish the connection. This must be + * provided if accepting a connection request. If accepting a connection + * response, this parameter must be NULL. + * + * Typically, this routine is only called by the listener to accept a connection + * request. It must also be called on the active side of a connection if the + * user is performing their own QP transitions. + */ +int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); + +/** + * rdma_reject - Called to reject a connection request or response. + */ +int rdma_reject(struct rdma_cm_id *id, const void *private_data, + u8 private_data_len); + +/** + * rdma_disconnect - This function disconnects the associated QP and + * transitions it into the error state. + */ +int rdma_disconnect(struct rdma_cm_id *id); + +#endif /* RDMA_CM_H */ + diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h new file mode 100644 index 0000000..63381e1 --- /dev/null +++ b/include/rdma/rdma_user_cm.h @@ -0,0 +1,186 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef RDMA_USER_CM_H +#define RDMA_USER_CM_H + +#include +#include +#include +#include + +#define RDMA_USER_CM_ABI_VERSION 1 + +#define RDMA_MAX_PRIVATE_DATA 256 + +enum { + RDMA_USER_CM_CMD_CREATE_ID, + RDMA_USER_CM_CMD_DESTROY_ID, + RDMA_USER_CM_CMD_BIND_ADDR, + RDMA_USER_CM_CMD_RESOLVE_ADDR, + RDMA_USER_CM_CMD_RESOLVE_ROUTE, + RDMA_USER_CM_CMD_QUERY_ROUTE, + RDMA_USER_CM_CMD_CONNECT, + RDMA_USER_CM_CMD_LISTEN, + RDMA_USER_CM_CMD_ACCEPT, + RDMA_USER_CM_CMD_REJECT, + RDMA_USER_CM_CMD_DISCONNECT, + RDMA_USER_CM_CMD_INIT_QP_ATTR, + RDMA_USER_CM_CMD_GET_EVENT +}; + +/* + * command ABI structures. + */ +struct rdma_ucm_cmd_hdr { + __u32 cmd; + __u16 in; + __u16 out; +}; + +struct rdma_ucm_create_id { + __u64 uid; + __u64 response; +}; + +struct rdma_ucm_create_id_resp { + __u32 id; +}; + +struct rdma_ucm_destroy_id { + __u64 response; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_destroy_id_resp { + __u32 events_reported; +}; + +struct rdma_ucm_bind_addr { + __u64 response; + struct sockaddr_in6 addr; + __u32 id; +}; + +struct rdma_ucm_resolve_addr { + struct sockaddr_in6 src_addr; + struct sockaddr_in6 dst_addr; + __u32 id; + __u32 timeout_ms; +}; + +struct rdma_ucm_resolve_route { + __u32 id; + __u32 timeout_ms; +}; + +struct rdma_ucm_query_route { + __u64 response; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_query_route_resp { + __u64 node_guid; + struct ib_user_path_rec ib_route[2]; + struct sockaddr_in6 src_addr; + struct sockaddr_in6 dst_addr; + __u32 num_paths; + __u8 port_num; + __u8 reserved[3]; +}; + +struct rdma_ucm_conn_param { + __u32 qp_num; + __u32 qp_type; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + __u8 private_data_len; + __u8 srq; + __u8 responder_resources; + __u8 initiator_depth; + __u8 flow_control; + __u8 retry_count; + __u8 rnr_retry_count; + __u8 valid; +}; + +struct rdma_ucm_connect { + struct rdma_ucm_conn_param conn_param; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_listen { + __u32 id; + __u32 backlog; +}; + +struct rdma_ucm_accept { + __u64 uid; + struct rdma_ucm_conn_param conn_param; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_reject { + __u32 id; + __u8 private_data_len; + __u8 reserved[3]; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; +}; + +struct rdma_ucm_disconnect { + __u32 id; +}; + +struct rdma_ucm_init_qp_attr { + __u64 response; + __u32 id; + __u32 qp_state; +}; + +struct rdma_ucm_get_event { + __u64 response; +}; + +struct rdma_ucm_event_resp { + __u64 uid; + __u32 id; + __u32 event; + __u32 status; + __u8 private_data_len; + __u8 reserved[3]; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; +}; + +#endif /* RDMA_USER_CM_H */ diff --git a/include/scsi/iscsi_if.h b/include/scsi/iscsi_if.h index e5618b9..253797c 100644 --- a/include/scsi/iscsi_if.h +++ b/include/scsi/iscsi_if.h @@ -43,6 +43,10 @@ enum iscsi_uevent_e { ISCSI_UEVENT_GET_STATS = UEVENT_BASE + 10, ISCSI_UEVENT_GET_PARAM = UEVENT_BASE + 11, + ISCSI_UEVENT_TRANSPORT_EP_CONNECT = UEVENT_BASE + 12, + ISCSI_UEVENT_TRANSPORT_EP_POLL = UEVENT_BASE + 13, + ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT = UEVENT_BASE + 14, + /* up events */ ISCSI_KEVENT_RECV_PDU = KEVENT_BASE + 1, ISCSI_KEVENT_CONN_ERROR = KEVENT_BASE + 2, @@ -60,61 +64,83 @@ struct iscsi_uevent { uint32_t initial_cmdsn; } c_session; struct msg_destroy_session { - uint64_t session_handle; uint32_t sid; } d_session; struct msg_create_conn { - uint64_t session_handle; - uint32_t cid; uint32_t sid; + uint32_t cid; } c_conn; struct msg_bind_conn { - uint64_t session_handle; - uint64_t conn_handle; - uint32_t transport_fd; + uint32_t sid; + uint32_t cid; + uint64_t transport_eph; uint32_t is_leading; } b_conn; struct msg_destroy_conn { - uint64_t conn_handle; + uint32_t sid; uint32_t cid; } d_conn; struct msg_send_pdu { + uint32_t sid; + uint32_t cid; uint32_t hdr_size; uint32_t data_size; - uint64_t conn_handle; } send_pdu; struct msg_set_param { - uint64_t conn_handle; + uint32_t sid; + uint32_t cid; uint32_t param; /* enum iscsi_param */ - uint32_t value; + uint32_t len; } set_param; struct msg_start_conn { - uint64_t conn_handle; + uint32_t sid; + uint32_t cid; } start_conn; struct msg_stop_conn { + uint32_t sid; + uint32_t cid; uint64_t conn_handle; uint32_t flag; } stop_conn; struct msg_get_stats { - uint64_t conn_handle; + uint32_t sid; + uint32_t cid; } get_stats; + struct msg_transport_connect { + uint32_t non_blocking; + } ep_connect; + struct msg_transport_poll { + uint64_t ep_handle; + uint32_t timeout_ms; + } ep_poll; + struct msg_transport_disconnect { + uint64_t ep_handle; + } ep_disconnect; } u; union { /* messages k -> u */ - uint64_t handle; int retcode; struct msg_create_session_ret { - uint64_t session_handle; uint32_t sid; + uint32_t host_no; } c_session_ret; + struct msg_create_conn_ret { + uint32_t sid; + uint32_t cid; + } c_conn_ret; struct msg_recv_req { + uint32_t sid; + uint32_t cid; uint64_t recv_handle; - uint64_t conn_handle; } recv_req; struct msg_conn_error { - uint64_t conn_handle; + uint32_t sid; + uint32_t cid; uint32_t error; /* enum iscsi_err */ } connerror; + struct msg_transport_connect_ret { + uint64_t handle; + } ep_connect_ret; } r; } __attribute__ ((aligned (sizeof(uint64_t)))); @@ -139,29 +165,66 @@ enum iscsi_err { ISCSI_ERR_SESSION_FAILED = ISCSI_ERR_BASE + 13, ISCSI_ERR_HDR_DGST = ISCSI_ERR_BASE + 14, ISCSI_ERR_DATA_DGST = ISCSI_ERR_BASE + 15, - ISCSI_ERR_PARAM_NOT_FOUND = ISCSI_ERR_BASE + 16 + ISCSI_ERR_PARAM_NOT_FOUND = ISCSI_ERR_BASE + 16, + ISCSI_ERR_NO_SCSI_CMD = ISCSI_ERR_BASE + 17, }; /* * iSCSI Parameters (RFC3720) */ enum iscsi_param { - ISCSI_PARAM_MAX_RECV_DLENGTH = 0, - ISCSI_PARAM_MAX_XMIT_DLENGTH = 1, - ISCSI_PARAM_HDRDGST_EN = 2, - ISCSI_PARAM_DATADGST_EN = 3, - ISCSI_PARAM_INITIAL_R2T_EN = 4, - ISCSI_PARAM_MAX_R2T = 5, - ISCSI_PARAM_IMM_DATA_EN = 6, - ISCSI_PARAM_FIRST_BURST = 7, - ISCSI_PARAM_MAX_BURST = 8, - ISCSI_PARAM_PDU_INORDER_EN = 9, - ISCSI_PARAM_DATASEQ_INORDER_EN = 10, - ISCSI_PARAM_ERL = 11, - ISCSI_PARAM_IFMARKER_EN = 12, - ISCSI_PARAM_OFMARKER_EN = 13, + /* passed in using netlink set param */ + ISCSI_PARAM_MAX_RECV_DLENGTH, + ISCSI_PARAM_MAX_XMIT_DLENGTH, + ISCSI_PARAM_HDRDGST_EN, + ISCSI_PARAM_DATADGST_EN, + ISCSI_PARAM_INITIAL_R2T_EN, + ISCSI_PARAM_MAX_R2T, + ISCSI_PARAM_IMM_DATA_EN, + ISCSI_PARAM_FIRST_BURST, + ISCSI_PARAM_MAX_BURST, + ISCSI_PARAM_PDU_INORDER_EN, + ISCSI_PARAM_DATASEQ_INORDER_EN, + ISCSI_PARAM_ERL, + ISCSI_PARAM_IFMARKER_EN, + ISCSI_PARAM_OFMARKER_EN, + ISCSI_PARAM_EXP_STATSN, + ISCSI_PARAM_TARGET_NAME, + ISCSI_PARAM_TPGT, + ISCSI_PARAM_PERSISTENT_ADDRESS, + ISCSI_PARAM_PERSISTENT_PORT, + ISCSI_PARAM_SESS_RECOVERY_TMO, + + /* pased in through bind conn using transport_fd */ + ISCSI_PARAM_CONN_PORT, + ISCSI_PARAM_CONN_ADDRESS, + + /* must always be last */ + ISCSI_PARAM_MAX, }; -#define ISCSI_PARAM_MAX 14 + +#define ISCSI_MAX_RECV_DLENGTH (1 << ISCSI_PARAM_MAX_RECV_DLENGTH) +#define ISCSI_MAX_XMIT_DLENGTH (1 << ISCSI_PARAM_MAX_XMIT_DLENGTH) +#define ISCSI_HDRDGST_EN (1 << ISCSI_PARAM_HDRDGST_EN) +#define ISCSI_DATADGST_EN (1 << ISCSI_PARAM_DATADGST_EN) +#define ISCSI_INITIAL_R2T_EN (1 << ISCSI_PARAM_INITIAL_R2T_EN) +#define ISCSI_MAX_R2T (1 << ISCSI_PARAM_MAX_R2T) +#define ISCSI_IMM_DATA_EN (1 << ISCSI_PARAM_IMM_DATA_EN) +#define ISCSI_FIRST_BURST (1 << ISCSI_PARAM_FIRST_BURST) +#define ISCSI_MAX_BURST (1 << ISCSI_PARAM_MAX_BURST) +#define ISCSI_PDU_INORDER_EN (1 << ISCSI_PARAM_PDU_INORDER_EN) +#define ISCSI_DATASEQ_INORDER_EN (1 << ISCSI_PARAM_DATASEQ_INORDER_EN) +#define ISCSI_ERL (1 << ISCSI_PARAM_ERL) +#define ISCSI_IFMARKER_EN (1 << ISCSI_PARAM_IFMARKER_EN) +#define ISCSI_OFMARKER_EN (1 << ISCSI_PARAM_OFMARKER_EN) +#define ISCSI_EXP_STATSN (1 << ISCSI_PARAM_EXP_STATSN) +#define ISCSI_TARGET_NAME (1 << ISCSI_PARAM_TARGET_NAME) +#define ISCSI_TPGT (1 << ISCSI_PARAM_TPGT) +#define ISCSI_PERSISTENT_ADDRESS (1 << ISCSI_PARAM_PERSISTENT_ADDRESS) +#define ISCSI_PERSISTENT_PORT (1 << ISCSI_PARAM_PERSISTENT_PORT) +#define ISCSI_SESS_RECOVERY_TMO (1 << ISCSI_PARAM_SESS_RECOVERY_TMO) +#define ISCSI_CONN_PORT (1 << ISCSI_PARAM_CONN_PORT) +#define ISCSI_CONN_ADDRESS (1 << ISCSI_PARAM_CONN_ADDRESS) #define iscsi_ptr(_handle) ((void*)(unsigned long)_handle) #define iscsi_handle(_ptr) ((uint64_t)(unsigned long)_ptr) diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h new file mode 100644 index 0000000..2dba929 --- /dev/null +++ b/include/scsi/libiscsi.h @@ -0,0 +1,286 @@ +/* + * iSCSI lib definitions + * + * Copyright (C) 2006 Red Hat, Inc. All rights reserved. + * Copyright (C) 2004 - 2006 Mike Christie + * Copyright (C) 2004 - 2005 Dmitry Yusupov + * Copyright (C) 2004 - 2005 Alex Aizman + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + */ +#ifndef LIBISCSI_H +#define LIBISCSI_H + +#include +#include +#include +#include + +struct scsi_transport_template; +struct scsi_device; +struct Scsi_Host; +struct scsi_cmnd; +struct socket; +struct iscsi_transport; +struct iscsi_cls_session; +struct iscsi_cls_conn; +struct iscsi_session; +struct iscsi_nopin; + +/* #define DEBUG_SCSI */ +#ifdef DEBUG_SCSI +#define debug_scsi(fmt...) printk(KERN_INFO "iscsi: " fmt) +#else +#define debug_scsi(fmt...) +#endif + +#define ISCSI_XMIT_CMDS_MAX 128 /* must be power of 2 */ +#define ISCSI_MGMT_CMDS_MAX 32 /* must be power of 2 */ +#define ISCSI_CONN_MAX 1 + +#define ISCSI_MGMT_ITT_OFFSET 0xa00 + +#define ISCSI_DEF_CMD_PER_LUN 32 +#define ISCSI_MAX_CMD_PER_LUN 128 + +/* Task Mgmt states */ +#define TMABORT_INITIAL 0x0 +#define TMABORT_SUCCESS 0x1 +#define TMABORT_FAILED 0x2 +#define TMABORT_TIMEDOUT 0x3 + +/* Connection suspend "bit" */ +#define ISCSI_SUSPEND_BIT 1 + +#define ISCSI_ITT_MASK (0xfff) +#define ISCSI_CID_SHIFT 12 +#define ISCSI_CID_MASK (0xffff << ISCSI_CID_SHIFT) +#define ISCSI_AGE_SHIFT 28 +#define ISCSI_AGE_MASK (0xf << ISCSI_AGE_SHIFT) + +struct iscsi_mgmt_task { + /* + * Becuae LLDs allocate their hdr differently, this is a pointer to + * that storage. It must be setup at session creation time. + */ + struct iscsi_hdr *hdr; + char *data; /* mgmt payload */ + int data_count; /* counts data to be sent */ + uint32_t itt; /* this ITT */ + void *dd_data; /* driver/transport data */ + struct list_head running; +}; + +struct iscsi_cmd_task { + /* + * Becuae LLDs allocate their hdr differently, this is a pointer to + * that storage. It must be setup at session creation time. + */ + struct iscsi_cmd *hdr; + int itt; /* this ITT */ + int datasn; /* DataSN */ + + uint32_t unsol_datasn; + int imm_count; /* imm-data (bytes) */ + int unsol_count; /* unsolicited (bytes)*/ + int data_count; /* remaining Data-Out */ + struct scsi_cmnd *sc; /* associated SCSI cmd*/ + int total_length; + struct iscsi_conn *conn; /* used connection */ + struct iscsi_mgmt_task *mtask; /* tmf mtask in progr */ + + struct list_head running; /* running cmd list */ + void *dd_data; /* driver/transport data */ +}; + +struct iscsi_conn { + struct iscsi_cls_conn *cls_conn; /* ptr to class connection */ + void *dd_data; /* iscsi_transport data */ + struct iscsi_session *session; /* parent session */ + /* + * LLDs should set this lock. It protects the transport recv + * code + */ + rwlock_t *recv_lock; + /* + * conn_stop() flag: stop to recover, stop to terminate + */ + int stop_stage; + + /* iSCSI connection-wide sequencing */ + uint32_t exp_statsn; + + /* control data */ + int id; /* CID */ + struct list_head item; /* maintains list of conns */ + int c_stage; /* connection state */ + struct iscsi_mgmt_task *login_mtask; /* mtask used for login/text */ + struct iscsi_mgmt_task *mtask; /* xmit mtask in progress */ + struct iscsi_cmd_task *ctask; /* xmit ctask in progress */ + + /* xmit */ + struct kfifo *immqueue; /* immediate xmit queue */ + struct kfifo *mgmtqueue; /* mgmt (control) xmit queue */ + struct list_head mgmt_run_list; /* list of control tasks */ + struct kfifo *xmitqueue; /* data-path cmd queue */ + struct list_head run_list; /* list of cmds in progress */ + struct work_struct xmitwork; /* per-conn. xmit workqueue */ + /* + * serializes connection xmit, access to kfifos: + * xmitqueue, immqueue, mgmtqueue + */ + struct mutex xmitmutex; + + unsigned long suspend_tx; /* suspend Tx */ + unsigned long suspend_rx; /* suspend Rx */ + + /* abort */ + wait_queue_head_t ehwait; /* used in eh_abort() */ + struct iscsi_tm tmhdr; + struct timer_list tmabort_timer; + int tmabort_state; /* see TMABORT_INITIAL, etc.*/ + + /* negotiated params */ + int max_recv_dlength; /* initiator_max_recv_dsl*/ + int max_xmit_dlength; /* target_max_recv_dsl */ + int hdrdgst_en; + int datadgst_en; + + /* MIB-statistics */ + uint64_t txdata_octets; + uint64_t rxdata_octets; + uint32_t scsicmd_pdus_cnt; + uint32_t dataout_pdus_cnt; + uint32_t scsirsp_pdus_cnt; + uint32_t datain_pdus_cnt; + uint32_t r2t_pdus_cnt; + uint32_t tmfcmd_pdus_cnt; + int32_t tmfrsp_pdus_cnt; + + /* custom statistics */ + uint32_t eh_abort_cnt; +}; + +struct iscsi_queue { + struct kfifo *queue; /* FIFO Queue */ + void **pool; /* Pool of elements */ + int max; /* Max number of elements */ +}; + +struct iscsi_session { + /* iSCSI session-wide sequencing */ + uint32_t cmdsn; + uint32_t exp_cmdsn; + uint32_t max_cmdsn; + + /* configuration */ + int initial_r2t_en; + int max_r2t; + int imm_data_en; + int first_burst; + int max_burst; + int time2wait; + int time2retain; + int pdu_inorder_en; + int dataseq_inorder_en; + int erl; + int ifmarker_en; + int ofmarker_en; + + /* control data */ + struct iscsi_transport *tt; + struct Scsi_Host *host; + struct iscsi_conn *leadconn; /* leading connection */ + spinlock_t lock; /* protects session state, * + * sequence numbers, * + * session resources: * + * - cmdpool, * + * - mgmtpool, * + * - r2tpool */ + int state; /* session state */ + int recovery_failed; + struct list_head item; + int conn_cnt; + int age; /* counts session re-opens */ + + struct list_head connections; /* list of connections */ + int cmds_max; /* size of cmds array */ + struct iscsi_cmd_task **cmds; /* Original Cmds arr */ + struct iscsi_queue cmdpool; /* PDU's pool */ + int mgmtpool_max; /* size of mgmt array */ + struct iscsi_mgmt_task **mgmt_cmds; /* Original mgmt arr */ + struct iscsi_queue mgmtpool; /* Mgmt PDU's pool */ +}; + +/* + * scsi host template + */ +extern int iscsi_change_queue_depth(struct scsi_device *sdev, int depth); +extern int iscsi_eh_abort(struct scsi_cmnd *sc); +extern int iscsi_eh_host_reset(struct scsi_cmnd *sc); +extern int iscsi_queuecommand(struct scsi_cmnd *sc, + void (*done)(struct scsi_cmnd *)); + +/* + * session management + */ +extern struct iscsi_cls_session * +iscsi_session_setup(struct iscsi_transport *, struct scsi_transport_template *, + int, int, uint32_t, uint32_t *); +extern void iscsi_session_teardown(struct iscsi_cls_session *); +extern struct iscsi_session *class_to_transport_session(struct iscsi_cls_session *); +extern void iscsi_start_session_recovery(struct iscsi_session *, + struct iscsi_conn *, int); +extern void iscsi_session_recovery_timedout(struct iscsi_cls_session *); + +#define session_to_cls(_sess) \ + hostdata_session(_sess->host->hostdata) + +/* + * connection management + */ +extern struct iscsi_cls_conn *iscsi_conn_setup(struct iscsi_cls_session *, + uint32_t); +extern void iscsi_conn_teardown(struct iscsi_cls_conn *); +extern int iscsi_conn_start(struct iscsi_cls_conn *); +extern void iscsi_conn_stop(struct iscsi_cls_conn *, int); +extern int iscsi_conn_bind(struct iscsi_cls_session *, struct iscsi_cls_conn *, + int); +extern void iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err); + +/* + * pdu and task processing + */ +extern int iscsi_check_assign_cmdsn(struct iscsi_session *, + struct iscsi_nopin *); +extern void iscsi_prep_unsolicit_data_pdu(struct iscsi_cmd_task *, + struct iscsi_data *hdr, + int transport_data_cnt); +extern int iscsi_conn_send_pdu(struct iscsi_cls_conn *, struct iscsi_hdr *, + char *, uint32_t); +extern int iscsi_complete_pdu(struct iscsi_conn *, struct iscsi_hdr *, + char *, int); +extern int __iscsi_complete_pdu(struct iscsi_conn *, struct iscsi_hdr *, + char *, int); +extern int iscsi_verify_itt(struct iscsi_conn *, struct iscsi_hdr *, + uint32_t *); + +/* + * generic helpers + */ +extern void iscsi_pool_free(struct iscsi_queue *, void **); +extern int iscsi_pool_init(struct iscsi_queue *, int, void ***, int); + +#endif diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 1ace1b9..7602b9b 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -152,4 +152,8 @@ extern void scsi_put_command(struct scsi extern void scsi_io_completion(struct scsi_cmnd *, unsigned int, unsigned int); extern void scsi_finish_command(struct scsi_cmnd *cmd); +extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count, + size_t *offset, size_t *len); +extern void scsi_kunmap_atomic_sg(void *virt); + #endif /* _SCSI_SCSI_CMND_H */ diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h index d31b16d..b4ddd3b 100644 --- a/include/scsi/scsi_devinfo.h +++ b/include/scsi/scsi_devinfo.h @@ -29,4 +29,5 @@ #define BLIST_NO_ULD_ATTACH 0x100000 /* #define BLIST_SELECT_NO_ATN 0x200000 /* select without ATN */ #define BLIST_RETRY_HWERROR 0x400000 /* retry HARDWARE_ERROR */ #define BLIST_MAX_512 0x800000 /* maximum 512 sector cdb length */ +#define BLIST_ATTACH_PQ3 0x1000000 /* Scan: Attach to PQ3 devices */ #endif diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h index b41cf07..c9e9475 100644 --- a/include/scsi/scsi_transport_iscsi.h +++ b/include/scsi/scsi_transport_iscsi.h @@ -2,7 +2,7 @@ * iSCSI transport class definitions * * Copyright (C) IBM Corporation, 2004 - * Copyright (C) Mike Christie, 2004 - 2005 + * Copyright (C) Mike Christie, 2004 - 2006 * Copyright (C) Dmitry Yusupov, 2004 - 2005 * Copyright (C) Alex Aizman, 2004 - 2005 * @@ -27,9 +27,13 @@ #include #include struct scsi_transport_template; +struct iscsi_transport; struct Scsi_Host; struct mempool_zone; struct iscsi_cls_conn; +struct iscsi_conn; +struct iscsi_cmd_task; +struct iscsi_mgmt_task; /** * struct iscsi_transport - iSCSI Transport template @@ -46,6 +50,20 @@ struct iscsi_cls_conn; * @start_conn: set connection to be operational * @stop_conn: suspend/recover/terminate connection * @send_pdu: send iSCSI PDU, Login, Logout, NOP-Out, Reject, Text. + * @session_recovery_timedout: notify LLD a block during recovery timed out + * @suspend_conn_recv: susepend the recv side of the connection + * @termincate_conn: destroy socket connection. Called with mutex lock. + * @init_cmd_task: Initialize a iscsi_cmd_task and any internal structs. + * Called from queuecommand with session lock held. + * @init_mgmt_task: Initialize a iscsi_mgmt_task and any internal structs. + * Called from iscsi_conn_send_generic with xmitmutex. + * @xmit_cmd_task: requests LLD to transfer cmd task + * @xmit_mgmt_task: requests LLD to transfer mgmt task + * @cleanup_cmd_task: requests LLD to fail cmd task. Called with xmitmutex + * and session->lock after the connection has been + * suspended and terminated during recovery. If called + * from abort task then connection is not suspended + * or terminated but sk_callback_lock is held * * Template API provided by iSCSI Transport */ @@ -53,38 +71,58 @@ struct iscsi_transport { struct module *owner; char *name; unsigned int caps; + /* LLD sets this to indicate what values it can export to sysfs */ + unsigned int param_mask; struct scsi_host_template *host_template; - /* LLD session/scsi_host data size */ - int hostdata_size; - /* LLD iscsi_host data size */ - int ihostdata_size; /* LLD connection data size */ int conndata_size; + /* LLD session data size */ + int sessiondata_size; int max_lun; unsigned int max_conn; unsigned int max_cmd_len; - struct iscsi_cls_session *(*create_session) - (struct scsi_transport_template *t, uint32_t sn, uint32_t *sid); + struct iscsi_cls_session *(*create_session) (struct iscsi_transport *it, + struct scsi_transport_template *t, uint32_t sn, uint32_t *hn); void (*destroy_session) (struct iscsi_cls_session *session); struct iscsi_cls_conn *(*create_conn) (struct iscsi_cls_session *sess, uint32_t cid); int (*bind_conn) (struct iscsi_cls_session *session, struct iscsi_cls_conn *cls_conn, - uint32_t transport_fd, int is_leading); + uint64_t transport_eph, int is_leading); int (*start_conn) (struct iscsi_cls_conn *conn); void (*stop_conn) (struct iscsi_cls_conn *conn, int flag); void (*destroy_conn) (struct iscsi_cls_conn *conn); int (*set_param) (struct iscsi_cls_conn *conn, enum iscsi_param param, uint32_t value); int (*get_conn_param) (struct iscsi_cls_conn *conn, - enum iscsi_param param, - uint32_t *value); + enum iscsi_param param, uint32_t *value); int (*get_session_param) (struct iscsi_cls_session *session, enum iscsi_param param, uint32_t *value); + int (*get_conn_str_param) (struct iscsi_cls_conn *conn, + enum iscsi_param param, char *buf); + int (*get_session_str_param) (struct iscsi_cls_session *session, + enum iscsi_param param, char *buf); int (*send_pdu) (struct iscsi_cls_conn *conn, struct iscsi_hdr *hdr, char *data, uint32_t data_size); void (*get_stats) (struct iscsi_cls_conn *conn, struct iscsi_stats *stats); + void (*suspend_conn_recv) (struct iscsi_conn *conn); + void (*terminate_conn) (struct iscsi_conn *conn); + void (*init_cmd_task) (struct iscsi_cmd_task *ctask); + void (*init_mgmt_task) (struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask, + char *data, uint32_t data_size); + int (*xmit_cmd_task) (struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask); + void (*cleanup_cmd_task) (struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask); + int (*xmit_mgmt_task) (struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask); + void (*session_recovery_timedout) (struct iscsi_cls_session *session); + int (*ep_connect) (struct sockaddr *dst_addr, int non_blocking, + uint64_t *ep_handle); + int (*ep_poll) (uint64_t ep_handle, int timeout_ms); + void (*ep_disconnect) (uint64_t ep_handle); }; /* @@ -100,10 +138,26 @@ extern void iscsi_conn_error(struct iscs extern int iscsi_recv_pdu(struct iscsi_cls_conn *conn, struct iscsi_hdr *hdr, char *data, uint32_t data_size); + +/* Connection's states */ +#define ISCSI_CONN_INITIAL_STAGE 0 +#define ISCSI_CONN_STARTED 1 +#define ISCSI_CONN_STOPPED 2 +#define ISCSI_CONN_CLEANUP_WAIT 3 + struct iscsi_cls_conn { struct list_head conn_list; /* item in connlist */ void *dd_data; /* LLD private data */ struct iscsi_transport *transport; + uint32_t cid; /* connection id */ + + /* portal/group values we got during discovery */ + char *persistent_address; + int persistent_port; + /* portal/group values we are currently using */ + char *address; + int port; + int active; /* must be accessed with the connlock */ struct device dev; /* sysfs transport/container device */ struct mempool_zone *z_error; @@ -114,9 +168,30 @@ struct iscsi_cls_conn { #define iscsi_dev_to_conn(_dev) \ container_of(_dev, struct iscsi_cls_conn, dev) +/* Session's states */ +#define ISCSI_STATE_FREE 1 +#define ISCSI_STATE_LOGGED_IN 2 +#define ISCSI_STATE_FAILED 3 +#define ISCSI_STATE_TERMINATE 4 + struct iscsi_cls_session { struct list_head sess_list; /* item in session_list */ + struct list_head host_list; struct iscsi_transport *transport; + + /* iSCSI values used as unique id by userspace. */ + char *targetname; + int tpgt; + + /* recovery fields */ + int recovery_tmo; + struct work_struct recovery_work; + + int target_id; + int channel; + + int sid; /* session id */ + void *dd_data; /* LLD private data */ struct device dev; /* sysfs transport/container device */ }; @@ -126,22 +201,22 @@ #define iscsi_dev_to_session(_dev) \ #define iscsi_session_to_shost(_session) \ dev_to_shost(_session->dev.parent) +struct iscsi_host { + int next_target_id; + struct list_head sessions; + struct mutex mutex; +}; + /* * session and connection functions that can be used by HW iSCSI LLDs */ extern struct iscsi_cls_session *iscsi_create_session(struct Scsi_Host *shost, - struct iscsi_transport *t); + struct iscsi_transport *t, int channel); extern int iscsi_destroy_session(struct iscsi_cls_session *session); extern struct iscsi_cls_conn *iscsi_create_conn(struct iscsi_cls_session *sess, uint32_t cid); extern int iscsi_destroy_conn(struct iscsi_cls_conn *conn); - -/* - * session functions used by software iscsi - */ -extern struct Scsi_Host * -iscsi_transport_create_session(struct scsi_transport_template *scsit, - struct iscsi_transport *transport); -extern int iscsi_transport_destroy_session(struct Scsi_Host *shost); +extern void iscsi_unblock_session(struct iscsi_cls_session *session); +extern void iscsi_block_session(struct iscsi_cls_session *session); #endif diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index cdde963..31387ab 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -666,3 +666,4 @@ #endif } EXPORT_SYMBOL(inet_addr_type); +EXPORT_SYMBOL(ip_dev_find);