GIT dfb32eb7e6ccb8ba59bf4106ae9ebccbce81b6d4 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.17.git commit dfb32eb7e6ccb8ba59bf4106ae9ebccbce81b6d4 Author: Ingo Oeser Date: Thu Mar 16 00:19:04 2006 -0800 [IPV6]: Cleanups for net/ipv6/addrconf.c (kzalloc, early exit) v2 Here are some possible (and trivial) cleanups. - use kzalloc() where possible - invert allocation failure test like if (object) { /* Rest of function here */ } to if (object == NULL) return NULL; /* Rest of function here */ Signed-off-by: Ingo Oeser Acked-by: YOSHIFUJI Hideaki Signed-off-by: David S. Miller commit 18cbee1a82898cdb29b44935ca9325f5d152c275 Author: Ingo Oeser Date: Thu Mar 16 00:16:25 2006 -0800 [IPV6]: Nearly complete kzalloc cleanup for net/ipv6 Stupidly use kzalloc() instead of kmalloc()/memset() everywhere where this is possible in net/ipv6/*.c . Signed-off-by: Ingo Oeser Signed-off-by: David S. Miller commit ce3124f52bf76bf1704bced88515ae14fffcf086 Author: Ingo Oeser Date: Thu Mar 16 00:14:38 2006 -0800 [IPV6]: Cleanup of net/ipv6/reassambly.c Two minor cleanups: 1. Using kzalloc() in fraq_alloc_queue() saves the memset() in ipv6_frag_create(). 2. Invert sense of if-statements to streamline code. Inverts the comment, too. Signed-off-by: Ingo Oeser Signed-off-by: David S. Miller commit 644716319292401ddaa2577d3584f2d567639f8c Author: Andrew Morton Date: Thu Mar 16 00:03:44 2006 -0800 [BRIDGE]: Remove duplicate const from is_link_local() argument type. Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit ee91542a400cd5d0330c914901c07b6b7c168b8e Author: Adrian Bunk Date: Tue Mar 14 17:06:29 2006 -0800 [DECNET]: net/decnet/dn_route.c: fix inconsequent NULL checking The Coverity checker noted this inconsequent NULL checking in dnrt_drop(). Since all callers ensure that NULL isn't passed, we can simply remove the check. Signed-off-by: Adrian Bunk Signed-off-by: David S. Miller commit e97d6c5e6b952e9e24ea6ad79059efc3b0863eb9 Author: Adrian Bunk Date: Tue Mar 14 17:04:31 2006 -0800 [TG3]: make drivers/net/tg3.c:tg3_request_irq() static This patch makes the needlessly global function tg3_request_irq() static. Signed-off-by: Adrian Bunk Signed-off-by: David S. Miller commit 39168ec8c744c383e3311d2135829e980485f9bb Author: Stephen Hemminger Date: Tue Mar 14 17:01:15 2006 -0800 [BRIDGE]: use LLC to send STP The bridge code can use existing LLC output code when building spanning tree protocol packets. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 467f9529c8ef29d965048abde8d075c880a8278b Author: Stephen Hemminger Date: Tue Mar 14 17:00:49 2006 -0800 [LLC]: llc_mac_hdr_init const arguments Cleanup of LLC. llc_mac_hdr_init can take constant arguments, and it is defined twice once in llc_output.h that is otherwise unused. Signed-off-by: Stephen Hemminger Acked-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller commit 7f3b46f6893160d9ca75a46e6dd882d9a765b32e Author: Stephen Hemminger Date: Tue Mar 14 17:00:20 2006 -0800 [BRIDGE]: allow show/store of group multicast address Bridge's communicate with each other using Spanning Tree Protocol over a standard multicast address. There are times when testing or layering bridges over existing topologies or tunnels, when it is useful to use alternative multicast addresses for STP packets. The 802.1d standard has some unused addresses, that can be used for this. This patch is restrictive in that it only allows one of the possible addresses in the standard. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 001f2f4fa57502ea157683dc5e173cf6813991f1 Author: Stephen Hemminger Date: Tue Mar 14 16:59:51 2006 -0800 [BRIDGE]: use llc for receiving STP packets Use LLC for the receive path of Spanning Tree Protocol packets. This allows link local multicast packets to be received by other protocols (if they care), and uses the existing LLC code to get STP packets back into bridge code. The bridge multicast address is also checked, so bridges using other link local multicast addresses are ignored. This allows for use of different multicast addresses to define separate STP domains. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit a4c71928e887f853f9482f887ee3cd8568f3db49 Author: Stephen Hemminger Date: Tue Mar 14 16:59:15 2006 -0800 [BRIDGE]: stp timer to jiffies cleanup Cleanup the get/set of bridge timer value in the packets. It is clearer not to bury the conversion in macro. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 76b6fcf8039bf306592fc5d8204f8ab93a81481a Author: Stephen Hemminger Date: Tue Mar 14 16:58:19 2006 -0800 [BRIDGE]: forwarding remove unneeded preempt and bh diasables Optimize the forwarding and transmit paths. Both places are called with bottom half/no preempt so there is no need to use spin_lock_bh or rcu_read_lock. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit f626526cc694d17c12e47d13519fa79ff248c86f Author: Stephen Hemminger Date: Tue Mar 14 16:57:53 2006 -0800 [BRIDGE]: netfilter inline cleanup Move nf_bridge_alloc from header file to the one place it is used and optimize it. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit fa2326e78b7aa045793b45a992e32af857b34bd4 Author: Stephen Hemminger Date: Tue Mar 14 16:57:30 2006 -0800 [BRIDGE]: netfilter VLAN macro cleanup Fix the VLAN macros in bridge netfilter code. Macros should not depend on magic variables. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 9640a9297a9b123e335848fbd8dd096d0da91e11 Author: Stephen Hemminger Date: Tue Mar 14 16:57:05 2006 -0800 [BRIDGE]: netfilter dont use __constant_htons Only use__constant_htons() for initializers and switch cases. For other uses, it is just as efficient and clearer to use htons Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 787c6fe7c0735ff3100fff40e40ef48b58c20da7 Author: Stephen Hemminger Date: Tue Mar 14 16:56:37 2006 -0800 [BRIDGE]: netfilter whitespace Run br_netfilter through Lindent to fix whitespace. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 1d55be3fdb8a6690b97ee0afb70a7ec6035b94d9 Author: Stephen Hemminger Date: Tue Mar 14 16:56:10 2006 -0800 [BRIDGE]: optimize frame pass up The netfilter hook that is used to receive frames doesn't need to be a stub. It is only called in two ways, both of which ignore the return value. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 96d17a6534ce26c61b478483bb1fc1e00bc92818 Author: Stephen Hemminger Date: Tue Mar 14 16:55:48 2006 -0800 [BRIDGE]: use kzalloc Use kzalloc versus kmalloc+memset. Also don't need to do memset() of bridge address since it is in netdev private data that is already zero'd in alloc_netdev. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 1df53dedc96e2f6e002ad07239d27eb791559cda Author: Stephen Hemminger Date: Tue Mar 14 16:55:22 2006 -0800 [BRIDGE]: use kcalloc Use kcalloc rather than kmalloc + memset. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 423fca1a253186460d76b5828187ef8d68bf3485 Author: Stephen Hemminger Date: Tue Mar 14 16:54:54 2006 -0800 [BRIDGE]: use setup_timer Use the now standard setup_timer function. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit c014184cf522dfc6cd88a5ecdd3bbfd249d98d01 Author: Stephen Hemminger Date: Tue Mar 14 16:54:28 2006 -0800 [BRIDGE]: remove unneeded bh disables The STP timers run off softirq (kernel timers), so there is no need to disable bottom half in the spin locks. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit bba84978da5b44feed40f6fb8dff3d876a7cca9c Author: David S. Miller Date: Sun Mar 12 18:16:33 2006 -0800 [NET]: Really delete the scm_send()/scm_recv() inlines. Signed-off-by: David S. Miller commit 65bf88e0f64369ce698ed32e000f7dc3f92360b9 Author: Andrew Morton Date: Sun Mar 12 18:07:25 2006 -0800 [NET]: Uninline scm_recv() and scm_send() a) They're big b) Their inlining forced the undesirable export of security_sid_to_context() Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit a3b7d0f2da518222bfa86603240f83021d517ecb Author: Andrew Morton Date: Sun Mar 12 01:41:14 2006 -0800 [SECURITY]: Export security_sid_to_context() WARNING: "security_sid_to_context" [net/unix/unix.ko] undefined! Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 913c56e55a34fcaf2c0d5f4f7191aeb9487e6614 Author: Andrew Morton Date: Sun Mar 12 00:57:15 2006 -0800 [BRIDGE] ebtables: Fix typo in ebtables build fix. It is unclear how that patch compiled... Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit b6237e58734a850fbdc6bcad8ad9949a6309d6f7 Author: Andrew Morton Date: Sun Mar 12 00:53:33 2006 -0800 [NET]: Fix build on ARM. net/core/sock.c: In function `sock_setsockopt': net/core/sock.c:460: error: duplicate case value net/core/sock.c:278: error: previously used here Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit f07c76889cf5613f0ff7af6630c0c6600c0cd6d1 Author: Andrew Morton Date: Sun Mar 12 00:44:30 2006 -0800 [BRIDGE] br_netfilter: Warning fixes. net/bridge/br_netfilter.c: In function `br_nf_pre_routing': net/bridge/br_netfilter.c:427: warning: unused variable `vhdr' net/bridge/br_netfilter.c:445: warning: unused variable `vhdr' Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 61d22b169b87d6dabeafe1bf91f9b271b46754e4 Author: Andrew Morton Date: Sun Mar 12 00:43:00 2006 -0800 [BRIDGE] ebtables: Build fix. net/bridge/netfilter/ebtables.c:1481: warning: initialization makes pointer from integer without a cast Note that the compat functions aren't implemented? Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit c510aaf7d6432f7cb0b44d2644158eb5ff69a359 Author: Catherine Zhang Date: Fri Mar 10 23:42:39 2006 -0800 [SECURITY]: Fix massive build fallout resulting from the getpeersec changes. 1) Add missing SO_PASSSEC defines for several platforms. 2) Make sure there is a security_sid_to_context() dummy implementation available when CONFIG_SECURITY_SELINUX is disabled. Signed-off-by: Catherine Zhang Signed-off-by: David S. Miller commit 32639ad6b7e3da27f233c0516471f0747f1178f5 Author: David S. Miller Date: Fri Mar 10 15:47:08 2006 -0800 [SPARC]: Fixup SO_*SEC values on 32-bit sparc. Sparc64 and Sparc32 have to have identical socket call numbering in order to handle compat layer stuff properly. Signed-off-by: David S. Miller commit 76f3830034ea15ffdbea5eaf8db73d2bb682a407 Author: David S. Miller Date: Fri Mar 10 15:11:43 2006 -0800 [INET]: Fix typo in Arnaldo's connection sock compat fixups. "struct inet_csk" --> "struct inet_connection_sock" :-) Signed-off-by: David S. Miller commit cacb1e0011aed81556cbd25797730ed0a4161461 Author: David S. Miller Date: Fri Mar 10 13:45:41 2006 -0800 [SPARC]: Provide a SO_PASSSEC definition. Signed-off-by: David S. Miller commit 5ce3566336cdf2c351e178368364516f76c964ff Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 18:16:54 2006 -0300 [DCCP] feat: Pass dccp_minisock ptr where only the minisock is used This is in preparation for having a dccp_minisock embedded into dccp_request_sock so that feature negotiation can be done prior to creating the full blown dccp_sock. Signed-off-by: Arnaldo Carvalho de Melo commit 7bbc1279b5199fbd225bec390fce70245acbb27f Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 18:11:48 2006 -0300 [DCCP] minisock: Rename struct dccp_options to struct dccp_minisock This will later be included in struct dccp_request_sock so that we can have per connection feature negotiation state while in the 3way handshake, when we clone the DCCP_ROLE_LISTEN socket (in dccp_create_openreq_child) we'll just copy this state from dreq_minisock to dccps_minisock. Also the feature negotiation and option parsing code will mostly touch dccps_minisock, which will simplify some stuff. Signed-off-by: Arnaldo Carvalho de Melo commit 4efa8434cce5535cefea8e9f44fada7136159722 Author: Catherine Zhang Date: Fri Mar 10 18:08:25 2006 -0300 [SELINUX]: selinux_socket_getpeer_{stream,dgram} fixup Signed-off-by: Catherine Zhang Signed-off-by: Arnaldo Carvalho de Melo commit 96d51c8c2816c58290a8379ba383eb8d520ab0b4 Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 17:35:38 2006 -0300 [NET]: Identation & other cleanups related to compat_[gs]etsockopt cset No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo commit 7c365b7e91f09e8535c7ed334b816226a58a680b Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 17:21:18 2006 -0300 [SK_BUFF]: export skb_pull_rcsum *** Warning: "skb_pull_rcsum" [net/bridge/bridge.ko] undefined! *** Warning: "skb_pull_rcsum" [net/8021q/8021q.ko] undefined! *** Warning: "skb_pull_rcsum" [drivers/net/pppoe.ko] undefined! *** Warning: "skb_pull_rcsum" [drivers/net/ppp_generic.ko] undefined! Signed-off-by: Arnaldo Carvalho de Melo commit 3cf2f09e8adda7b2d1d9a2bbd05b42a5608044f4 Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 16:43:40 2006 -0300 [SECURITY] getpeersec: Fix build breakage a265d6baa827bd6411d1c5566b9e3596fec88a91 removes dummy_socket_getpeersec, replacing it with two new functions, but still references the removed function in the security_fixup_ops table, fix it by doing the replacement operation in the fixup table too. Signed-off-by: Arnaldo Carvalho de Melo commit b4692710ef01c342b050947b55240424854d272e Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 16:38:39 2006 -0300 [INFINIBAND] ipoib: Remove leftover use of neigh_ops->destructor Signed-off-by: Arnaldo Carvalho de Melo commit 58044045b103c91119a33e3630c9fa5289069422 Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 14:09:34 2006 -0300 [ICSK] compat: Introduce inet_csk_compat_[gs]etsockopt Signed-off-by: Arnaldo Carvalho de Melo commit ad864a85e3a5ff4f063d8e415ffd44a7a249feb2 Author: Arnaldo Carvalho de Melo Date: Fri Mar 10 14:02:07 2006 -0300 [SNAP]: Remove leftover unused hdr variable Signed-off-by: Arnaldo Carvalho de Melo commit 4d9e17981639d3764cff77d5cc141a42d44e2d2c Author: Dmitry Mishin Date: Fri Mar 10 03:34:27 2006 -0800 [NET]: {get|set}sockopt compatibility layer This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin Signed-off-by: David S. Miller commit f5cef3d513557704ce35a188feb74c120b44b8d0 Author: Dave Jones Date: Fri Mar 10 03:10:18 2006 -0800 [IPV6]: remove useless test in ip6_append_data We've already dereferenced 'np' a dozen times at this point, so it's safe to say it's not null. Signed-off-by: Dave Jones Signed-off-by: David S. Miller commit 136313f487f294cc53f681d7380621e0d6dc540a Author: Adrian Bunk Date: Fri Mar 10 03:03:58 2006 -0800 [PKT_SCHED]: Let NET_CLS_ACT no longer depend on EXPERIMENTAL This option should IMHO no longer depend on EXPERIMENTAL. Signed-off-by: Adrian Bunk ACKed-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 5237465933c391dbd9a8f994920658006cb74fc8 Author: Herbert Xu Date: Fri Mar 10 02:56:45 2006 -0800 [NET]: Replace skb_pull/skb_postpull_rcsum with skb_pull_rcsum We're now starting to have quite a number of places that do skb_pull followed immediately by an skb_postpull_rcsum. We can merge these two operations into one function with skb_pull_rcsum. This makes sense since most pull operations on receive skb's need to update the checksum. I've decided to make this out-of-line since it is fairly big and the fast path where hardware checksums are enabled need to call csum_partial anyway. Since this is a brand new function we get to add an extra check on the len argument. As it is most callers of skb_pull ignore its return value which essentially means that there is no check on the len argument. Signed-off-by: Herbert Xu Signed-off-by: David S. Miller commit 3d6f6072c7c2142e21cfddcd3a1abe51c0bf0149 Author: Steven Whitehouse Date: Fri Mar 10 02:52:15 2006 -0800 [DECnet] Use RCU locking in dn_rules.c As per Robert Olsson's patch for ipv4, this is the DECnet version to keep the code "in step". It changes the list of rules to use RCU rather than an rwlock. Inspired-by: Robert Olsson Signed-off-by: Steven Whitehouse Signed-off-by: Patrick Caulfield Signed-off-by: David S. Miller commit 9fc26a989f1c36c92b8849e9086f576a0ec16c30 Author: Patrick Caulfield Date: Fri Mar 10 02:51:32 2006 -0800 [DECnet] Patch to fix recvmsg() flag check This patch means that 64bit kernel/32bit userland platforms will now work correctly with DECnet. Signed-off-by: Patrick Caulfield Signed-off-by: Steven Whitehouse commit 15000f46ecbc1909ef33e58285a9bc44b345f5fa Author: Steven Whitehouse Date: Fri Mar 10 02:51:03 2006 -0800 [DECnet] Endian annotation and fixes for DECnet. The typedef for dn_address has been removed in favour of using __le16 or __u16 directly as appropriate. All the DECnet header files are updated accordingly. The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed since just about all their callers wanted network order rather than host order, so the conversion is now done in the functions themselves. Several missed endianess conversions have been picked up during the conversion process. The nh_gw field in struct dn_fib_info has been changed from a 32 bit field to 16 bits as it ought to be. One or two cases of using htons rather than dn_htons in the routing code have been found and fixed. There are still a few warnings to fix, but this patch deals with the important cases. Signed-off-by: Steven Whitehouse Signed-off-by: Patrick Caulfield Signed-off-by: David S. Miller commit ddf1c0e35d73b05ebc9fc12cb374315f806a2764 Author: Catherine Zhang Date: Fri Mar 10 00:38:44 2006 -0800 [SECURITY]: Unix Datagram getpeersec This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang Acked-by: James Morris Signed-off-by: David S. Miller commit a265d6baa827bd6411d1c5566b9e3596fec88a91 Author: Catherine Zhang Date: Fri Mar 10 00:34:15 2006 -0800 [SECURITY]: TCP/UDP getpeersec This patch implements an application of the LSM-IPSec networking controls whereby an application can determine the label of the security association its TCP or UDP sockets are currently connected to via getsockopt and the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of an IPSec security association a particular TCP or UDP socket is using. The application can then use this security context to determine the security context for processing on behalf of the peer at the other end of this connection. In the case of UDP, the security context is for each individual packet. An example application is the inetd daemon, which could be modified to start daemons running at security contexts dependent on the remote client. Patch design approach: - Design for TCP The patch enables the SELinux LSM to set the peer security context for a socket based on the security context of the IPSec security association. The application may retrieve this context using getsockopt. When called, the kernel determines if the socket is a connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry cache on the socket to retrieve the security associations. If a security association has a security context, the context string is returned, as for UNIX domain sockets. - Design for UDP Unlike TCP, UDP is connectionless. This requires a somewhat different API to retrieve the peer security context. With TCP, the peer security context stays the same throughout the connection, thus it can be retrieved at any time between when the connection is established and when it is torn down. With UDP, each read/write can have different peer and thus the security context might change every time. As a result the security context retrieval must be done TOGETHER with the packet retrieval. The solution is to build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). Patch implementation details: - Implementation for TCP The security context can be retrieved by applications using getsockopt with the existing SO_PEERSEC flag. As an example (ignoring error checking): getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen); printf("Socket peer context is: %s\n", optbuf); The SELinux function, selinux_socket_getpeersec, is extended to check for labeled security associations for connected (TCP_ESTABLISHED == sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of struct dst_entry values that may refer to security associations. If these have security associations with security contexts, the security context is returned. getsockopt returns a buffer that contains a security context string or the buffer is unmodified. - Implementation for UDP To retrieve the security context, the application first indicates to the kernel such desire by setting the IP_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for UDP should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_IP && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow a server socket to receive security context of the peer. A new ancillary message type SCM_SECURITY. When the packet is received we get the security context from the sec_path pointer which is contained in the sk_buff, and copy it to the ancillary message space. An additional LSM hook, selinux_socket_getpeersec_udp, is defined to retrieve the security context from the SELinux space. The existing function, selinux_socket_getpeersec does not suit our purpose, because the security context is copied directly to user space, rather than to kernel space. Testing: We have tested the patch by setting up TCP and UDP connections between applications on two machines using the IPSec policies that result in labeled security associations being built. For TCP, we can then extract the peer security context using getsockopt on either end. For UDP, the receiving end can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang Acked-by: James Morris Acked-by: Herbert Xu Signed-off-by: David S. Miller commit ef995532edbd11b3c703272866232f5f682d8caf Author: Patrick McHardy Date: Thu Mar 9 15:31:51 2006 -0800 [XFRM]: Fix aevent related crash When xfrm_user isn't loaded xfrm_nl is NULL, which makes IPsec crash because xfrm_aevent_is_on passes the NULL pointer to netlink_has_listeners as socket. A second problem is that the xfrm_nl pointer is not cleared when the socket is releases at module unload time. Protect references of xfrm_nl from outside of xfrm_user by RCU, check that the socket is present in xfrm_aevent_is_on and set it to NULL when unloading xfrm_user. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 4f034885acfbd446424c5db653736d072b3ddecb Author: Rick Jones Date: Thu Mar 9 15:22:29 2006 -0800 [TCP]: sysctl to allow TCP window > 32767 sans wscale Back in the dark ages, we had to be conservative and only allow 15-bit window fields if the window scale option was not negotiated. Some ancient stacks used a signed 16-bit quantity for the window field of the TCP header and would get confused. Those days are long gone, so we can use the full 16-bits by default now. There is a sysctl added so that we can still interact with such old stacks Signed-off-by: Rick Jones Signed-off-by: David S. Miller commit 641ee548431d0a62b185feab9e429e54933bea92 Author: Neil Horman Date: Thu Mar 9 01:23:07 2006 -0800 [IPV4] ARP: Documentation for new arp_accept sysctl variable. As John pointed out, I had not added documentation to describe the arp_accpet sysctl that I posted in my last patch. This patch adds that documentation. Signed-off-by: Neil Horman Signed-off-by: David S. Miller commit 8877367c4a68b87084b611349fa89af68bac8ea0 Author: Neil Horman Date: Thu Mar 9 01:20:42 2006 -0800 [IPV4] ARP: Alloc acceptance of unsolicited ARP via netdevice sysctl. Signed-off-by: Neil Horman Signed-off-by: David S. Miller commit 879254e8cdec9152041c57b629c9f36e1e6210c8 Author: Jeff Mahoney Date: Thu Mar 9 00:52:47 2006 -0800 [TG3]: netif_carrier_off runs too early; could still be queued when init fails Move the netif_carrier_off() call from tg3_init_one()-> tg3_init_link_config() to tg3_open() as is the convention for most other network drivers. I was getting a panic after a tg3 device failed to initialize due to DMA failure. The oops pointed to the link watch queue with spinlock debugging enabled. Without spinlock debugging, the Oops didn't occur. I suspect that the link event was getting queued but not executed until after the DMA test had failed and the device was freed. The link event was then operating on freed memory, which could contain anything. With this patch applied, the Oops no longer occurs. [ Based upon feedback from Michael Chan, we move netif_carrier_off() to the end of tg3_init_one() instead of moving it to tg3_open() -DaveM ] Signed-off-by: Jeff Mahoney Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 71f224a1d6a45126c8385e82d23b29d79bf2b2b6 Author: Per Liden Date: Thu Mar 9 00:42:52 2006 -0800 [TIPC]: Avoid compiler warning Signed-off-by: Per Liden Signed-off-by: David S. Miller commit d1c9d783a6bc1d32e6061f2417278a7eea8a7a7e Author: Per Liden Date: Thu Mar 9 00:41:48 2006 -0800 [TIPC]: Reduce stack usage The node_map struct can be quite large (516 bytes) and allocating two of them on the stack is not a good idea since we might only have a 4K stack to start with. Signed-off-by: Per Liden Signed-off-by: David S. Miller commit 2326798b51675859b08471f29aef9ca0055390dd Author: Adrian Bunk Date: Thu Mar 9 00:41:15 2006 -0800 [TIPC]: Cleanups This patch contains the following possible cleanups: - make needlessly global code static - #if 0 the following unused global functions: - name_table.c: tipc_nametbl_print() - name_table.c: tipc_nametbl_dump() - net.c: tipc_net_next_node() Signed-off-by: Adrian Bunk Signed-off-by: Per Liden Signed-off-by: David S. Miller commit f44bb309ea27322939c57e4a140a50652bf23c89 Author: Per Liden Date: Thu Mar 9 00:40:07 2006 -0800 [TIPC]: Remove unused functions Signed-off-by: Per Liden Signed-off-by: David S. Miller commit 9ab00cf9ce629934a0399c9849e4c620cc2d6129 Author: Sam Ravnborg Date: Thu Mar 9 00:38:14 2006 -0800 [TIPC]: Remove inlines from *.c With reference to latest discussions on linux-kernel with respect to inline here is a patch for tipc to remove all inlines as used in the .c files. See also chapter 14 in Documentation/CodingStyle. Before: text data bss dec hex filename 102990 5292 1752 110034 1add2 tipc.o Now: text data bss dec hex filename 101190 5292 1752 108234 1a6ca tipc.o This is a nice text size reduction which will improve icache usage. In some cases bigger (> 4 lines) functions where declared inline and used in many places, they are most probarly no longer inlined by gcc resulting in the size reduction. There are several one liners that no longer are declared inline, but gcc should inline these just fine without the inline hint. With this patch applied one warning is added about an unused static function - that was hidded by utilising inline before. The function in question were kept so this patch is solely a inline removal patch. Signed-off-by: Sam Ravnborg Signed-off-by: Per Liden Signed-off-by: David S. Miller commit b0fff0141bdb014ac3b504ce7e0a1c04af9ffb2b Author: Sam Ravnborg Date: Thu Mar 9 00:36:59 2006 -0800 [TIPC]: Fix simple sparse warnings Tried to run the new tipc stack through sparse. Following patch fixes all cases where 0 was used as replacement of NULL. Use NULL to document this is a pointer and to silence sparse. This brough sparse warning count down with 127 to 24 warnings. Signed-off-by: Sam Ravnborg Signed-off-by: Per Liden Signed-off-by: David S. Miller commit 176e326cabcccf8fbfe3ca2cd80df9267327efb3 Author: David S. Miller Date: Thu Mar 9 00:29:29 2006 -0800 [NETFILTER]: Fix warnings in ip_nat_snmp_basic.c net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode': net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate': net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used uninitialized in this function Signed-off-by: David S. Miller commit 1c8e8d0fcb367748533480c25ae481bb6cb2941a Author: David S. Miller Date: Wed Mar 8 23:31:46 2006 -0800 [DCCP]: Fix uninitialized var warnings in dccp_parse_options(). Signed-off-by: David S. Miller commit baac94b213ce19bcdb871f77ddfd506f637978d7 Author: Ingo Molnar Date: Wed Mar 8 23:09:07 2006 -0800 [NET]: sem2mutex part 2 Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 0b23b355e615d3c8c0ba82c111aab5f0d0e8c2e9 Author: Alexey Dobriyan Date: Wed Mar 8 23:02:49 2006 -0800 [ATM] suni: cast arg properly in SONET_SETFRAMING Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 81c9eed3690fc245c956d85a3edf94e427d21a39 Author: Sam Ravnborg Date: Wed Mar 8 23:01:43 2006 -0800 [WAN]: fix section mismatch warning in sbni In latest -mm sbni gives following warning: WARNING: drivers/net/wan/sbni.o - Section mismatch: reference to \ .init.data: from .text between 'init_module' (at offset 0x14ef) and \ 'cleanup_module' The warning is caused by init_module() calling a function declared __init. Declare init_module() __init too to fix warning. Signed-off-by: Sam Ravnborg Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 3a8bd611f91e626dbf60ef4ed9d25b355a925cd4 Author: Ingo Molnar Date: Wed Mar 8 22:57:06 2006 -0800 [SUNGEM]: sem2mutex Semaphore to mutexes conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit fa0efe92e2f0418344de9df29046f6e833d295fe Author: Ingo Molnar Date: Wed Mar 8 22:54:20 2006 -0800 [CASSINI]: sem2mutex Semaphore to mutexes conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 439cc19f819a450d66376ae7c83115435e4560c1 Author: Andrew Morton Date: Wed Mar 8 22:50:49 2006 -0800 [IRDA]: remove MODULE_PARM() MODULE_PARM() is deprecated and is about to go away altogether. Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit f50bcdfa37b53fc02eac2c63cc7156a19dd11bdb Author: Arjan van de Ven Date: Wed Mar 8 22:50:09 2006 -0800 [NET] sem2mutex: net/ Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 24cdaf0d8120aba80d20e85ca814b40401d5415c Author: Arjan van de Ven Date: Wed Mar 8 22:48:30 2006 -0800 [IRDA] sem2mutex: drivers/net/irda Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 1105f1a41e4800036fd0159ce6d62327031eec7d Author: Stephen Hemminger Date: Wed Mar 8 22:45:30 2006 -0800 [NET]: dev_put/dev_hold cleanup Get rid of the old __dev_put macro that is just a hold over from pre 2.6 kernel. And turn dev_hold into an inline instead of a macro. Signed-off-by: Stephen Hemminger Signed-off-by: Andrew Morton Signed-off-by: David S. Miller commit 06325f52a7dedd5f1f4a3f689306a5122d6b88f2 Author: Arnaldo Carvalho de Melo Date: Wed Mar 8 22:43:50 2006 -0800 [DCCP] options: Make dccp_insert_options & friends yell on error And not the silly LIMIT_NETDEBUG and silently return without inserting the option requested. Also drop some old debugging messages associated to option insertion. Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller commit 7304430fc3c0038e2f824785c8e2ddcdd1170bb0 Author: Arnaldo Carvalho de Melo Date: Wed Mar 8 22:42:54 2006 -0800 [DCCP]: Remove leftover dccp_send_response prototype Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller commit 12f3fb0870f3d8e133c3a2fa4041bec9f21fed56 Author: Arnaldo Carvalho de Melo Date: Wed Mar 8 22:42:28 2006 -0800 [DCCP]: ditch dccp_v[46]_ctl_send_ack Merging it with its only user: dccp_v[46]_reqsk_send_ack. Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller commit 440b3f636b57452c87b01dc07f132a34cb083036 Author: Arnaldo Carvalho de Melo Date: Wed Mar 8 22:42:02 2006 -0800 [DCCP]: Use sk->sk_prot->max_header consistently for non-data packets Using this also provides opportunities for introducing inet_csk_alloc_skb that would call alloc_skb, account it to the sock and skb_reserve(max_header), but I'll leave this for later, for now using sk_prot->max_header consistently is enough. Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller commit 74b8a772c11c1b7418331275307b5ac46e001b9a Author: Arnaldo Carvalho de Melo Date: Wed Mar 8 22:40:59 2006 -0800 [DCCP] options: Fix handling of ackvecs in DATA packets I.e. they should be just ignored, but we have to use 'break', not 'continue', as we have to possibly reset the mandatory flag. Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller commit 74ad00d8c97f06369d058c0a77e2e0e93139ae65 Author: David S. Miller Date: Mon Mar 6 22:29:47 2006 -0800 [ATM]: Fix build after neigh->parms->neigh_destructor change. Signed-off-by: David S. Miller commit f88c1bf0c0bddd53d2542b9f20646d47c51556eb Author: Michael Chan Date: Mon Mar 6 16:45:26 2006 -0800 [TG3]: update version and reldate Update version to 3.52. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 6d96c883affbf8f973b50b25973ae38f2e67b86e Author: Michael Chan Date: Mon Mar 6 16:44:34 2006 -0800 [TG3]: Add firmware version info Add fw_version information to ethtool -i. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 53cb8a17a7d51396aaa6e1bd149d17c2e03e49f8 Author: Michael Chan Date: Mon Mar 6 16:43:12 2006 -0800 [TG3]: nvram cleanup Some nvram related cleanup: 1. Add a tg3_nvram_read_swab() since swabing the data is frequently done. 2. Add a function to convert nvram address to physical address instead of doing it in 2 separate places. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit d29d2a53b412bbef4a1bdddafb70d2e5709d75b8 Author: Michael Chan Date: Mon Mar 6 16:42:29 2006 -0800 [TG3]: Fixup memory test for 5787 Ethtool memory test on 5787 requires a new memory table. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 415310e34ae8a37d0b11c1b8f2d328f505190b54 Author: Michael Chan Date: Mon Mar 6 16:40:52 2006 -0800 [TG3]: Add new one-shot MSI handler Support one-shot MSI on 5787. This one-shot MSI idea is credited to David Miller. In this mode, MSI disables itself automatically after it is generated, saving the driver a register access to disable it for NAPI. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit fba976b994a6acbe2d9582078d366b03f6d2dd2a Author: Michael Chan Date: Mon Mar 6 16:40:07 2006 -0800 [TG3]: Add ipv6 checksum support Support ipv6 tx csum on 5787 by setting NETIF_F_HW_CSUM. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 33967b1e470a7673ae463d71c7b1e37efa1887a7 Author: Michael Chan Date: Mon Mar 6 16:38:18 2006 -0800 [TG3]: Add new hard_start_xmit Support 5787 hardware TSO using a new flag TG3_FLG2_HW_TSO_2. Since the TSO interface is slightly different and these chips have finally fixed the 4GB DMA problem and do not have the 40-bit DMA problem, a new hard_start_xmit is used for these chips. All previous chips will use the old hard_start_xmit that is now renamed tg3_start_xmit_dma_bug(). Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 9db372c2a93d47895f2235dd9cbe7f2c6732ec53 Author: Michael Chan Date: Mon Mar 6 16:33:46 2006 -0800 [TG3]: Add 5787 nvram support Support additional nvrams and new nvram format for 5787 and 5754. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 48ec375cf19e4c2da829c707712ea9fea2e09265 Author: Michael Chan Date: Mon Mar 6 16:32:21 2006 -0800 [TG3]: Add 5787 and 5754 basic support Add basic support for 2 new chips 5787 and 5754. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit d6c152c630bc0d808106d0ad223529ac3429342f Author: Benjamin LaHaise Date: Mon Mar 6 14:45:49 2006 -0800 [NET]: use fget_light() in net/socket.c Here's an updated copy of the patch to use fget_light in net/socket.c. Rerunning the tests show a drop of ~80Mbit/s on average, which looks bad until you see the drop in cpu usage from ~89% to ~82%. That will get fixed in another patch... Before: max 8113.70, min 8026.32, avg 8072.34 87380 16384 16384 10.01 8045.55 87.11 87.11 1.774 1.774 87380 16384 16384 10.01 8065.14 90.86 90.86 1.846 1.846 87380 16384 16384 10.00 8077.76 89.85 89.85 1.822 1.822 87380 16384 16384 10.00 8026.32 89.80 89.80 1.833 1.833 87380 16384 16384 10.01 8108.59 89.81 89.81 1.815 1.815 87380 16384 16384 10.01 8034.53 89.01 89.01 1.815 1.815 87380 16384 16384 10.00 8113.70 90.45 90.45 1.827 1.827 87380 16384 16384 10.00 8111.37 89.90 89.90 1.816 1.816 87380 16384 16384 10.01 8077.75 87.96 87.96 1.784 1.784 87380 16384 16384 10.00 8062.70 90.25 90.25 1.834 1.834 After: max 8035.81, min 7963.69, avg 7998.14 87380 16384 16384 10.01 8000.93 82.11 82.11 1.682 1.682 87380 16384 16384 10.01 8016.17 83.67 83.67 1.710 1.710 87380 16384 16384 10.01 7963.69 83.47 83.47 1.717 1.717 87380 16384 16384 10.01 8014.35 81.71 81.71 1.671 1.671 87380 16384 16384 10.00 7967.68 83.41 83.41 1.715 1.715 87380 16384 16384 10.00 7995.22 81.00 81.00 1.660 1.660 87380 16384 16384 10.00 8002.61 83.90 83.90 1.718 1.718 87380 16384 16384 10.00 8035.81 81.71 81.71 1.666 1.666 87380 16384 16384 10.01 8005.36 82.56 82.56 1.690 1.690 87380 16384 16384 10.00 7979.61 82.50 82.50 1.694 1.694 Signed-off-by: Benjamin LaHaise Signed-off-by: David S. Miller commit 9a4562eaca2247426879aa802f4a862254066084 Author: Stephen Hemminger Date: Mon Mar 6 14:42:27 2006 -0800 [NET]: minor net_rx_action optimization The functions list_del followed by list_add_tail is equivalent to the existing inline list_move_tail. list_move_tail avoids unnecessary _LIST_POISON. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 5f2a56b59c97de987fea15f27067ce1d9541da86 Author: Alpt Date: Fri Mar 3 18:01:01 2006 -0800 [NET] rtnetlink: Add RTPROT entry for Netsukuku. The Netsukuku daemon is using the same number to mark its routes, you can see it here: http://hinezumilabs.org/cgi-bin/viewcvs.cgi/netsukuku/src/krnl_route.h?rev=HEAD&content-type=text/vnd.viewcvs-markup Signed-off-by: David S. Miller commit 8963cdba2cd537cf382d73b598fbe2084427c187 Author: Michael S. Tsirkin Date: Fri Mar 3 17:58:29 2006 -0800 [NET]: Move destructor from neigh->ops to neigh_params struct neigh_ops currently has a destructor field, which no in-kernel drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree driver stashes some info in the neighbour structure (the results of the second-stage lookup from ARP results to real link-level path), and it uses neigh->ops->destructor to get a callback so it can clean up this extra info when a neighbour is freed. We've run into problems with this: since the destructor is in an ops field that is shared between neighbours that may belong to different net devices, there's no way to set/clear it safely. The following patch moves this field to neigh_parms where it can be safely set, together with its twin neigh_setup. Two additional patches in the patch series update ipoib to use this new interface. Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier Signed-off-by: David S. Miller commit 4fb001333bf27214117b02f275ddae8a60cfb145 Author: Luiz Capitulino Date: Fri Mar 3 17:53:03 2006 -0800 [PKTGEN]: Updates version. Due to the thread's lock changes, we're at a new version now. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit fa80cfc046c7a6813c214abe7da5523c8bbcb370 Author: Luiz Capitulino Date: Fri Mar 3 17:52:31 2006 -0800 [PKTGEN]: Removes thread_{un,}lock() macros. As suggested by Arnaldo, this patch replaces the thread_lock()/thread_unlock() by directly calls to mutex_lock()/mutex_unlock(). This change makes the code a bit more readable, and the direct calls are used everywhere in the kernel. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit bbae758110d7f8b80db2f67fa0384c1ee1365790 Author: Luiz Capitulino Date: Fri Mar 3 17:51:53 2006 -0800 [PKTGEN]: Convert thread lock to mutexes. pktgen's thread semaphores are strict mutexes, convert them to the mutex implementation. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit 8f8d7680c7972054c25a6d3f0c1833242940c9d6 Author: Stephen Hemminger Date: Fri Mar 3 17:41:48 2006 -0800 [NET]: Convert RTNL to mutex. This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and gets rid of some of the leftover legacy. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller commit 5824345130b1fb9b3ae2cda961e819cb1d0f2312 Author: David S. Miller Date: Fri Mar 3 17:33:56 2006 -0800 [IPSEC] xfrm_user: Kill PAGE_SIZE check in verify_sec_ctx_len() First, it warns when PAGE_SIZE >= 64K because the ctx_len field is 16-bits. Secondly, if there are any real length limitations it can be verified by the security layer security_xfrm_state_alloc() call. Signed-off-by: David S. Miller commit 91b65f8f9341eed050cd0415d9e5649c5b4fc885 Author: Baruch Even Date: Fri Mar 3 17:25:22 2006 -0800 [TCP] H-TCP: Better time accounting Instead of estimating the time since the last congestion event, count it directly. Signed-Off-By: Baruch Even Signed-off-by: David S. Miller commit 3450c42791ff0b73ecc22c649f7b004dd0434733 Author: Baruch Even Date: Fri Mar 3 17:24:39 2006 -0800 [TCP] H-TCP: Account for delayed-ACKs Account for delayed-ACKs in H-TCP. Delayed-ACKs cause H-TCP to be less aggressive than its design calls for. It is especially true when the receiver is a Linux machine where the average delayed ack is over 3 packets with values of 7 not unheard of. Signed-Off-By: Baruch Even Signed-off-by: David S. Miller commit 3fb3ef95fffb064854893905d77d181c19246b26 Author: Baruch Even Date: Fri Mar 3 17:23:58 2006 -0800 [TCP] H-TCP: Use msecs_to_jiffies Use functions to calculate jiffies from milliseconds and not the old, crude method of dividing HZ by a value. Ensures more accurate values even in the face of strange HZ values. Signed-Off-By: Baruch Even Signed-off-by: David S. Miller commit 5c8ab7ae8d23e197092300277ab26917c5793ba8 Author: Evgeniy Polyakov Date: Fri Mar 3 17:22:38 2006 -0800 [CONNECTOR]: Use netlink_has_listeners() to avoind unnecessary allocations. Return -ESRCH from cn_netlink_send() when there are not listeners, just as it could be done by netlink_broadcast(). Propagate netlink_broadcast() error back to the caller. Signed-off-by: Evgeniy Polyakov Signed-off-by: David S. Miller commit 3d3fc9c68db766e88702ae416ec7247982e78062 Author: David Basden Date: Wed Mar 1 16:04:14 2006 -0800 [IRDA]: TOIM3232 dongle support Here goes a patch for supporting TOIM3232 based serial IrDA dongles. The code is based on the tekram dongle code. It's been tested with a TOIM3232 based IRWave 320S dongle. It may work for TOIM4232 dongles, although it's not been tested. Signed-off-by: David Basden Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit 837f2c82abe82337fad1d6d05a3551c9677c4e9e Author: Luiz Capitulino Date: Tue Feb 28 12:45:01 2006 -0800 [PKTGEN]: Updates version. With all the previous changes, we're at a new version now. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit b832620a4494263cb8718c287975464be3e58b48 Author: Luiz Capitulino Date: Tue Feb 28 12:44:34 2006 -0800 [PKTGEN]: Ports if_list to the in-kernel implementation. This patch ports the per-thread interface list list to the in-kernel linked list implementation. In the general, the resulting code is a bit simpler. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit 20c147a32e418919f4496e76e8b7a74a2cc43d04 Author: Luiz Capitulino Date: Tue Feb 28 12:44:01 2006 -0800 [PKTGEN]: Fix Initialization fail leak. Even if pktgen's thread initialization fails for all CPUs, the module will be successfully loaded. This patch changes that behaivor, by returning an error on module load time, and also freeing all the resources allocated. It also prints a warning if a thread initialization has failed. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit c12f66cc32a46a7880e0d7e70c58c71b144d110d Author: Luiz Capitulino Date: Tue Feb 28 12:43:33 2006 -0800 [PKTGEN]: Fix kernel_thread() fail leak. Free all the alocated resources if kernel_thread() call fails. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit 2e542e3ab430eb29fcef0958fd6a09f5ffe8f000 Author: Luiz Capitulino Date: Tue Feb 28 12:43:00 2006 -0800 [PKTGEN]: Ports thread list to Kernel list implementation. The final result is a simpler and smaller code. Note that I'm adding a new member in the struct pktgen_thread called 'removed'. The reason is that I didn't find a better wait condition to be used in the place of the replaced one. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit d65f838cc634e588ecbb7ec1316d13608112dba8 Author: Luiz Capitulino Date: Tue Feb 28 12:42:38 2006 -0800 [PKTGEN]: Lindent run. Lindet run, with some fixes made by hand. Signed-off-by: Luiz Capitulino Signed-off-by: David S. Miller commit 1a2155249135e4e99520015829c35434b5629000 Author: Hagen Paul Pfeifer Date: Tue Feb 28 16:16:50 2006 -0300 [DCCP] options: Fix some aspects of mandatory option processing According to dccp draft (draft-ietf-dccp-spec-13.txt) section 5.8.2 (Mandatory Option) the following patch correct the handling of the following cases: 1) "... and any Mandatory options received on DCCP-Data packets MUST be ignored." 2) "The connection is in error and should be reset with Reset Code 5, ... if option O is absent (Mandatory was the last byte of the option list), or if option O equals Mandatory." Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Hagen Paul Pfeifer commit c8e8fbb4d761e37263824cdef1f4ad835b4d980d Author: Arnaldo Carvalho de Melo Date: Sun Feb 26 00:22:04 2006 -0300 [DCCP] ccid2: coding style cleanups No changes in the logic where made. Signed-off-by: Arnaldo Carvalho de Melo commit b2e4575dc052b96314d51c670e7bebd12e6c11a9 Author: Arnaldo Carvalho de Melo Date: Sun Feb 26 00:04:04 2006 -0300 [DCCP] ipv6: cleanups No changes in the logic were made, just removing trailing whitespaces, etc. Signed-off-by: Arnaldo Carvalho de Melo commit 8137662d6770e69b2230aeed7c14362a88db700c Author: Arnaldo Carvalho de Melo Date: Sat Feb 25 19:24:19 2006 -0300 [ICSK]: Introduce inet_csk_ctl_sock_create Consolidating open coded sequences in tcp and dccp, v4 and v6. Signed-off-by: Arnaldo Carvalho de Melo commit 92e6d3cc62dd63a53f2e8f084feec1b2b678b388 Author: Arnaldo Carvalho de Melo Date: Sat Feb 25 18:12:20 2006 -0300 [DCCP] ipv6: Add missing ipv6 control socket I guess I forgot to add it, nah, now it just works: 18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0) 18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code) Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both IPv6 and IPv4, so I'll come up with another way for freeing the control sockets in upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo commit e4f122b39c07c3cec34b8513dff960263eac844c Author: Arnaldo Carvalho de Melo Date: Sat Feb 25 16:59:29 2006 -0300 [DCCP]: Uninline some functions Signed-off-by: Arnaldo Carvalho de Melo commit 339b7fe3681c9afe9c28552a668648e8ccdac368 Author: Adrian Bunk Date: Sat Feb 25 11:55:02 2006 -0300 [DCCP] ipv4: make struct dccp_v4_prot static There's no reason for struct dccp_v4_prot being global. Signed-off-by: Adrian Bunk commit fe5bd04f85de0b796ffa8f530f73524502d35068 Author: David S. Miller Date: Fri Feb 24 15:13:06 2006 -0800 [IPV6]: Fix some code/comment formatting in ip6_dst_output(). Signed-off-by: David S. Miller commit 693bc55d70cd4c75ceaf0bf774b0da7a4b61210a Author: Robert Olsson Date: Fri Feb 24 14:04:38 2006 -0800 [IPV4]: fib_trie stats fix fib_triestats has been buggy and caused oopses some platforms as openwrt. The patch below should cure those problems. Signed-off-by: Robert Olsson Signed-off-by: David S. Miller commit 12651cc7037e0917c396313002a3edf63e91bf31 Author: Robert Olsson Date: Fri Feb 24 14:03:42 2006 -0800 [IPV4]: fib_trie initialzation fix In some kernel configs /proc functions seems to be accessed before the trie is initialized. The patch below checks for this. Signed-off-by: Robert Olsson Signed-off-by: David S. Miller commit 2777525a462a957662d542379e3283ce9f6a3846 Author: Michael Chan Date: Fri Feb 24 14:02:15 2006 -0800 [TG3]: Fix tg3_get_ringparam() Fix-up tg3_get_ringparam() to return the correct parameters. Set the jumbo rx ring parameter only if it is supported by the chip and currently in use. Add missing value for tx_max_pending, noticed by Rick Jones. Update version to 3.51. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 31a3b4376f415a355698b396f9aa26124dff5fd0 Author: Michael Chan Date: Fri Feb 24 14:01:48 2006 -0800 [TG3]: Add some missing netif_running() checks Add missing netif_running() checks in tg3's dev->set_multicast_list() and dev->set_mac_address(). If not netif_running(), these 2 calls can simply return 0 after storing the new settings if required. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 4a7341470743e480a3f796ee1210a4465e8119bd Author: John Heffner Date: Fri Feb 24 13:58:53 2006 -0800 [TCP] mtu probing: move tcp-specific data out of inet_connection_sock This moves some TCP-specific MTU probing state out of inet_connection_sock back to tcp_sock. Signed-off-by: John Heffner Signed-off-by: David S. Miller commit b39c98ec9bce6f65756ff2bbcac7c653218ace5c Author: Benjamin LaHaise Date: Fri Feb 24 13:53:15 2006 -0800 [AF_UNIX]: scm: better initialization Instead of doing a memset then initialization of the fields of the scm structure, just initialize all the members explicitly. Prevent reloading of current on x86 and x86-64 by storing the value in a local variable for subsequent dereferences. This is worth a ~7KB/s increase in af_unix bandwidth. Note that we avoid the issues surrounding potentially uninitialized members of the ucred structure by constructing a struct ucred instead of assigning the members individually, which forces the compiler to zero any padding. Signed-off-by: Benjamin LaHaise Signed-off-by: David S. Miller commit 0e25b5b0df2d88ce600ea7dea8b0a2e0a9600f06 Author: Benjamin LaHaise Date: Fri Feb 24 13:52:42 2006 -0800 [AF_UNIX]: use shift instead of integer division The patch below replaces a divide by 2 with a shift -- sk_sndbuf is an integer, so gcc emits an idiv, which takes 10x longer than a shift by 1. This improves af_unix bandwidth by ~6-10K/s. Also, tidy up the comment to fit in 80 columns while we're at it. Signed-off-by: Benjamin LaHaise Signed-off-by: David S. Miller commit c4e47b959ecefdf956e4820022e41dce994122a7 Author: Jörn Engel Date: Fri Feb 24 13:32:25 2006 -0800 [NET]: Uninline kfree_skb and allow NULL argument o Uninline kfree_skb, which saves some 15k of object code on my notebook. o Allow kfree_skb to be called with a NULL argument. Subsequent patches can remove conditional from drivers and further reduce source and object size. Signed-off-by: Jörn Engel Signed-off-by: David S. Miller commit 9370026aa35414cf2523450a46f1c7d9ab1debce Author: Arnaldo Carvalho de Melo Date: Fri Feb 24 13:26:05 2006 -0300 [LLC]: Fix sap refcounting Thanks to Leslie Harlley Watter for reporting the problem an testing this patch. Signed-off-by: Arnaldo Carvalho de Melo commit 5667779feec9bdab50e4c6b2734c2aff5f828fd7 Author: Arnaldo Carvalho de Melo Date: Fri Feb 24 10:44:06 2006 -0300 [LLC]: Replace __inline__ with inline Signed-off-by: Arnaldo Carvalho de Melo commit 520d7ee4db3e26bf663b2944cdc763fcb3ec0a2d Author: Arnaldo Carvalho de Melo Date: Fri Feb 24 10:27:18 2006 -0300 [LLC]: Fix struct proto .name Cut'n'paste error from ddp_proto. Signed-off-by: Arnaldo Carvalho de Melo commit 5818dc022233c356793f5f3a32f023b604aa76c8 Author: Arthur Kepner Date: Thu Feb 23 17:14:41 2006 -0800 [NET] pktgen: Fix races between control/worker threads. There's a race in pktgen which can lead to a double free of a pktgen_dev's skb. If a worker thread is in the midst of doing fill_packet(), and the controlling thread gets a "stop" message, the already freed skb can be freed once again in pktgen_stop_device(). This patch gives all responsibility for cleaning up a pktgen_dev's skb to the associated worker thread. Signed-off-by: Arthur Kepner Acked-by: Robert Olsson Signed-off-by: David S. Miller commit 2862dfb52df52555b42a7c47359f98404b3f9a54 Author: Jamal Hadi Salim Date: Thu Feb 23 16:26:25 2006 -0800 [XFRM]: Rearrange struct xfrm_aevent_id for better compatibility. struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly. Based upon suggestions from Yoshifuji. Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit a3c0f498c9ae9146b7b4cbc20c51e401795b0ef4 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 17:33:26 2006 -0300 [DCCP]: Move the IPv4 specific bits from proto.c to ipv4.c With this patch in place we can break down the complexity by better compartmentalizing the code that is common to ipv6 and ipv4. Now we have these modules: Module Size Used by dccp_diag 1344 0 inet_diag 9448 1 dccp_diag dccp_ccid3 15856 0 dccp_tfrc_lib 12320 1 dccp_ccid3 dccp_ccid2 5764 0 dccp_ipv4 16996 2 dccp 48208 4 dccp_diag,dccp_ccid3,dccp_ccid2,dccp_ipv4 dccp_ipv6 still requires dccp_ipv4 due to dccp_ipv6_mapped, that is the next target to work on the "hey, ipv4 is legacy, I only want ipv6 dude!" direction. Signed-off-by: Arnaldo Carvalho de Melo commit 9309b89376ce450ca6ba335a24e167dc4f02edf8 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 15:51:53 2006 -0300 [DCCP]: Rename init_dccp_v4_mibs to dccp_mib_init And introduce dccp_mib_exit grouping previously open coded sequence. Signed-off-by: Arnaldo Carvalho de Melo commit a3e06ea1b3a3e782e16c08324d9f3ce0567e1481 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 15:31:22 2006 -0300 [DCCP]: Move dccp_hashinfo from ipv4.c to the core As it is used by both ipv4 and ipv6. Signed-off-by: Arnaldo Carvalho de Melo commit 547f3c199bb248bcecdd7c40fd8233409e2f4d76 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 15:19:57 2006 -0300 [DCCP]: Dont use dccp_v4_checksum in dccp_make_response dccp_make_response is shared by ipv4/6 and the ipv6 code was recalculating the checksum, not good, so move the dccp_v4_checksum call to dccp_v4_send_response. Signed-off-by: Arnaldo Carvalho de Melo commit 6504c7940f90dea13b233bfd2342dd0dc34bb578 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 14:12:33 2006 -0300 [DCCP]: Move dccp_[un]hash from ipv4.c to the core As this is used by both ipv4 and ipv6 and is not ipv4 specific. Signed-off-by: Arnaldo Carvalho de Melo commit 6762d96a55e8bddfb7df46772edc283e732c0a89 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 13:19:18 2006 -0300 [DCCP]: Move dccp_v4_{init,destroy}_sock to the core Removing one more ipv6 uses ipv4 stuff case in dccp land. Signed-off-by: Arnaldo Carvalho de Melo commit dc437a6e3c7ff82b076f28802b7e8efa514f00b5 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 13:09:11 2006 -0300 [DCCP]: Generalize dccp_v4_send_reset Renaming it to dccp_send_reset and moving it from the ipv4 specific code to the core dccp code. This fixes some bugs in IPV6 where timers would send v4 resets, etc. Signed-off-by: Arnaldo Carvalho de Melo commit 81a218e4a4769f6844678ab62cc200af8dfedbfa Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 17:13:10 2006 -0300 [DCCP] feat: Introduce sysctls for the default features [root@qemu ~]# for a in /proc/sys/net/dccp/default/* ; do echo $a ; cat $a ; done /proc/sys/net/dccp/default/ack_ratio 2 /proc/sys/net/dccp/default/rx_ccid 3 /proc/sys/net/dccp/default/send_ackvec 1 /proc/sys/net/dccp/default/send_ndp 1 /proc/sys/net/dccp/default/seq_window 100 /proc/sys/net/dccp/default/tx_ccid 3 [root@qemu ~]# So if wanting to test ccid3 as the tx CCID one can just do: [root@qemu ~]# echo 3 > /proc/sys/net/dccp/default/tx_ccid [root@qemu ~]# echo 2 > /proc/sys/net/dccp/default/rx_ccid [root@qemu ~]# cat /proc/sys/net/dccp/default/[tr]x_ccid 2 3 [root@qemu ~]# Of course we also need the setsockopt for each app to tell its preferences, but for testing or defining something other than CCID2 as the default for apps that don't explicitely set their preference the sysctl interface is handy. Signed-off-by: Arnaldo Carvalho de Melo commit 44862a4d5592e6a0e8d96b94308178e0fed8eab6 Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 16:58:15 2006 -0300 [DCCP]: Call dccp_feat_init more early in dccp_v4_init_sock So that dccp_feat_clean doesn't get confused with uninitialized list_heads. Noticed when testing with no ccid kernel modules. Signed-off-by: Arnaldo Carvalho de Melo commit f11837b372501fd5a653c9d4319953ae75edf411 Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 16:54:28 2006 -0300 [DCCP]: Kconfig tidy up Make CCID2 and CCID3 default to what was selected for DCCP and use the standard short description for the CCIDs (TCP-Like & TCP-Friendly). Signed-off-by: Arnaldo Carvalho de Melo commit 4dcdddeb3fcdbb7864b7cf8da47993ba59dbb876 Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 16:51:15 2006 -0300 [DCCP]: Make CCID2 be the default As per the draft. This fixes the build when netfilter dccp components are built and dccp isn't. Thanks to Reuben Farrelly for reporting this. The following changesets will introduce /proc/sys/net/dccp/defaults/ to give more flexibility to DCCP developers and testers while apps doesn't use setsockopt to specify the desired CCID, etc. Signed-off-by: Arnaldo Carvalho de Melo commit e0c1aea19dcafcbc39629bc634635a60de44c49d Author: Al Viro Date: Mon Feb 20 16:19:58 2006 -0300 [DCCP]: sparse endianness annotations This also fixes the layout of dccp_hdr short sequence numbers, problem was not fatal now as we only support long (48 bits) sequence numbers. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Al Viro commit a742209fdff470daf301f4cf872c70e31d32d7d0 Author: Patrick McHardy Date: Mon Feb 20 20:22:03 2006 -0800 [NETFILTER]: Fix skb->nf_bridge lifetime issues The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips the real hook by registering with high priority and returning NF_STOP if skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not set. The flag is only set during the simulated hook. Because skb->nf_bridge is only freed when the packet is destroyed, the packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but in the case of tunnel devices on top of the bridge also all further ones. Forwarded packets from a bridge encapsulated by a tunnel device and sent as locally outgoing packet will also still have the incorrect bridge information from the input path attached. We already have nf_reset calls on all RX/TX paths of tunnel devices, so simply reset the nf_bridge field there too. As an added bonus, the bridge information for locally delivered packets is now also freed when the packet is queued to a socket. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit bebc46da036d605a664d00ae489c6c77464a73ae Author: Andrea Bittau Date: Mon Feb 20 09:27:51 2006 -0300 [DCCP] feat: Actually change the CCID upon negotiation Change the CCID upon successful feature negotiation. Commiter note: patch mostly rewritten to use the new ccid API. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit b92152521dafb9fbaaf70c953a67a5235ff38a96 Author: Arnaldo Carvalho de Melo Date: Mon Feb 20 09:25:22 2006 -0300 [DCCP] CCID: Improve CCID infrastructure 1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit} does and anynways neither ccid2 nor ccid3 were using it. 2. Rename struct ccid to struct ccid_operations and introduce struct ccid with a pointer to ccid_operations and rigth after it the rx or tx private state. 3. Remove the pointer to the state of the half connections from struct dccp_sock, now its derived thru ccid_priv() from the ccid pointer. Now we also can implement the setsockopt for changing the CCID easily as no ccid init routines can affect struct dccp_sock in any way that prevents other CCIDs from working if a CCID switch operation is asked by apps. Signed-off-by: Arnaldo Carvalho de Melo commit 13d603c7b62a01e2c8bc63391e064c6cd3ecb3ac Author: Patrick McHardy Date: Sun Feb 19 21:05:35 2006 -0800 [PKT_SCHED]: Convert sch_red to a classful qdisc Convert sch_red to a classful qdisc. All qdiscs that maintain accurate backlog counters are eligible as child qdiscs. When a queue limit larger than zero is given, a bfifo qdisc is used for backwards compatibility. Current versions of tc enforce a limit larger than zero, other users can avoid creating the default qdisc by using zero. Signed-off-by: Patrick McHardy Acked-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 315b1e0d860bec06d3dff4de6eb99bccdfee44a5 Author: David S. Miller Date: Sun Feb 19 10:12:37 2006 -0800 [XFRM]: Add some missing exports. To fix the case of modular xfrm_user. Signed-off-by: David S. Miller commit 804ae22381aa79c1d8d1878b39e18300c847c63e Author: David S. Miller Date: Sun Feb 19 10:09:00 2006 -0800 [XFRM]: Move xfrm_nl to xfrm_state.c from xfrm_user.c xfrm_user could be modular, and since generic code uses this symbol now... Signed-off-by: David S. Miller commit eb55593d3b15a977a2feb30d0326884439375087 Author: David S. Miller Date: Sun Feb 19 01:13:47 2006 -0800 [XFRM]: Make sure xfrm_replay_timer_handler() is declared early enough. Signed-off-by: David S. Miller commit 00c5878d953778e326e1dddecb1bed249d85bea4 Author: Jamal Hadi Salim Date: Sun Feb 19 00:55:24 2006 -0800 [IPSEC]: Sync series - update selinux Add new netlink messages to selinux framework Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit c6d97cf4d418c1b5c6e654df942665acbd91700c Author: Jamal Hadi Salim Date: Sun Feb 19 00:54:23 2006 -0800 [IPSEC]: Sync series - policy expires This is similar to the SA expire insertion patch - only it inserts expires for SP. Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit ab4a688074c9b1fdf09a10032c2e52ced8f36963 Author: Jamal Hadi Salim Date: Sun Feb 19 00:53:43 2006 -0800 [IPSEC]: Sync series - SA expires This patch allows a user to insert SA expires. This is useful to do on an HA backup for the case of byte counts but may not be very useful for the case of time based expiry. Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 4ecacf073efccdb2a12994ab53db17cf85e8c73d Author: Jamal Hadi Salim Date: Sun Feb 19 00:53:06 2006 -0800 [IPSEC]: Sync series - acquire insert This introduces a feature similar to the one described in RFC 2367: " ... the application needing an SA sends a PF_KEY SADB_ACQUIRE message down to the Key Engine, which then either returns an error or sends a similar SADB_ACQUIRE message up to one or more key management applications capable of creating such SAs. ... ... The third is where an application-layer consumer of security associations (e.g. an OSPFv2 or RIPv2 daemon) needs a security association. Send an SADB_ACQUIRE message from a user process to the kernel. The kernel returns an SADB_ACQUIRE message to registered sockets. The user-level consumer waits for an SADB_UPDATE or SADB_ADD message for its particular type, and then can use that association by using SADB_GET messages. " An app such as OSPF could then use ipsec KM to get keys Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit edace80666e7aca4b75af88a3a79e09043166ff7 Author: Jamal Hadi Salim Date: Sun Feb 19 00:52:27 2006 -0800 [IPSEC]: Sync series - user Add xfrm as the user of the core changes Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 679710585f2b674719231efcccb0396e8a63e620 Author: Jamal Hadi Salim Date: Sun Feb 19 00:51:34 2006 -0800 [IPSEC]: Sync series - fast path Fast path sequence updates that will generate ipsec async events Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 7132268704e5eddaa90cce651d63a2abbbfe2065 Author: Jamal Hadi Salim Date: Wed Mar 1 15:11:53 2006 -0800 [IPSEC]: Sync series - core changes This patch provides the core functionality needed for sync events for ipsec. Derived work of Krisztian KOVACS Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 989dd98d3376882d8144d80910ecb6830df2c07c Author: Patrick McHardy Date: Sun Feb 19 00:39:44 2006 -0800 [PKT_SCHED]: Keep backlog counter in sch_sfq Keep backlog counter in SFQ qdisc to make it usable as child qdisc with RED. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 72126fdfc946417fd01af957a49210ed6f8c396c Author: Patrick McHardy Date: Sun Feb 19 00:39:15 2006 -0800 [PKT_SCHED]: Restore TBF change semantic When TBF was converted to a classful qdisc, the semantic of the limit parameter was broken. On initilization an inner bfifo qdisc is created for backwards compatibility, when changing parameters however the new limit is ignored and the current child qdisc remains in place. Always replace the child qdisc by the default bfifo when limit is above zero, otherwise don't touch the inner qdisc. Current tc version enforce a limit above zero, other users can avoid creating the inner qdisc by using zero. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 813c909d776b51a0c8dd34510f100e820f4f7c13 Author: Patrick McHardy Date: Sun Feb 19 00:38:50 2006 -0800 [PKT_SCHED]: Dump child qdisc handle in sch_{atm,dsmark} A qdisc should set tcm_info to the child qdisc handle in its class dump function. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit cf18e8d9f288226a051888efac49bff1149ff751 Author: Patrick McHardy Date: Sun Feb 19 00:38:24 2006 -0800 [PKT_SCHED]: Qdisc drop operation is optional The drop operation is optional and qdiscs must check if childs support it. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 702f00dc403ad3b8de6ae62b61faa3bfc3b4c7e1 Author: Christophe Lucas Date: Sun Feb 19 00:36:52 2006 -0800 [IRDA]: pci_register_driver conversion This patch converts 2 IrDA drivers pci_module_init() calls to pci_register_driver(). Signed-off-by: Christophe Lucas Signed-off-by: Domen Puncer Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit 2ffe72fb2f97a6f542fc05e2b61cb5c428ff2083 Author: David chosrova Date: Sun Feb 19 00:36:16 2006 -0800 [IRDA]: sti/cli removal from EP7211 IrDA driver This patch replaces the deprecated sti/cli routines with the corresponding spin_lock ones. Signed-off-by: David chosrova Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit de773952e21a454b2527914a5df113ed0822b14c Author: Jean Tourrilhes Date: Sun Feb 19 00:35:35 2006 -0800 [IRDA]: nsc-ircc: support for yet another Thinkpad IrDA chipset This patch simply adds support for a variation of the nsc-ircc PC8739x chipset, found in some IBM Thinkpad laptops. Signed-off-by: Jean Tourrilhes Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit edc40b9ad284fc52b23fc0c54f7e99273bb85b20 Author: Dmitry Torokhov Date: Sun Feb 19 00:35:02 2006 -0800 [IRDA]: nsc-ircc: PM update This patch brings the nsc-ircc code to a more up to date power management scheme, following the current device model. Signed-off-by: Dmitry Torokhov Signed-off-by: Rudolf Marek Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit 18440b468eb12aea2818236ae6001f812e7ea7c0 Author: Jean Tourrilhes Date: Sun Feb 19 00:34:12 2006 -0800 [IRDA]: nsc-ircc: ISAPnP support This enables PnP support for the nsc-ircc chipset. Since we can't fetch the chipset cfg_base from the PnP layer, we just use the PnP information as one more hint when probing the chip. Signed-off-by: Jean Tourrilhes Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit f7b52029d170155b36c8a02a68ce1d2f9c7234f9 Author: Patrick McHardy Date: Sun Feb 19 00:32:45 2006 -0800 [NETLINK]: Add netlink_has_listeners for avoiding unneccessary event message generation Keep a bitmask of multicast groups with subscribed listeners to let netlink users check for listeners before generating multicast messages. Queries don't perform any locking, which may result in false positives, it is guaranteed however that any new subscriptions are visible before bind() or setsockopt() return. Signed-off-by: Patrick McHardy ACKed-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit df26b47cc11513ecd78d6db98304858db5bbdec5 Author: Patrick McHardy Date: Sun Feb 19 00:31:19 2006 -0800 [NETFILTER]: ctnetlink: avoid unneccessary event message generation Avoid unneccessary event message generation by checking for netlink listeners before building a message. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit cef03289239bce482b3725e8544f583b2cdaa15e Author: Patrick McHardy Date: Sun Feb 19 00:30:58 2006 -0800 [NETFILTER]: x_tables: replace IPv4/IPv6 policy match by address family independant version Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 13eec7b8a97b4f0b87fccce05d4f68a4676b3454 Author: Patrick McHardy Date: Sun Feb 19 00:30:38 2006 -0800 [NETFILTER]: Move ip6_masked_addrcmp to include/net/ipv6.h Replace netfilter's ip6_masked_addrcmp by a more efficient version in include/net/ipv6.h to make it usable without module dependencies. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit f1fba8f900d00b011f3ea731abbf4a672b6851d2 Author: Patrick McHardy Date: Sun Feb 19 00:30:16 2006 -0800 [NETFILTER]: x_tables: add xt_{match,target} arguments to match/target functions Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 3f2489700a49861cba852488f815ea3cc00e8d88 Author: Patrick McHardy Date: Sun Feb 19 00:29:54 2006 -0800 [NETFILTER]: x_tables: pass registered match/target data to match/target functions This allows to make decisions based on the revision (and address family with a follow-up patch) at runtime. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 4d0684027d31a9a4f99711f5936b10a6bda9ff9f Author: Patrick McHardy Date: Sun Feb 19 00:29:16 2006 -0800 [NETFILTER]: Convert x_tables matches/targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 96426f6d3e611189d4df139a3bf38686d33790e9 Author: Patrick McHardy Date: Sun Feb 19 00:28:57 2006 -0800 [NETFILTER]: Convert ip6_tables matches/targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 3a77f01feab52a2cc3bf00967e82dec7c20369ac Author: Patrick McHardy Date: Sun Feb 19 00:28:39 2006 -0800 [NETFILTER]: Convert arp_tables targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit d7923fe537c87677a05f52a86ca17222f4f89510 Author: Patrick McHardy Date: Sun Feb 19 00:28:17 2006 -0800 [NETFILTER]: Convert ip_tables matches/targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit af1358570cdb023b47dc15bc6c3d9d49ebef9423 Author: Patrick McHardy Date: Sun Feb 19 00:26:53 2006 -0800 [NETFILTER]: Change {ip,ip6,arp}_tables to use centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 09320120c1f1019a15e976e6ba6730340dbb3f7e Author: Patrick McHardy Date: Sun Feb 19 00:26:26 2006 -0800 [NETFILTER]: xt_tables: add centralized error checking Introduce new functions for common match/target checks (private data size, valid hooks, valid tables and valid protocols) to get more consistent error reporting and to avoid each module duplicating them. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 4b0c02a820c7f73323229cde38ed08d99ba8f72f Author: Yasuyuki Kozakai Date: Sun Feb 19 00:25:52 2006 -0800 [NETFILTER]: nf_conntrack: use ipv6_addr_equal in nf_ct_reasm Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit ba9ec860a65abf768bb3d5f772224e80f9005767 Author: Holger Eitzenberger Date: Sun Feb 19 00:25:18 2006 -0800 [NETFILTER]: Fix CID offset bug in PPTP NAT helper debug message The recent (kernel 2.6.15.1) fix for PPTP NAT helper introduced a bug - which only appears if DEBUGP is enabled though. The calculation of the CID offset into a PPTP request struct is not correct, so that at least not the correct CID is displayed if DEBUGP is enabled. This patch corrects CID offset calculation and introduces a #define for that. Signed-off-by: Holger Eitzenberger Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 6d0557e6c9dc2295cdf592d68c9e08f69459ef08 Author: Andrea Bittau Date: Wed Feb 15 10:28:17 2006 -0200 [DCCP] CCID2: Drop sock reference count on timer expiration and reset. There was a hybrid use of standard timers and sk_timers. This caused the reference count of the sock to be incorrect when resetting the RTO timer. The sock reference count should now be correct, enabling its destruction, and allowing the DCCP module to be unloaded. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit 52c4c4fd1851f6e6229a00ac13001fdc4529b720 Author: Arnaldo Carvalho de Melo Date: Wed Feb 15 09:41:42 2006 -0200 [DCCP]: Set the default CCID according to kernel config selection Now CCID2 is the default, as stated in the RFC drafts, but we allow a config where just CCID3 is built, where CCID3 becomes the default. Signed-off-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo commit a60cfad63cd67f4177c8bad7a8cf06faacb65e6e Author: Harald Welte Date: Mon Feb 13 02:35:31 2006 -0800 [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn' This patch moves all helper related data fields of 'struct nf_conn' into a separate structure 'struct nf_conn_help'. This new structure is only present in conntrack entries for which we actually have a helper loaded. Also, this patch cleans up the nf_conntrack 'features' mechanism to resemble what the original idea was: Just glue the feature-specific data structures at the end of 'struct nf_conn', and explicitly re-calculate the pointer to it when needed rather than keeping pointers around. Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack is 276 bytes. We still need to save another 20 bytes in order to fit into to target of 256bytes. Signed-off-by: Harald Welte Signed-off-by: David S. Miller commit 1546f13c27022b268c0f80ab5ae8452a04a976a5 Author: Michael Chan Date: Fri Feb 10 14:48:13 2006 -0800 [BNX2]: include Include so that it compiles properly on all archs. Update version to 1.4.38. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 190fc233e3f77e4c1979f3e2834283f39c66f517 Author: John Heffner Date: Sun Feb 5 22:14:53 2006 -0800 [TCP]: MTU probing Implementation of packetization layer path mtu discovery for TCP, based on the internet-draft currently found at . Signed-off-by: John Heffner Signed-off-by: David S. Miller commit 0561eeefc1e6e40fb2c2ba8c66e4341a7e91eefa Author: Michael Chan Date: Sun Feb 5 18:10:56 2006 -0800 [BNX2]: Update version Update version to 1.4.37. Add missing flush_scheduled_work() in bnx2_suspend as noted by Jeff Garzik. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit c615bba87bbebbdf44a18e1dbc29d92eb4344f85 Author: Michael Chan Date: Sun Feb 5 18:10:29 2006 -0800 [BNX2]: Support larger rx ring sizes (part 2) Support bigger rx ring sizes (up to 1020) in the rx fast path. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 8cd94c62ae14d1d64193d766a2cc5f02a61242ff Author: Michael Chan Date: Sun Feb 5 18:10:01 2006 -0800 [BNX2]: Support larger rx ring sizes (part 1) Increase maximum receive ring size from 255 to 1020 by supporting up to 4 linked pages of receive descriptors. To accomodate the higher memory usage, each physical descriptor page is allocated separately and the software ring that keeps track of the SKBs and the DMA addresses is allocated using vmalloc. Some of the receive-related fields in the bp structure are re- organized a bit for better locality of reference. The max. was reduced to 1020 from 4080 after discussion with David Miller. This patch contains ring init code changes only. This next patch contains rx data path code changes. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 834ec288b46264b1a947cd6176748f37c63b8b1e Author: Michael Chan Date: Sun Feb 5 18:09:29 2006 -0800 [BNX2]: Fix bug when rx ring is full Fix the rx code path that does not handle the full rx ring correctly. When the rx ring is set to the max. size (i.e. 255), the consumer and producer indices will be the same when completing an rx packet. Fix the rx code to handle this condition properly. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 587b1bf3690cc8fee6579a7b42b63cab648f89a2 Author: Michael Chan Date: Sun Feb 5 18:08:55 2006 -0800 [BNX2]: Add ethtool -d support Add ETHTOOL_GREGS support. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit ba92f99a03082aec84f2ee9dcdb397ffad829635 Author: Michael Chan Date: Sun Feb 5 18:08:16 2006 -0800 [BNX2]: Reduce register test size Eliminate some of the registers in ethtool register test to reduce driver size. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit aedb11eb6f1a45e3cbd75e3b9fbe197eca6eed95 Author: Michael Chan Date: Sun Feb 5 16:30:20 2006 -0800 [TG3]: Update version and reldate Update version to 3.50. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 877d8c70e85a8489a59a5ad1fe5810015a0723b6 Author: Michael Chan Date: Sun Feb 5 16:27:37 2006 -0800 [TG3]: Support shutdown WoL. Support WoL during shutdown by calling tg3_set_power_state(tp, PCI_D3hot) during tg3_close(). Change the power state parameter to pci_power_t type and use constants defined in pci.h. Certain ethtool operations cannot be performed after tg3_close() because the device will go to low power state. Add return -EAGAIN in such cases where appropriate. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit dc60120cc02af0a9989688ec215f8769398b8d2e Author: Michael Chan Date: Sun Feb 5 16:27:02 2006 -0800 [TG3]: Enable TSO by default Enable TSO by default on newer chips that support TSO in hardware. Leave TSO off by default on older chips that do firmware TSO because performance is slightly lower. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit dc157cd634f0d3d8053c327447f32b67cfcdcf7a Author: Michael Chan Date: Sun Feb 5 16:26:37 2006 -0800 [TG3]: Add support for 5714S and 5715S Add support for 5714S and 5715S. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 1cb52c81f845f305f9ef0275a8aa15c0cf7c9c98 Author: Adrian Bunk Date: Sat Feb 4 15:51:24 2006 -0800 [IPV4] fib_rules.c: make struct fib_rules static again struct fib_rules became global for no good reason. Signed-off-by: Adrian Bunk Signed-off-by: David S. Miller commit 6289b5eb175275e5d0dd4294e787704495819066 Author: Jesper Juhl Date: Sat Feb 4 15:50:27 2006 -0800 [IPCOMP6]: don't check vfree() argument for NULL. vfree does it's own NULL checking, so checking a pointer before handing it to vfree is pointless. Signed-off-by: Jesper Juhl Signed-off-by: David S. Miller commit 9336efe83a68a171bd81060cc3ea0f13c243c003 Author: Andrea Bittau Date: Sat Feb 4 19:10:17 2006 -0200 [DCCP]: Initial feature negotiation implementation Still needs more work, but boots and doesn't crashes, even does some negotiation! 18:38:52.174934 127.0.0.1.43458 > 127.0.0.1.5001: request 18:38:52.218526 127.0.0.1.5001 > 127.0.0.1.43458: response 18:38:52.185398 127.0.0.1.43458 > 127.0.0.1.5001: :-) Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit b089ee0ba40a1c264229a0a2c3eb3c85f9d0a766 Author: Andrea Bittau Date: Fri Feb 3 15:56:41 2006 -0200 [DCCP] CCID2: Initial CCID2 (TCP-Like) implementation Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several issues on the merge process. For now CCID2 was turned the default for all SOCK_DCCP connections, but this will be remedied soon with the merge of the feature negotiation code. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit f74d86dd0f580cfe508af7d2c6b871d175e3846a Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:49:14 2006 -0200 [DCCP] CCID3: Set the no_feedback_timer fields near init_timer Signed-off-by: Arnaldo Carvalho de Melo commit 07324e4a6c698b9ec34a334edffcfde031abb53b Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:30:45 2006 -0200 [DCCP]: Don't alloc ack vector for the control sock Signed-off-by: Arnaldo Carvalho de Melo commit c688ae7fca0469fb50444ab889ac73d724daf494 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:28:41 2006 -0200 [DCCP] ackvec: Delete all the ack vector records in dccp_ackvec_free Signed-off-by: Arnaldo Carvalho de Melo commit 74bb61b2d714a54036cdf5e9b0e3f55519e12544 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:23:55 2006 -0200 [DCCP] CCID: Allow ccid_{init,exit} to be NULL Testing if the ccid being instantiated has these methods in ccid_init(). Signed-off-by: Arnaldo Carvalho de Melo commit e11607348f3246eb377cc0f34ba2916f253fefa1 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 11:04:36 2006 -0200 [DCCP] ackvec: Introduce ack vector records Based on a patch by Andrea Bittau. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit 97502c8ac1a92540eaacc3e7fc049b2b538e016c Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 11:00:59 2006 -0200 [LIST]: Introduce list_for_each_entry_from For iterating over list of given type continuing from existing point. Signed-off-by: Arnaldo Carvalho de Melo commit a5af197668e880e0f44d3021f93c0bca13f21a81 Author: Robert Olsson Date: Thu Feb 2 17:31:35 2006 -0800 [IPV4]: Use RCU locking in fib_rules. Signed-off-by: Robert Olsson Signed-off-by: David S. Miller commit f47d9e2c9a0c21798b23b42891fa93a62ca130aa Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 18:15:14 2006 -0200 [LIST]: Introduce list_for_each_entry_safe_from For iterate over list of given type from existing point safe against removal of list entry. Signed-off-by: Arnaldo Carvalho de Melo commit 84c600c2fdc6bfde2827615f7476fafeeb9f58ce Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 13:20:15 2006 -0200 [DCCP] ackvec: Introduce dccp_ackvec_slab Signed-off-by: Arnaldo Carvalho de Melo commit 3ccdb4c99a12105fea424bbdc997261dbee3ce38 Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 11:52:30 2006 -0200 [DCCP]: Fix error handling in dccp_init Signed-off-by: Arnaldo Carvalho de Melo commit c3242ea8a52b7729e417396e5ba3cc8b151115ff Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 10:05:41 2006 -0200 [DCCP] ackvec: Ditch dccpav_buf_len Simplifying the code a bit as we're always using DCCP_MAX_ACKVEC_LEN. Signed-off-by: Arnaldo Carvalho de Melo commit ecb4f3da47341ca7857308008ccd3c8a584bf9b2 Author: David S. Miller Date: Wed Feb 1 00:39:07 2006 -0800 [NET] socket: Use put_filp not fput. Another conversion error in the sock_map_fd() splitup for sys_accept(). Based upon a report by Andrew Morton. Signed-off-by: David S. Miller commit b2e5b60e48432ae42e30d90a2d7ed2cebd1cb530 Author: Harald Welte Date: Mon Jan 30 15:50:09 2006 -0800 [NETFILTER] nfnetlink_log: add sequence numbers for log events By using a sequence number for every logged netfilter event, we can determine from userspace whether logging information was lots somewhere downstream. The user has a choice of either having per-instance local sequence counters, or using a global sequence counter, or both. Signed-off-by: Harald Welte Signed-off-by: David S. Miller commit 19daf1e53cc01028ef0a12ad35615916f56ca5e7 Author: Harald Welte Date: Mon Jan 30 15:49:48 2006 -0800 [NETFILTER] NAT sequence adjustment: Save eight bytes per conntrack This patch reduces the size of 'struct ip_conntrack' on systems with NAT by eight bytes. The sequence number delta values can be int16_t, since we only support one sequence number modification per window anyway, and one such modification is not going to exceed 32kB ;) Signed-off-by: Harald Welte Signed-off-by: David S. Miller commit c5521c92564bf35f20002548194dd78d95bccbf8 Author: David S. Miller Date: Sun Jan 29 21:08:25 2006 -0800 [NET] sys_accept: Pass correct socket to sock_attach_fd(). Also, make sure "filep" is assigned to on every path through sock_alloc_fd(). Thanks to Andrew Morton for the OOPS trace. Signed-off-by: David S. Miller commit 3bf72b0a2916361b2944f9f1a37a809f5492991c Author: David S. Miller Date: Thu Jan 26 20:45:15 2006 -0800 [NET]: Do not lose accepted socket when -ENFILE/-EMFILE. Try to allocate the struct file and an unused file descriptor before we try to pull a newly accepted socket out of the protocol layer. Based upon a patch by Prassana Meda. Signed-off-by: David S. Miller commit 0237d8ffe2e081e223e19e75b97df82559bec9dd Author: Patrick McHardy Date: Tue Jan 24 12:43:55 2006 -0800 [NET]: Reduce size of struct sk_buff on 64 bit architectures Move skb->nf_mark next to skb->tc_index to remove a 4 byte hole between skb->nfmark and skb->nfct and another one between skb->users and skb->head when CONFIG_NETFILTER, CONFIG_NET_SCHED and CONFIG_NET_CLS_ACT are enabled. For all other combinations the size stays the same. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 7ffa6ab2f6ba5109901e73a13e6754dcb6f505de Author: Stefan Rompf Date: Fri Jan 20 02:56:51 2006 -0800 [VLAN]: translate IF_OPER_DORMANT to netif_dormant_on() this patch adds support to the VLAN driver to translate IF_OPER_DORMANT of the underlying device to netif_dormant_on(). Beside clean state forwarding, this allows running independant userspace supplicants on both the real device and the stacked VLAN. It depends on my RFC2863 patch. Signed-off-by: Stefan Rompf Signed-off-by: David S. Miller commit b044189454984206ca127d26d5fdb8236e7c8cd0 Author: Stefan Rompf Date: Fri Jan 20 02:55:48 2006 -0800 [NET] core: add RFC2863 operstate this patch adds a dormant flag to network devices, RFC2863 operstate derived from these flags and possibility for userspace interaction. It allows drivers to signal that a device is unusable for user traffic without disabling queueing (and therefore the possibility for protocol establishment traffic to flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable without changes to the driver. It is the result of our long discussion. However I must admit that it represents what Jamal and I agreed on with compromises towards Krzysztof, but Thomas and Krzysztof still disagree with some parts. Anyway I think it should be applied. Signed-off-by: Stefan Rompf Signed-off-by: David S. Miller commit 757fbaae2b213ba0f47bacc37356b431cb07f95d Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:57 2006 +0900 [IPV6]: ROUTE: Ensure to accept redirects from nexthop for the target. It is possible to get redirects from nexthop of "more-specific" routes. Signed-off-by: YOSHIFUJI Hideaki commit d51da5f2ce76f31b6198caef66213b0a9c64d955 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:55 2006 +0900 [IPV6]: ROUTE: Add accept_ra_rt_info_max_plen sysctl. Signed-off-by: YOSHIFUJI Hideaki commit 5f6048971731419f69838dc1c3526018fa8987b7 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:54 2006 +0900 [IPV6]: ROUTE: Flag RTF_DEFAULT for Route Infomation for ::/0. Signed-off-by: YOSHIFUJI Hideaki commit 80b80564026e135dd9be56ba5f26fb5e6ef75b42 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:53 2006 +0900 [IPV6]: ROUTE: Add experimental support for Route Information Option in RA (RFC4191). Signed-off-by: YOSHIFUJI Hideaki commit 093580e54da5d7986b0abfe4d1ab1b442812b250 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:51 2006 +0900 [IPV6]: ROUTE: Add router_probe_interval sysctl. Signed-off-by: YOSHIFUJI Hideaki commit b611ffe1f10df25839cfe7f2d57366e31367b571 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:50 2006 +0900 [IPV6]: ROUTE: Add accept_ra_rtr_pref sysctl. Signed-off-by: YOSHIFUJI Hideaki commit 3b7a88a69250d5b72eaa0a92bb181f058b9699d0 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:49 2006 +0900 [IPV6]: ROUTE: Add Router Reachability Probing (RFC4191). Signed-off-by: YOSHIFUJI Hideaki commit 15bf643d58cc68ebd301adb906c1ecde0f831d7a Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:47 2006 +0900 [IPV6]: ROUTE: Add support for Router Preference (RFC4191). Signed-off-by: YOSHIFUJI Hideaki commit f59d58527ef59a8da6a16b918893f096a0f48026 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:46 2006 +0900 [IPV6]: ROUTE: Handle finding the next best route in reachability in BACKTRACK(). Signed-off-by: YOSHIFUJI Hideaki commit b5b58295dc2bf67dd7716674a83bb00fae074d65 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:45 2006 +0900 [IPV6]: ROUTE: Try finding the next best route. Signed-off-by: YOSHIFUJI Hideaki commit f0cd3385764d6563b7efc4e2301197501ef4f6c6 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:44 2006 +0900 [IPV6]: ROUTE: Clean up rt6_select() code path in ip6_route_{intput,output}(). Signed-off-by: YOSHIFUJI Hideaki commit 1384885b26cc0f1e4f8ec4abc5cbdc0980d20457 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:43 2006 +0900 [IPV6]: ROUTE: Try selecting better route for non-default routes as well. Signed-off-by: YOSHIFUJI Hideaki commit eb2893caa56688e9825d988529e92da73c756b06 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:41 2006 +0900 [IPV6]: ROUTE: More strict check for default routers in rt6_get_dflt_router(). Check RTF_ADDRCONF|RTF_DEFAULT in rt6_get_dflt_router(). Signed-off-by: YOSHIFUJI Hideaki commit 9f128f9ba6a33b4d2ba967cd523b349bfe5e6972 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:40 2006 +0900 [IPV6]: ROUTE: Eliminate lock for default route pointer. And prepare for more advanced router selection. Signed-off-by: YOSHIFUJI Hideaki commit 514dd100d8828ca74e2ad3cf58e8555fc185f414 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:39 2006 +0900 [IPV6]: ROUTE: Clean-up cow'ing in ip6_route_{intput,output}(). Signed-off-by: YOSHIFUJI Hideaki commit 1bf1998be2c0cae1353eb74f972f4d8f958dc68b Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:38 2006 +0900 [IPV6]: ROUTE: Convert rt6_cow() to rt6_alloc_cow(). Signed-off-by: YOSHIFUJI Hideaki commit 69696eae5e2a9b812a54b74ca5c36e8fe20d0830 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:36 2006 +0900 [IPV6]: ROUTE: Clean up reference counting / unlocking for returning object. Signed-off-by: YOSHIFUJI Hideaki commit d58a90470cbef53180b6775f760ce6c5e78100a1 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:35 2006 +0900 [IPV6]: ROUTE: Unify two code paths for pmtu disc. Signed-off-by: YOSHIFUJI Hideaki commit c2897933b6234f947b37f5bc6336feb7e4f0b293 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:34 2006 +0900 [IPV6]: ROUTE: Add rt6_alloc_clone() for cloning route allocation. Signed-off-by: YOSHIFUJI Hideaki commit 1e8bf0917a69313bbe9aad71326819fab1689ae3 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:33 2006 +0900 [IPV6]: ROUTE: Copy u.dst.error for RTF_REJECT routes when cloning. Signed-off-by: YOSHIFUJI Hideaki commit cc1258e86238381eba2bab8d8d1d83201a9125bb Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:32 2006 +0900 [IPV6]: ROUTE: Set appropriate information before inserting a route. Signed-off-by: YOSHIFUJI Hideaki commit bbb8ee7b315bcc8f65fc066f776bba2320c53d0e Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:30 2006 +0900 [IPV6]: ROUTE: Split up rt6_cow() for future changes. Signed-off-by: YOSHIFUJI Hideaki commit 292d54206f480b0fe25f398342e0f66fe1429fcf Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:29 2006 +0900 [IPV6]: ADDRCONF: Add accept_ra_pinfo sysctl. This controls whether we accept Prefix Information in RAs. Signed-off-by: YOSHIFUJI Hideaki commit 368a5e593d4a5edef641fda29a8ea2fba8aec9f9 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:28 2006 +0900 [IPV6]: ROUTE: Add accept_ra_defrtr sysctl. This controls whether we accept default router information in RAs. Signed-off-by: YOSHIFUJI Hideaki commit f375fac59d2a29c0692b2631042208e4ac85a606 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:27 2006 +0900 [IPV6]: ADDRCONF: Split up ipv6_generate_eui64() by device type. Signed-off-by: YOSHIFUJI Hideaki commit f690d18dd25332c5c033027dd3b07d54d6a16999 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:25 2006 +0900 [IPV6]: ADDRCONF: Use our standard algorithm for randomized ifid. RFC 3041 describes an algorithm to generate random interface identifier. In RFC 3041bis, it is allowed to use different algorithm than one described in RFC 3041. So, let's use our standard pseudo random algorithm to simplify our implementation. Signed-off-by: YOSHIFUJI Hideaki commit 1669197bac8822676265065b259171cdd58095d2 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:24 2006 +0900 [NET]: NEIGHBOUR: Ensure to record time to neigh->updated when neighbour's state changed. Signed-off-by: YOSHIFUJI Hideaki commit 0bf9f97773b9bc00ca84f7efd20b4a9f6773b24e Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:23 2006 +0900 [IPV6]: TUNNEL6: Don't try to add multicast route twice. Since addrconf_add_dev() has already called addrconf_add_mroute() to added route for multicast prefix, there's no point to call it again in addrconf_ip6_tnl_config(). Signed-off-by: YOSHIFUJI Hideaki --- Signed-off-by: Andrew Morton --- dev/null | 372 - Documentation/connector/connector.txt | 5 Documentation/networking/ip-sysctl.txt | 49 drivers/atm/suni.c | 2 drivers/connector/connector.c | 7 drivers/infiniband/ulp/ipoib/ipoib_main.c | 16 drivers/net/8139too.c | 2 drivers/net/bnx2.c | 475 -- drivers/net/bnx2.h | 37 drivers/net/cassini.c | 40 drivers/net/cassini.h | 2 drivers/net/e1000/e1000_main.c | 2 drivers/net/irda/Kconfig | 8 drivers/net/irda/Makefile | 1 drivers/net/irda/donauboe.c | 2 drivers/net/irda/ep7211_ir.c | 11 drivers/net/irda/irtty-sir.c | 19 drivers/net/irda/nsc-ircc.c | 328 + drivers/net/irda/nsc-ircc.h | 2 drivers/net/irda/sir_dongle.c | 19 drivers/net/irda/toim3232-sir.c | 375 + drivers/net/irda/vlsi_ir.c | 2 drivers/net/ppp_generic.c | 4 drivers/net/pppoe.c | 3 drivers/net/sungem.c | 37 drivers/net/sungem.h | 6 drivers/net/tg3.c | 646 ++- drivers/net/tg3.h | 19 drivers/net/wan/sbni.c | 3 include/asm-alpha/socket.h | 1 include/asm-arm/socket.h | 1 include/asm-arm26/socket.h | 1 include/asm-cris/socket.h | 1 include/asm-frv/socket.h | 1 include/asm-h8300/socket.h | 1 include/asm-i386/socket.h | 1 include/asm-ia64/socket.h | 1 include/asm-m32r/socket.h | 1 include/asm-m68k/socket.h | 1 include/asm-mips/socket.h | 1 include/asm-parisc/socket.h | 1 include/asm-powerpc/socket.h | 1 include/asm-s390/socket.h | 1 include/asm-sh/socket.h | 1 include/asm-sparc/socket.h | 3 include/asm-sparc64/socket.h | 1 include/asm-v850/socket.h | 1 include/asm-x86_64/socket.h | 1 include/asm-xtensa/socket.h | 1 include/linux/dccp.h | 132 include/linux/dn.h | 44 include/linux/icmpv6.h | 11 include/linux/if.h | 26 include/linux/in.h | 1 include/linux/inetdevice.h | 1 include/linux/ipv6.h | 14 include/linux/ipv6_route.h | 10 include/linux/irda.h | 1 include/linux/list.h | 24 include/linux/net.h | 5 include/linux/netdevice.h | 41 include/linux/netfilter.h | 9 include/linux/netfilter/nfnetlink.h | 1 include/linux/netfilter/nfnetlink_log.h | 6 include/linux/netfilter/x_tables.h | 37 include/linux/netfilter/xt_policy.h | 58 include/linux/netfilter_bridge.h | 27 include/linux/netfilter_ipv4/ip_nat.h | 2 include/linux/netfilter_ipv4/ipt_policy.h | 67 include/linux/netfilter_ipv6/ip6t_policy.h | 67 include/linux/netlink.h | 1 include/linux/pci_ids.h | 6 include/linux/rtnetlink.h | 23 include/linux/security.h | 34 include/linux/skbuff.h | 47 include/linux/socket.h | 1 include/linux/sunrpc/svcsock.h | 2 include/linux/sysctl.h | 27 include/linux/tcp.h | 6 include/linux/xfrm.h | 30 include/net/af_unix.h | 5 include/net/dn.h | 105 include/net/dn_dev.h | 88 include/net/dn_fib.h | 22 include/net/dn_neigh.h | 4 include/net/dn_nsp.h | 72 include/net/dn_route.h | 12 include/net/flow.h | 8 include/net/if_inet6.h | 3 include/net/inet_connection_sock.h | 26 include/net/ip.h | 4 include/net/ip6_route.h | 24 include/net/ipv6.h | 22 include/net/llc.h | 2 include/net/ndisc.h | 2 include/net/neighbour.h | 2 include/net/netfilter/nf_conntrack.h | 56 include/net/scm.h | 39 include/net/sctp/structs.h | 10 include/net/sock.h | 12 include/net/tcp.h | 16 include/net/xfrm.h | 62 net/802/psnap.c | 4 net/8021q/vlan.c | 43 net/8021q/vlan_dev.c | 6 net/atm/clip.c | 2 net/atm/common.c | 4 net/atm/ioctl.c | 15 net/atm/resources.c | 32 net/atm/resources.h | 3 net/bluetooth/rfcomm/core.c | 8 net/bridge/Kconfig | 1 net/bridge/br.c | 12 net/bridge/br_device.c | 3 net/bridge/br_fdb.c | 6 net/bridge/br_if.c | 9 net/bridge/br_input.c | 43 net/bridge/br_netfilter.c | 225 - net/bridge/br_private.h | 6 net/bridge/br_stp_bpdu.c | 192 - net/bridge/br_stp_timer.c | 47 net/bridge/br_sysfs_br.c | 49 net/bridge/netfilter/ebtables.c | 101 net/compat.c | 95 net/core/dev.c | 32 net/core/flow.c | 7 net/core/link_watch.c | 44 net/core/neighbour.c | 12 net/core/net-sysfs.c | 41 net/core/netpoll.c | 6 net/core/pktgen.c | 2816 ++++++++------- net/core/rtnetlink.c | 78 net/core/scm.c | 49 net/core/skbuff.c | 42 net/core/sock.c | 41 net/core/sysctl_net_core.c | 23 net/dccp/Kconfig | 13 net/dccp/Makefile | 9 net/dccp/ackvec.c | 296 + net/dccp/ackvec.h | 53 net/dccp/ccid.c | 189 - net/dccp/ccid.h | 131 net/dccp/ccids/Kconfig | 43 net/dccp/ccids/Makefile | 4 net/dccp/ccids/ccid2.c | 779 ++++ net/dccp/ccids/ccid2.h | 85 net/dccp/ccids/ccid3.c | 112 net/dccp/ccids/ccid3.h | 5 net/dccp/dccp.h | 133 net/dccp/diag.c | 2 net/dccp/feat.c | 586 +++ net/dccp/feat.h | 29 net/dccp/input.c | 28 net/dccp/ipv4.c | 333 - net/dccp/ipv6.c | 371 + net/dccp/minisocks.c | 37 net/dccp/options.c | 295 + net/dccp/output.c | 88 net/dccp/proto.c | 440 +- net/dccp/sysctl.c | 124 net/dccp/timer.c | 14 net/decnet/af_decnet.c | 18 net/decnet/dn_dev.c | 34 net/decnet/dn_fib.c | 8 net/decnet/dn_neigh.c | 24 net/decnet/dn_nsp_in.c | 28 net/decnet/dn_nsp_out.c | 38 net/decnet/dn_route.c | 60 net/decnet/dn_rules.c | 115 net/decnet/dn_table.c | 12 net/decnet/sysctl_net_decnet.c | 12 net/ipv4/af_inet.c | 120 net/ipv4/ah4.c | 1 net/ipv4/arp.c | 20 net/ipv4/devinet.c | 8 net/ipv4/esp4.c | 1 net/ipv4/fib_rules.c | 115 net/ipv4/fib_trie.c | 24 net/ipv4/igmp.c | 26 net/ipv4/inet_connection_sock.c | 49 net/ipv4/ip_sockglue.c | 170 net/ipv4/ipcomp.c | 17 net/ipv4/ipconfig.c | 10 net/ipv4/ipmr.c | 4 net/ipv4/ipvs/ip_vs_app.c | 19 net/ipv4/netfilter/Kconfig | 10 net/ipv4/netfilter/Makefile | 1 net/ipv4/netfilter/arp_tables.c | 21 net/ipv4/netfilter/arpt_mangle.c | 23 net/ipv4/netfilter/ip_conntrack_netlink.c | 7 net/ipv4/netfilter/ip_nat_helper_pptp.c | 8 net/ipv4/netfilter/ip_nat_rule.c | 45 net/ipv4/netfilter/ip_nat_snmp_basic.c | 5 net/ipv4/netfilter/ip_queue.c | 11 net/ipv4/netfilter/ip_tables.c | 69 net/ipv4/netfilter/ipt_CLUSTERIP.c | 27 net/ipv4/netfilter/ipt_DSCP.c | 17 net/ipv4/netfilter/ipt_ECN.c | 18 net/ipv4/netfilter/ipt_LOG.c | 11 net/ipv4/netfilter/ipt_MASQUERADE.c | 18 net/ipv4/netfilter/ipt_NETMAP.c | 19 net/ipv4/netfilter/ipt_REDIRECT.c | 17 net/ipv4/netfilter/ipt_REJECT.c | 28 net/ipv4/netfilter/ipt_SAME.c | 19 net/ipv4/netfilter/ipt_TCPMSS.c | 16 net/ipv4/netfilter/ipt_TOS.c | 17 net/ipv4/netfilter/ipt_TTL.c | 25 net/ipv4/netfilter/ipt_ULOG.c | 12 net/ipv4/netfilter/ipt_addrtype.c | 20 net/ipv4/netfilter/ipt_ah.c | 25 net/ipv4/netfilter/ipt_dscp.c | 19 net/ipv4/netfilter/ipt_ecn.c | 14 net/ipv4/netfilter/ipt_esp.c | 25 net/ipv4/netfilter/ipt_hashlimit.c | 21 net/ipv4/netfilter/ipt_iprange.c | 28 net/ipv4/netfilter/ipt_multiport.c | 31 net/ipv4/netfilter/ipt_owner.c | 21 net/ipv4/netfilter/ipt_recent.c | 22 net/ipv4/netfilter/ipt_tos.c | 18 net/ipv4/netfilter/ipt_ttl.c | 19 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 22 net/ipv4/raw.c | 80 net/ipv4/sysctl_net_ipv4.c | 25 net/ipv4/tcp.c | 63 net/ipv4/tcp_htcp.c | 66 net/ipv4/tcp_input.c | 49 net/ipv4/tcp_ipv4.c | 44 net/ipv4/tcp_output.c | 263 + net/ipv4/tcp_timer.c | 36 net/ipv4/udp.c | 83 net/ipv4/xfrm4_tunnel.c | 11 net/ipv6/Kconfig | 26 net/ipv6/addrconf.c | 344 + net/ipv6/af_inet6.c | 120 net/ipv6/ah6.c | 5 net/ipv6/anycast.c | 7 net/ipv6/esp6.c | 5 net/ipv6/ip6_fib.c | 1 net/ipv6/ip6_flowlabel.c | 6 net/ipv6/ip6_output.c | 43 net/ipv6/ipcomp6.c | 22 net/ipv6/ipv6_sockglue.c | 163 net/ipv6/mcast.c | 17 net/ipv6/ndisc.c | 49 net/ipv6/netfilter/Kconfig | 10 net/ipv6/netfilter/Makefile | 1 net/ipv6/netfilter/ip6_queue.c | 11 net/ipv6/netfilter/ip6_tables.c | 87 net/ipv6/netfilter/ip6t_HL.c | 19 net/ipv6/netfilter/ip6t_LOG.c | 11 net/ipv6/netfilter/ip6t_REJECT.c | 25 net/ipv6/netfilter/ip6t_ah.c | 12 net/ipv6/netfilter/ip6t_dst.c | 13 net/ipv6/netfilter/ip6t_esp.c | 12 net/ipv6/netfilter/ip6t_eui64.c | 27 net/ipv6/netfilter/ip6t_frag.c | 13 net/ipv6/netfilter/ip6t_hbh.c | 13 net/ipv6/netfilter/ip6t_hl.c | 22 net/ipv6/netfilter/ip6t_ipv6header.c | 8 net/ipv6/netfilter/ip6t_multiport.c | 11 net/ipv6/netfilter/ip6t_owner.c | 18 net/ipv6/netfilter/ip6t_rt.c | 12 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 39 net/ipv6/netfilter/nf_conntrack_reasm.c | 8 net/ipv6/raw.c | 145 net/ipv6/reassembly.c | 37 net/ipv6/route.c | 680 ++- net/ipv6/tcp_ipv6.c | 74 net/ipv6/udp.c | 84 net/ipv6/xfrm6_tunnel.c | 11 net/key/af_key.c | 6 net/llc/af_llc.c | 15 net/llc/llc_c_ac.c | 1 net/llc/llc_core.c | 1 net/llc/llc_output.c | 3 net/llc/llc_s_ac.c | 2 net/netfilter/Kconfig | 10 net/netfilter/Makefile | 1 net/netfilter/nf_conntrack_core.c | 135 net/netfilter/nf_conntrack_ftp.c | 2 net/netfilter/nf_conntrack_netlink.c | 46 net/netfilter/nf_conntrack_standalone.c | 1 net/netfilter/nf_sockopt.c | 94 net/netfilter/nfnetlink.c | 6 net/netfilter/nfnetlink_log.c | 46 net/netfilter/x_tables.c | 72 net/netfilter/xt_CLASSIFY.c | 42 net/netfilter/xt_CONNMARK.c | 27 net/netfilter/xt_MARK.c | 37 net/netfilter/xt_NFQUEUE.c | 24 net/netfilter/xt_NOTRACK.c | 45 net/netfilter/xt_comment.c | 18 net/netfilter/xt_connbytes.c | 15 net/netfilter/xt_connmark.c | 28 net/netfilter/xt_conntrack.c | 18 net/netfilter/xt_dccp.c | 45 net/netfilter/xt_helper.c | 26 net/netfilter/xt_length.c | 24 net/netfilter/xt_limit.c | 7 net/netfilter/xt_mac.c | 34 net/netfilter/xt_mark.c | 16 net/netfilter/xt_physdev.c | 14 net/netfilter/xt_pkttype.c | 23 net/netfilter/xt_policy.c | 209 + net/netfilter/xt_realm.c | 27 net/netfilter/xt_sctp.c | 66 net/netfilter/xt_state.c | 21 net/netfilter/xt_string.c | 10 net/netfilter/xt_tcpmss.c | 52 net/netfilter/xt_tcpudp.c | 112 net/netlink/af_netlink.c | 52 net/sched/Kconfig | 1 net/sched/act_ipt.c | 10 net/sched/sch_atm.c | 1 net/sched/sch_dsmark.c | 1 net/sched/sch_generic.c | 2 net/sched/sch_netem.c | 4 net/sched/sch_prio.c | 2 net/sched/sch_red.c | 179 net/sched/sch_sfq.c | 5 net/sched/sch_tbf.c | 9 net/sctp/ipv6.c | 92 net/sctp/protocol.c | 94 net/socket.c | 334 + net/sunrpc/cache.c | 17 net/sunrpc/sched.c | 11 net/sunrpc/svcsock.c | 8 net/tipc/bcast.c | 58 net/tipc/bearer.c | 20 net/tipc/cluster.c | 22 net/tipc/cluster.h | 2 net/tipc/config.c | 4 net/tipc/dbg.c | 4 net/tipc/discover.c | 8 net/tipc/eth_media.c | 4 net/tipc/link.c | 89 net/tipc/name_distr.c | 6 net/tipc/name_table.c | 62 net/tipc/net.c | 7 net/tipc/node.c | 20 net/tipc/node.h | 2 net/tipc/node_subscr.c | 2 net/tipc/port.c | 57 net/tipc/ref.c | 8 net/tipc/ref.h | 4 net/tipc/socket.c | 28 net/tipc/subscr.c | 30 net/tipc/user_reg.c | 4 net/tipc/zone.c | 12 net/unix/af_unix.c | 34 net/unix/garbage.c | 7 net/xfrm/xfrm_policy.c | 9 net/xfrm/xfrm_state.c | 108 net/xfrm/xfrm_user.c | 397 ++ security/dummy.c | 13 security/selinux/hooks.c | 46 security/selinux/include/xfrm.h | 12 security/selinux/nlmsgtab.c | 7 security/selinux/ss/services.c | 1 security/selinux/xfrm.c | 68 infiniband/ulp/ipoib/ipoib_multicast.c | 0 361 files changed, 12815 insertions(+), 7305 deletions(-) diff -puN Documentation/connector/connector.txt~git-net Documentation/connector/connector.txt --- devel/Documentation/connector/connector.txt~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/Documentation/connector/connector.txt 2006-03-17 23:03:46.000000000 -0800 @@ -69,10 +69,11 @@ Unregisters new callback with connector struct cb_id *id - unique connector's user identifier. -void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); +int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); Sends message to the specified groups. It can be safely called from -any context, but may silently fail under strong memory pressure. +softirq context, but may silently fail under strong memory pressure. +If there are no listeners for given group -ESRCH can be returned. struct cn_msg * - message header(with attached data). u32 __group - destination group. diff -puN Documentation/networking/ip-sysctl.txt~git-net Documentation/networking/ip-sysctl.txt --- devel/Documentation/networking/ip-sysctl.txt~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/Documentation/networking/ip-sysctl.txt 2006-03-17 23:03:46.000000000 -0800 @@ -355,6 +355,13 @@ somaxconn - INTEGER Defaults to 128. See also tcp_max_syn_backlog for additional tuning for TCP sockets. +tcp_workaround_signed_windows - BOOLEAN + If set, assume no receipt of a window scaling option means the + remote TCP is broken and treats the window as a signed quantity. + If unset, assume the remote TCP is not broken even if we do + not receive a window scaling option from them. + Default: 0 + IP Variables: ip_local_port_range - 2 INTEGERS @@ -619,6 +626,11 @@ arp_ignore - INTEGER The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface} +arp_accept - BOOLEAN + Define behavior when gratuitous arp replies are received: + 0 - drop gratuitous arp frames + 1 - accept gratuitous arp frames + app_solicit - INTEGER The maximum number of probes to send to the user space ARP daemon via netlink before dropping back to multicast probes (see @@ -717,6 +729,33 @@ accept_ra - BOOLEAN Functional default: enabled if local forwarding is disabled. disabled if local forwarding is enabled. +accept_ra_defrtr - BOOLEAN + Learn default router in Router Advertisement. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + +accept_ra_pinfo - BOOLEAN + Learn Prefix Inforamtion in Router Advertisement. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + +accept_ra_rt_info_max_plen - INTEGER + Maximum prefix length of Route Information in RA. + + Route Information w/ prefix larger than or equal to this + variable shall be ignored. + + Functional default: 0 if accept_ra_rtr_pref is enabled. + -1 if accept_ra_rtr_pref is disabled. + +accept_ra_rtr_pref - BOOLEAN + Accept Router Preference in RA. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + accept_redirects - BOOLEAN Accept Redirects. @@ -727,8 +766,8 @@ autoconf - BOOLEAN Autoconfigure addresses using Prefix Information in Router Advertisements. - Functional default: enabled if accept_ra is enabled. - disabled if accept_ra is disabled. + Functional default: enabled if accept_ra_pinfo is enabled. + disabled if accept_ra_pinfo is disabled. dad_transmits - INTEGER The amount of Duplicate Address Detection probes to send. @@ -771,6 +810,12 @@ mtu - INTEGER Default Maximum Transfer Unit Default: 1280 (IPv6 required minimum) +router_probe_interval - INTEGER + Minimum interval (in seconds) between Router Probing described + in RFC4191. + + Default: 60 + router_solicitation_delay - INTEGER Number of seconds to wait after interface is brought up before sending Router Solicitations. diff -puN drivers/atm/suni.c~git-net drivers/atm/suni.c --- devel/drivers/atm/suni.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/atm/suni.c 2006-03-17 23:03:46.000000000 -0800 @@ -188,7 +188,7 @@ static int suni_ioctl(struct atm_dev *de case SONET_GETDIAG: return get_diag(dev,arg); case SONET_SETFRAMING: - if (arg != SONET_FRAME_SONET) return -EINVAL; + if ((int)(unsigned long)arg != SONET_FRAME_SONET) return -EINVAL; return 0; case SONET_GETFRAMING: return put_user(SONET_FRAME_SONET,(int __user *)arg) ? diff -puN drivers/connector/connector.c~git-net drivers/connector/connector.c --- devel/drivers/connector/connector.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/connector/connector.c 2006-03-17 23:03:46.000000000 -0800 @@ -97,6 +97,9 @@ int cn_netlink_send(struct cn_msg *msg, group = __group; } + if (!netlink_has_listeners(dev->nls, group)) + return -ESRCH; + size = NLMSG_SPACE(sizeof(*msg) + msg->len); skb = alloc_skb(size, gfp_mask); @@ -111,9 +114,7 @@ int cn_netlink_send(struct cn_msg *msg, NETLINK_CB(skb).dst_group = group; - netlink_broadcast(dev->nls, skb, 0, group, gfp_mask); - - return 0; + return netlink_broadcast(dev->nls, skb, 0, group, gfp_mask); nlmsg_failure: kfree_skb(skb); diff -puN drivers/infiniband/ulp/ipoib/ipoib_main.c~git-net drivers/infiniband/ulp/ipoib/ipoib_main.c --- devel/drivers/infiniband/ulp/ipoib/ipoib_main.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-03-17 23:03:46.000000000 -0800 @@ -253,7 +253,6 @@ static void path_free(struct net_device if (neigh->ah) ipoib_put_ah(neigh->ah); *to_ipoib_neigh(neigh->neighbour) = NULL; - neigh->neighbour->ops->destructor = NULL; kfree(neigh); } @@ -534,7 +533,6 @@ static void neigh_add_path(struct sk_buf err: *to_ipoib_neigh(skb->dst->neighbour) = NULL; list_del(&neigh->list); - neigh->neighbour->ops->destructor = NULL; kfree(neigh); ++priv->stats.tx_dropped; @@ -773,21 +771,9 @@ static void ipoib_neigh_destructor(struc ipoib_put_ah(ah); } -static int ipoib_neigh_setup(struct neighbour *neigh) -{ - /* - * Is this kosher? I can't find anybody in the kernel that - * sets neigh->destructor, so we should be able to set it here - * without trouble. - */ - neigh->ops->destructor = ipoib_neigh_destructor; - - return 0; -} - static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms) { - parms->neigh_setup = ipoib_neigh_setup; + parms->neigh_destructor = ipoib_neigh_destructor; return 0; } diff -puN drivers/infiniband/ulp/ipoib/ipoib_multicast.c~git-net drivers/infiniband/ulp/ipoib/ipoib_multicast.c diff -puN drivers/net/8139too.c~git-net drivers/net/8139too.c --- devel/drivers/net/8139too.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/8139too.c 2006-03-17 23:03:47.000000000 -0800 @@ -1605,7 +1605,7 @@ static void rtl8139_thread (void *_data) if (tp->watchdog_fired) { tp->watchdog_fired = 0; rtl8139_tx_timeout_task(_data); - } else if (rtnl_shlock_nowait() == 0) { + } else if (rtnl_trylock()) { rtl8139_thread_iter (dev, tp, tp->mmio_addr); rtnl_unlock (); } else { diff -puN drivers/net/bnx2.c~git-net drivers/net/bnx2.c --- devel/drivers/net/bnx2.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/bnx2.c 2006-03-17 23:03:48.000000000 -0800 @@ -14,8 +14,8 @@ #define DRV_MODULE_NAME "bnx2" #define PFX DRV_MODULE_NAME ": " -#define DRV_MODULE_VERSION "1.4.31" -#define DRV_MODULE_RELDATE "January 19, 2006" +#define DRV_MODULE_VERSION "1.4.38" +#define DRV_MODULE_RELDATE "February 10, 2006" #define RUN_AT(x) (jiffies + (x)) @@ -360,6 +360,8 @@ bnx2_netif_start(struct bnx2 *bp) static void bnx2_free_mem(struct bnx2 *bp) { + int i; + if (bp->stats_blk) { pci_free_consistent(bp->pdev, sizeof(struct statistics_block), bp->stats_blk, bp->stats_blk_mapping); @@ -378,19 +380,23 @@ bnx2_free_mem(struct bnx2 *bp) } kfree(bp->tx_buf_ring); bp->tx_buf_ring = NULL; - if (bp->rx_desc_ring) { - pci_free_consistent(bp->pdev, - sizeof(struct rx_bd) * RX_DESC_CNT, - bp->rx_desc_ring, bp->rx_desc_mapping); - bp->rx_desc_ring = NULL; + for (i = 0; i < bp->rx_max_ring; i++) { + if (bp->rx_desc_ring[i]) + pci_free_consistent(bp->pdev, + sizeof(struct rx_bd) * RX_DESC_CNT, + bp->rx_desc_ring[i], + bp->rx_desc_mapping[i]); + bp->rx_desc_ring[i] = NULL; } - kfree(bp->rx_buf_ring); + vfree(bp->rx_buf_ring); bp->rx_buf_ring = NULL; } static int bnx2_alloc_mem(struct bnx2 *bp) { + int i; + bp->tx_buf_ring = kmalloc(sizeof(struct sw_bd) * TX_DESC_CNT, GFP_KERNEL); if (bp->tx_buf_ring == NULL) @@ -404,18 +410,23 @@ bnx2_alloc_mem(struct bnx2 *bp) if (bp->tx_desc_ring == NULL) goto alloc_mem_err; - bp->rx_buf_ring = kmalloc(sizeof(struct sw_bd) * RX_DESC_CNT, - GFP_KERNEL); + bp->rx_buf_ring = vmalloc(sizeof(struct sw_bd) * RX_DESC_CNT * + bp->rx_max_ring); if (bp->rx_buf_ring == NULL) goto alloc_mem_err; - memset(bp->rx_buf_ring, 0, sizeof(struct sw_bd) * RX_DESC_CNT); - bp->rx_desc_ring = pci_alloc_consistent(bp->pdev, - sizeof(struct rx_bd) * - RX_DESC_CNT, - &bp->rx_desc_mapping); - if (bp->rx_desc_ring == NULL) - goto alloc_mem_err; + memset(bp->rx_buf_ring, 0, sizeof(struct sw_bd) * RX_DESC_CNT * + bp->rx_max_ring); + + for (i = 0; i < bp->rx_max_ring; i++) { + bp->rx_desc_ring[i] = + pci_alloc_consistent(bp->pdev, + sizeof(struct rx_bd) * RX_DESC_CNT, + &bp->rx_desc_mapping[i]); + if (bp->rx_desc_ring[i] == NULL) + goto alloc_mem_err; + + } bp->status_blk = pci_alloc_consistent(bp->pdev, sizeof(struct status_block), @@ -1520,7 +1531,7 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, u16 i struct sk_buff *skb; struct sw_bd *rx_buf = &bp->rx_buf_ring[index]; dma_addr_t mapping; - struct rx_bd *rxbd = &bp->rx_desc_ring[index]; + struct rx_bd *rxbd = &bp->rx_desc_ring[RX_RING(index)][RX_IDX(index)]; unsigned long align; skb = dev_alloc_skb(bp->rx_buf_size); @@ -1656,23 +1667,30 @@ static inline void bnx2_reuse_rx_skb(struct bnx2 *bp, struct sk_buff *skb, u16 cons, u16 prod) { - struct sw_bd *cons_rx_buf = &bp->rx_buf_ring[cons]; - struct sw_bd *prod_rx_buf = &bp->rx_buf_ring[prod]; - struct rx_bd *cons_bd = &bp->rx_desc_ring[cons]; - struct rx_bd *prod_bd = &bp->rx_desc_ring[prod]; + struct sw_bd *cons_rx_buf, *prod_rx_buf; + struct rx_bd *cons_bd, *prod_bd; + + cons_rx_buf = &bp->rx_buf_ring[cons]; + prod_rx_buf = &bp->rx_buf_ring[prod]; pci_dma_sync_single_for_device(bp->pdev, pci_unmap_addr(cons_rx_buf, mapping), bp->rx_offset + RX_COPY_THRESH, PCI_DMA_FROMDEVICE); - prod_rx_buf->skb = cons_rx_buf->skb; - pci_unmap_addr_set(prod_rx_buf, mapping, - pci_unmap_addr(cons_rx_buf, mapping)); + bp->rx_prod_bseq += bp->rx_buf_use_size; - memcpy(prod_bd, cons_bd, 8); + prod_rx_buf->skb = skb; - bp->rx_prod_bseq += bp->rx_buf_use_size; + if (cons == prod) + return; + + pci_unmap_addr_set(prod_rx_buf, mapping, + pci_unmap_addr(cons_rx_buf, mapping)); + cons_bd = &bp->rx_desc_ring[RX_RING(cons)][RX_IDX(cons)]; + prod_bd = &bp->rx_desc_ring[RX_RING(prod)][RX_IDX(prod)]; + prod_bd->rx_bd_haddr_hi = cons_bd->rx_bd_haddr_hi; + prod_bd->rx_bd_haddr_lo = cons_bd->rx_bd_haddr_lo; } static int @@ -1699,14 +1717,19 @@ bnx2_rx_int(struct bnx2 *bp, int budget) u32 status; struct sw_bd *rx_buf; struct sk_buff *skb; + dma_addr_t dma_addr; sw_ring_cons = RX_RING_IDX(sw_cons); sw_ring_prod = RX_RING_IDX(sw_prod); rx_buf = &bp->rx_buf_ring[sw_ring_cons]; skb = rx_buf->skb; - pci_dma_sync_single_for_cpu(bp->pdev, - pci_unmap_addr(rx_buf, mapping), + + rx_buf->skb = NULL; + + dma_addr = pci_unmap_addr(rx_buf, mapping); + + pci_dma_sync_single_for_cpu(bp->pdev, dma_addr, bp->rx_offset + RX_COPY_THRESH, PCI_DMA_FROMDEVICE); rx_hdr = (struct l2_fhdr *) skb->data; @@ -1747,8 +1770,7 @@ bnx2_rx_int(struct bnx2 *bp, int budget) skb = new_skb; } else if (bnx2_alloc_rx_skb(bp, sw_ring_prod) == 0) { - pci_unmap_single(bp->pdev, - pci_unmap_addr(rx_buf, mapping), + pci_unmap_single(bp->pdev, dma_addr, bp->rx_buf_use_size, PCI_DMA_FROMDEVICE); skb_reserve(skb, bp->rx_offset); @@ -1794,8 +1816,6 @@ reuse_rx: rx_pkt++; next_rx: - rx_buf->skb = NULL; - sw_cons = NEXT_RX_BD(sw_cons); sw_prod = NEXT_RX_BD(sw_prod); @@ -3340,27 +3360,35 @@ bnx2_init_rx_ring(struct bnx2 *bp) bp->hw_rx_cons = 0; bp->rx_prod_bseq = 0; - rxbd = &bp->rx_desc_ring[0]; - for (i = 0; i < MAX_RX_DESC_CNT; i++, rxbd++) { - rxbd->rx_bd_len = bp->rx_buf_use_size; - rxbd->rx_bd_flags = RX_BD_FLAGS_START | RX_BD_FLAGS_END; - } + for (i = 0; i < bp->rx_max_ring; i++) { + int j; - rxbd->rx_bd_haddr_hi = (u64) bp->rx_desc_mapping >> 32; - rxbd->rx_bd_haddr_lo = (u64) bp->rx_desc_mapping & 0xffffffff; + rxbd = &bp->rx_desc_ring[i][0]; + for (j = 0; j < MAX_RX_DESC_CNT; j++, rxbd++) { + rxbd->rx_bd_len = bp->rx_buf_use_size; + rxbd->rx_bd_flags = RX_BD_FLAGS_START | RX_BD_FLAGS_END; + } + if (i == (bp->rx_max_ring - 1)) + j = 0; + else + j = i + 1; + rxbd->rx_bd_haddr_hi = (u64) bp->rx_desc_mapping[j] >> 32; + rxbd->rx_bd_haddr_lo = (u64) bp->rx_desc_mapping[j] & + 0xffffffff; + } val = BNX2_L2CTX_CTX_TYPE_CTX_BD_CHN_TYPE_VALUE; val |= BNX2_L2CTX_CTX_TYPE_SIZE_L2; val |= 0x02 << 8; CTX_WR(bp, GET_CID_ADDR(RX_CID), BNX2_L2CTX_CTX_TYPE, val); - val = (u64) bp->rx_desc_mapping >> 32; + val = (u64) bp->rx_desc_mapping[0] >> 32; CTX_WR(bp, GET_CID_ADDR(RX_CID), BNX2_L2CTX_NX_BDHADDR_HI, val); - val = (u64) bp->rx_desc_mapping & 0xffffffff; + val = (u64) bp->rx_desc_mapping[0] & 0xffffffff; CTX_WR(bp, GET_CID_ADDR(RX_CID), BNX2_L2CTX_NX_BDHADDR_LO, val); - for ( ;ring_prod < bp->rx_ring_size; ) { + for (i = 0; i < bp->rx_ring_size; i++) { if (bnx2_alloc_rx_skb(bp, ring_prod) < 0) { break; } @@ -3375,6 +3403,29 @@ bnx2_init_rx_ring(struct bnx2 *bp) } static void +bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size) +{ + u32 num_rings, max; + + bp->rx_ring_size = size; + num_rings = 1; + while (size > MAX_RX_DESC_CNT) { + size -= MAX_RX_DESC_CNT; + num_rings++; + } + /* round to next power of 2 */ + max = MAX_RX_RINGS; + while ((max & num_rings) == 0) + max >>= 1; + + if (num_rings != max) + max <<= 1; + + bp->rx_max_ring = max; + bp->rx_max_ring_idx = (bp->rx_max_ring * RX_DESC_CNT) - 1; +} + +static void bnx2_free_tx_skbs(struct bnx2 *bp) { int i; @@ -3419,7 +3470,7 @@ bnx2_free_rx_skbs(struct bnx2 *bp) if (bp->rx_buf_ring == NULL) return; - for (i = 0; i < RX_DESC_CNT; i++) { + for (i = 0; i < bp->rx_max_ring_idx; i++) { struct sw_bd *rx_buf = &bp->rx_buf_ring[i]; struct sk_buff *skb = rx_buf->skb; @@ -3506,74 +3557,9 @@ bnx2_test_registers(struct bnx2 *bp) { 0x0c00, 0, 0x00000000, 0x00000001 }, { 0x0c04, 0, 0x00000000, 0x03ff0001 }, { 0x0c08, 0, 0x0f0ff073, 0x00000000 }, - { 0x0c0c, 0, 0x00ffffff, 0x00000000 }, - { 0x0c30, 0, 0x00000000, 0xffffffff }, - { 0x0c34, 0, 0x00000000, 0xffffffff }, - { 0x0c38, 0, 0x00000000, 0xffffffff }, - { 0x0c3c, 0, 0x00000000, 0xffffffff }, - { 0x0c40, 0, 0x00000000, 0xffffffff }, - { 0x0c44, 0, 0x00000000, 0xffffffff }, - { 0x0c48, 0, 0x00000000, 0x0007ffff }, - { 0x0c4c, 0, 0x00000000, 0xffffffff }, - { 0x0c50, 0, 0x00000000, 0xffffffff }, - { 0x0c54, 0, 0x00000000, 0xffffffff }, - { 0x0c58, 0, 0x00000000, 0xffffffff }, - { 0x0c5c, 0, 0x00000000, 0xffffffff }, - { 0x0c60, 0, 0x00000000, 0xffffffff }, - { 0x0c64, 0, 0x00000000, 0xffffffff }, - { 0x0c68, 0, 0x00000000, 0xffffffff }, - { 0x0c6c, 0, 0x00000000, 0xffffffff }, - { 0x0c70, 0, 0x00000000, 0xffffffff }, - { 0x0c74, 0, 0x00000000, 0xffffffff }, - { 0x0c78, 0, 0x00000000, 0xffffffff }, - { 0x0c7c, 0, 0x00000000, 0xffffffff }, - { 0x0c80, 0, 0x00000000, 0xffffffff }, - { 0x0c84, 0, 0x00000000, 0xffffffff }, - { 0x0c88, 0, 0x00000000, 0xffffffff }, - { 0x0c8c, 0, 0x00000000, 0xffffffff }, - { 0x0c90, 0, 0x00000000, 0xffffffff }, - { 0x0c94, 0, 0x00000000, 0xffffffff }, - { 0x0c98, 0, 0x00000000, 0xffffffff }, - { 0x0c9c, 0, 0x00000000, 0xffffffff }, - { 0x0ca0, 0, 0x00000000, 0xffffffff }, - { 0x0ca4, 0, 0x00000000, 0xffffffff }, - { 0x0ca8, 0, 0x00000000, 0x0007ffff }, - { 0x0cac, 0, 0x00000000, 0xffffffff }, - { 0x0cb0, 0, 0x00000000, 0xffffffff }, - { 0x0cb4, 0, 0x00000000, 0xffffffff }, - { 0x0cb8, 0, 0x00000000, 0xffffffff }, - { 0x0cbc, 0, 0x00000000, 0xffffffff }, - { 0x0cc0, 0, 0x00000000, 0xffffffff }, - { 0x0cc4, 0, 0x00000000, 0xffffffff }, - { 0x0cc8, 0, 0x00000000, 0xffffffff }, - { 0x0ccc, 0, 0x00000000, 0xffffffff }, - { 0x0cd0, 0, 0x00000000, 0xffffffff }, - { 0x0cd4, 0, 0x00000000, 0xffffffff }, - { 0x0cd8, 0, 0x00000000, 0xffffffff }, - { 0x0cdc, 0, 0x00000000, 0xffffffff }, - { 0x0ce0, 0, 0x00000000, 0xffffffff }, - { 0x0ce4, 0, 0x00000000, 0xffffffff }, - { 0x0ce8, 0, 0x00000000, 0xffffffff }, - { 0x0cec, 0, 0x00000000, 0xffffffff }, - { 0x0cf0, 0, 0x00000000, 0xffffffff }, - { 0x0cf4, 0, 0x00000000, 0xffffffff }, - { 0x0cf8, 0, 0x00000000, 0xffffffff }, - { 0x0cfc, 0, 0x00000000, 0xffffffff }, - { 0x0d00, 0, 0x00000000, 0xffffffff }, - { 0x0d04, 0, 0x00000000, 0xffffffff }, { 0x1000, 0, 0x00000000, 0x00000001 }, { 0x1004, 0, 0x00000000, 0x000f0001 }, - { 0x1044, 0, 0x00000000, 0xffc003ff }, - { 0x1080, 0, 0x00000000, 0x0001ffff }, - { 0x1084, 0, 0x00000000, 0xffffffff }, - { 0x1088, 0, 0x00000000, 0xffffffff }, - { 0x108c, 0, 0x00000000, 0xffffffff }, - { 0x1090, 0, 0x00000000, 0xffffffff }, - { 0x1094, 0, 0x00000000, 0xffffffff }, - { 0x1098, 0, 0x00000000, 0xffffffff }, - { 0x109c, 0, 0x00000000, 0xffffffff }, - { 0x10a0, 0, 0x00000000, 0xffffffff }, { 0x1408, 0, 0x01c00800, 0x00000000 }, { 0x149c, 0, 0x8000ffff, 0x00000000 }, @@ -3585,111 +3571,9 @@ bnx2_test_registers(struct bnx2 *bp) { 0x14c4, 0, 0x00003fff, 0x00000000 }, { 0x14cc, 0, 0x00000000, 0x00000001 }, { 0x14d0, 0, 0xffffffff, 0x00000000 }, - { 0x1500, 0, 0x00000000, 0xffffffff }, - { 0x1504, 0, 0x00000000, 0xffffffff }, - { 0x1508, 0, 0x00000000, 0xffffffff }, - { 0x150c, 0, 0x00000000, 0xffffffff }, - { 0x1510, 0, 0x00000000, 0xffffffff }, - { 0x1514, 0, 0x00000000, 0xffffffff }, - { 0x1518, 0, 0x00000000, 0xffffffff }, - { 0x151c, 0, 0x00000000, 0xffffffff }, - { 0x1520, 0, 0x00000000, 0xffffffff }, - { 0x1524, 0, 0x00000000, 0xffffffff }, - { 0x1528, 0, 0x00000000, 0xffffffff }, - { 0x152c, 0, 0x00000000, 0xffffffff }, - { 0x1530, 0, 0x00000000, 0xffffffff }, - { 0x1534, 0, 0x00000000, 0xffffffff }, - { 0x1538, 0, 0x00000000, 0xffffffff }, - { 0x153c, 0, 0x00000000, 0xffffffff }, - { 0x1540, 0, 0x00000000, 0xffffffff }, - { 0x1544, 0, 0x00000000, 0xffffffff }, - { 0x1548, 0, 0x00000000, 0xffffffff }, - { 0x154c, 0, 0x00000000, 0xffffffff }, - { 0x1550, 0, 0x00000000, 0xffffffff }, - { 0x1554, 0, 0x00000000, 0xffffffff }, - { 0x1558, 0, 0x00000000, 0xffffffff }, - { 0x1600, 0, 0x00000000, 0xffffffff }, - { 0x1604, 0, 0x00000000, 0xffffffff }, - { 0x1608, 0, 0x00000000, 0xffffffff }, - { 0x160c, 0, 0x00000000, 0xffffffff }, - { 0x1610, 0, 0x00000000, 0xffffffff }, - { 0x1614, 0, 0x00000000, 0xffffffff }, - { 0x1618, 0, 0x00000000, 0xffffffff }, - { 0x161c, 0, 0x00000000, 0xffffffff }, - { 0x1620, 0, 0x00000000, 0xffffffff }, - { 0x1624, 0, 0x00000000, 0xffffffff }, - { 0x1628, 0, 0x00000000, 0xffffffff }, - { 0x162c, 0, 0x00000000, 0xffffffff }, - { 0x1630, 0, 0x00000000, 0xffffffff }, - { 0x1634, 0, 0x00000000, 0xffffffff }, - { 0x1638, 0, 0x00000000, 0xffffffff }, - { 0x163c, 0, 0x00000000, 0xffffffff }, - { 0x1640, 0, 0x00000000, 0xffffffff }, - { 0x1644, 0, 0x00000000, 0xffffffff }, - { 0x1648, 0, 0x00000000, 0xffffffff }, - { 0x164c, 0, 0x00000000, 0xffffffff }, - { 0x1650, 0, 0x00000000, 0xffffffff }, - { 0x1654, 0, 0x00000000, 0xffffffff }, { 0x1800, 0, 0x00000000, 0x00000001 }, { 0x1804, 0, 0x00000000, 0x00000003 }, - { 0x1840, 0, 0x00000000, 0xffffffff }, - { 0x1844, 0, 0x00000000, 0xffffffff }, - { 0x1848, 0, 0x00000000, 0xffffffff }, - { 0x184c, 0, 0x00000000, 0xffffffff }, - { 0x1850, 0, 0x00000000, 0xffffffff }, - { 0x1900, 0, 0x7ffbffff, 0x00000000 }, - { 0x1904, 0, 0xffffffff, 0x00000000 }, - { 0x190c, 0, 0xffffffff, 0x00000000 }, - { 0x1914, 0, 0xffffffff, 0x00000000 }, - { 0x191c, 0, 0xffffffff, 0x00000000 }, - { 0x1924, 0, 0xffffffff, 0x00000000 }, - { 0x192c, 0, 0xffffffff, 0x00000000 }, - { 0x1934, 0, 0xffffffff, 0x00000000 }, - { 0x193c, 0, 0xffffffff, 0x00000000 }, - { 0x1944, 0, 0xffffffff, 0x00000000 }, - { 0x194c, 0, 0xffffffff, 0x00000000 }, - { 0x1954, 0, 0xffffffff, 0x00000000 }, - { 0x195c, 0, 0xffffffff, 0x00000000 }, - { 0x1964, 0, 0xffffffff, 0x00000000 }, - { 0x196c, 0, 0xffffffff, 0x00000000 }, - { 0x1974, 0, 0xffffffff, 0x00000000 }, - { 0x197c, 0, 0xffffffff, 0x00000000 }, - { 0x1980, 0, 0x0700ffff, 0x00000000 }, - - { 0x1c00, 0, 0x00000000, 0x00000001 }, - { 0x1c04, 0, 0x00000000, 0x00000003 }, - { 0x1c08, 0, 0x0000000f, 0x00000000 }, - { 0x1c40, 0, 0x00000000, 0xffffffff }, - { 0x1c44, 0, 0x00000000, 0xffffffff }, - { 0x1c48, 0, 0x00000000, 0xffffffff }, - { 0x1c4c, 0, 0x00000000, 0xffffffff }, - { 0x1c50, 0, 0x00000000, 0xffffffff }, - { 0x1d00, 0, 0x7ffbffff, 0x00000000 }, - { 0x1d04, 0, 0xffffffff, 0x00000000 }, - { 0x1d0c, 0, 0xffffffff, 0x00000000 }, - { 0x1d14, 0, 0xffffffff, 0x00000000 }, - { 0x1d1c, 0, 0xffffffff, 0x00000000 }, - { 0x1d24, 0, 0xffffffff, 0x00000000 }, - { 0x1d2c, 0, 0xffffffff, 0x00000000 }, - { 0x1d34, 0, 0xffffffff, 0x00000000 }, - { 0x1d3c, 0, 0xffffffff, 0x00000000 }, - { 0x1d44, 0, 0xffffffff, 0x00000000 }, - { 0x1d4c, 0, 0xffffffff, 0x00000000 }, - { 0x1d54, 0, 0xffffffff, 0x00000000 }, - { 0x1d5c, 0, 0xffffffff, 0x00000000 }, - { 0x1d64, 0, 0xffffffff, 0x00000000 }, - { 0x1d6c, 0, 0xffffffff, 0x00000000 }, - { 0x1d74, 0, 0xffffffff, 0x00000000 }, - { 0x1d7c, 0, 0xffffffff, 0x00000000 }, - { 0x1d80, 0, 0x0700ffff, 0x00000000 }, - - { 0x2004, 0, 0x00000000, 0x0337000f }, - { 0x2008, 0, 0xffffffff, 0x00000000 }, - { 0x200c, 0, 0xffffffff, 0x00000000 }, - { 0x2010, 0, 0xffffffff, 0x00000000 }, - { 0x2014, 0, 0x801fff80, 0x00000000 }, - { 0x2018, 0, 0x000003ff, 0x00000000 }, { 0x2800, 0, 0x00000000, 0x00000001 }, { 0x2804, 0, 0x00000000, 0x00003f01 }, @@ -3707,16 +3591,6 @@ bnx2_test_registers(struct bnx2 *bp) { 0x2c00, 0, 0x00000000, 0x00000011 }, { 0x2c04, 0, 0x00000000, 0x00030007 }, - { 0x3000, 0, 0x00000000, 0x00000001 }, - { 0x3004, 0, 0x00000000, 0x007007ff }, - { 0x3008, 0, 0x00000003, 0x00000000 }, - { 0x300c, 0, 0xffffffff, 0x00000000 }, - { 0x3010, 0, 0xffffffff, 0x00000000 }, - { 0x3014, 0, 0xffffffff, 0x00000000 }, - { 0x3034, 0, 0xffffffff, 0x00000000 }, - { 0x3038, 0, 0xffffffff, 0x00000000 }, - { 0x3050, 0, 0x00000001, 0x00000000 }, - { 0x3c00, 0, 0x00000000, 0x00000001 }, { 0x3c04, 0, 0x00000000, 0x00070000 }, { 0x3c08, 0, 0x00007f71, 0x07f00000 }, @@ -3726,88 +3600,11 @@ bnx2_test_registers(struct bnx2 *bp) { 0x3c18, 0, 0x00000000, 0xffffffff }, { 0x3c1c, 0, 0xfffff000, 0x00000000 }, { 0x3c20, 0, 0xffffff00, 0x00000000 }, - { 0x3c24, 0, 0xffffffff, 0x00000000 }, - { 0x3c28, 0, 0xffffffff, 0x00000000 }, - { 0x3c2c, 0, 0xffffffff, 0x00000000 }, - { 0x3c30, 0, 0xffffffff, 0x00000000 }, - { 0x3c34, 0, 0xffffffff, 0x00000000 }, - { 0x3c38, 0, 0xffffffff, 0x00000000 }, - { 0x3c3c, 0, 0xffffffff, 0x00000000 }, - { 0x3c40, 0, 0xffffffff, 0x00000000 }, - { 0x3c44, 0, 0xffffffff, 0x00000000 }, - { 0x3c48, 0, 0xffffffff, 0x00000000 }, - { 0x3c4c, 0, 0xffffffff, 0x00000000 }, - { 0x3c50, 0, 0xffffffff, 0x00000000 }, - { 0x3c54, 0, 0xffffffff, 0x00000000 }, - { 0x3c58, 0, 0xffffffff, 0x00000000 }, - { 0x3c5c, 0, 0xffffffff, 0x00000000 }, - { 0x3c60, 0, 0xffffffff, 0x00000000 }, - { 0x3c64, 0, 0xffffffff, 0x00000000 }, - { 0x3c68, 0, 0xffffffff, 0x00000000 }, - { 0x3c6c, 0, 0xffffffff, 0x00000000 }, - { 0x3c70, 0, 0xffffffff, 0x00000000 }, - { 0x3c74, 0, 0x0000003f, 0x00000000 }, - { 0x3c78, 0, 0x00000000, 0x00000000 }, - { 0x3c7c, 0, 0x00000000, 0x00000000 }, - { 0x3c80, 0, 0x3fffffff, 0x00000000 }, - { 0x3c84, 0, 0x0000003f, 0x00000000 }, - { 0x3c88, 0, 0x00000000, 0xffffffff }, - { 0x3c8c, 0, 0x00000000, 0xffffffff }, - - { 0x4000, 0, 0x00000000, 0x00000001 }, - { 0x4004, 0, 0x00000000, 0x00030000 }, - { 0x4008, 0, 0x00000ff0, 0x00000000 }, - { 0x400c, 0, 0xffffffff, 0x00000000 }, - { 0x4088, 0, 0x00000000, 0x00070303 }, - - { 0x4400, 0, 0x00000000, 0x00000001 }, - { 0x4404, 0, 0x00000000, 0x00003f01 }, - { 0x4408, 0, 0x7fff00ff, 0x00000000 }, - { 0x440c, 0, 0xffffffff, 0x00000000 }, - { 0x4410, 0, 0xffff, 0x0000 }, - { 0x4414, 0, 0xffff, 0x0000 }, - { 0x4418, 0, 0xffff, 0x0000 }, - { 0x441c, 0, 0xffff, 0x0000 }, - { 0x4428, 0, 0xffffffff, 0x00000000 }, - { 0x442c, 0, 0xffffffff, 0x00000000 }, - { 0x4430, 0, 0xffffffff, 0x00000000 }, - { 0x4434, 0, 0xffffffff, 0x00000000 }, - { 0x4438, 0, 0xffffffff, 0x00000000 }, - { 0x443c, 0, 0xffffffff, 0x00000000 }, - { 0x4440, 0, 0xffffffff, 0x00000000 }, - { 0x4444, 0, 0xffffffff, 0x00000000 }, - - { 0x4c00, 0, 0x00000000, 0x00000001 }, - { 0x4c04, 0, 0x00000000, 0x0000003f }, - { 0x4c08, 0, 0xffffffff, 0x00000000 }, - { 0x4c0c, 0, 0x0007fc00, 0x00000000 }, - { 0x4c10, 0, 0x80003fe0, 0x00000000 }, - { 0x4c14, 0, 0xffffffff, 0x00000000 }, - { 0x4c44, 0, 0x00000000, 0x9fff9fff }, - { 0x4c48, 0, 0x00000000, 0xb3009fff }, - { 0x4c4c, 0, 0x00000000, 0x77f33b30 }, - { 0x4c50, 0, 0x00000000, 0xffffffff }, { 0x5004, 0, 0x00000000, 0x0000007f }, { 0x5008, 0, 0x0f0007ff, 0x00000000 }, { 0x500c, 0, 0xf800f800, 0x07ff07ff }, - { 0x5400, 0, 0x00000008, 0x00000001 }, - { 0x5404, 0, 0x00000000, 0x0000003f }, - { 0x5408, 0, 0x0000001f, 0x00000000 }, - { 0x540c, 0, 0xffffffff, 0x00000000 }, - { 0x5410, 0, 0xffffffff, 0x00000000 }, - { 0x5414, 0, 0x0000ffff, 0x00000000 }, - { 0x5418, 0, 0x0000ffff, 0x00000000 }, - { 0x541c, 0, 0x0000ffff, 0x00000000 }, - { 0x5420, 0, 0x0000ffff, 0x00000000 }, - { 0x5428, 0, 0x000000ff, 0x00000000 }, - { 0x542c, 0, 0xff00ffff, 0x00000000 }, - { 0x5430, 0, 0x001fff80, 0x00000000 }, - { 0x5438, 0, 0xffffffff, 0x00000000 }, - { 0x543c, 0, 0xffffffff, 0x00000000 }, - { 0x5440, 0, 0xf800f800, 0x07ff07ff }, - { 0x5c00, 0, 0x00000000, 0x00000001 }, { 0x5c04, 0, 0x00000000, 0x0003000f }, { 0x5c08, 0, 0x00000003, 0x00000000 }, @@ -4794,6 +4591,64 @@ bnx2_get_drvinfo(struct net_device *dev, info->fw_version[5] = 0; } +#define BNX2_REGDUMP_LEN (32 * 1024) + +static int +bnx2_get_regs_len(struct net_device *dev) +{ + return BNX2_REGDUMP_LEN; +} + +static void +bnx2_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p) +{ + u32 *p = _p, i, offset; + u8 *orig_p = _p; + struct bnx2 *bp = netdev_priv(dev); + u32 reg_boundaries[] = { 0x0000, 0x0098, 0x0400, 0x045c, + 0x0800, 0x0880, 0x0c00, 0x0c10, + 0x0c30, 0x0d08, 0x1000, 0x101c, + 0x1040, 0x1048, 0x1080, 0x10a4, + 0x1400, 0x1490, 0x1498, 0x14f0, + 0x1500, 0x155c, 0x1580, 0x15dc, + 0x1600, 0x1658, 0x1680, 0x16d8, + 0x1800, 0x1820, 0x1840, 0x1854, + 0x1880, 0x1894, 0x1900, 0x1984, + 0x1c00, 0x1c0c, 0x1c40, 0x1c54, + 0x1c80, 0x1c94, 0x1d00, 0x1d84, + 0x2000, 0x2030, 0x23c0, 0x2400, + 0x2800, 0x2820, 0x2830, 0x2850, + 0x2b40, 0x2c10, 0x2fc0, 0x3058, + 0x3c00, 0x3c94, 0x4000, 0x4010, + 0x4080, 0x4090, 0x43c0, 0x4458, + 0x4c00, 0x4c18, 0x4c40, 0x4c54, + 0x4fc0, 0x5010, 0x53c0, 0x5444, + 0x5c00, 0x5c18, 0x5c80, 0x5c90, + 0x5fc0, 0x6000, 0x6400, 0x6428, + 0x6800, 0x6848, 0x684c, 0x6860, + 0x6888, 0x6910, 0x8000 }; + + regs->version = 0; + + memset(p, 0, BNX2_REGDUMP_LEN); + + if (!netif_running(bp->dev)) + return; + + i = 0; + offset = reg_boundaries[0]; + p += offset; + while (offset < BNX2_REGDUMP_LEN) { + *p++ = REG_RD(bp, offset); + offset += 4; + if (offset == reg_boundaries[i + 1]) { + offset = reg_boundaries[i + 2]; + p = (u32 *) (orig_p + offset); + i += 2; + } + } +} + static void bnx2_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) { @@ -4979,7 +4834,7 @@ bnx2_get_ringparam(struct net_device *de { struct bnx2 *bp = netdev_priv(dev); - ering->rx_max_pending = MAX_RX_DESC_CNT; + ering->rx_max_pending = MAX_TOTAL_RX_DESC_CNT; ering->rx_mini_max_pending = 0; ering->rx_jumbo_max_pending = 0; @@ -4996,17 +4851,28 @@ bnx2_set_ringparam(struct net_device *de { struct bnx2 *bp = netdev_priv(dev); - if ((ering->rx_pending > MAX_RX_DESC_CNT) || + if ((ering->rx_pending > MAX_TOTAL_RX_DESC_CNT) || (ering->tx_pending > MAX_TX_DESC_CNT) || (ering->tx_pending <= MAX_SKB_FRAGS)) { return -EINVAL; } - bp->rx_ring_size = ering->rx_pending; + if (netif_running(bp->dev)) { + bnx2_netif_stop(bp); + bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET); + bnx2_free_skbs(bp); + bnx2_free_mem(bp); + } + + bnx2_set_rx_ring_size(bp, ering->rx_pending); bp->tx_ring_size = ering->tx_pending; if (netif_running(bp->dev)) { - bnx2_netif_stop(bp); + int rc; + + rc = bnx2_alloc_mem(bp); + if (rc) + return rc; bnx2_init_nic(bp); bnx2_netif_start(bp); } @@ -5360,6 +5226,8 @@ static struct ethtool_ops bnx2_ethtool_o .get_settings = bnx2_get_settings, .set_settings = bnx2_set_settings, .get_drvinfo = bnx2_get_drvinfo, + .get_regs_len = bnx2_get_regs_len, + .get_regs = bnx2_get_regs, .get_wol = bnx2_get_wol, .set_wol = bnx2_set_wol, .nway_reset = bnx2_nway_reset, @@ -5678,7 +5546,7 @@ bnx2_init_board(struct pci_dev *pdev, st bp->mac_addr[5] = (u8) reg; bp->tx_ring_size = MAX_TX_DESC_CNT; - bp->rx_ring_size = 100; + bnx2_set_rx_ring_size(bp, 100); bp->rx_csum = 1; @@ -5897,6 +5765,7 @@ bnx2_suspend(struct pci_dev *pdev, pm_me if (!netif_running(dev)) return 0; + flush_scheduled_work(); bnx2_netif_stop(bp); netif_device_detach(dev); del_timer_sync(&bp->timer); diff -puN drivers/net/bnx2.h~git-net drivers/net/bnx2.h --- devel/drivers/net/bnx2.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/bnx2.h 2006-03-17 23:03:48.000000000 -0800 @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -3792,8 +3793,10 @@ struct l2_fhdr { #define TX_DESC_CNT (BCM_PAGE_SIZE / sizeof(struct tx_bd)) #define MAX_TX_DESC_CNT (TX_DESC_CNT - 1) +#define MAX_RX_RINGS 4 #define RX_DESC_CNT (BCM_PAGE_SIZE / sizeof(struct rx_bd)) #define MAX_RX_DESC_CNT (RX_DESC_CNT - 1) +#define MAX_TOTAL_RX_DESC_CNT (MAX_RX_DESC_CNT * MAX_RX_RINGS) #define NEXT_TX_BD(x) (((x) & (MAX_TX_DESC_CNT - 1)) == \ (MAX_TX_DESC_CNT - 1)) ? \ @@ -3805,8 +3808,10 @@ struct l2_fhdr { (MAX_RX_DESC_CNT - 1)) ? \ (x) + 2 : (x) + 1 -#define RX_RING_IDX(x) ((x) & MAX_RX_DESC_CNT) +#define RX_RING_IDX(x) ((x) & bp->rx_max_ring_idx) +#define RX_RING(x) (((x) & ~MAX_RX_DESC_CNT) >> 8) +#define RX_IDX(x) ((x) & MAX_RX_DESC_CNT) /* Context size. */ #define CTX_SHIFT 7 @@ -3903,6 +3908,15 @@ struct bnx2 { struct status_block *status_blk; u32 last_status_idx; + u32 flags; +#define PCIX_FLAG 1 +#define PCI_32BIT_FLAG 2 +#define ONE_TDMA_FLAG 4 /* no longer used */ +#define NO_WOL_FLAG 8 +#define USING_DAC_FLAG 0x10 +#define USING_MSI_FLAG 0x20 +#define ASF_ENABLE_FLAG 0x40 + struct tx_bd *tx_desc_ring; struct sw_bd *tx_buf_ring; u32 tx_prod_bseq; @@ -3920,19 +3934,22 @@ struct bnx2 { u32 rx_offset; u32 rx_buf_use_size; /* useable size */ u32 rx_buf_size; /* with alignment */ - struct rx_bd *rx_desc_ring; - struct sw_bd *rx_buf_ring; + u32 rx_max_ring_idx; + u32 rx_prod_bseq; u16 rx_prod; u16 rx_cons; u32 rx_csum; + struct sw_bd *rx_buf_ring; + struct rx_bd *rx_desc_ring[MAX_RX_RINGS]; + /* Only used to synchronize netif_stop_queue/wake_queue when tx */ /* ring is full */ spinlock_t tx_lock; - /* End of fileds used in the performance code paths. */ + /* End of fields used in the performance code paths. */ char *name; @@ -3945,15 +3962,6 @@ struct bnx2 { /* Used to synchronize phy accesses. */ spinlock_t phy_lock; - u32 flags; -#define PCIX_FLAG 1 -#define PCI_32BIT_FLAG 2 -#define ONE_TDMA_FLAG 4 /* no longer used */ -#define NO_WOL_FLAG 8 -#define USING_DAC_FLAG 0x10 -#define USING_MSI_FLAG 0x20 -#define ASF_ENABLE_FLAG 0x40 - u32 phy_flags; #define PHY_SERDES_FLAG 1 #define PHY_CRC_FIX_FLAG 2 @@ -4004,8 +4012,9 @@ struct bnx2 { dma_addr_t tx_desc_mapping; + int rx_max_ring; int rx_ring_size; - dma_addr_t rx_desc_mapping; + dma_addr_t rx_desc_mapping[MAX_RX_RINGS]; u16 tx_quick_cons_trip; u16 tx_quick_cons_trip_int; diff -puN drivers/net/cassini.c~git-net drivers/net/cassini.c --- devel/drivers/net/cassini.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/cassini.c 2006-03-17 23:03:48.000000000 -0800 @@ -91,6 +91,7 @@ #include #include #include +#include #include @@ -3892,7 +3893,7 @@ static void cas_reset(struct cas *cp, in spin_unlock(&cp->stat_lock[N_TX_RINGS]); } -/* Shut down the chip, must be called with pm_sem held. */ +/* Shut down the chip, must be called with pm_mutex held. */ static void cas_shutdown(struct cas *cp) { unsigned long flags; @@ -4311,11 +4312,11 @@ static int cas_open(struct net_device *d int hw_was_up, err; unsigned long flags; - down(&cp->pm_sem); + mutex_lock(&cp->pm_mutex); hw_was_up = cp->hw_running; - /* The power-management semaphore protects the hw_running + /* The power-management mutex protects the hw_running * etc. state so it is safe to do this bit without cp->lock */ if (!cp->hw_running) { @@ -4364,7 +4365,7 @@ static int cas_open(struct net_device *d cas_unlock_all_restore(cp, flags); netif_start_queue(dev); - up(&cp->pm_sem); + mutex_unlock(&cp->pm_mutex); return 0; err_spare: @@ -4372,7 +4373,7 @@ err_spare: cas_free_rxds(cp); err_tx_tiny: cas_tx_tiny_free(cp); - up(&cp->pm_sem); + mutex_unlock(&cp->pm_mutex); return err; } @@ -4382,7 +4383,7 @@ static int cas_close(struct net_device * struct cas *cp = netdev_priv(dev); /* Make sure we don't get distracted by suspend/resume */ - down(&cp->pm_sem); + mutex_lock(&cp->pm_mutex); netif_stop_queue(dev); @@ -4399,7 +4400,7 @@ static int cas_close(struct net_device * cas_spare_free(cp); cas_free_rxds(cp); cas_tx_tiny_free(cp); - up(&cp->pm_sem); + mutex_unlock(&cp->pm_mutex); return 0; } @@ -4834,10 +4835,10 @@ static int cas_ioctl(struct net_device * unsigned long flags; int rc = -EOPNOTSUPP; - /* Hold the PM semaphore while doing ioctl's or we may collide + /* Hold the PM mutex while doing ioctl's or we may collide * with open/close and power management and oops. */ - down(&cp->pm_sem); + mutex_lock(&cp->pm_mutex); switch (cmd) { case SIOCGMIIPHY: /* Get address of MII PHY in use. */ data->phy_id = cp->phy_addr; @@ -4867,7 +4868,7 @@ static int cas_ioctl(struct net_device * break; }; - up(&cp->pm_sem); + mutex_unlock(&cp->pm_mutex); return rc; } @@ -4994,7 +4995,7 @@ static int __devinit cas_init_one(struct spin_lock_init(&cp->tx_lock[i]); } spin_lock_init(&cp->stat_lock[N_TX_RINGS]); - init_MUTEX(&cp->pm_sem); + mutex_init(&cp->pm_mutex); init_timer(&cp->link_timer); cp->link_timer.function = cas_link_timer; @@ -5116,10 +5117,10 @@ err_out_free_consistent: cp->init_block, cp->block_dvma); err_out_iounmap: - down(&cp->pm_sem); + mutex_lock(&cp->pm_mutex); if (cp->hw_running) cas_shutdown(cp); - up(&cp->pm_sem); + mutex_unlock(&cp->pm_mutex); iounmap(cp->regs); @@ -5152,11 +5153,11 @@ static void __devexit cas_remove_one(str cp = netdev_priv(dev); unregister_netdev(dev); - down(&cp->pm_sem); + mutex_lock(&cp->pm_mutex); flush_scheduled_work(); if (cp->hw_running) cas_shutdown(cp); - up(&cp->pm_sem); + mutex_unlock(&cp->pm_mutex); #if 1 if (cp->orig_cacheline_size) { @@ -5183,10 +5184,7 @@ static int cas_suspend(struct pci_dev *p struct cas *cp = netdev_priv(dev); unsigned long flags; - /* We hold the PM semaphore during entire driver - * sleep time - */ - down(&cp->pm_sem); + mutex_lock(&cp->pm_mutex); /* If the driver is opened, we stop the DMA */ if (cp->opened) { @@ -5206,6 +5204,7 @@ static int cas_suspend(struct pci_dev *p if (cp->hw_running) cas_shutdown(cp); + mutex_unlock(&cp->pm_mutex); return 0; } @@ -5217,6 +5216,7 @@ static int cas_resume(struct pci_dev *pd printk(KERN_INFO "%s: resuming\n", dev->name); + mutex_lock(&cp->pm_mutex); cas_hard_reset(cp); if (cp->opened) { unsigned long flags; @@ -5229,7 +5229,7 @@ static int cas_resume(struct pci_dev *pd netif_device_attach(dev); } - up(&cp->pm_sem); + mutex_unlock(&cp->pm_mutex); return 0; } #endif /* CONFIG_PM */ diff -puN drivers/net/cassini.h~git-net drivers/net/cassini.h --- devel/drivers/net/cassini.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/cassini.h 2006-03-17 23:03:48.000000000 -0800 @@ -4284,7 +4284,7 @@ struct cas { * (ie. not power managed) */ int hw_running; int opened; - struct semaphore pm_sem; /* open/close/suspend/resume */ + struct mutex pm_mutex; /* open/close/suspend/resume */ struct cas_init_block *init_block; struct cas_tx_desc *init_txds[MAX_TX_RINGS]; diff -puN drivers/net/e1000/e1000_main.c~git-net drivers/net/e1000/e1000_main.c --- devel/drivers/net/e1000/e1000_main.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/e1000/e1000_main.c 2006-03-17 23:03:48.000000000 -0800 @@ -920,7 +920,7 @@ e1000_remove(struct pci_dev *pdev) unregister_netdev(netdev); #ifdef CONFIG_E1000_NAPI for (i = 0; i < adapter->num_rx_queues; i++) - __dev_put(&adapter->polling_netdev[i]); + dev_put(&adapter->polling_netdev[i]); #endif if (!e1000_check_phy_reset_block(&adapter->hw)) diff -puN drivers/net/irda/donauboe.c~git-net drivers/net/irda/donauboe.c --- devel/drivers/net/irda/donauboe.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/donauboe.c 2006-03-17 23:03:48.000000000 -0800 @@ -1778,7 +1778,7 @@ static struct pci_driver donauboe_pci_dr static int __init donauboe_init (void) { - return pci_module_init(&donauboe_pci_driver); + return pci_register_driver(&donauboe_pci_driver); } static void __exit diff -puN drivers/net/irda/ep7211_ir.c~git-net drivers/net/irda/ep7211_ir.c --- devel/drivers/net/irda/ep7211_ir.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/ep7211_ir.c 2006-03-17 23:03:48.000000000 -0800 @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -23,6 +24,8 @@ static void ep7211_ir_close(dongle_t *se static int ep7211_ir_change_speed(struct irda_task *task); static int ep7211_ir_reset(struct irda_task *task); +static DEFINE_SPINLOCK(ep7211_lock); + static struct dongle_reg dongle = { .type = IRDA_EP7211_IR, .open = ep7211_ir_open, @@ -36,7 +39,7 @@ static void ep7211_ir_open(dongle_t *sel { unsigned int syscon1, flags; - save_flags(flags); cli(); + spin_lock_irqsave(&ep7211_lock, flags); /* Turn on the SIR encoder. */ syscon1 = clps_readl(SYSCON1); @@ -46,14 +49,14 @@ static void ep7211_ir_open(dongle_t *sel /* XXX: We should disable modem status interrupts on the first UART (interrupt #14). */ - restore_flags(flags); + spin_unlock_irqrestore(&ep7211_lock, flags); } static void ep7211_ir_close(dongle_t *self) { unsigned int syscon1, flags; - save_flags(flags); cli(); + spin_lock_irqsave(&ep7211_lock, flags); /* Turn off the SIR encoder. */ syscon1 = clps_readl(SYSCON1); @@ -63,7 +66,7 @@ static void ep7211_ir_close(dongle_t *se /* XXX: If we've disabled the modem status interrupts, we should reset them back to their original state. */ - restore_flags(flags); + spin_unlock_irqrestore(&ep7211_lock, flags); } /* diff -puN drivers/net/irda/irtty-sir.c~git-net drivers/net/irda/irtty-sir.c --- devel/drivers/net/irda/irtty-sir.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/irtty-sir.c 2006-03-17 23:03:48.000000000 -0800 @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -338,7 +339,7 @@ static inline void irtty_stop_receiver(s /*****************************************************************/ /* serialize ldisc open/close with sir_dev */ -static DECLARE_MUTEX(irtty_sem); +static DEFINE_MUTEX(irtty_mutex); /* notifier from sir_dev when irda% device gets opened (ifup) */ @@ -348,11 +349,11 @@ static int irtty_start_dev(struct sir_de struct tty_struct *tty; /* serialize with ldisc open/close */ - down(&irtty_sem); + mutex_lock(&irtty_mutex); priv = dev->priv; if (unlikely(!priv || priv->magic!=IRTTY_MAGIC)) { - up(&irtty_sem); + mutex_unlock(&irtty_mutex); return -ESTALE; } @@ -363,7 +364,7 @@ static int irtty_start_dev(struct sir_de /* Make sure we can receive more data */ irtty_stop_receiver(tty, FALSE); - up(&irtty_sem); + mutex_unlock(&irtty_mutex); return 0; } @@ -375,11 +376,11 @@ static int irtty_stop_dev(struct sir_dev struct tty_struct *tty; /* serialize with ldisc open/close */ - down(&irtty_sem); + mutex_lock(&irtty_mutex); priv = dev->priv; if (unlikely(!priv || priv->magic!=IRTTY_MAGIC)) { - up(&irtty_sem); + mutex_unlock(&irtty_mutex); return -ESTALE; } @@ -390,7 +391,7 @@ static int irtty_stop_dev(struct sir_dev if (tty->driver->stop) tty->driver->stop(tty); - up(&irtty_sem); + mutex_unlock(&irtty_mutex); return 0; } @@ -514,13 +515,13 @@ static int irtty_open(struct tty_struct priv->dev = dev; /* serialize with start_dev - in case we were racing with ifup */ - down(&irtty_sem); + mutex_lock(&irtty_mutex); dev->priv = priv; tty->disc_data = priv; tty->receive_room = 65536; - up(&irtty_sem); + mutex_unlock(&irtty_mutex); IRDA_DEBUG(0, "%s - %s: irda line discipline opened\n", __FUNCTION__, tty->name); diff -puN drivers/net/irda/Kconfig~git-net drivers/net/irda/Kconfig --- devel/drivers/net/irda/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -64,6 +64,14 @@ config TEKRAM_DONGLE dongles you will have to start irattach like this: "irattach -d tekram". +config TOIM3232_DONGLE + tristate "TOIM3232 IrDa dongle" + depends on DONGLE && IRDA + help + Say Y here if you want to build support for the Vishay/Temic + TOIM3232 and TOIM4232 based dongles. + To compile it as a module, choose M here. + config LITELINK_DONGLE tristate "Parallax LiteLink dongle" depends on DONGLE && IRDA diff -puN drivers/net/irda/Makefile~git-net drivers/net/irda/Makefile --- devel/drivers/net/irda/Makefile~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/Makefile 2006-03-17 23:03:48.000000000 -0800 @@ -43,6 +43,7 @@ obj-$(CONFIG_OLD_BELKIN_DONGLE) += old_b obj-$(CONFIG_MCP2120_DONGLE) += mcp2120-sir.o obj-$(CONFIG_ACT200L_DONGLE) += act200l-sir.o obj-$(CONFIG_MA600_DONGLE) += ma600-sir.o +obj-$(CONFIG_TOIM3232_DONGLE) += toim3232-sir.o # The SIR helper module sir-dev-objs := sir_dev.o sir_dongle.o sir_kthread.o diff -puN drivers/net/irda/nsc-ircc.c~git-net drivers/net/irda/nsc-ircc.c --- devel/drivers/net/irda/nsc-ircc.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/nsc-ircc.c 2006-03-17 23:03:48.000000000 -0800 @@ -12,6 +12,7 @@ * Copyright (c) 1998-2000 Dag Brattli * Copyright (c) 1998 Lichen Wang, * Copyright (c) 1998 Actisys Corp., www.actisys.com + * Copyright (c) 2000-2004 Jean Tourrilhes * All Rights Reserved * * This program is free software; you can redistribute it and/or @@ -53,14 +54,13 @@ #include #include #include +#include +#include #include #include #include -#include -#include - #include #include #include @@ -72,14 +72,27 @@ static char *driver_name = "nsc-ircc"; +/* Power Management */ +#define NSC_IRCC_DRIVER_NAME "nsc-ircc" +static int nsc_ircc_suspend(struct platform_device *dev, pm_message_t state); +static int nsc_ircc_resume(struct platform_device *dev); + +static struct platform_driver nsc_ircc_driver = { + .suspend = nsc_ircc_suspend, + .resume = nsc_ircc_resume, + .driver = { + .name = NSC_IRCC_DRIVER_NAME, + }, +}; + /* Module parameters */ static int qos_mtt_bits = 0x07; /* 1 ms or more */ static int dongle_id; /* Use BIOS settions by default, but user may supply module parameters */ -static unsigned int io[] = { ~0, ~0, ~0, ~0 }; -static unsigned int irq[] = { 0, 0, 0, 0, 0 }; -static unsigned int dma[] = { 0, 0, 0, 0, 0 }; +static unsigned int io[] = { ~0, ~0, ~0, ~0, ~0 }; +static unsigned int irq[] = { 0, 0, 0, 0, 0 }; +static unsigned int dma[] = { 0, 0, 0, 0, 0 }; static int nsc_ircc_probe_108(nsc_chip_t *chip, chipio_t *info); static int nsc_ircc_probe_338(nsc_chip_t *chip, chipio_t *info); @@ -87,6 +100,7 @@ static int nsc_ircc_probe_39x(nsc_chip_t static int nsc_ircc_init_108(nsc_chip_t *chip, chipio_t *info); static int nsc_ircc_init_338(nsc_chip_t *chip, chipio_t *info); static int nsc_ircc_init_39x(nsc_chip_t *chip, chipio_t *info); +static int nsc_ircc_pnp_probe(struct pnp_dev *dev, const struct pnp_device_id *id); /* These are the known NSC chips */ static nsc_chip_t chips[] = { @@ -101,11 +115,12 @@ static nsc_chip_t chips[] = { /* Contributed by Jan Frey - IBM A30/A31 */ { "PC8739x", { 0x2e, 0x4e, 0x0 }, 0x20, 0xea, 0xff, nsc_ircc_probe_39x, nsc_ircc_init_39x }, + { "IBM", { 0x2e, 0x4e, 0x0 }, 0x20, 0xf4, 0xff, + nsc_ircc_probe_39x, nsc_ircc_init_39x }, { NULL } }; -/* Max 4 instances for now */ -static struct nsc_ircc_cb *dev_self[] = { NULL, NULL, NULL, NULL }; +static struct nsc_ircc_cb *dev_self[] = { NULL, NULL, NULL, NULL, NULL }; static char *dongle_types[] = { "Differential serial interface", @@ -126,8 +141,24 @@ static char *dongle_types[] = { "No dongle connected", }; +/* PNP probing */ +static chipio_t pnp_info; +static const struct pnp_device_id nsc_ircc_pnp_table[] = { + { .id = "NSC6001", .driver_data = 0 }, + { .id = "IBM0071", .driver_data = 0 }, + { } +}; + +MODULE_DEVICE_TABLE(pnp, nsc_ircc_pnp_table); + +static struct pnp_driver nsc_ircc_pnp_driver = { + .name = "nsc-ircc", + .id_table = nsc_ircc_pnp_table, + .probe = nsc_ircc_pnp_probe, +}; + /* Some prototypes */ -static int nsc_ircc_open(int i, chipio_t *info); +static int nsc_ircc_open(chipio_t *info); static int nsc_ircc_close(struct nsc_ircc_cb *self); static int nsc_ircc_setup(chipio_t *info); static void nsc_ircc_pio_receive(struct nsc_ircc_cb *self); @@ -146,7 +177,10 @@ static int nsc_ircc_net_open(struct net static int nsc_ircc_net_close(struct net_device *dev); static int nsc_ircc_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); static struct net_device_stats *nsc_ircc_net_get_stats(struct net_device *dev); -static int nsc_ircc_pmproc(struct pm_dev *dev, pm_request_t rqst, void *data); + +/* Globals */ +static int pnp_registered; +static int pnp_succeeded; /* * Function nsc_ircc_init () @@ -158,28 +192,36 @@ static int __init nsc_ircc_init(void) { chipio_t info; nsc_chip_t *chip; - int ret = -ENODEV; + int ret; int cfg_base; int cfg, id; int reg; int i = 0; + ret = platform_driver_register(&nsc_ircc_driver); + if (ret) { + IRDA_ERROR("%s, Can't register driver!\n", driver_name); + return ret; + } + + /* Register with PnP subsystem to detect disable ports */ + ret = pnp_register_driver(&nsc_ircc_pnp_driver); + + if (ret >= 0) + pnp_registered = 1; + + ret = -ENODEV; + /* Probe for all the NSC chipsets we know about */ - for (chip=chips; chip->name ; chip++) { + for (chip = chips; chip->name ; chip++) { IRDA_DEBUG(2, "%s(), Probing for %s ...\n", __FUNCTION__, chip->name); /* Try all config registers for this chip */ - for (cfg=0; cfg<3; cfg++) { + for (cfg = 0; cfg < ARRAY_SIZE(chip->cfg); cfg++) { cfg_base = chip->cfg[cfg]; if (!cfg_base) continue; - - memset(&info, 0, sizeof(chipio_t)); - info.cfg_base = cfg_base; - info.fir_base = io[i]; - info.dma = dma[i]; - info.irq = irq[i]; /* Read index register */ reg = inb(cfg_base); @@ -194,26 +236,67 @@ static int __init nsc_ircc_init(void) if ((id & chip->cid_mask) == chip->cid_value) { IRDA_DEBUG(2, "%s() Found %s chip, revision=%d\n", __FUNCTION__, chip->name, id & ~chip->cid_mask); - /* - * If the user supplies the base address, then - * we init the chip, if not we probe the values - * set by the BIOS - */ - if (io[i] < 0x2000) { - chip->init(chip, &info); - } else - chip->probe(chip, &info); - - if (nsc_ircc_open(i, &info) == 0) - ret = 0; + + /* + * If we found a correct PnP setting, + * we first try it. + */ + if (pnp_succeeded) { + memset(&info, 0, sizeof(chipio_t)); + info.cfg_base = cfg_base; + info.fir_base = pnp_info.fir_base; + info.dma = pnp_info.dma; + info.irq = pnp_info.irq; + + if (info.fir_base < 0x2000) { + IRDA_MESSAGE("%s, chip->init\n", driver_name); + chip->init(chip, &info); + } else + chip->probe(chip, &info); + + if (nsc_ircc_open(&info) >= 0) + ret = 0; + } + + /* + * Opening based on PnP values failed. + * Let's fallback to user values, or probe + * the chip. + */ + if (ret) { + IRDA_DEBUG(2, "%s, PnP init failed\n", driver_name); + memset(&info, 0, sizeof(chipio_t)); + info.cfg_base = cfg_base; + info.fir_base = io[i]; + info.dma = dma[i]; + info.irq = irq[i]; + + /* + * If the user supplies the base address, then + * we init the chip, if not we probe the values + * set by the BIOS + */ + if (io[i] < 0x2000) { + chip->init(chip, &info); + } else + chip->probe(chip, &info); + + if (nsc_ircc_open(&info) >= 0) + ret = 0; + } i++; } else { IRDA_DEBUG(2, "%s(), Wrong chip id=0x%02x\n", __FUNCTION__, id); } } - } + if (ret) { + platform_driver_unregister(&nsc_ircc_driver); + pnp_unregister_driver(&nsc_ircc_pnp_driver); + pnp_registered = 0; + } + return ret; } @@ -227,12 +310,17 @@ static void __exit nsc_ircc_cleanup(void { int i; - pm_unregister_all(nsc_ircc_pmproc); - - for (i=0; i < 4; i++) { + for (i = 0; i < ARRAY_SIZE(dev_self); i++) { if (dev_self[i]) nsc_ircc_close(dev_self[i]); } + + platform_driver_unregister(&nsc_ircc_driver); + + if (pnp_registered) + pnp_unregister_driver(&nsc_ircc_pnp_driver); + + pnp_registered = 0; } /* @@ -241,16 +329,26 @@ static void __exit nsc_ircc_cleanup(void * Open driver instance * */ -static int __init nsc_ircc_open(int i, chipio_t *info) +static int __init nsc_ircc_open(chipio_t *info) { struct net_device *dev; struct nsc_ircc_cb *self; - struct pm_dev *pmdev; void *ret; - int err; + int err, chip_index; IRDA_DEBUG(2, "%s()\n", __FUNCTION__); + + for (chip_index = 0; chip_index < ARRAY_SIZE(dev_self); chip_index++) { + if (!dev_self[chip_index]) + break; + } + + if (chip_index == ARRAY_SIZE(dev_self)) { + IRDA_ERROR("%s(), maximum number of supported chips reached!\n", __FUNCTION__); + return -ENOMEM; + } + IRDA_MESSAGE("%s, Found chip at base=0x%03x\n", driver_name, info->cfg_base); @@ -271,8 +369,8 @@ static int __init nsc_ircc_open(int i, c spin_lock_init(&self->lock); /* Need to store self somewhere */ - dev_self[i] = self; - self->index = i; + dev_self[chip_index] = self; + self->index = chip_index; /* Initialize IO */ self->io.cfg_base = info->cfg_base; @@ -351,7 +449,7 @@ static int __init nsc_ircc_open(int i, c /* Check if user has supplied a valid dongle id or not */ if ((dongle_id <= 0) || - (dongle_id >= (sizeof(dongle_types) / sizeof(dongle_types[0]))) ) { + (dongle_id >= ARRAY_SIZE(dongle_types))) { dongle_id = nsc_ircc_read_dongle_id(self->io.fir_base); IRDA_MESSAGE("%s, Found dongle: %s\n", driver_name, @@ -364,11 +462,18 @@ static int __init nsc_ircc_open(int i, c self->io.dongle_id = dongle_id; nsc_ircc_init_dongle_interface(self->io.fir_base, dongle_id); - pmdev = pm_register(PM_SYS_DEV, PM_SYS_IRDA, nsc_ircc_pmproc); - if (pmdev) - pmdev->data = self; + self->pldev = platform_device_register_simple(NSC_IRCC_DRIVER_NAME, + self->index, NULL, 0); + if (IS_ERR(self->pldev)) { + err = PTR_ERR(self->pldev); + goto out5; + } + platform_set_drvdata(self->pldev, self); - return 0; + return chip_index; + + out5: + unregister_netdev(dev); out4: dma_free_coherent(NULL, self->tx_buff.truesize, self->tx_buff.head, self->tx_buff_dma); @@ -379,7 +484,7 @@ static int __init nsc_ircc_open(int i, c release_region(self->io.fir_base, self->io.fir_ext); out1: free_netdev(dev); - dev_self[i] = NULL; + dev_self[chip_index] = NULL; return err; } @@ -399,6 +504,8 @@ static int __exit nsc_ircc_close(struct iobase = self->io.fir_base; + platform_device_unregister(self->pldev); + /* Remove netdevice */ unregister_netdev(self->netdev); @@ -806,6 +913,43 @@ static int nsc_ircc_probe_39x(nsc_chip_t return 0; } +/* PNP probing */ +static int nsc_ircc_pnp_probe(struct pnp_dev *dev, const struct pnp_device_id *id) +{ + memset(&pnp_info, 0, sizeof(chipio_t)); + pnp_info.irq = -1; + pnp_info.dma = -1; + pnp_succeeded = 1; + + /* There don't seem to be any way to get the cfg_base. + * On my box, cfg_base is in the PnP descriptor of the + * motherboard. Oh well... Jean II */ + + if (pnp_port_valid(dev, 0) && + !(pnp_port_flags(dev, 0) & IORESOURCE_DISABLED)) + pnp_info.fir_base = pnp_port_start(dev, 0); + + if (pnp_irq_valid(dev, 0) && + !(pnp_irq_flags(dev, 0) & IORESOURCE_DISABLED)) + pnp_info.irq = pnp_irq(dev, 0); + + if (pnp_dma_valid(dev, 0) && + !(pnp_dma_flags(dev, 0) & IORESOURCE_DISABLED)) + pnp_info.dma = pnp_dma(dev, 0); + + IRDA_DEBUG(0, "%s() : From PnP, found firbase 0x%03X ; irq %d ; dma %d.\n", + __FUNCTION__, pnp_info.fir_base, pnp_info.irq, pnp_info.dma); + + if((pnp_info.fir_base == 0) || + (pnp_info.irq == -1) || (pnp_info.dma == -1)) { + /* Returning an error will disable the device. Yuck ! */ + //return -EINVAL; + pnp_succeeded = 0; + } + + return 0; +} + /* * Function nsc_ircc_setup (info) * @@ -2161,45 +2305,83 @@ static struct net_device_stats *nsc_ircc return &self->stats; } -static void nsc_ircc_suspend(struct nsc_ircc_cb *self) +static int nsc_ircc_suspend(struct platform_device *dev, pm_message_t state) { - IRDA_MESSAGE("%s, Suspending\n", driver_name); - + struct nsc_ircc_cb *self = platform_get_drvdata(dev); + int bank; + unsigned long flags; + int iobase = self->io.fir_base; + if (self->io.suspended) - return; - - nsc_ircc_net_close(self->netdev); + return 0; + IRDA_DEBUG(1, "%s, Suspending\n", driver_name); + + rtnl_lock(); + if (netif_running(self->netdev)) { + netif_device_detach(self->netdev); + spin_lock_irqsave(&self->lock, flags); + /* Save current bank */ + bank = inb(iobase+BSR); + + /* Disable interrupts */ + switch_bank(iobase, BANK0); + outb(0, iobase+IER); + + /* Restore bank register */ + outb(bank, iobase+BSR); + + spin_unlock_irqrestore(&self->lock, flags); + free_irq(self->io.irq, self->netdev); + disable_dma(self->io.dma); + } self->io.suspended = 1; + rtnl_unlock(); + + return 0; } - -static void nsc_ircc_wakeup(struct nsc_ircc_cb *self) + +static int nsc_ircc_resume(struct platform_device *dev) { + struct nsc_ircc_cb *self = platform_get_drvdata(dev); + unsigned long flags; + if (!self->io.suspended) - return; + return 0; + IRDA_DEBUG(1, "%s, Waking up\n", driver_name); + + rtnl_lock(); nsc_ircc_setup(&self->io); - nsc_ircc_net_open(self->netdev); - - IRDA_MESSAGE("%s, Waking up\n", driver_name); + nsc_ircc_init_dongle_interface(self->io.fir_base, self->io.dongle_id); + if (netif_running(self->netdev)) { + if (request_irq(self->io.irq, nsc_ircc_interrupt, 0, + self->netdev->name, self->netdev)) { + IRDA_WARNING("%s, unable to allocate irq=%d\n", + driver_name, self->io.irq); + + /* + * Don't fail resume process, just kill this + * network interface + */ + unregister_netdevice(self->netdev); + } else { + spin_lock_irqsave(&self->lock, flags); + nsc_ircc_change_speed(self, self->io.speed); + spin_unlock_irqrestore(&self->lock, flags); + netif_device_attach(self->netdev); + } + + } else { + spin_lock_irqsave(&self->lock, flags); + nsc_ircc_change_speed(self, 9600); + spin_unlock_irqrestore(&self->lock, flags); + } self->io.suspended = 0; -} + rtnl_unlock(); -static int nsc_ircc_pmproc(struct pm_dev *dev, pm_request_t rqst, void *data) -{ - struct nsc_ircc_cb *self = (struct nsc_ircc_cb*) dev->data; - if (self) { - switch (rqst) { - case PM_SUSPEND: - nsc_ircc_suspend(self); - break; - case PM_RESUME: - nsc_ircc_wakeup(self); - break; - } - } - return 0; + return 0; } MODULE_AUTHOR("Dag Brattli "); diff -puN drivers/net/irda/nsc-ircc.h~git-net drivers/net/irda/nsc-ircc.h --- devel/drivers/net/irda/nsc-ircc.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/nsc-ircc.h 2006-03-17 23:03:48.000000000 -0800 @@ -269,7 +269,7 @@ struct nsc_ircc_cb { __u32 new_speed; int index; /* Instance index */ - struct pm_dev *dev; + struct platform_device *pldev; }; static inline void switch_bank(int iobase, int bank) diff -puN drivers/net/irda/sir_dongle.c~git-net drivers/net/irda/sir_dongle.c --- devel/drivers/net/irda/sir_dongle.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/sir_dongle.c 2006-03-17 23:03:48.000000000 -0800 @@ -16,6 +16,7 @@ #include #include #include +#include #include @@ -28,7 +29,7 @@ */ static LIST_HEAD(dongle_list); /* list of registered dongle drivers */ -static DECLARE_MUTEX(dongle_list_lock); /* protects the list */ +static DEFINE_MUTEX(dongle_list_lock); /* protects the list */ int irda_register_dongle(struct dongle_driver *new) { @@ -38,25 +39,25 @@ int irda_register_dongle(struct dongle_d IRDA_DEBUG(0, "%s : registering dongle \"%s\" (%d).\n", __FUNCTION__, new->driver_name, new->type); - down(&dongle_list_lock); + mutex_lock(&dongle_list_lock); list_for_each(entry, &dongle_list) { drv = list_entry(entry, struct dongle_driver, dongle_list); if (new->type == drv->type) { - up(&dongle_list_lock); + mutex_unlock(&dongle_list_lock); return -EEXIST; } } list_add(&new->dongle_list, &dongle_list); - up(&dongle_list_lock); + mutex_unlock(&dongle_list_lock); return 0; } EXPORT_SYMBOL(irda_register_dongle); int irda_unregister_dongle(struct dongle_driver *drv) { - down(&dongle_list_lock); + mutex_lock(&dongle_list_lock); list_del(&drv->dongle_list); - up(&dongle_list_lock); + mutex_unlock(&dongle_list_lock); return 0; } EXPORT_SYMBOL(irda_unregister_dongle); @@ -75,7 +76,7 @@ int sirdev_get_dongle(struct sir_dev *de return -EBUSY; /* serialize access to the list of registered dongles */ - down(&dongle_list_lock); + mutex_lock(&dongle_list_lock); list_for_each(entry, &dongle_list) { drv = list_entry(entry, struct dongle_driver, dongle_list); @@ -109,14 +110,14 @@ int sirdev_get_dongle(struct sir_dev *de if (!drv->open || (err=drv->open(dev))!=0) goto out_reject; /* failed to open driver */ - up(&dongle_list_lock); + mutex_unlock(&dongle_list_lock); return 0; out_reject: dev->dongle_drv = NULL; module_put(drv->owner); out_unlock: - up(&dongle_list_lock); + mutex_unlock(&dongle_list_lock); return err; } diff -puN /dev/null drivers/net/irda/toim3232-sir.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/drivers/net/irda/toim3232-sir.c 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,375 @@ +/********************************************************************* + * + * Filename: toim3232-sir.c + * Version: 1.0 + * Description: Implementation of dongles based on the Vishay/Temic + * TOIM3232 SIR Endec chipset. Currently only the + * IRWave IR320ST-2 is tested, although it should work + * with any TOIM3232 or TOIM4232 chipset based RS232 + * dongle with minimal modification. + * Based heavily on the Tekram driver (tekram.c), + * with thanks to Dag Brattli and Martin Diehl. + * Status: Experimental. + * Author: David Basden + * Created at: Thu Feb 09 23:47:32 2006 + * + * Copyright (c) 2006 David Basden. + * Copyright (c) 1998-1999 Dag Brattli, + * Copyright (c) 2002 Martin Diehl, + * All Rights Reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * Neither Dag Brattli nor University of Tromsø admit liability nor + * provide warranty for any of this software. This material is + * provided "AS-IS" and at no charge. + * + ********************************************************************/ + +/* + * This driver has currently only been tested on the IRWave IR320ST-2 + * + * PROTOCOL: + * + * The protocol for talking to the TOIM3232 is quite easy, and is + * designed to interface with RS232 with only level convertors. The + * BR/~D line on the chip is brought high to signal 'command mode', + * where a command byte is sent to select the baudrate of the RS232 + * interface and the pulse length of the IRDA output. When BR/~D + * is brought low, the dongle then changes to the selected baudrate, + * and the RS232 interface is used for data until BR/~D is brought + * high again. The initial speed for the TOIMx323 after RESET is + * 9600 baud. The baudrate for command-mode is the last selected + * baud-rate, or 9600 after a RESET. + * + * The dongle I have (below) adds some extra hardware on the front end, + * but this is mostly directed towards pariasitic power from the RS232 + * line rather than changing very much about how to communicate with + * the TOIM3232. + * + * The protocol to talk to the TOIM4232 chipset seems to be almost + * identical to the TOIM3232 (and the 4232 datasheet is more detailed) + * so this code will probably work on that as well, although I haven't + * tested it on that hardware. + * + * Target dongle variations that might be common: + * + * DTR and RTS function: + * The data sheet for the 4232 has a sample implementation that hooks the + * DTR and RTS lines to the RESET and BaudRate/~Data lines of the + * chip (through line-converters). Given both DTR and RTS would have to + * be held low in normal operation, and the TOIMx232 requires +5V to + * signal ground, most dongle designers would almost certainly choose + * an implementation that kept at least one of DTR or RTS high in + * normal operation to provide power to the dongle, but will likely + * vary between designs. + * + * User specified command bits: + * There are two user-controllable output lines from the TOIMx232 that + * can be set low or high by setting the appropriate bits in the + * high-nibble of the command byte (when setting speed and pulse length). + * These might be used to switch on and off added hardware or extra + * dongle features. + * + * + * Target hardware: IRWave IR320ST-2 + * + * The IRWave IR320ST-2 is a simple dongle based on the Vishay/Temic + * TOIM3232 SIR Endec and the Vishay/Temic TFDS4500 SIR IRDA transciever. + * It uses a hex inverter and some discrete components to buffer and + * line convert the RS232 down to 5V. + * + * The dongle is powered through a voltage regulator, fed by a large + * capacitor. To switch the dongle on, DTR is brought high to charge + * the capacitor and drive the voltage regulator. DTR isn't associated + * with any control lines on the TOIM3232. Parisitic power is also taken + * from the RTS, TD and RD lines when brought high, but through resistors. + * When DTR is low, the circuit might lose power even with RTS high. + * + * RTS is inverted and attached to the BR/~D input pin. When RTS + * is high, BR/~D is low, and the TOIM3232 is in the normal 'data' mode. + * RTS is brought low, BR/~D is high, and the TOIM3232 is in 'command + * mode'. + * + * For some unknown reason, the RESET line isn't actually connected + * to anything. This means to reset the dongle to get it to a known + * state (9600 baud) you must drop DTR and RTS low, wait for the power + * capacitor to discharge, and then bring DTR (and RTS for data mode) + * high again, and wait for the capacitor to charge, the power supply + * to stabilise, and the oscillator clock to stabilise. + * + * Fortunately, if the current baudrate is known, the chipset can + * easily change speed by entering command mode without having to + * reset the dongle first. + * + * Major Components: + * + * - Vishay/Temic TOIM3232 SIR Endec to change RS232 pulse timings + * to IRDA pulse timings + * - 3.6864MHz crystal to drive TOIM3232 clock oscillator + * - DM74lS04M Inverting Hex line buffer for RS232 input buffering + * and level conversion + * - PJ2951AC 150mA voltage regulator + * - Vishay/Temic TFDS4500 SIR IRDA front-end transceiver + * + */ + +#include +#include +#include + +#include + +#include "sir-dev.h" + +static int toim3232delay = 150; /* default is 150 ms */ +module_param(toim3232delay, int, 0); +MODULE_PARM_DESC(toim3232delay, "toim3232 dongle write complete delay"); + +#if 0 +static int toim3232flipdtr = 0; /* default is DTR high to reset */ +module_param(toim3232flipdtr, int, 0); +MODULE_PARM_DESC(toim3232flipdtr, "toim3232 dongle invert DTR (Reset)"); + +static int toim3232fliprts = 0; /* default is RTS high for baud change */ +module_param(toim3232fliptrs, int, 0); +MODULE_PARM_DESC(toim3232fliprts, "toim3232 dongle invert RTS (BR/D)"); +#endif + +static int toim3232_open(struct sir_dev *); +static int toim3232_close(struct sir_dev *); +static int toim3232_change_speed(struct sir_dev *, unsigned); +static int toim3232_reset(struct sir_dev *); + +#define TOIM3232_115200 0x00 +#define TOIM3232_57600 0x01 +#define TOIM3232_38400 0x02 +#define TOIM3232_19200 0x03 +#define TOIM3232_9600 0x06 +#define TOIM3232_2400 0x0A + +#define TOIM3232_PW 0x10 /* Pulse select bit */ + +static struct dongle_driver toim3232 = { + .owner = THIS_MODULE, + .driver_name = "Vishay TOIM3232", + .type = IRDA_TOIM3232_DONGLE, + .open = toim3232_open, + .close = toim3232_close, + .reset = toim3232_reset, + .set_speed = toim3232_change_speed, +}; + +static int __init toim3232_sir_init(void) +{ + if (toim3232delay < 1 || toim3232delay > 500) + toim3232delay = 200; + IRDA_DEBUG(1, "%s - using %d ms delay\n", + toim3232.driver_name, toim3232delay); + return irda_register_dongle(&toim3232); +} + +static void __exit toim3232_sir_cleanup(void) +{ + irda_unregister_dongle(&toim3232); +} + +static int toim3232_open(struct sir_dev *dev) +{ + struct qos_info *qos = &dev->qos; + + IRDA_DEBUG(2, "%s()\n", __FUNCTION__); + + /* Pull the lines high to start with. + * + * For the IR320ST-2, we need to charge the main supply capacitor to + * switch the device on. We keep DTR high throughout to do this. + * When RTS, TD and RD are high, they will also trickle-charge the + * cap. RTS is high for data transmission, and low for baud rate select. + * -- DGB + */ + sirdev_set_dtr_rts(dev, TRUE, TRUE); + + /* The TOI3232 supports many speeds between 1200bps and 115000bps. + * We really only care about those supported by the IRDA spec, but + * 38400 seems to be implemented in many places */ + qos->baud_rate.bits &= IR_2400|IR_9600|IR_19200|IR_38400|IR_57600|IR_115200; + + /* From the tekram driver. Not sure what a reasonable value is -- DGB */ + qos->min_turn_time.bits = 0x01; /* Needs at least 10 ms */ + irda_qos_bits_to_value(qos); + + /* irda thread waits 50 msec for power settling */ + + return 0; +} + +static int toim3232_close(struct sir_dev *dev) +{ + IRDA_DEBUG(2, "%s()\n", __FUNCTION__); + + /* Power off dongle */ + sirdev_set_dtr_rts(dev, FALSE, FALSE); + + return 0; +} + +/* + * Function toim3232change_speed (dev, state, speed) + * + * Set the speed for the TOIM3232 based dongle. Warning, this + * function must be called with a process context! + * + * Algorithm + * 1. keep DTR high but clear RTS to bring into baud programming mode + * 2. wait at least 7us to enter programming mode + * 3. send control word to set baud rate and timing + * 4. wait at least 1us + * 5. bring RTS high to enter DATA mode (RS232 is passed through to transceiver) + * 6. should take effect immediately (although probably worth waiting) + */ + +#define TOIM3232_STATE_WAIT_SPEED (SIRDEV_STATE_DONGLE_SPEED + 1) + +static int toim3232_change_speed(struct sir_dev *dev, unsigned speed) +{ + unsigned state = dev->fsm.substate; + unsigned delay = 0; + u8 byte; + static int ret = 0; + + IRDA_DEBUG(2, "%s()\n", __FUNCTION__); + + switch(state) { + case SIRDEV_STATE_DONGLE_SPEED: + + /* Figure out what we are going to send as a control byte */ + switch (speed) { + case 2400: + byte = TOIM3232_PW|TOIM3232_2400; + break; + default: + speed = 9600; + ret = -EINVAL; + /* fall thru */ + case 9600: + byte = TOIM3232_PW|TOIM3232_9600; + break; + case 19200: + byte = TOIM3232_PW|TOIM3232_19200; + break; + case 38400: + byte = TOIM3232_PW|TOIM3232_38400; + break; + case 57600: + byte = TOIM3232_PW|TOIM3232_57600; + break; + case 115200: + byte = TOIM3232_115200; + break; + } + + /* Set DTR, Clear RTS: Go into baud programming mode */ + sirdev_set_dtr_rts(dev, TRUE, FALSE); + + /* Wait at least 7us */ + udelay(14); + + /* Write control byte */ + sirdev_raw_write(dev, &byte, 1); + + dev->speed = speed; + + state = TOIM3232_STATE_WAIT_SPEED; + delay = toim3232delay; + break; + + case TOIM3232_STATE_WAIT_SPEED: + /* Have transmitted control byte * Wait for 'at least 1us' */ + udelay(14); + + /* Set DTR, Set RTS: Go into normal data mode */ + sirdev_set_dtr_rts(dev, TRUE, TRUE); + + /* Wait (TODO: check this is needed) */ + udelay(50); + break; + + default: + printk(KERN_ERR "%s - undefined state %d\n", __FUNCTION__, state); + ret = -EINVAL; + break; + } + + dev->fsm.substate = state; + return (delay > 0) ? delay : ret; +} + +/* + * Function toim3232reset (driver) + * + * This function resets the toim3232 dongle. Warning, this function + * must be called with a process context!! + * + * What we should do is: + * 0. Pull RESET high + * 1. Wait for at least 7us + * 2. Pull RESET low + * 3. Wait for at least 7us + * 4. Pull BR/~D high + * 5. Wait for at least 7us + * 6. Send control byte to set baud rate + * 7. Wait at least 1us after stop bit + * 8. Pull BR/~D low + * 9. Should then be in data mode + * + * Because the IR320ST-2 doesn't have the RESET line connected for some reason, + * we'll have to do something else. + * + * The default speed after a RESET is 9600, so lets try just bringing it up in + * data mode after switching it off, waiting for the supply capacitor to + * discharge, and then switch it back on. This isn't actually pulling RESET + * high, but it seems to have the same effect. + * + * This behaviour will probably work on dongles that have the RESET line connected, + * but if not, add a flag for the IR320ST-2, and implment the above-listed proper + * behaviour. + * + * RTS is inverted and then fed to BR/~D, so to put it in programming mode, we + * need to have pull RTS low + */ + +static int toim3232_reset(struct sir_dev *dev) +{ + IRDA_DEBUG(2, "%s()\n", __FUNCTION__); + + /* Switch off both DTR and RTS to switch off dongle */ + sirdev_set_dtr_rts(dev, FALSE, FALSE); + + /* Should sleep a while. This might be evil doing it this way.*/ + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(msecs_to_jiffies(50)); + + /* Set DTR, Set RTS (data mode) */ + sirdev_set_dtr_rts(dev, TRUE, TRUE); + + /* Wait at least 10 ms for power to stabilize again */ + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(msecs_to_jiffies(10)); + + /* Speed should now be 9600 */ + dev->speed = 9600; + + return 0; +} + +MODULE_AUTHOR("David Basden "); +MODULE_DESCRIPTION("Vishay/Temic TOIM3232 based dongle driver"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("irda-dongle-12"); /* IRDA_TOIM3232_DONGLE */ + +module_init(toim3232_sir_init); +module_exit(toim3232_sir_cleanup); diff -puN drivers/net/irda/vlsi_ir.c~git-net drivers/net/irda/vlsi_ir.c --- devel/drivers/net/irda/vlsi_ir.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/vlsi_ir.c 2006-03-17 23:03:48.000000000 -0800 @@ -1887,7 +1887,7 @@ static int __init vlsi_mod_init(void) vlsi_proc_root->owner = THIS_MODULE; } - ret = pci_module_init(&vlsi_irda_driver); + ret = pci_register_driver(&vlsi_irda_driver); if (ret && vlsi_proc_root) remove_proc_entry(PROC_DIR, NULL); diff -puN drivers/net/ppp_generic.c~git-net drivers/net/ppp_generic.c --- devel/drivers/net/ppp_generic.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/ppp_generic.c 2006-03-17 23:03:48.000000000 -0800 @@ -1691,8 +1691,8 @@ ppp_receive_nonmp_frame(struct ppp *ppp, || ppp->npmode[npi] != NPMODE_PASS) { kfree_skb(skb); } else { - skb_pull(skb, 2); /* chop off protocol */ - skb_postpull_rcsum(skb, skb->data - 2, 2); + /* chop off protocol */ + skb_pull_rcsum(skb, 2); skb->dev = ppp->dev; skb->protocol = htons(npindex_to_ethertype[npi]); skb->mac.raw = skb->data; diff -puN drivers/net/pppoe.c~git-net drivers/net/pppoe.c --- devel/drivers/net/pppoe.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/pppoe.c 2006-03-17 23:03:48.000000000 -0800 @@ -337,8 +337,7 @@ static int pppoe_rcv_core(struct sock *s if (sk->sk_state & PPPOX_BOUND) { struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw; int len = ntohs(ph->length); - skb_pull(skb, sizeof(struct pppoe_hdr)); - skb_postpull_rcsum(skb, ph, sizeof(*ph)); + skb_pull_rcsum(skb, sizeof(struct pppoe_hdr)); if (pskb_trim_rcsum(skb, len)) goto abort_kfree; diff -puN drivers/net/sungem.c~git-net drivers/net/sungem.c --- devel/drivers/net/sungem.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/sungem.c 2006-03-17 23:03:48.000000000 -0800 @@ -55,6 +55,7 @@ #include #include #include +#include #include #include @@ -2284,7 +2285,7 @@ static void gem_reset_task(void *data) { struct gem *gp = (struct gem *) data; - down(&gp->pm_sem); + mutex_lock(&gp->pm_mutex); netif_poll_disable(gp->dev); @@ -2311,7 +2312,7 @@ static void gem_reset_task(void *data) netif_poll_enable(gp->dev); - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); } @@ -2320,14 +2321,14 @@ static int gem_open(struct net_device *d struct gem *gp = dev->priv; int rc = 0; - down(&gp->pm_sem); + mutex_lock(&gp->pm_mutex); /* We need the cell enabled */ if (!gp->asleep) rc = gem_do_start(dev); gp->opened = (rc == 0); - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); return rc; } @@ -2340,13 +2341,13 @@ static int gem_close(struct net_device * * our caller (dev_close) already did it for us */ - down(&gp->pm_sem); + mutex_lock(&gp->pm_mutex); gp->opened = 0; if (!gp->asleep) gem_do_stop(dev, 0); - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); return 0; } @@ -2358,7 +2359,7 @@ static int gem_suspend(struct pci_dev *p struct gem *gp = dev->priv; unsigned long flags; - down(&gp->pm_sem); + mutex_lock(&gp->pm_mutex); netif_poll_disable(dev); @@ -2391,11 +2392,11 @@ static int gem_suspend(struct pci_dev *p /* Stop the link timer */ del_timer_sync(&gp->link_timer); - /* Now we release the semaphore to not block the reset task who + /* Now we release the mutex to not block the reset task who * can take it too. We are marked asleep, so there will be no * conflict here */ - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); /* Wait for a pending reset task to complete */ while (gp->reset_task_pending) @@ -2424,7 +2425,7 @@ static int gem_resume(struct pci_dev *pd printk(KERN_INFO "%s: resuming\n", dev->name); - down(&gp->pm_sem); + mutex_lock(&gp->pm_mutex); /* Keep the cell enabled during the entire operation, no need to * take a lock here tho since nothing else can happen while we are @@ -2440,7 +2441,7 @@ static int gem_resume(struct pci_dev *pd * still asleep, a new sleep cycle may bring it back */ gem_put_cell(gp); - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); return 0; } pci_set_master(gp->pdev); @@ -2486,7 +2487,7 @@ static int gem_resume(struct pci_dev *pd netif_poll_enable(dev); - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); return 0; } @@ -2591,7 +2592,7 @@ static int gem_change_mtu(struct net_dev return 0; } - down(&gp->pm_sem); + mutex_lock(&gp->pm_mutex); spin_lock_irq(&gp->lock); spin_lock(&gp->tx_lock); dev->mtu = new_mtu; @@ -2602,7 +2603,7 @@ static int gem_change_mtu(struct net_dev } spin_unlock(&gp->tx_lock); spin_unlock_irq(&gp->lock); - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); return 0; } @@ -2771,10 +2772,10 @@ static int gem_ioctl(struct net_device * int rc = -EOPNOTSUPP; unsigned long flags; - /* Hold the PM semaphore while doing ioctl's or we may collide + /* Hold the PM mutex while doing ioctl's or we may collide * with power management. */ - down(&gp->pm_sem); + mutex_lock(&gp->pm_mutex); spin_lock_irqsave(&gp->lock, flags); gem_get_cell(gp); @@ -2812,7 +2813,7 @@ static int gem_ioctl(struct net_device * gem_put_cell(gp); spin_unlock_irqrestore(&gp->lock, flags); - up(&gp->pm_sem); + mutex_unlock(&gp->pm_mutex); return rc; } @@ -3033,7 +3034,7 @@ static int __devinit gem_init_one(struct spin_lock_init(&gp->lock); spin_lock_init(&gp->tx_lock); - init_MUTEX(&gp->pm_sem); + mutex_init(&gp->pm_mutex); init_timer(&gp->link_timer); gp->link_timer.function = gem_link_timer; diff -puN drivers/net/sungem.h~git-net drivers/net/sungem.h --- devel/drivers/net/sungem.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/sungem.h 2006-03-17 23:03:48.000000000 -0800 @@ -980,15 +980,15 @@ struct gem { int tx_new, tx_old; unsigned int has_wol : 1; /* chip supports wake-on-lan */ - unsigned int asleep : 1; /* chip asleep, protected by pm_sem */ + unsigned int asleep : 1; /* chip asleep, protected by pm_mutex */ unsigned int asleep_wol : 1; /* was asleep with WOL enabled */ - unsigned int opened : 1; /* driver opened, protected by pm_sem */ + unsigned int opened : 1; /* driver opened, protected by pm_mutex */ unsigned int running : 1; /* chip running, protected by lock */ /* cell enable count, protected by lock */ int cell_enabled; - struct semaphore pm_sem; + struct mutex pm_mutex; u32 msg_enable; u32 status; diff -puN drivers/net/tg3.c~git-net drivers/net/tg3.c --- devel/drivers/net/tg3.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/tg3.c 2006-03-17 23:03:48.000000000 -0800 @@ -69,8 +69,8 @@ #define DRV_MODULE_NAME "tg3" #define PFX DRV_MODULE_NAME ": " -#define DRV_MODULE_VERSION "3.49" -#define DRV_MODULE_RELDATE "Feb 2, 2006" +#define DRV_MODULE_VERSION "3.52" +#define DRV_MODULE_RELDATE "Mar 06, 2006" #define TG3_DEF_MAC_MODE 0 #define TG3_DEF_RX_MODE 0 @@ -221,10 +221,22 @@ static struct pci_device_id tg3_pci_tbl[ PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5753F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5754, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5754M, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5787, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5787M, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5714, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5714S, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5715, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5715S, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5780S, @@ -534,6 +546,9 @@ static void tg3_enable_ints(struct tg3 * (tp->misc_host_ctrl & ~MISC_HOST_CTRL_MASK_PCI_INT)); tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, (tp->last_tag << 24)); + if (tp->tg3_flags2 & TG3_FLG2_1SHOT_MSI) + tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, + (tp->last_tag << 24)); tg3_cond_int(tp); } @@ -1038,9 +1053,11 @@ static void tg3_frob_aux_power(struct tg struct net_device *dev_peer; dev_peer = pci_get_drvdata(tp->pdev_peer); + /* remove_one() may have been run on the peer. */ if (!dev_peer) - BUG(); - tp_peer = netdev_priv(dev_peer); + tp_peer = tp; + else + tp_peer = netdev_priv(dev_peer); } if ((tp->tg3_flags & TG3_FLAG_WOL_ENABLE) != 0 || @@ -1131,7 +1148,7 @@ static int tg3_halt_cpu(struct tg3 *, u3 static int tg3_nvram_lock(struct tg3 *); static void tg3_nvram_unlock(struct tg3 *); -static int tg3_set_power_state(struct tg3 *tp, int state) +static int tg3_set_power_state(struct tg3 *tp, pci_power_t state) { u32 misc_host_ctrl; u16 power_control, power_caps; @@ -1150,7 +1167,7 @@ static int tg3_set_power_state(struct tg power_control |= PCI_PM_CTRL_PME_STATUS; power_control &= ~(PCI_PM_CTRL_STATE_MASK); switch (state) { - case 0: + case PCI_D0: power_control |= 0; pci_write_config_word(tp->pdev, pm + PCI_PM_CTRL, @@ -1163,15 +1180,15 @@ static int tg3_set_power_state(struct tg return 0; - case 1: + case PCI_D1: power_control |= 1; break; - case 2: + case PCI_D2: power_control |= 2; break; - case 3: + case PCI_D3hot: power_control |= 3; break; @@ -2680,6 +2697,12 @@ static int tg3_setup_fiber_mii_phy(struc err |= tg3_readphy(tp, MII_BMSR, &bmsr); err |= tg3_readphy(tp, MII_BMSR, &bmsr); + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5714) { + if (tr32(MAC_TX_STATUS) & TX_STATUS_LINK_UP) + bmsr |= BMSR_LSTATUS; + else + bmsr &= ~BMSR_LSTATUS; + } err |= tg3_readphy(tp, MII_BMCR, &bmcr); @@ -2748,6 +2771,13 @@ static int tg3_setup_fiber_mii_phy(struc bmcr = new_bmcr; err |= tg3_readphy(tp, MII_BMSR, &bmsr); err |= tg3_readphy(tp, MII_BMSR, &bmsr); + if (GET_ASIC_REV(tp->pci_chip_rev_id) == + ASIC_REV_5714) { + if (tr32(MAC_TX_STATUS) & TX_STATUS_LINK_UP) + bmsr |= BMSR_LSTATUS; + else + bmsr &= ~BMSR_LSTATUS; + } tp->tg3_flags2 &= ~TG3_FLG2_PARALLEL_DETECT; } } @@ -3338,6 +3368,23 @@ static inline void tg3_full_unlock(struc spin_unlock_bh(&tp->lock); } +/* One-shot MSI handler - Chip automatically disables interrupt + * after sending MSI so driver doesn't have to do it. + */ +static irqreturn_t tg3_msi_1shot(int irq, void *dev_id, struct pt_regs *regs) +{ + struct net_device *dev = dev_id; + struct tg3 *tp = netdev_priv(dev); + + prefetch(tp->hw_status); + prefetch(&tp->rx_rcb[tp->rx_rcb_ptr]); + + if (likely(!tg3_irq_sync(tp))) + netif_rx_schedule(dev); /* schedule NAPI poll */ + + return IRQ_HANDLED; +} + /* MSI ISR - No need to check for interrupt sharing and no need to * flush status block and interrupt mailbox. PCI ordering rules * guarantee that MSI will arrive after the status block. @@ -3628,11 +3675,139 @@ static void tg3_set_txd(struct tg3 *tp, txd->vlan_tag = vlan_tag << TXD_VLAN_TAG_SHIFT; } +/* hard_start_xmit for devices that don't have any bugs and + * support TG3_FLG2_HW_TSO_2 only. + */ static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct tg3 *tp = netdev_priv(dev); dma_addr_t mapping; u32 len, entry, base_flags, mss; + + len = skb_headlen(skb); + + /* No BH disabling for tx_lock here. We are running in BH disabled + * context and TX reclaim runs via tp->poll inside of a software + * interrupt. Furthermore, IRQ processing runs lockless so we have + * no IRQ context deadlocks to worry about either. Rejoice! + */ + if (!spin_trylock(&tp->tx_lock)) + return NETDEV_TX_LOCKED; + + if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) { + if (!netif_queue_stopped(dev)) { + netif_stop_queue(dev); + + /* This is a hard error, log it. */ + printk(KERN_ERR PFX "%s: BUG! Tx Ring full when " + "queue awake!\n", dev->name); + } + spin_unlock(&tp->tx_lock); + return NETDEV_TX_BUSY; + } + + entry = tp->tx_prod; + base_flags = 0; +#if TG3_TSO_SUPPORT != 0 + mss = 0; + if (skb->len > (tp->dev->mtu + ETH_HLEN) && + (mss = skb_shinfo(skb)->tso_size) != 0) { + int tcp_opt_len, ip_tcp_len; + + if (skb_header_cloned(skb) && + pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) { + dev_kfree_skb(skb); + goto out_unlock; + } + + tcp_opt_len = ((skb->h.th->doff - 5) * 4); + ip_tcp_len = (skb->nh.iph->ihl * 4) + sizeof(struct tcphdr); + + base_flags |= (TXD_FLAG_CPU_PRE_DMA | + TXD_FLAG_CPU_POST_DMA); + + skb->nh.iph->check = 0; + skb->nh.iph->tot_len = htons(mss + ip_tcp_len + tcp_opt_len); + + skb->h.th->check = 0; + + mss |= (ip_tcp_len + tcp_opt_len) << 9; + } + else if (skb->ip_summed == CHECKSUM_HW) + base_flags |= TXD_FLAG_TCPUDP_CSUM; +#else + mss = 0; + if (skb->ip_summed == CHECKSUM_HW) + base_flags |= TXD_FLAG_TCPUDP_CSUM; +#endif +#if TG3_VLAN_TAG_USED + if (tp->vlgrp != NULL && vlan_tx_tag_present(skb)) + base_flags |= (TXD_FLAG_VLAN | + (vlan_tx_tag_get(skb) << 16)); +#endif + + /* Queue skb data, a.k.a. the main skb fragment. */ + mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE); + + tp->tx_buffers[entry].skb = skb; + pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping); + + tg3_set_txd(tp, entry, mapping, len, base_flags, + (skb_shinfo(skb)->nr_frags == 0) | (mss << 1)); + + entry = NEXT_TX(entry); + + /* Now loop through additional data fragments, and queue them. */ + if (skb_shinfo(skb)->nr_frags > 0) { + unsigned int i, last; + + last = skb_shinfo(skb)->nr_frags - 1; + for (i = 0; i <= last; i++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + + len = frag->size; + mapping = pci_map_page(tp->pdev, + frag->page, + frag->page_offset, + len, PCI_DMA_TODEVICE); + + tp->tx_buffers[entry].skb = NULL; + pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping); + + tg3_set_txd(tp, entry, mapping, len, + base_flags, (i == last) | (mss << 1)); + + entry = NEXT_TX(entry); + } + } + + /* Packets are ready, update Tx producer idx local and on card. */ + tw32_tx_mbox((MAILBOX_SNDHOST_PROD_IDX_0 + TG3_64BIT_REG_LOW), entry); + + tp->tx_prod = entry; + if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) { + netif_stop_queue(dev); + if (TX_BUFFS_AVAIL(tp) > TG3_TX_WAKEUP_THRESH) + netif_wake_queue(tp->dev); + } + +out_unlock: + mmiowb(); + spin_unlock(&tp->tx_lock); + + dev->trans_start = jiffies; + + return NETDEV_TX_OK; +} + +/* hard_start_xmit for devices that have the 4G bug and/or 40-bit bug and + * support TG3_FLG2_HW_TSO_1 or firmware TSO only. + */ +static int tg3_start_xmit_dma_bug(struct sk_buff *skb, struct net_device *dev) +{ + struct tg3 *tp = netdev_priv(dev); + dma_addr_t mapping; + u32 len, entry, base_flags, mss; int would_hit_hwbug; len = skb_headlen(skb); @@ -4369,6 +4544,10 @@ static int tg3_chip_reset(struct tg3 *tp tp->nvram_lock_cnt = 0; } + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5752 || + GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) + tw32(GRC_FASTBOOT_PC, 0); + /* * We must avoid the readl() that normally takes place. * It locks machines, causes machine checks, and other @@ -5518,6 +5697,9 @@ static int tg3_set_mac_addr(struct net_d memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); + if (!netif_running(dev)) + return 0; + spin_lock_bh(&tp->lock); __tg3_set_mac_addr(tp); spin_unlock_bh(&tp->lock); @@ -5585,6 +5767,9 @@ static int tg3_reset_hw(struct tg3 *tp) tg3_abort_hw(tp, 1); } + if (tp->tg3_flags2 & TG3_FLG2_MII_SERDES) + tg3_phy_reset(tp); + err = tg3_chip_reset(tp); if (err) return err; @@ -5993,6 +6178,10 @@ static int tg3_reset_hw(struct tg3 *tp) } } + /* Enable host coalescing bug fix */ + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) + val |= (1 << 29); + tw32_f(WDMAC_MODE, val); udelay(40); @@ -6097,6 +6286,17 @@ static int tg3_reset_hw(struct tg3 *tp) tp->tg3_flags2 |= TG3_FLG2_HW_AUTONEG; } + if ((tp->tg3_flags2 & TG3_FLG2_MII_SERDES) && + (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5714)) { + u32 tmp; + + tmp = tr32(SERDES_RX_CTRL); + tw32(SERDES_RX_CTRL, tmp | SERDES_RX_SIG_DETECT); + tp->grc_local_ctrl &= ~GRC_LCLCTRL_USE_EXT_SIG_DETECT; + tp->grc_local_ctrl |= GRC_LCLCTRL_USE_SIG_DETECT; + tw32(GRC_LOCAL_CTRL, tp->grc_local_ctrl); + } + err = tg3_setup_phy(tp, 1); if (err) return err; @@ -6175,7 +6375,7 @@ static int tg3_init_hw(struct tg3 *tp) int err; /* Force the chip into D0. */ - err = tg3_set_power_state(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); if (err) goto out; @@ -6331,6 +6531,26 @@ static void tg3_timer(unsigned long __op add_timer(&tp->timer); } +static int tg3_request_irq(struct tg3 *tp) +{ + irqreturn_t (*fn)(int, void *, struct pt_regs *); + unsigned long flags; + struct net_device *dev = tp->dev; + + if (tp->tg3_flags2 & TG3_FLG2_USING_MSI) { + fn = tg3_msi; + if (tp->tg3_flags2 & TG3_FLG2_1SHOT_MSI) + fn = tg3_msi_1shot; + flags = SA_SAMPLE_RANDOM; + } else { + fn = tg3_interrupt; + if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS) + fn = tg3_interrupt_tagged; + flags = SA_SHIRQ | SA_SAMPLE_RANDOM; + } + return (request_irq(tp->pdev->irq, fn, flags, dev->name, dev)); +} + static int tg3_test_interrupt(struct tg3 *tp) { struct net_device *dev = tp->dev; @@ -6367,16 +6587,7 @@ static int tg3_test_interrupt(struct tg3 free_irq(tp->pdev->irq, dev); - if (tp->tg3_flags2 & TG3_FLG2_USING_MSI) - err = request_irq(tp->pdev->irq, tg3_msi, - SA_SAMPLE_RANDOM, dev->name, dev); - else { - irqreturn_t (*fn)(int, void *, struct pt_regs *)=tg3_interrupt; - if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS) - fn = tg3_interrupt_tagged; - err = request_irq(tp->pdev->irq, fn, - SA_SHIRQ | SA_SAMPLE_RANDOM, dev->name, dev); - } + err = tg3_request_irq(tp); if (err) return err; @@ -6428,14 +6639,7 @@ static int tg3_test_msi(struct tg3 *tp) tp->tg3_flags2 &= ~TG3_FLG2_USING_MSI; - { - irqreturn_t (*fn)(int, void *, struct pt_regs *)=tg3_interrupt; - if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS) - fn = tg3_interrupt_tagged; - - err = request_irq(tp->pdev->irq, fn, - SA_SHIRQ | SA_SAMPLE_RANDOM, dev->name, dev); - } + err = tg3_request_irq(tp); if (err) return err; @@ -6462,6 +6666,10 @@ static int tg3_open(struct net_device *d tg3_full_lock(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); + if (err) + return err; + tg3_disable_ints(tp); tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE; @@ -6476,7 +6684,9 @@ static int tg3_open(struct net_device *d if ((tp->tg3_flags2 & TG3_FLG2_5750_PLUS) && (GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5750_AX) && - (GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5750_BX)) { + (GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5750_BX) && + !((GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5714) && + (tp->pdev_peer == tp->pdev))) { /* All MSI supporting chips should support tagged * status. Assert that this is the case. */ @@ -6491,17 +6701,7 @@ static int tg3_open(struct net_device *d tp->tg3_flags2 |= TG3_FLG2_USING_MSI; } } - if (tp->tg3_flags2 & TG3_FLG2_USING_MSI) - err = request_irq(tp->pdev->irq, tg3_msi, - SA_SAMPLE_RANDOM, dev->name, dev); - else { - irqreturn_t (*fn)(int, void *, struct pt_regs *)=tg3_interrupt; - if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS) - fn = tg3_interrupt_tagged; - - err = request_irq(tp->pdev->irq, fn, - SA_SHIRQ | SA_SAMPLE_RANDOM, dev->name, dev); - } + err = tg3_request_irq(tp); if (err) { if (tp->tg3_flags2 & TG3_FLG2_USING_MSI) { @@ -6566,6 +6766,14 @@ static int tg3_open(struct net_device *d return err; } + + if (tp->tg3_flags2 & TG3_FLG2_USING_MSI) { + if (tp->tg3_flags2 & TG3_FLG2_1SHOT_MSI) { + u32 val = tr32(0x7c04); + + tw32(0x7c04, val | (1 << 29)); + } + } } tg3_full_lock(tp, 0); @@ -6839,7 +7047,6 @@ static int tg3_close(struct net_device * tp->tg3_flags &= ~(TG3_FLAG_INIT_COMPLETE | TG3_FLAG_GOT_SERDES_FLOWCTL); - netif_carrier_off(tp->dev); tg3_full_unlock(tp); @@ -6856,6 +7063,10 @@ static int tg3_close(struct net_device * tg3_free_consistent(tp); + tg3_set_power_state(tp, PCI_D3hot); + + netif_carrier_off(tp->dev); + return 0; } @@ -7150,6 +7361,9 @@ static void tg3_set_rx_mode(struct net_d { struct tg3 *tp = netdev_priv(dev); + if (!netif_running(dev)) + return; + tg3_full_lock(tp, 0); __tg3_set_rx_mode(dev); tg3_full_unlock(tp); @@ -7174,6 +7388,9 @@ static void tg3_get_regs(struct net_devi memset(p, 0, TG3_REGDUMP_LEN); + if (tp->link_config.phy_is_low_power) + return; + tg3_full_lock(tp, 0); #define __GET_REG32(reg) (*(p)++ = tr32(reg)) @@ -7240,6 +7457,7 @@ static int tg3_get_eeprom_len(struct net } static int tg3_nvram_read(struct tg3 *tp, u32 offset, u32 *val); +static int tg3_nvram_read_swab(struct tg3 *tp, u32 offset, u32 *val); static int tg3_get_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, u8 *data) { @@ -7248,6 +7466,9 @@ static int tg3_get_eeprom(struct net_dev u8 *pd; u32 i, offset, len, val, b_offset, b_count; + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + offset = eeprom->offset; len = eeprom->len; eeprom->len = 0; @@ -7309,6 +7530,9 @@ static int tg3_set_eeprom(struct net_dev u32 offset, len, b_offset, odd_len, start, end; u8 *buf; + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + if (eeprom->magic != TG3_EEPROM_MAGIC) return -EINVAL; @@ -7442,6 +7666,7 @@ static void tg3_get_drvinfo(struct net_d strcpy(info->driver, DRV_MODULE_NAME); strcpy(info->version, DRV_MODULE_VERSION); + strcpy(info->fw_version, tp->fw_ver); strcpy(info->bus_info, pci_name(tp->pdev)); } @@ -7536,11 +7761,20 @@ static void tg3_get_ringparam(struct net ering->rx_max_pending = TG3_RX_RING_SIZE - 1; ering->rx_mini_max_pending = 0; - ering->rx_jumbo_max_pending = TG3_RX_JUMBO_RING_SIZE - 1; + if (tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) + ering->rx_jumbo_max_pending = TG3_RX_JUMBO_RING_SIZE - 1; + else + ering->rx_jumbo_max_pending = 0; + + ering->tx_max_pending = TG3_TX_RING_SIZE - 1; ering->rx_pending = tp->rx_pending; ering->rx_mini_pending = 0; - ering->rx_jumbo_pending = tp->rx_jumbo_pending; + if (tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) + ering->rx_jumbo_pending = tp->rx_jumbo_pending; + else + ering->rx_jumbo_pending = 0; + ering->tx_pending = tp->tx_pending; } @@ -7661,10 +7895,10 @@ static int tg3_set_tx_csum(struct net_de return 0; } - if (data) - dev->features |= NETIF_F_IP_CSUM; + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) + ethtool_op_set_tx_hw_csum(dev, data); else - dev->features &= ~NETIF_F_IP_CSUM; + ethtool_op_set_tx_csum(dev, data); return 0; } @@ -7734,29 +7968,52 @@ static void tg3_get_ethtool_stats (struc } #define NVRAM_TEST_SIZE 0x100 +#define NVRAM_SELFBOOT_FORMAT1_SIZE 0x14 static int tg3_test_nvram(struct tg3 *tp) { - u32 *buf, csum; - int i, j, err = 0; + u32 *buf, csum, magic; + int i, j, err = 0, size; + + if (tg3_nvram_read_swab(tp, 0, &magic) != 0) + return -EIO; + + if (magic == TG3_EEPROM_MAGIC) + size = NVRAM_TEST_SIZE; + else if ((magic & 0xff000000) == 0xa5000000) { + if ((magic & 0xe00000) == 0x200000) + size = NVRAM_SELFBOOT_FORMAT1_SIZE; + else + return 0; + } else + return -EIO; - buf = kmalloc(NVRAM_TEST_SIZE, GFP_KERNEL); + buf = kmalloc(size, GFP_KERNEL); if (buf == NULL) return -ENOMEM; - for (i = 0, j = 0; i < NVRAM_TEST_SIZE; i += 4, j++) { + err = -EIO; + for (i = 0, j = 0; i < size; i += 4, j++) { u32 val; if ((err = tg3_nvram_read(tp, i, &val)) != 0) break; buf[j] = cpu_to_le32(val); } - if (i < NVRAM_TEST_SIZE) + if (i < size) goto out; - err = -EIO; - if (cpu_to_be32(buf[0]) != TG3_EEPROM_MAGIC) - goto out; + /* Selfboot format */ + if (cpu_to_be32(buf[0]) != TG3_EEPROM_MAGIC) { + u8 *buf8 = (u8 *) buf, csum8 = 0; + + for (i = 0; i < size; i++) + csum8 += buf8[i]; + + if (csum8 == 0) + return 0; + return -EIO; + } /* Bootstrap checksum at offset 0x10 */ csum = calc_crc((unsigned char *) buf, 0x10); @@ -8050,14 +8307,24 @@ static int tg3_test_memory(struct tg3 *t { 0x00008000, 0x02000}, { 0x00010000, 0x0e000}, { 0xffffffff, 0x00000} + }, mem_tbl_5755[] = { + { 0x00000200, 0x00008}, + { 0x00004000, 0x00800}, + { 0x00006000, 0x00800}, + { 0x00008000, 0x02000}, + { 0x00010000, 0x0c000}, + { 0xffffffff, 0x00000} }; struct mem_entry *mem_tbl; int err = 0; int i; - if (tp->tg3_flags2 & TG3_FLG2_5705_PLUS) - mem_tbl = mem_tbl_5705; - else + if (tp->tg3_flags2 & TG3_FLG2_5705_PLUS) { + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) + mem_tbl = mem_tbl_5755; + else + mem_tbl = mem_tbl_5705; + } else mem_tbl = mem_tbl_570x; for (i = 0; mem_tbl[i].offset != 0xffffffff; i++) { @@ -8229,6 +8496,9 @@ static void tg3_self_test(struct net_dev { struct tg3 *tp = netdev_priv(dev); + if (tp->link_config.phy_is_low_power) + tg3_set_power_state(tp, PCI_D0); + memset(data, 0, sizeof(u64) * TG3_NUM_TEST); if (tg3_test_nvram(tp) != 0) { @@ -8257,6 +8527,9 @@ static void tg3_self_test(struct net_dev if (!err) tg3_nvram_unlock(tp); + if (tp->tg3_flags2 & TG3_FLG2_MII_SERDES) + tg3_phy_reset(tp); + if (tg3_test_registers(tp) != 0) { etest->flags |= ETH_TEST_FL_FAILED; data[2] = 1; @@ -8286,6 +8559,9 @@ static void tg3_self_test(struct net_dev tg3_full_unlock(tp); } + if (tp->link_config.phy_is_low_power) + tg3_set_power_state(tp, PCI_D3hot); + } static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) @@ -8305,6 +8581,9 @@ static int tg3_ioctl(struct net_device * if (tp->tg3_flags2 & TG3_FLG2_PHY_SERDES) break; /* We have no PHY */ + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + spin_lock_bh(&tp->lock); err = tg3_readphy(tp, data->reg_num & 0x1f, &mii_regval); spin_unlock_bh(&tp->lock); @@ -8321,6 +8600,9 @@ static int tg3_ioctl(struct net_device * if (!capable(CAP_NET_ADMIN)) return -EPERM; + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + spin_lock_bh(&tp->lock); err = tg3_writephy(tp, data->reg_num & 0x1f, data->val_in); spin_unlock_bh(&tp->lock); @@ -8464,14 +8746,14 @@ static struct ethtool_ops tg3_ethtool_op static void __devinit tg3_get_eeprom_size(struct tg3 *tp) { - u32 cursize, val; + u32 cursize, val, magic; tp->nvram_size = EEPROM_CHIP_SIZE; - if (tg3_nvram_read(tp, 0, &val) != 0) + if (tg3_nvram_read_swab(tp, 0, &magic) != 0) return; - if (swab32(val) != TG3_EEPROM_MAGIC) + if ((magic != TG3_EEPROM_MAGIC) && ((magic & 0xff000000) != 0xa5000000)) return; /* @@ -8479,13 +8761,13 @@ static void __devinit tg3_get_eeprom_siz * When we encounter our validation signature, we know the addressing * has wrapped around, and thus have our chip size. */ - cursize = 0x800; + cursize = 0x10; while (cursize < tp->nvram_size) { - if (tg3_nvram_read(tp, cursize, &val) != 0) + if (tg3_nvram_read_swab(tp, cursize, &val) != 0) return; - if (swab32(val) == TG3_EEPROM_MAGIC) + if (val == magic) break; cursize <<= 1; @@ -8498,6 +8780,15 @@ static void __devinit tg3_get_nvram_size { u32 val; + if (tg3_nvram_read_swab(tp, 0, &val) != 0) + return; + + /* Selfboot format */ + if (val != TG3_EEPROM_MAGIC) { + tg3_get_eeprom_size(tp); + return; + } + if (tg3_nvram_read(tp, 0xf0, &val) == 0) { if (val != 0) { tp->nvram_size = (val >> 16) * 1024; @@ -8621,6 +8912,44 @@ static void __devinit tg3_get_5752_nvram } } +static void __devinit tg3_get_5787_nvram_info(struct tg3 *tp) +{ + u32 nvcfg1; + + nvcfg1 = tr32(NVRAM_CFG1); + + switch (nvcfg1 & NVRAM_CFG1_5752VENDOR_MASK) { + case FLASH_5787VENDOR_ATMEL_EEPROM_64KHZ: + case FLASH_5787VENDOR_ATMEL_EEPROM_376KHZ: + case FLASH_5787VENDOR_MICRO_EEPROM_64KHZ: + case FLASH_5787VENDOR_MICRO_EEPROM_376KHZ: + tp->nvram_jedecnum = JEDEC_ATMEL; + tp->tg3_flags |= TG3_FLAG_NVRAM_BUFFERED; + tp->nvram_pagesize = ATMEL_AT24C512_CHIP_SIZE; + + nvcfg1 &= ~NVRAM_CFG1_COMPAT_BYPASS; + tw32(NVRAM_CFG1, nvcfg1); + break; + case FLASH_5752VENDOR_ATMEL_FLASH_BUFFERED: + case FLASH_5755VENDOR_ATMEL_FLASH_1: + case FLASH_5755VENDOR_ATMEL_FLASH_2: + case FLASH_5755VENDOR_ATMEL_FLASH_3: + tp->nvram_jedecnum = JEDEC_ATMEL; + tp->tg3_flags |= TG3_FLAG_NVRAM_BUFFERED; + tp->tg3_flags2 |= TG3_FLG2_FLASH; + tp->nvram_pagesize = 264; + break; + case FLASH_5752VENDOR_ST_M45PE10: + case FLASH_5752VENDOR_ST_M45PE20: + case FLASH_5752VENDOR_ST_M45PE40: + tp->nvram_jedecnum = JEDEC_ST; + tp->tg3_flags |= TG3_FLAG_NVRAM_BUFFERED; + tp->tg3_flags2 |= TG3_FLG2_FLASH; + tp->nvram_pagesize = 256; + break; + } +} + /* Chips other than 5700/5701 use the NVRAM for fetching info. */ static void __devinit tg3_nvram_init(struct tg3 *tp) { @@ -8656,6 +8985,8 @@ static void __devinit tg3_nvram_init(str if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5752) tg3_get_5752_nvram_info(tp); + else if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) + tg3_get_5787_nvram_info(tp); else tg3_get_nvram_info(tp); @@ -8725,6 +9056,34 @@ static int tg3_nvram_exec_cmd(struct tg3 return 0; } +static u32 tg3_nvram_phys_addr(struct tg3 *tp, u32 addr) +{ + if ((tp->tg3_flags & TG3_FLAG_NVRAM) && + (tp->tg3_flags & TG3_FLAG_NVRAM_BUFFERED) && + (tp->tg3_flags2 & TG3_FLG2_FLASH) && + (tp->nvram_jedecnum == JEDEC_ATMEL)) + + addr = ((addr / tp->nvram_pagesize) << + ATMEL_AT45DB0X1B_PAGE_POS) + + (addr % tp->nvram_pagesize); + + return addr; +} + +static u32 tg3_nvram_logical_addr(struct tg3 *tp, u32 addr) +{ + if ((tp->tg3_flags & TG3_FLAG_NVRAM) && + (tp->tg3_flags & TG3_FLAG_NVRAM_BUFFERED) && + (tp->tg3_flags2 & TG3_FLG2_FLASH) && + (tp->nvram_jedecnum == JEDEC_ATMEL)) + + addr = ((addr >> ATMEL_AT45DB0X1B_PAGE_POS) * + tp->nvram_pagesize) + + (addr & ((1 << ATMEL_AT45DB0X1B_PAGE_POS) - 1)); + + return addr; +} + static int tg3_nvram_read(struct tg3 *tp, u32 offset, u32 *val) { int ret; @@ -8737,14 +9096,7 @@ static int tg3_nvram_read(struct tg3 *tp if (!(tp->tg3_flags & TG3_FLAG_NVRAM)) return tg3_nvram_read_using_eeprom(tp, offset, val); - if ((tp->tg3_flags & TG3_FLAG_NVRAM_BUFFERED) && - (tp->tg3_flags2 & TG3_FLG2_FLASH) && - (tp->nvram_jedecnum == JEDEC_ATMEL)) { - - offset = ((offset / tp->nvram_pagesize) << - ATMEL_AT45DB0X1B_PAGE_POS) + - (offset % tp->nvram_pagesize); - } + offset = tg3_nvram_phys_addr(tp, offset); if (offset > NVRAM_ADDR_MSK) return -EINVAL; @@ -8769,6 +9121,16 @@ static int tg3_nvram_read(struct tg3 *tp return ret; } +static int tg3_nvram_read_swab(struct tg3 *tp, u32 offset, u32 *val) +{ + int err; + u32 tmp; + + err = tg3_nvram_read(tp, offset, &tmp); + *val = swab32(tmp); + return err; +} + static int tg3_nvram_write_block_using_eeprom(struct tg3 *tp, u32 offset, u32 len, u8 *buf) { @@ -8921,15 +9283,7 @@ static int tg3_nvram_write_block_buffere page_off = offset % tp->nvram_pagesize; - if ((tp->tg3_flags2 & TG3_FLG2_FLASH) && - (tp->nvram_jedecnum == JEDEC_ATMEL)) { - - phy_addr = ((offset / tp->nvram_pagesize) << - ATMEL_AT45DB0X1B_PAGE_POS) + page_off; - } - else { - phy_addr = offset; - } + phy_addr = tg3_nvram_phys_addr(tp, offset); tw32(NVRAM_ADDR, phy_addr); @@ -8944,6 +9298,7 @@ static int tg3_nvram_write_block_buffere nvram_cmd |= NVRAM_CMD_LAST; if ((GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5752) && + (GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5787) && (tp->nvram_jedecnum == JEDEC_ST) && (nvram_cmd & NVRAM_CMD_FIRST)) { @@ -9343,6 +9698,7 @@ static void __devinit tg3_read_partno(st { unsigned char vpd_data[256]; int i; + u32 magic; if (tp->tg3_flags2 & TG3_FLG2_SUN_570X) { /* Sun decided not to put the necessary bits in the @@ -9352,16 +9708,43 @@ static void __devinit tg3_read_partno(st return; } - for (i = 0; i < 256; i += 4) { - u32 tmp; + if (tg3_nvram_read_swab(tp, 0x0, &magic)) + return; - if (tg3_nvram_read(tp, 0x100 + i, &tmp)) - goto out_not_found; + if (magic == TG3_EEPROM_MAGIC) { + for (i = 0; i < 256; i += 4) { + u32 tmp; + + if (tg3_nvram_read(tp, 0x100 + i, &tmp)) + goto out_not_found; + + vpd_data[i + 0] = ((tmp >> 0) & 0xff); + vpd_data[i + 1] = ((tmp >> 8) & 0xff); + vpd_data[i + 2] = ((tmp >> 16) & 0xff); + vpd_data[i + 3] = ((tmp >> 24) & 0xff); + } + } else { + int vpd_cap; - vpd_data[i + 0] = ((tmp >> 0) & 0xff); - vpd_data[i + 1] = ((tmp >> 8) & 0xff); - vpd_data[i + 2] = ((tmp >> 16) & 0xff); - vpd_data[i + 3] = ((tmp >> 24) & 0xff); + vpd_cap = pci_find_capability(tp->pdev, PCI_CAP_ID_VPD); + for (i = 0; i < 256; i += 4) { + u32 tmp, j = 0; + u16 tmp16; + + pci_write_config_word(tp->pdev, vpd_cap + PCI_VPD_ADDR, + i); + while (j++ < 100) { + pci_read_config_word(tp->pdev, vpd_cap + + PCI_VPD_ADDR, &tmp16); + if (tmp16 & 0x8000) + break; + msleep(1); + } + pci_read_config_dword(tp->pdev, vpd_cap + PCI_VPD_DATA, + &tmp); + tmp = cpu_to_le32(tmp); + memcpy(&vpd_data[i], &tmp, 4); + } } /* Now parse and find the part number. */ @@ -9408,6 +9791,46 @@ out_not_found: strcpy(tp->board_part_number, "none"); } +static void __devinit tg3_read_fw_ver(struct tg3 *tp) +{ + u32 val, offset, start; + + if (tg3_nvram_read_swab(tp, 0, &val)) + return; + + if (val != TG3_EEPROM_MAGIC) + return; + + if (tg3_nvram_read_swab(tp, 0xc, &offset) || + tg3_nvram_read_swab(tp, 0x4, &start)) + return; + + offset = tg3_nvram_logical_addr(tp, offset); + if (tg3_nvram_read_swab(tp, offset, &val)) + return; + + if ((val & 0xfc000000) == 0x0c000000) { + u32 ver_offset, addr; + int i; + + if (tg3_nvram_read_swab(tp, offset + 4, &val) || + tg3_nvram_read_swab(tp, offset + 8, &ver_offset)) + return; + + if (val != 0) + return; + + addr = offset + ver_offset - start; + for (i = 0; i < 16; i += 4) { + if (tg3_nvram_read(tp, addr + i, &val)) + return; + + val = cpu_to_le32(val); + memcpy(tp->fw_ver + i, &val, 4); + } + } +} + #ifdef CONFIG_SPARC64 static int __devinit tg3_is_sun_570X(struct tg3 *tp) { @@ -9575,6 +9998,7 @@ static int __devinit tg3_get_invariants( if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5750 || GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5752 || + GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787 || (tp->tg3_flags2 & TG3_FLG2_5780_CLASS)) tp->tg3_flags2 |= TG3_FLG2_5750_PLUS; @@ -9582,12 +10006,18 @@ static int __devinit tg3_get_invariants( (tp->tg3_flags2 & TG3_FLG2_5750_PLUS)) tp->tg3_flags2 |= TG3_FLG2_5705_PLUS; - if (tp->tg3_flags2 & TG3_FLG2_5750_PLUS) - tp->tg3_flags2 |= TG3_FLG2_HW_TSO; + if (tp->tg3_flags2 & TG3_FLG2_5750_PLUS) { + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) { + tp->tg3_flags2 |= TG3_FLG2_HW_TSO_2; + tp->tg3_flags2 |= TG3_FLG2_1SHOT_MSI; + } else + tp->tg3_flags2 |= TG3_FLG2_HW_TSO_1; + } if (GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5705 && GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5750 && - GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5752) + GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5752 && + GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5787) tp->tg3_flags2 |= TG3_FLG2_JUMBO_CAPABLE; if (pci_find_capability(tp->pdev, PCI_CAP_ID_EXP) != 0) @@ -9744,7 +10174,7 @@ static int __devinit tg3_get_invariants( tp->grc_local_ctrl |= GRC_LCLCTRL_GPIO_OE3; /* Force the chip into D0. */ - err = tg3_set_power_state(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); if (err) { printk(KERN_ERR PFX "(%s) transition to D0 failed\n", pci_name(tp->pdev)); @@ -9797,7 +10227,8 @@ static int __devinit tg3_get_invariants( if (tp->pci_chip_rev_id == CHIPREV_ID_5704_A0) tp->tg3_flags2 |= TG3_FLG2_PHY_5704_A0_BUG; - if (tp->tg3_flags2 & TG3_FLG2_5705_PLUS) + if ((tp->tg3_flags2 & TG3_FLG2_5705_PLUS) && + (GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5787)) tp->tg3_flags2 |= TG3_FLG2_PHY_BER_BUG; tp->coalesce_mode = 0; @@ -9897,6 +10328,7 @@ static int __devinit tg3_get_invariants( } tg3_read_partno(tp); + tg3_read_fw_ver(tp); if (tp->tg3_flags2 & TG3_FLG2_PHY_SERDES) { tp->tg3_flags &= ~TG3_FLAG_USE_MI_INTERRUPT; @@ -9932,10 +10364,13 @@ static int __devinit tg3_get_invariants( else tp->tg3_flags &= ~TG3_FLAG_POLL_SERDES; - /* It seems all chips can get confused if TX buffers + /* All chips before 5787 can get confused if TX buffers * straddle the 4GB address boundary in some cases. */ - tp->dev->hard_start_xmit = tg3_start_xmit; + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) + tp->dev->hard_start_xmit = tg3_start_xmit; + else + tp->dev->hard_start_xmit = tg3_start_xmit_dma_bug; tp->rx_offset = 2; if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701 && @@ -10456,7 +10891,6 @@ static void __devinit tg3_init_link_conf tp->link_config.speed = SPEED_INVALID; tp->link_config.duplex = DUPLEX_INVALID; tp->link_config.autoneg = AUTONEG_ENABLE; - netif_carrier_off(tp->dev); tp->link_config.active_speed = SPEED_INVALID; tp->link_config.active_duplex = DUPLEX_INVALID; tp->link_config.phy_is_low_power = 0; @@ -10515,6 +10949,7 @@ static char * __devinit tg3_phy_string(s case PHY_ID_BCM5752: return "5752"; case PHY_ID_BCM5714: return "5714"; case PHY_ID_BCM5780: return "5780"; + case PHY_ID_BCM5787: return "5787"; case PHY_ID_BCM8002: return "8002/serdes"; case 0: return "serdes"; default: return "unknown"; @@ -10812,11 +11247,12 @@ static int __devinit tg3_init_one(struct tp->tg3_flags2 |= TG3_FLG2_TSO_CAPABLE; } - /* TSO is off by default, user can enable using ethtool. */ -#if 0 - if (tp->tg3_flags2 & TG3_FLG2_TSO_CAPABLE) + /* TSO is on by default on chips that support hardware TSO. + * Firmware TSO on older chips gives lower performance, so it + * is off by default, but can be enabled using ethtool. + */ + if (tp->tg3_flags2 & TG3_FLG2_HW_TSO) dev->features |= NETIF_F_TSO; -#endif #endif @@ -10860,7 +11296,11 @@ static int __devinit tg3_init_one(struct * checksumming. */ if ((tp->tg3_flags & TG3_FLAG_BROKEN_CHECKSUMS) == 0) { - dev->features |= NETIF_F_SG | NETIF_F_IP_CSUM; + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5787) + dev->features |= NETIF_F_HW_CSUM; + else + dev->features |= NETIF_F_IP_CSUM; + dev->features |= NETIF_F_SG; tp->tg3_flags |= TG3_FLAG_RX_CHECKSUMS; } else tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS; @@ -10911,6 +11351,8 @@ static int __devinit tg3_init_one(struct printk(KERN_INFO "%s: dma_rwctrl[%08x]\n", dev->name, tp->dma_rwctrl); + netif_carrier_off(tp->dev); + return 0; err_out_iounmap: @@ -11006,7 +11448,7 @@ static int tg3_resume(struct pci_dev *pd pci_restore_state(tp->pdev); - err = tg3_set_power_state(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); if (err) return err; diff -puN drivers/net/tg3.h~git-net drivers/net/tg3.h --- devel/drivers/net/tg3.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/tg3.h 2006-03-17 23:03:48.000000000 -0800 @@ -138,6 +138,7 @@ #define ASIC_REV_5752 0x06 #define ASIC_REV_5780 0x08 #define ASIC_REV_5714 0x09 +#define ASIC_REV_5787 0x0b #define GET_CHIP_REV(CHIP_REV_ID) ((CHIP_REV_ID) >> 8) #define CHIPREV_5700_AX 0x70 #define CHIPREV_5700_BX 0x71 @@ -1393,6 +1394,7 @@ #define GRC_MDI_CTRL 0x00006844 #define GRC_SEEPROM_DELAY 0x00006848 /* 0x684c --> 0x6c00 unused */ +#define GRC_FASTBOOT_PC 0x00006894 /* 5752, 5755, 5787 */ /* 0x6c00 --> 0x7000 unused */ @@ -1436,6 +1438,13 @@ #define FLASH_5752VENDOR_ST_M45PE10 0x02400000 #define FLASH_5752VENDOR_ST_M45PE20 0x02400002 #define FLASH_5752VENDOR_ST_M45PE40 0x02400001 +#define FLASH_5755VENDOR_ATMEL_FLASH_1 0x03400001 +#define FLASH_5755VENDOR_ATMEL_FLASH_2 0x03400002 +#define FLASH_5755VENDOR_ATMEL_FLASH_3 0x03400000 +#define FLASH_5787VENDOR_ATMEL_EEPROM_64KHZ 0x03000003 +#define FLASH_5787VENDOR_ATMEL_EEPROM_376KHZ 0x03000002 +#define FLASH_5787VENDOR_MICRO_EEPROM_64KHZ 0x03000000 +#define FLASH_5787VENDOR_MICRO_EEPROM_376KHZ 0x02000000 #define NVRAM_CFG1_5752PAGE_SIZE_MASK 0x70000000 #define FLASH_5752PAGE_SIZE_256 0x00000000 #define FLASH_5752PAGE_SIZE_512 0x10000000 @@ -2184,7 +2193,7 @@ struct tg3 { #define TG3_FLG2_PHY_SERDES 0x00002000 #define TG3_FLG2_CAPACITIVE_COUPLING 0x00004000 #define TG3_FLG2_FLASH 0x00008000 -#define TG3_FLG2_HW_TSO 0x00010000 +#define TG3_FLG2_HW_TSO_1 0x00010000 #define TG3_FLG2_SERDES_PREEMPHASIS 0x00020000 #define TG3_FLG2_5705_PLUS 0x00040000 #define TG3_FLG2_5750_PLUS 0x00080000 @@ -2197,6 +2206,9 @@ struct tg3 { #define TG3_FLG2_PARALLEL_DETECT 0x01000000 #define TG3_FLG2_ICH_WORKAROUND 0x02000000 #define TG3_FLG2_5780_CLASS 0x04000000 +#define TG3_FLG2_HW_TSO_2 0x08000000 +#define TG3_FLG2_HW_TSO (TG3_FLG2_HW_TSO_1 | TG3_FLG2_HW_TSO_2) +#define TG3_FLG2_1SHOT_MSI 0x10000000 u32 split_mode_max_reqs; #define SPLIT_MODE_5704_MAX_REQ 3 @@ -2246,6 +2258,7 @@ struct tg3 { #define PHY_ID_BCM5752 0x60008100 #define PHY_ID_BCM5714 0x60008340 #define PHY_ID_BCM5780 0x60008350 +#define PHY_ID_BCM5787 0xbc050ce0 #define PHY_ID_BCM8002 0x60010140 #define PHY_ID_INVALID 0xffffffff #define PHY_ID_REV_MASK 0x0000000f @@ -2257,6 +2270,7 @@ struct tg3 { u32 led_ctrl; char board_part_number[24]; + char fw_ver[16]; u32 nic_sram_data_cfg; u32 pci_clock_ctrl; struct pci_dev *pdev_peer; @@ -2270,7 +2284,8 @@ struct tg3 { (X) == PHY_ID_BCM5703 || (X) == PHY_ID_BCM5704 || \ (X) == PHY_ID_BCM5705 || (X) == PHY_ID_BCM5750 || \ (X) == PHY_ID_BCM5752 || (X) == PHY_ID_BCM5714 || \ - (X) == PHY_ID_BCM5780 || (X) == PHY_ID_BCM8002) + (X) == PHY_ID_BCM5780 || (X) == PHY_ID_BCM5787 || \ + (X) == PHY_ID_BCM8002) struct tg3_hw_stats *hw_stats; dma_addr_t stats_mapping; diff -puN drivers/net/wan/sbni.c~git-net drivers/net/wan/sbni.c --- devel/drivers/net/wan/sbni.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/drivers/net/wan/sbni.c 2006-03-17 23:03:48.000000000 -0800 @@ -1495,8 +1495,7 @@ module_param(skip_pci_probe, bool, 0); MODULE_LICENSE("GPL"); -int -init_module( void ) +int __init init_module( void ) { struct net_device *dev; int err; diff -puN include/asm-alpha/socket.h~git-net include/asm-alpha/socket.h --- devel/include/asm-alpha/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-alpha/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -51,6 +51,7 @@ #define SCM_TIMESTAMP SO_TIMESTAMP #define SO_PEERSEC 30 +#define SO_PASSSEC 31 /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 19 diff -puN include/asm-arm26/socket.h~git-net include/asm-arm26/socket.h --- devel/include/asm-arm26/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-arm26/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-arm/socket.h~git-net include/asm-arm/socket.h --- devel/include/asm-arm/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-arm/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-cris/socket.h~git-net include/asm-cris/socket.h --- devel/include/asm-cris/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-cris/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -50,6 +50,7 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-frv/socket.h~git-net include/asm-frv/socket.h --- devel/include/asm-frv/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-frv/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,6 +48,7 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-h8300/socket.h~git-net include/asm-h8300/socket.h --- devel/include/asm-h8300/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-h8300/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-i386/socket.h~git-net include/asm-i386/socket.h --- devel/include/asm-i386/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-i386/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-ia64/socket.h~git-net include/asm-ia64/socket.h --- devel/include/asm-ia64/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-ia64/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -57,5 +57,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_IA64_SOCKET_H */ diff -puN include/asm-m32r/socket.h~git-net include/asm-m32r/socket.h --- devel/include/asm-m32r/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-m32r/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _ASM_M32R_SOCKET_H */ diff -puN include/asm-m68k/socket.h~git-net include/asm-m68k/socket.h --- devel/include/asm-m68k/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-m68k/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-mips/socket.h~git-net include/asm-mips/socket.h --- devel/include/asm-mips/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-mips/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -69,6 +69,7 @@ To add: #define SO_REUSEPORT 0x0200 /* A #define SO_PEERSEC 30 #define SO_SNDBUFFORCE 31 #define SO_RCVBUFFORCE 33 +#define SO_PASSSEC 34 #ifdef __KERNEL__ diff -puN include/asm-parisc/socket.h~git-net include/asm-parisc/socket.h --- devel/include/asm-parisc/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-parisc/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 0x401c #define SO_PEERSEC 0x401d +#define SO_PASSSEC 0x401e #endif /* _ASM_SOCKET_H */ diff -puN include/asm-powerpc/socket.h~git-net include/asm-powerpc/socket.h --- devel/include/asm-powerpc/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-powerpc/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -55,5 +55,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_POWERPC_SOCKET_H */ diff -puN include/asm-s390/socket.h~git-net include/asm-s390/socket.h --- devel/include/asm-s390/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-s390/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -56,5 +56,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-sh/socket.h~git-net include/asm-sh/socket.h --- devel/include/asm-sh/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-sh/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* __ASM_SH_SOCKET_H */ diff -puN include/asm-sparc64/socket.h~git-net include/asm-sparc64/socket.h --- devel/include/asm-sparc64/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-sparc64/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,6 +48,7 @@ #define SCM_TIMESTAMP SO_TIMESTAMP #define SO_PEERSEC 0x001e +#define SO_PASSSEC 0x001f /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 diff -puN include/asm-sparc/socket.h~git-net include/asm-sparc/socket.h --- devel/include/asm-sparc/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-sparc/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -47,7 +47,8 @@ #define SO_TIMESTAMP 0x001d #define SCM_TIMESTAMP SO_TIMESTAMP -#define SO_PEERSEC 0x100e +#define SO_PEERSEC 0x001e +#define SO_PASSSEC 0x001f /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 diff -puN include/asm-v850/socket.h~git-net include/asm-v850/socket.h --- devel/include/asm-v850/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-v850/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* __V850_SOCKET_H__ */ diff -puN include/asm-x86_64/socket.h~git-net include/asm-x86_64/socket.h --- devel/include/asm-x86_64/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-x86_64/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-xtensa/socket.h~git-net include/asm-xtensa/socket.h --- devel/include/asm-xtensa/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/asm-xtensa/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -59,5 +59,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 32 #endif /* _XTENSA_SOCKET_H */ diff -puN include/linux/dccp.h~git-net include/linux/dccp.h --- devel/include/linux/dccp.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/dccp.h 2006-03-17 23:03:48.000000000 -0800 @@ -18,7 +18,7 @@ * @dccph_seq - sequence number high or low order 24 bits, depends on dccph_x */ struct dccp_hdr { - __u16 dccph_sport, + __be16 dccph_sport, dccph_dport; __u8 dccph_doff; #if defined(__LITTLE_ENDIAN_BITFIELD) @@ -32,18 +32,18 @@ struct dccp_hdr { #endif __u16 dccph_checksum; #if defined(__LITTLE_ENDIAN_BITFIELD) - __u32 dccph_x:1, + __u8 dccph_x:1, dccph_type:4, - dccph_reserved:3, - dccph_seq:24; + dccph_reserved:3; #elif defined(__BIG_ENDIAN_BITFIELD) - __u32 dccph_reserved:3, + __u8 dccph_reserved:3, dccph_type:4, - dccph_x:1, - dccph_seq:24; + dccph_x:1; #else #error "Adjust your defines" #endif + __u8 dccph_seq2; + __be16 dccph_seq; }; /** @@ -52,7 +52,7 @@ struct dccp_hdr { * @dccph_seq_low - low 24 bits of a 48 bit seq packet */ struct dccp_hdr_ext { - __u32 dccph_seq_low; + __be32 dccph_seq_low; }; /** @@ -62,7 +62,7 @@ struct dccp_hdr_ext { * @dccph_req_options - list of options (must be a multiple of 32 bits */ struct dccp_hdr_request { - __u32 dccph_req_service; + __be32 dccph_req_service; }; /** * struct dccp_hdr_ack_bits - acknowledgment bits common to most packets @@ -71,9 +71,9 @@ struct dccp_hdr_request { * @dccph_resp_ack_nr_low - 48 bit ack number low order bits, contains GSR */ struct dccp_hdr_ack_bits { - __u32 dccph_reserved1:8, - dccph_ack_nr_high:24; - __u32 dccph_ack_nr_low; + __be16 dccph_reserved1; + __be16 dccph_ack_nr_high; + __be32 dccph_ack_nr_low; }; /** * struct dccp_hdr_response - Conection initiation response header @@ -85,7 +85,7 @@ struct dccp_hdr_ack_bits { */ struct dccp_hdr_response { struct dccp_hdr_ack_bits dccph_resp_ack; - __u32 dccph_resp_service; + __be32 dccph_resp_service; }; /** @@ -154,6 +154,10 @@ enum { DCCPO_MANDATORY = 1, DCCPO_MIN_RESERVED = 3, DCCPO_MAX_RESERVED = 31, + DCCPO_CHANGE_L = 32, + DCCPO_CONFIRM_L = 33, + DCCPO_CHANGE_R = 34, + DCCPO_CONFIRM_R = 35, DCCPO_NDP_COUNT = 37, DCCPO_ACK_VECTOR_0 = 38, DCCPO_ACK_VECTOR_1 = 39, @@ -168,7 +172,9 @@ enum { /* DCCP features */ enum { DCCPF_RESERVED = 0, + DCCPF_CCID = 1, DCCPF_SEQUENCE_WINDOW = 3, + DCCPF_ACK_RATIO = 5, DCCPF_SEND_ACK_VECTOR = 6, DCCPF_SEND_NDP_COUNT = 7, /* 10-127 reserved */ @@ -176,9 +182,18 @@ enum { DCCPF_MAX_CCID_SPECIFIC = 255, }; +/* this structure is argument to DCCP_SOCKOPT_CHANGE_X */ +struct dccp_so_feat { + __u8 dccpsf_feat; + __u8 *dccpsf_val; + __u8 dccpsf_len; +}; + /* DCCP socket options */ #define DCCP_SOCKOPT_PACKET_SIZE 1 #define DCCP_SOCKOPT_SERVICE 2 +#define DCCP_SOCKOPT_CHANGE_L 3 +#define DCCP_SOCKOPT_CHANGE_R 4 #define DCCP_SOCKOPT_CCID_RX_INFO 128 #define DCCP_SOCKOPT_CCID_TX_INFO 192 @@ -254,16 +269,12 @@ static inline unsigned int dccp_basic_hd static inline __u64 dccp_hdr_seq(const struct sk_buff *skb) { const struct dccp_hdr *dh = dccp_hdr(skb); -#if defined(__LITTLE_ENDIAN_BITFIELD) - __u64 seq_nr = ntohl(dh->dccph_seq << 8); -#elif defined(__BIG_ENDIAN_BITFIELD) - __u64 seq_nr = ntohl(dh->dccph_seq); -#else -#error "Adjust your defines" -#endif + __u64 seq_nr = ntohs(dh->dccph_seq); if (dh->dccph_x != 0) seq_nr = (seq_nr << 32) + ntohl(dccp_hdrx(skb)->dccph_seq_low); + else + seq_nr += (u32)dh->dccph_seq2 << 16; return seq_nr; } @@ -281,13 +292,7 @@ static inline struct dccp_hdr_ack_bits * static inline u64 dccp_hdr_ack_seq(const struct sk_buff *skb) { const struct dccp_hdr_ack_bits *dhack = dccp_hdr_ack_bits(skb); -#if defined(__LITTLE_ENDIAN_BITFIELD) - return (((u64)ntohl(dhack->dccph_ack_nr_high << 8)) << 32) + ntohl(dhack->dccph_ack_nr_low); -#elif defined(__BIG_ENDIAN_BITFIELD) - return (((u64)ntohl(dhack->dccph_ack_nr_high)) << 32) + ntohl(dhack->dccph_ack_nr_low); -#else -#error "Adjust your defines" -#endif + return ((u64)ntohs(dhack->dccph_ack_nr_high) << 32) + ntohl(dhack->dccph_ack_nr_low); } static inline struct dccp_hdr_response *dccp_hdr_response(struct sk_buff *skb) @@ -314,38 +319,60 @@ static inline unsigned int dccp_hdr_len( /* initial values for each feature */ #define DCCPF_INITIAL_SEQUENCE_WINDOW 100 -/* FIXME: for now we're using CCID 3 (TFRC) */ -#define DCCPF_INITIAL_CCID 3 -#define DCCPF_INITIAL_SEND_ACK_VECTOR 0 +#define DCCPF_INITIAL_ACK_RATIO 2 +#define DCCPF_INITIAL_CCID 2 +#define DCCPF_INITIAL_SEND_ACK_VECTOR 1 /* FIXME: for now we're default to 1 but it should really be 0 */ #define DCCPF_INITIAL_SEND_NDP_COUNT 1 #define DCCP_NDP_LIMIT 0xFFFFFF /** - * struct dccp_options - option values for a DCCP connection - * @dccpo_sequence_window - Sequence Window Feature (section 7.5.2) - * @dccpo_ccid - Congestion Control Id (CCID) (section 10) - * @dccpo_send_ack_vector - Send Ack Vector Feature (section 11.5) - * @dccpo_send_ndp_count - Send NDP Count Feature (7.7.2) + * struct dccp_minisock - Minimal DCCP connection representation + * + * Will be used to pass the state from dccp_request_sock to dccp_sock. + * + * @dccpms_sequence_window - Sequence Window Feature (section 7.5.2) + * @dccpms_ccid - Congestion Control Id (CCID) (section 10) + * @dccpms_send_ack_vector - Send Ack Vector Feature (section 11.5) + * @dccpms_send_ndp_count - Send NDP Count Feature (7.7.2) */ -struct dccp_options { - __u64 dccpo_sequence_window; - __u8 dccpo_rx_ccid; - __u8 dccpo_tx_ccid; - __u8 dccpo_send_ack_vector; - __u8 dccpo_send_ndp_count; +struct dccp_minisock { + __u64 dccpms_sequence_window; + __u8 dccpms_rx_ccid; + __u8 dccpms_tx_ccid; + __u8 dccpms_send_ack_vector; + __u8 dccpms_send_ndp_count; + __u8 dccpms_ack_ratio; + struct list_head dccpms_pending; + struct list_head dccpms_conf; +}; + +struct dccp_opt_conf { + __u8 *dccpoc_val; + __u8 dccpoc_len; +}; + +struct dccp_opt_pend { + struct list_head dccpop_node; + __u8 dccpop_type; + __u8 dccpop_feat; + __u8 *dccpop_val; + __u8 dccpop_len; + int dccpop_conf; + struct dccp_opt_conf *dccpop_sc; }; -extern void __dccp_options_init(struct dccp_options *dccpo); -extern void dccp_options_init(struct dccp_options *dccpo); +extern void __dccp_minisock_init(struct dccp_minisock *dmsk); +extern void dccp_minisock_init(struct dccp_minisock *dmsk); + extern int dccp_parse_options(struct sock *sk, struct sk_buff *skb); struct dccp_request_sock { struct inet_request_sock dreq_inet_rsk; __u64 dreq_iss; __u64 dreq_isr; - __u32 dreq_service; + __be32 dreq_service; }; static inline struct dccp_request_sock *dccp_rsk(const struct request_sock *req) @@ -373,13 +400,13 @@ enum dccp_role { struct dccp_service_list { __u32 dccpsl_nr; - __u32 dccpsl_list[0]; + __be32 dccpsl_list[0]; }; #define DCCP_SERVICE_INVALID_VALUE htonl((__u32)-1) static inline int dccp_list_has_service(const struct dccp_service_list *sl, - const u32 service) + const __be32 service) { if (likely(sl != NULL)) { u32 i = sl->dccpsl_nr; @@ -425,17 +452,17 @@ struct dccp_sock { __u64 dccps_gss; __u64 dccps_gsr; __u64 dccps_gar; - __u32 dccps_service; + __be32 dccps_service; struct dccp_service_list *dccps_service_list; struct timeval dccps_timestamp_time; __u32 dccps_timestamp_echo; __u32 dccps_packet_size; + __u16 dccps_l_ack_ratio; + __u16 dccps_r_ack_ratio; unsigned long dccps_ndp_count; __u32 dccps_mss_cache; - struct dccp_options dccps_options; + struct dccp_minisock dccps_minisock; struct dccp_ackvec *dccps_hc_rx_ackvec; - void *dccps_hc_rx_ccid_private; - void *dccps_hc_tx_ccid_private; struct ccid *dccps_hc_rx_ccid; struct ccid *dccps_hc_tx_ccid; struct dccp_options_received dccps_options_received; @@ -450,6 +477,11 @@ static inline struct dccp_sock *dccp_sk( return (struct dccp_sock *)sk; } +static inline struct dccp_minisock *dccp_msk(const struct sock *sk) +{ + return (struct dccp_minisock *)&dccp_sk(sk)->dccps_minisock; +} + static inline int dccp_service_not_initialized(const struct sock *sk) { return dccp_sk(sk)->dccps_service == DCCP_SERVICE_INVALID_VALUE; diff -puN include/linux/dn.h~git-net include/linux/dn.h --- devel/include/linux/dn.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/dn.h 2006-03-17 23:03:48.000000000 -0800 @@ -71,17 +71,17 @@ struct dn_naddr { - unsigned short a_len; - unsigned char a_addr[DN_MAXADDL]; + __le16 a_len; + __u8 a_addr[DN_MAXADDL]; /* Two bytes little endian */ }; struct sockaddr_dn { - unsigned short sdn_family; - unsigned char sdn_flags; - unsigned char sdn_objnum; - unsigned short sdn_objnamel; - unsigned char sdn_objname[DN_MAXOBJL]; + __u16 sdn_family; + __u8 sdn_flags; + __u8 sdn_objnum; + __le16 sdn_objnamel; + __u8 sdn_objname[DN_MAXOBJL]; struct dn_naddr sdn_add; }; #define sdn_nodeaddrl sdn_add.a_len /* Node address length */ @@ -93,38 +93,38 @@ struct sockaddr_dn * DECnet set/get DSO_CONDATA, DSO_DISDATA (optional data) structure */ struct optdata_dn { - unsigned short opt_status; /* Extended status return */ + __le16 opt_status; /* Extended status return */ #define opt_sts opt_status - unsigned short opt_optl; /* Length of user data */ - unsigned char opt_data[16]; /* User data */ + __le16 opt_optl; /* Length of user data */ + __u8 opt_data[16]; /* User data */ }; struct accessdata_dn { - unsigned char acc_accl; - unsigned char acc_acc[DN_MAXACCL]; - unsigned char acc_passl; - unsigned char acc_pass[DN_MAXACCL]; - unsigned char acc_userl; - unsigned char acc_user[DN_MAXACCL]; + __u8 acc_accl; + __u8 acc_acc[DN_MAXACCL]; + __u8 acc_passl; + __u8 acc_pass[DN_MAXACCL]; + __u8 acc_userl; + __u8 acc_user[DN_MAXACCL]; }; /* * DECnet logical link information structure */ struct linkinfo_dn { - unsigned short idn_segsize; /* Segment size for link */ - unsigned char idn_linkstate; /* Logical link state */ + __le16 idn_segsize; /* Segment size for link */ + __u8 idn_linkstate; /* Logical link state */ }; /* * Ethernet address format (for DECnet) */ union etheraddress { - unsigned char dne_addr[6]; /* Full ethernet address */ + __u8 dne_addr[6]; /* Full ethernet address */ struct { - unsigned char dne_hiord[4]; /* DECnet HIORD prefix */ - unsigned char dne_nodeaddr[2]; /* DECnet node address */ + __u8 dne_hiord[4]; /* DECnet HIORD prefix */ + __u8 dne_nodeaddr[2]; /* DECnet node address */ } dne_remote; }; @@ -133,7 +133,7 @@ union etheraddress { * DECnet physical socket address format */ struct dn_addr { - unsigned short dna_family; /* AF_DECnet */ + __le16 dna_family; /* AF_DECnet */ union etheraddress dna_netaddr; /* DECnet ethernet address */ }; diff -puN include/linux/icmpv6.h~git-net include/linux/icmpv6.h --- devel/include/linux/icmpv6.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/icmpv6.h 2006-03-17 23:03:48.000000000 -0800 @@ -40,14 +40,16 @@ struct icmp6hdr { struct icmpv6_nd_ra { __u8 hop_limit; #if defined(__LITTLE_ENDIAN_BITFIELD) - __u8 reserved:6, + __u8 reserved:4, + router_pref:2, other:1, managed:1; #elif defined(__BIG_ENDIAN_BITFIELD) __u8 managed:1, other:1, - reserved:6; + router_pref:2, + reserved:4; #else #error "Please fix " #endif @@ -70,8 +72,13 @@ struct icmp6hdr { #define icmp6_addrconf_managed icmp6_dataun.u_nd_ra.managed #define icmp6_addrconf_other icmp6_dataun.u_nd_ra.other #define icmp6_rt_lifetime icmp6_dataun.u_nd_ra.rt_lifetime +#define icmp6_router_pref icmp6_dataun.u_nd_ra.router_pref }; +#define ICMPV6_ROUTER_PREF_LOW 0x3 +#define ICMPV6_ROUTER_PREF_MEDIUM 0x0 +#define ICMPV6_ROUTER_PREF_HIGH 0x1 +#define ICMPV6_ROUTER_PREF_INVALID 0x2 #define ICMPV6_DEST_UNREACH 1 #define ICMPV6_PKT_TOOBIG 2 diff -puN include/linux/if.h~git-net include/linux/if.h --- devel/include/linux/if.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/if.h 2006-03-17 23:03:48.000000000 -0800 @@ -33,7 +33,7 @@ #define IFF_LOOPBACK 0x8 /* is a loopback net */ #define IFF_POINTOPOINT 0x10 /* interface is has p-p link */ #define IFF_NOTRAILERS 0x20 /* avoid use of trailers */ -#define IFF_RUNNING 0x40 /* interface running and carrier ok */ +#define IFF_RUNNING 0x40 /* interface RFC2863 OPER_UP */ #define IFF_NOARP 0x80 /* no ARP protocol */ #define IFF_PROMISC 0x100 /* receive all packets */ #define IFF_ALLMULTI 0x200 /* receive all multicast packets*/ @@ -43,12 +43,16 @@ #define IFF_MULTICAST 0x1000 /* Supports multicast */ -#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_MASTER|IFF_SLAVE|IFF_RUNNING) - #define IFF_PORTSEL 0x2000 /* can set media type */ #define IFF_AUTOMEDIA 0x4000 /* auto media select active */ #define IFF_DYNAMIC 0x8000 /* dialup device with changing addresses*/ +#define IFF_LOWER_UP 0x10000 /* driver signals L1 up */ +#define IFF_DORMANT 0x20000 /* driver signals dormant */ + +#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|\ + IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT) + /* Private (from user) interface flags (netdevice->priv_flags). */ #define IFF_802_1Q_VLAN 0x1 /* 802.1Q VLAN device. */ #define IFF_EBRIDGE 0x2 /* Ethernet bridging device. */ @@ -83,6 +87,22 @@ #define IF_PROTO_FR_ETH_PVC 0x200B #define IF_PROTO_RAW 0x200C /* RAW Socket */ +/* RFC 2863 operational status */ +enum { + IF_OPER_UNKNOWN, + IF_OPER_NOTPRESENT, + IF_OPER_DOWN, + IF_OPER_LOWERLAYERDOWN, + IF_OPER_TESTING, + IF_OPER_DORMANT, + IF_OPER_UP, +}; + +/* link modes */ +enum { + IF_LINK_MODE_DEFAULT, + IF_LINK_MODE_DORMANT, /* limit upward transition to dormant */ +}; /* * Device mapping structure. I'd just gone off and designed a diff -puN include/linux/inetdevice.h~git-net include/linux/inetdevice.h --- devel/include/linux/inetdevice.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/inetdevice.h 2006-03-17 23:03:48.000000000 -0800 @@ -25,6 +25,7 @@ struct ipv4_devconf int arp_filter; int arp_announce; int arp_ignore; + int arp_accept; int medium_id; int no_xfrm; int no_policy; diff -puN include/linux/in.h~git-net include/linux/in.h --- devel/include/linux/in.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/in.h 2006-03-17 23:03:48.000000000 -0800 @@ -72,6 +72,7 @@ struct in_addr { #define IP_FREEBIND 15 #define IP_IPSEC_POLICY 16 #define IP_XFRM_POLICY 17 +#define IP_PASSSEC 18 /* BSD compatibility */ #define IP_RECVRETOPTS IP_RETOPTS diff -puN include/linux/ipv6.h~git-net include/linux/ipv6.h --- devel/include/linux/ipv6.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/ipv6.h 2006-03-17 23:03:48.000000000 -0800 @@ -145,6 +145,15 @@ struct ipv6_devconf { __s32 max_desync_factor; #endif __s32 max_addresses; + __s32 accept_ra_defrtr; + __s32 accept_ra_pinfo; +#ifdef CONFIG_IPV6_ROUTER_PREF + __s32 accept_ra_rtr_pref; + __s32 rtr_probe_interval; +#ifdef CONFIG_IPV6_ROUTE_INFO + __s32 accept_ra_rt_info_max_plen; +#endif +#endif void *sysctl; }; @@ -167,6 +176,11 @@ enum { DEVCONF_MAX_DESYNC_FACTOR, DEVCONF_MAX_ADDRESSES, DEVCONF_FORCE_MLD_VERSION, + DEVCONF_ACCEPT_RA_DEFRTR, + DEVCONF_ACCEPT_RA_PINFO, + DEVCONF_ACCEPT_RA_RTR_PREF, + DEVCONF_RTR_PROBE_INTERVAL, + DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN, DEVCONF_MAX }; diff -puN include/linux/ipv6_route.h~git-net include/linux/ipv6_route.h --- devel/include/linux/ipv6_route.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/ipv6_route.h 2006-03-17 23:03:48.000000000 -0800 @@ -23,12 +23,22 @@ #define RTF_NONEXTHOP 0x00200000 /* route with no nexthop */ #define RTF_EXPIRES 0x00400000 +#define RTF_ROUTEINFO 0x00800000 /* route information - RA */ + #define RTF_CACHE 0x01000000 /* cache entry */ #define RTF_FLOW 0x02000000 /* flow significant route */ #define RTF_POLICY 0x04000000 /* policy route */ +#define RTF_PREF(pref) ((pref) << 27) +#define RTF_PREF_MASK 0x18000000 + #define RTF_LOCAL 0x80000000 +#ifdef __KERNEL__ +#define IPV6_EXTRACT_PREF(flag) (((flag) & RTF_PREF_MASK) >> 27) +#define IPV6_DECODE_PREF(pref) ((pref) ^ 2) /* 1:low,2:med,3:high */ +#endif + struct in6_rtmsg { struct in6_addr rtmsg_dst; struct in6_addr rtmsg_src; diff -puN include/linux/irda.h~git-net include/linux/irda.h --- devel/include/linux/irda.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/irda.h 2006-03-17 23:03:48.000000000 -0800 @@ -76,6 +76,7 @@ typedef enum { IRDA_MCP2120_DONGLE = 9, IRDA_ACT200L_DONGLE = 10, IRDA_MA600_DONGLE = 11, + IRDA_TOIM3232_DONGLE = 12, } IRDA_DONGLE; /* Protocol types to be used for SOCK_DGRAM */ diff -puN include/linux/list.h~git-net include/linux/list.h --- devel/include/linux/list.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/list.h 2006-03-17 23:03:48.000000000 -0800 @@ -411,6 +411,17 @@ static inline void list_splice_init(stru pos = list_entry(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_from - iterate over list of given type + * continuing from existing point + * @pos: the type * to use as a loop counter. + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry_from(pos, head, member) \ + for (; prefetch(pos->member.next), &pos->member != (head); \ + pos = list_entry(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry * @pos: the type * to use as a loop counter. * @n: another type * to use as temporary storage @@ -438,6 +449,19 @@ static inline void list_splice_init(stru pos = n, n = list_entry(n->member.next, typeof(*n), member)) /** + * list_for_each_entry_safe_from - iterate over list of given type + * from existing point safe against removal of list entry + * @pos: the type * to use as a loop counter. + * @n: another type * to use as temporary storage + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry_safe_from(pos, n, head, member) \ + for (n = list_entry(pos->member.next, typeof(*pos), member); \ + &pos->member != (head); \ + pos = n, n = list_entry(n->member.next, typeof(*n), member)) + +/** * list_for_each_entry_safe_reverse - iterate backwards over list of given type safe against * removal of list entry * @pos: the type * to use as a loop counter. diff -puN include/linux/netdevice.h~git-net include/linux/netdevice.h --- devel/include/linux/netdevice.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netdevice.h 2006-03-17 23:03:48.000000000 -0800 @@ -230,7 +230,8 @@ enum netdev_state_t __LINK_STATE_SCHED, __LINK_STATE_NOCARRIER, __LINK_STATE_RX_SCHED, - __LINK_STATE_LINKWATCH_PENDING + __LINK_STATE_LINKWATCH_PENDING, + __LINK_STATE_DORMANT, }; @@ -335,11 +336,14 @@ struct net_device */ - unsigned short flags; /* interface flags (a la BSD) */ + unsigned int flags; /* interface flags (a la BSD) */ unsigned short gflags; unsigned short priv_flags; /* Like 'flags' but invisible to userspace. */ unsigned short padded; /* How much padding added by alloc_netdev() */ + unsigned char operstate; /* RFC2863 operstate */ + unsigned char link_mode; /* mapping policy to operstate */ + unsigned mtu; /* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len; /* hardware hdr length */ @@ -708,12 +712,18 @@ static inline void dev_put(struct net_de atomic_dec(&dev->refcnt); } -#define __dev_put(dev) atomic_dec(&(dev)->refcnt) -#define dev_hold(dev) atomic_inc(&(dev)->refcnt) +static inline void dev_hold(struct net_device *dev) +{ + atomic_inc(&dev->refcnt); +} /* Carrier loss detection, dial on demand. The functions netif_carrier_on * and _off may be called from IRQ context, but it is caller * who is responsible for serialization of these calls. + * + * The name carrier is inappropriate, these functions should really be + * called netif_lowerlayer_*() because they represent the state of any + * kind of lower layer not just hardware media. */ extern void linkwatch_fire_event(struct net_device *dev); @@ -729,6 +739,29 @@ extern void netif_carrier_on(struct net_ extern void netif_carrier_off(struct net_device *dev); +static inline void netif_dormant_on(struct net_device *dev) +{ + if (!test_and_set_bit(__LINK_STATE_DORMANT, &dev->state)) + linkwatch_fire_event(dev); +} + +static inline void netif_dormant_off(struct net_device *dev) +{ + if (test_and_clear_bit(__LINK_STATE_DORMANT, &dev->state)) + linkwatch_fire_event(dev); +} + +static inline int netif_dormant(const struct net_device *dev) +{ + return test_bit(__LINK_STATE_DORMANT, &dev->state); +} + + +static inline int netif_oper_up(const struct net_device *dev) { + return (dev->operstate == IF_OPER_UP || + dev->operstate == IF_OPER_UNKNOWN /* backward compat */); +} + /* Hot-plugging. */ static inline int netif_device_present(struct net_device *dev) { diff -puN include/linux/netfilter_bridge.h~git-net include/linux/netfilter_bridge.h --- devel/include/linux/netfilter_bridge.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter_bridge.h 2006-03-17 23:03:48.000000000 -0800 @@ -47,22 +47,6 @@ enum nf_br_hook_priorities { #define BRNF_BRIDGED 0x08 #define BRNF_NF_BRIDGE_PREROUTING 0x10 -static inline -struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb) -{ - struct nf_bridge_info **nf_bridge = &(skb->nf_bridge); - - if ((*nf_bridge = kmalloc(sizeof(**nf_bridge), GFP_ATOMIC)) != NULL) { - atomic_set(&(*nf_bridge)->use, 1); - (*nf_bridge)->mask = 0; - (*nf_bridge)->physindev = (*nf_bridge)->physoutdev = NULL; -#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE) - (*nf_bridge)->netoutdev = NULL; -#endif - } - - return *nf_bridge; -} /* Only used in br_forward.c */ static inline @@ -77,17 +61,6 @@ void nf_bridge_maybe_copy_header(struct } } -static inline -void nf_bridge_save_header(struct sk_buff *skb) -{ - int header_size = 16; - - if (skb->protocol == __constant_htons(ETH_P_8021Q)) - header_size = 18; - - memcpy(skb->nf_bridge->data, skb->data - header_size, header_size); -} - /* This is called by the IP fragmenting code and it ensures there is * enough room for the encapsulating header (if there is one). */ static inline diff -puN include/linux/netfilter.h~git-net include/linux/netfilter.h --- devel/include/linux/netfilter.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter.h 2006-03-17 23:03:48.000000000 -0800 @@ -80,10 +80,14 @@ struct nf_sockopt_ops int set_optmin; int set_optmax; int (*set)(struct sock *sk, int optval, void __user *user, unsigned int len); + int (*compat_set)(struct sock *sk, int optval, + void __user *user, unsigned int len); int get_optmin; int get_optmax; int (*get)(struct sock *sk, int optval, void __user *user, int *len); + int (*compat_get)(struct sock *sk, int optval, + void __user *user, int *len); /* Number of users inside set() or get(). */ unsigned int use; @@ -246,6 +250,11 @@ int nf_setsockopt(struct sock *sk, int p int nf_getsockopt(struct sock *sk, int pf, int optval, char __user *opt, int *len); +int compat_nf_setsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int len); +int compat_nf_getsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int *len); + /* Packet queuing */ struct nf_queue_handler { int (*outfn)(struct sk_buff *skb, struct nf_info *info, diff -puN include/linux/netfilter_ipv4/ip_nat.h~git-net include/linux/netfilter_ipv4/ip_nat.h --- devel/include/linux/netfilter_ipv4/ip_nat.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter_ipv4/ip_nat.h 2006-03-17 23:03:48.000000000 -0800 @@ -23,7 +23,7 @@ struct ip_nat_seq { * modification (if any) */ u_int32_t correction_pos; /* sequence number offset before and after last modification */ - int32_t offset_before, offset_after; + int16_t offset_before, offset_after; }; /* Single range specification. */ diff -puN include/linux/netfilter_ipv4/ipt_policy.h~git-net include/linux/netfilter_ipv4/ipt_policy.h --- devel/include/linux/netfilter_ipv4/ipt_policy.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter_ipv4/ipt_policy.h 2006-03-17 23:03:48.000000000 -0800 @@ -1,58 +1,21 @@ #ifndef _IPT_POLICY_H #define _IPT_POLICY_H -#define IPT_POLICY_MAX_ELEM 4 +#define IPT_POLICY_MAX_ELEM XT_POLICY_MAX_ELEM -enum ipt_policy_flags -{ - IPT_POLICY_MATCH_IN = 0x1, - IPT_POLICY_MATCH_OUT = 0x2, - IPT_POLICY_MATCH_NONE = 0x4, - IPT_POLICY_MATCH_STRICT = 0x8, -}; - -enum ipt_policy_modes -{ - IPT_POLICY_MODE_TRANSPORT, - IPT_POLICY_MODE_TUNNEL -}; - -struct ipt_policy_spec -{ - u_int8_t saddr:1, - daddr:1, - proto:1, - mode:1, - spi:1, - reqid:1; -}; - -union ipt_policy_addr -{ - struct in_addr a4; - struct in6_addr a6; -}; - -struct ipt_policy_elem -{ - union ipt_policy_addr saddr; - union ipt_policy_addr smask; - union ipt_policy_addr daddr; - union ipt_policy_addr dmask; - u_int32_t spi; - u_int32_t reqid; - u_int8_t proto; - u_int8_t mode; - - struct ipt_policy_spec match; - struct ipt_policy_spec invert; -}; - -struct ipt_policy_info -{ - struct ipt_policy_elem pol[IPT_POLICY_MAX_ELEM]; - u_int16_t flags; - u_int16_t len; -}; +/* ipt_policy_flags */ +#define IPT_POLICY_MATCH_IN XT_POLICY_MATCH_IN +#define IPT_POLICY_MATCH_OUT XT_POLICY_MATCH_OUT +#define IPT_POLICY_MATCH_NONE XT_POLICY_MATCH_NONE +#define IPT_POLICY_MATCH_STRICT XT_POLICY_MATCH_STRICT + +/* ipt_policy_modes */ +#define IPT_POLICY_MODE_TRANSPORT XT_POLICY_MODE_TRANSPORT +#define IPT_POLICY_MODE_TUNNEL XT_POLICY_MODE_TUNNEL + +#define ipt_policy_spec xt_policy_spec +#define ipt_policy_addr xt_policy_addr +#define ipt_policy_elem xt_policy_elem +#define ipt_policy_info xt_policy_info #endif /* _IPT_POLICY_H */ diff -puN include/linux/netfilter_ipv6/ip6t_policy.h~git-net include/linux/netfilter_ipv6/ip6t_policy.h --- devel/include/linux/netfilter_ipv6/ip6t_policy.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter_ipv6/ip6t_policy.h 2006-03-17 23:03:48.000000000 -0800 @@ -1,58 +1,21 @@ #ifndef _IP6T_POLICY_H #define _IP6T_POLICY_H -#define IP6T_POLICY_MAX_ELEM 4 +#define IP6T_POLICY_MAX_ELEM XT_POLICY_MAX_ELEM -enum ip6t_policy_flags -{ - IP6T_POLICY_MATCH_IN = 0x1, - IP6T_POLICY_MATCH_OUT = 0x2, - IP6T_POLICY_MATCH_NONE = 0x4, - IP6T_POLICY_MATCH_STRICT = 0x8, -}; - -enum ip6t_policy_modes -{ - IP6T_POLICY_MODE_TRANSPORT, - IP6T_POLICY_MODE_TUNNEL -}; - -struct ip6t_policy_spec -{ - u_int8_t saddr:1, - daddr:1, - proto:1, - mode:1, - spi:1, - reqid:1; -}; - -union ip6t_policy_addr -{ - struct in_addr a4; - struct in6_addr a6; -}; - -struct ip6t_policy_elem -{ - union ip6t_policy_addr saddr; - union ip6t_policy_addr smask; - union ip6t_policy_addr daddr; - union ip6t_policy_addr dmask; - u_int32_t spi; - u_int32_t reqid; - u_int8_t proto; - u_int8_t mode; - - struct ip6t_policy_spec match; - struct ip6t_policy_spec invert; -}; - -struct ip6t_policy_info -{ - struct ip6t_policy_elem pol[IP6T_POLICY_MAX_ELEM]; - u_int16_t flags; - u_int16_t len; -}; +/* ip6t_policy_flags */ +#define IP6T_POLICY_MATCH_IN XT_POLICY_MATCH_IN +#define IP6T_POLICY_MATCH_OUT XT_POLICY_MATCH_OUT +#define IP6T_POLICY_MATCH_NONE XT_POLICY_MATCH_NONE +#define IP6T_POLICY_MATCH_STRICT XT_POLICY_MATCH_STRICT + +/* ip6t_policy_modes */ +#define IP6T_POLICY_MODE_TRANSPORT XT_POLICY_MODE_TRANSPORT +#define IP6T_POLICY_MODE_TUNNEL XT_POLICY_MODE_TUNNEL + +#define ip6t_policy_spec xt_policy_spec +#define ip6t_policy_addr xt_policy_addr +#define ip6t_policy_elem xt_policy_elem +#define ip6t_policy_info xt_policy_info #endif /* _IP6T_POLICY_H */ diff -puN include/linux/netfilter/nfnetlink.h~git-net include/linux/netfilter/nfnetlink.h --- devel/include/linux/netfilter/nfnetlink.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter/nfnetlink.h 2006-03-17 23:03:48.000000000 -0800 @@ -164,6 +164,7 @@ extern void nfattr_parse(struct nfattr * __res; \ }) +extern int nfnetlink_has_listeners(unsigned int group); extern int nfnetlink_send(struct sk_buff *skb, u32 pid, unsigned group, int echo); extern int nfnetlink_unicast(struct sk_buff *skb, u_int32_t pid, int flags); diff -puN include/linux/netfilter/nfnetlink_log.h~git-net include/linux/netfilter/nfnetlink_log.h --- devel/include/linux/netfilter/nfnetlink_log.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter/nfnetlink_log.h 2006-03-17 23:03:48.000000000 -0800 @@ -47,6 +47,8 @@ enum nfulnl_attr_type { NFULA_PAYLOAD, /* opaque data payload */ NFULA_PREFIX, /* string prefix */ NFULA_UID, /* user id of socket */ + NFULA_SEQ, /* instance-local sequence number */ + NFULA_SEQ_GLOBAL, /* global sequence number */ __NFULA_MAX }; @@ -77,6 +79,7 @@ enum nfulnl_attr_config { NFULA_CFG_NLBUFSIZ, /* u_int32_t buffer size */ NFULA_CFG_TIMEOUT, /* u_int32_t in 1/100 s */ NFULA_CFG_QTHRESH, /* u_int32_t */ + NFULA_CFG_FLAGS, /* u_int16_t */ __NFULA_CFG_MAX }; #define NFULA_CFG_MAX (__NFULA_CFG_MAX -1) @@ -85,4 +88,7 @@ enum nfulnl_attr_config { #define NFULNL_COPY_META 0x01 #define NFULNL_COPY_PACKET 0x02 +#define NFULNL_CFG_F_SEQ 0x0001 +#define NFULNL_CFG_F_SEQ_GLOBAL 0x0002 + #endif /* _NFNETLINK_LOG_H */ diff -puN include/linux/netfilter/x_tables.h~git-net include/linux/netfilter/x_tables.h --- devel/include/linux/netfilter/x_tables.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter/x_tables.h 2006-03-17 23:03:48.000000000 -0800 @@ -92,8 +92,6 @@ struct xt_match const char name[XT_FUNCTION_MAXNAMELEN-1]; - u_int8_t revision; - /* Return true or false: return FALSE and set *hotdrop = 1 to force immediate packet drop. */ /* Arguments changed since 2.6.9, as this must now handle @@ -102,6 +100,7 @@ struct xt_match int (*match)(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -111,15 +110,25 @@ struct xt_match /* Should return true or false. */ int (*checkentry)(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask); /* Called when entry of this type deleted. */ - void (*destroy)(void *matchinfo, unsigned int matchinfosize); + void (*destroy)(const struct xt_match *match, void *matchinfo, + unsigned int matchinfosize); /* Set this to THIS_MODULE if you are a module, otherwise NULL */ struct module *me; + + char *table; + unsigned int matchsize; + unsigned int hooks; + unsigned short proto; + + unsigned short family; + u_int8_t revision; }; /* Registration hooks for targets. */ @@ -129,8 +138,6 @@ struct xt_target const char name[XT_FUNCTION_MAXNAMELEN-1]; - u_int8_t revision; - /* Returns verdict. Argument order changed since 2.6.9, as this must now handle non-linear skbs, using skb_copy_bits and skb_ip_make_writable. */ @@ -138,6 +145,7 @@ struct xt_target const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userdata); @@ -147,15 +155,25 @@ struct xt_target /* Should return true or false. */ int (*checkentry)(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask); /* Called when entry of this type deleted. */ - void (*destroy)(void *targinfo, unsigned int targinfosize); + void (*destroy)(const struct xt_target *target, void *targinfo, + unsigned int targinfosize); /* Set this to THIS_MODULE if you are a module, otherwise NULL */ struct module *me; + + char *table; + unsigned int targetsize; + unsigned int hooks; + unsigned short proto; + + unsigned short family; + u_int8_t revision; }; /* Furniture shopping... */ @@ -207,6 +225,13 @@ extern void xt_unregister_target(int af, extern int xt_register_match(int af, struct xt_match *target); extern void xt_unregister_match(int af, struct xt_match *target); +extern int xt_check_match(const struct xt_match *match, unsigned short family, + unsigned int size, const char *table, unsigned int hook, + unsigned short proto, int inv_proto); +extern int xt_check_target(const struct xt_target *target, unsigned short family, + unsigned int size, const char *table, unsigned int hook, + unsigned short proto, int inv_proto); + extern int xt_register_table(struct xt_table *table, struct xt_table_info *bootstrap, struct xt_table_info *newinfo); diff -puN /dev/null include/linux/netfilter/xt_policy.h --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/include/linux/netfilter/xt_policy.h 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,58 @@ +#ifndef _XT_POLICY_H +#define _XT_POLICY_H + +#define XT_POLICY_MAX_ELEM 4 + +enum xt_policy_flags +{ + XT_POLICY_MATCH_IN = 0x1, + XT_POLICY_MATCH_OUT = 0x2, + XT_POLICY_MATCH_NONE = 0x4, + XT_POLICY_MATCH_STRICT = 0x8, +}; + +enum xt_policy_modes +{ + XT_POLICY_MODE_TRANSPORT, + XT_POLICY_MODE_TUNNEL +}; + +struct xt_policy_spec +{ + u_int8_t saddr:1, + daddr:1, + proto:1, + mode:1, + spi:1, + reqid:1; +}; + +union xt_policy_addr +{ + struct in_addr a4; + struct in6_addr a6; +}; + +struct xt_policy_elem +{ + union xt_policy_addr saddr; + union xt_policy_addr smask; + union xt_policy_addr daddr; + union xt_policy_addr dmask; + u_int32_t spi; + u_int32_t reqid; + u_int8_t proto; + u_int8_t mode; + + struct xt_policy_spec match; + struct xt_policy_spec invert; +}; + +struct xt_policy_info +{ + struct xt_policy_elem pol[XT_POLICY_MAX_ELEM]; + u_int16_t flags; + u_int16_t len; +}; + +#endif /* _XT_POLICY_H */ diff -puN include/linux/net.h~git-net include/linux/net.h --- devel/include/linux/net.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/net.h 2006-03-17 23:03:48.000000000 -0800 @@ -62,6 +62,7 @@ typedef enum { #define SOCK_ASYNC_WAITDATA 1 #define SOCK_NOSPACE 2 #define SOCK_PASSCRED 3 +#define SOCK_PASSSEC 4 #ifndef ARCH_HAS_SOCKET_TYPES /** @@ -149,6 +150,10 @@ struct proto_ops { int optname, char __user *optval, int optlen); int (*getsockopt)(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int optlen); + int (*compat_getsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); int (*sendmsg) (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t total_len); int (*recvmsg) (struct kiocb *iocb, struct socket *sock, diff -puN include/linux/netlink.h~git-net include/linux/netlink.h --- devel/include/linux/netlink.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/netlink.h 2006-03-17 23:03:48.000000000 -0800 @@ -151,6 +151,7 @@ struct netlink_skb_parms extern struct sock *netlink_kernel_create(int unit, unsigned int groups, void (*input)(struct sock *sk, int len), struct module *module); extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err); +extern int netlink_has_listeners(struct sock *sk, unsigned int group); extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock); extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 pid, __u32 group, gfp_t allocation); diff -puN include/linux/pci_ids.h~git-net include/linux/pci_ids.h --- devel/include/linux/pci_ids.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/pci_ids.h 2006-03-17 23:03:48.000000000 -0800 @@ -1860,16 +1860,22 @@ #define PCI_DEVICE_ID_TIGON3_5705M 0x165d #define PCI_DEVICE_ID_TIGON3_5705M_2 0x165e #define PCI_DEVICE_ID_TIGON3_5714 0x1668 +#define PCI_DEVICE_ID_TIGON3_5714S 0x1669 #define PCI_DEVICE_ID_TIGON3_5780 0x166a #define PCI_DEVICE_ID_TIGON3_5780S 0x166b #define PCI_DEVICE_ID_TIGON3_5705F 0x166e +#define PCI_DEVICE_ID_TIGON3_5754M 0x1672 #define PCI_DEVICE_ID_TIGON3_5750 0x1676 #define PCI_DEVICE_ID_TIGON3_5751 0x1677 #define PCI_DEVICE_ID_TIGON3_5715 0x1678 +#define PCI_DEVICE_ID_TIGON3_5715S 0x1679 +#define PCI_DEVICE_ID_TIGON3_5754 0x167a #define PCI_DEVICE_ID_TIGON3_5750M 0x167c #define PCI_DEVICE_ID_TIGON3_5751M 0x167d #define PCI_DEVICE_ID_TIGON3_5751F 0x167e +#define PCI_DEVICE_ID_TIGON3_5787M 0x1693 #define PCI_DEVICE_ID_TIGON3_5782 0x1696 +#define PCI_DEVICE_ID_TIGON3_5787 0x169b #define PCI_DEVICE_ID_TIGON3_5788 0x169c #define PCI_DEVICE_ID_TIGON3_5789 0x169d #define PCI_DEVICE_ID_TIGON3_5702X 0x16a6 diff -puN include/linux/rtnetlink.h~git-net include/linux/rtnetlink.h --- devel/include/linux/rtnetlink.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/rtnetlink.h 2006-03-17 23:03:48.000000000 -0800 @@ -199,6 +199,7 @@ enum #define RTPROT_BIRD 12 /* BIRD */ #define RTPROT_DNROUTED 13 /* DECnet routing daemon */ #define RTPROT_XORP 14 /* XORP */ +#define RTPROT_NTK 15 /* Netsukuku */ /* rtm_scope @@ -733,6 +734,8 @@ enum #define IFLA_MAP IFLA_MAP IFLA_WEIGHT, #define IFLA_WEIGHT IFLA_WEIGHT + IFLA_OPERSTATE, + IFLA_LINKMODE, __IFLA_MAX }; @@ -905,6 +908,7 @@ struct tcamsg #ifdef __KERNEL__ #include +#include extern size_t rtattr_strlcpy(char *dest, const struct rtattr *rta, size_t size); static __inline__ int rtattr_strcmp(const struct rtattr *rta, const char *str) @@ -1036,24 +1040,17 @@ __rta_reserve(struct sk_buff *skb, int a extern void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change); -extern struct semaphore rtnl_sem; - -#define rtnl_shlock() down(&rtnl_sem) -#define rtnl_shlock_nowait() down_trylock(&rtnl_sem) - -#define rtnl_shunlock() do { up(&rtnl_sem); \ - if (rtnl && rtnl->sk_receive_queue.qlen) \ - rtnl->sk_data_ready(rtnl, 0); \ - } while(0) - +/* RTNL is used as a global lock for all changes to network configuration */ extern void rtnl_lock(void); -extern int rtnl_lock_interruptible(void); extern void rtnl_unlock(void); +extern int rtnl_trylock(void); + extern void rtnetlink_init(void); +extern void __rtnl_unlock(void); #define ASSERT_RTNL() do { \ - if (unlikely(down_trylock(&rtnl_sem) == 0)) { \ - up(&rtnl_sem); \ + if (unlikely(rtnl_trylock())) { \ + rtnl_unlock(); \ printk(KERN_ERR "RTNL: assertion failed at %s (%d)\n", \ __FILE__, __LINE__); \ dump_stack(); \ diff -puN include/linux/security.h~git-net include/linux/security.h --- devel/include/linux/security.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/security.h 2006-03-17 23:03:48.000000000 -0800 @@ -1293,7 +1293,8 @@ struct security_operations { int (*socket_setsockopt) (struct socket * sock, int level, int optname); int (*socket_shutdown) (struct socket * sock, int how); int (*socket_sock_rcv_skb) (struct sock * sk, struct sk_buff * skb); - int (*socket_getpeersec) (struct socket *sock, char __user *optval, int __user *optlen, unsigned len); + int (*socket_getpeersec_stream) (struct socket *sock, char __user *optval, int __user *optlen, unsigned len); + int (*socket_getpeersec_dgram) (struct sk_buff *skb, char **secdata, u32 *seclen); int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority); void (*sk_free_security) (struct sock *sk); unsigned int (*sk_getsid) (struct sock *sk, struct flowi *fl, u8 dir); @@ -2661,6 +2662,15 @@ static inline void securityfs_remove(str #endif /* CONFIG_SECURITY */ +#ifdef CONFIG_SECURITY_SELINUX +extern int security_sid_to_context (u32 sid, char **scontext, u32 *scontext_len); +#else +static inline int security_sid_to_context (u32 sid, char **scontext, u32 *scontext_len) +{ + return -EOPNOTSUPP; +} +#endif /* CONFIG_SECURITY_SELINUX */ + #ifdef CONFIG_SECURITY_NETWORK static inline int security_unix_stream_connect(struct socket * sock, struct socket * other, @@ -2768,10 +2778,16 @@ static inline int security_sock_rcv_skb return security_ops->socket_sock_rcv_skb (sk, skb); } -static inline int security_socket_getpeersec(struct socket *sock, char __user *optval, - int __user *optlen, unsigned len) +static inline int security_socket_getpeersec_stream(struct socket *sock, char __user *optval, + int __user *optlen, unsigned len) +{ + return security_ops->socket_getpeersec_stream(sock, optval, optlen, len); +} + +static inline int security_socket_getpeersec_dgram(struct sk_buff *skb, char **secdata, + u32 *seclen) { - return security_ops->socket_getpeersec(sock, optval, optlen, len); + return security_ops->socket_getpeersec_dgram(skb, secdata, seclen); } static inline int security_sk_alloc(struct sock *sk, int family, gfp_t priority) @@ -2890,8 +2906,14 @@ static inline int security_sock_rcv_skb return 0; } -static inline int security_socket_getpeersec(struct socket *sock, char __user *optval, - int __user *optlen, unsigned len) +static inline int security_socket_getpeersec_stream(struct socket *sock, char __user *optval, + int __user *optlen, unsigned len) +{ + return -ENOPROTOOPT; +} + +static inline int security_socket_getpeersec_dgram(struct sk_buff *skb, char **secdata, + u32 *seclen) { return -ENOPROTOOPT; } diff -puN include/linux/skbuff.h~git-net include/linux/skbuff.h --- devel/include/linux/skbuff.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/skbuff.h 2006-03-17 23:03:48.000000000 -0800 @@ -270,7 +270,6 @@ struct sk_buff { void (*destructor)(struct sk_buff *skb); #ifdef CONFIG_NETFILTER - __u32 nfmark; struct nf_conntrack *nfct; #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) struct sk_buff *nfct_reasm; @@ -278,6 +277,7 @@ struct sk_buff { #ifdef CONFIG_BRIDGE_NETFILTER struct nf_bridge_info *nf_bridge; #endif + __u32 nfmark; #endif /* CONFIG_NETFILTER */ #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ @@ -304,6 +304,7 @@ struct sk_buff { #include +extern void kfree_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); extern struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int fclone); @@ -404,22 +405,6 @@ static inline struct sk_buff *skb_get(st */ /** - * kfree_skb - free an sk_buff - * @skb: buffer to free - * - * Drop a reference to the buffer and free it if the usage count has - * hit zero. - */ -static inline void kfree_skb(struct sk_buff *skb) -{ - if (likely(atomic_read(&skb->users) == 1)) - smp_rmb(); - else if (likely(!atomic_dec_and_test(&skb->users))) - return; - __kfree_skb(skb); -} - -/** * skb_cloned - is the buffer a clone * @skb: buffer to check * @@ -1174,12 +1159,14 @@ static inline int skb_linearize(struct s */ static inline void skb_postpull_rcsum(struct sk_buff *skb, - const void *start, int len) + const void *start, unsigned int len) { if (skb->ip_summed == CHECKSUM_HW) skb->csum = csum_sub(skb->csum, csum_partial(start, len, 0)); } +unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len); + /** * pskb_trim_rcsum - trim received skb and update checksum * @skb: buffer to trim @@ -1351,16 +1338,6 @@ static inline void nf_conntrack_put_reas kfree_skb(skb); } #endif -static inline void nf_reset(struct sk_buff *skb) -{ - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) - nf_conntrack_put_reasm(skb->nfct_reasm); - skb->nfct_reasm = NULL; -#endif -} - #ifdef CONFIG_BRIDGE_NETFILTER static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge) { @@ -1373,6 +1350,20 @@ static inline void nf_bridge_get(struct atomic_inc(&nf_bridge->use); } #endif /* CONFIG_BRIDGE_NETFILTER */ +static inline void nf_reset(struct sk_buff *skb) +{ + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) + nf_conntrack_put_reasm(skb->nfct_reasm); + skb->nfct_reasm = NULL; +#endif +#ifdef CONFIG_BRIDGE_NETFILTER + nf_bridge_put(skb->nf_bridge); + skb->nf_bridge = NULL; +#endif +} + #else /* CONFIG_NETFILTER */ static inline void nf_reset(struct sk_buff *skb) {} #endif /* CONFIG_NETFILTER */ diff -puN include/linux/socket.h~git-net include/linux/socket.h --- devel/include/linux/socket.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/socket.h 2006-03-17 23:03:48.000000000 -0800 @@ -150,6 +150,7 @@ __KINLINE struct cmsghdr * cmsg_nxthdr ( #define SCM_RIGHTS 0x01 /* rw: access rights (array of int) */ #define SCM_CREDENTIALS 0x02 /* rw: struct ucred */ +#define SCM_SECURITY 0x03 /* rw: security label */ struct ucred { __u32 pid; diff -puN include/linux/sunrpc/svcsock.h~git-net include/linux/sunrpc/svcsock.h --- devel/include/linux/sunrpc/svcsock.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/sunrpc/svcsock.h 2006-03-17 23:03:48.000000000 -0800 @@ -36,7 +36,7 @@ struct svc_sock { struct list_head sk_deferred; /* deferred requests that need to * be revisted */ - struct semaphore sk_sem; /* to serialize sending data */ + struct mutex sk_mutex; /* to serialize sending data */ int (*sk_recvfrom)(struct svc_rqst *rqstp); int (*sk_sendto)(struct svc_rqst *rqstp); diff -puN include/linux/sysctl.h~git-net include/linux/sysctl.h --- devel/include/linux/sysctl.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/sysctl.h 2006-03-17 23:03:48.000000000 -0800 @@ -211,6 +211,7 @@ enum NET_SCTP=17, NET_LLC=18, NET_NETFILTER=19, + NET_DCCP=20, }; /* /proc/sys/kernel/random */ @@ -261,6 +262,8 @@ enum NET_CORE_DEV_WEIGHT=17, NET_CORE_SOMAXCONN=18, NET_CORE_BUDGET=19, + NET_CORE_AEVENT_ETIME=20, + NET_CORE_AEVENT_RSEQTH=21, }; /* /proc/sys/net/ethernet */ @@ -397,6 +400,9 @@ enum NET_TCP_CONG_CONTROL=110, NET_TCP_ABC=111, NET_IPV4_IPFRAG_MAX_DIST=112, + NET_TCP_MTU_PROBING=113, + NET_TCP_BASE_MSS=114, + NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115, }; enum { @@ -451,6 +457,7 @@ enum NET_IPV4_CONF_ARP_ANNOUNCE=18, NET_IPV4_CONF_ARP_IGNORE=19, NET_IPV4_CONF_PROMOTE_SECONDARIES=20, + NET_IPV4_CONF_ARP_ACCEPT=21, __NET_IPV4_CONF_MAX }; @@ -531,6 +538,11 @@ enum { NET_IPV6_MAX_DESYNC_FACTOR=15, NET_IPV6_MAX_ADDRESSES=16, NET_IPV6_FORCE_MLD_VERSION=17, + NET_IPV6_ACCEPT_RA_DEFRTR=18, + NET_IPV6_ACCEPT_RA_PINFO=19, + NET_IPV6_ACCEPT_RA_RTR_PREF=20, + NET_IPV6_RTR_PROBE_INTERVAL=21, + NET_IPV6_ACCEPT_RA_RT_INFO_MAX_PLEN=22, __NET_IPV6_MAX }; @@ -562,6 +574,21 @@ enum { __NET_NEIGH_MAX }; +/* /proc/sys/net/dccp */ +enum { + NET_DCCP_DEFAULT=1, +}; + +/* /proc/sys/net/dccp/default */ +enum { + NET_DCCP_DEFAULT_SEQ_WINDOW = 1, + NET_DCCP_DEFAULT_RX_CCID = 2, + NET_DCCP_DEFAULT_TX_CCID = 3, + NET_DCCP_DEFAULT_ACK_RATIO = 4, + NET_DCCP_DEFAULT_SEND_ACKVEC = 5, + NET_DCCP_DEFAULT_SEND_NDP = 6, +}; + /* /proc/sys/net/ipx */ enum { NET_IPX_PPROP_BROADCASTING=1, diff -puN include/linux/tcp.h~git-net include/linux/tcp.h --- devel/include/linux/tcp.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/tcp.h 2006-03-17 23:03:48.000000000 -0800 @@ -343,6 +343,12 @@ struct tcp_sock { __u32 seq; __u32 time; } rcvq_space; + +/* TCP-specific MTU probe information. */ + struct { + __u32 probe_seq_start; + __u32 probe_seq_end; + } mtu_probe; }; static inline struct tcp_sock *tcp_sk(const struct sock *sk) diff -puN include/linux/xfrm.h~git-net include/linux/xfrm.h --- devel/include/linux/xfrm.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/linux/xfrm.h 2006-03-17 23:03:48.000000000 -0800 @@ -156,6 +156,10 @@ enum { XFRM_MSG_FLUSHPOLICY, #define XFRM_MSG_FLUSHPOLICY XFRM_MSG_FLUSHPOLICY + XFRM_MSG_NEWAE, +#define XFRM_MSG_NEWAE XFRM_MSG_NEWAE + XFRM_MSG_GETAE, +#define XFRM_MSG_GETAE XFRM_MSG_GETAE __XFRM_MSG_MAX }; #define XFRM_MSG_MAX (__XFRM_MSG_MAX - 1) @@ -194,6 +198,21 @@ struct xfrm_encap_tmpl { xfrm_address_t encap_oa; }; +/* AEVENT flags */ +enum xfrm_ae_ftype_t { + XFRM_AE_UNSPEC, + XFRM_AE_RTHR=1, /* replay threshold*/ + XFRM_AE_RVAL=2, /* replay value */ + XFRM_AE_LVAL=4, /* lifetime value */ + XFRM_AE_ETHR=8, /* expiry timer threshold */ + XFRM_AE_CR=16, /* Event cause is replay update */ + XFRM_AE_CE=32, /* Event cause is timer expiry */ + XFRM_AE_CU=64, /* Event cause is policy update */ + __XFRM_AE_MAX + +#define XFRM_AE_MAX (__XFRM_AE_MAX - 1) +}; + /* Netlink message attributes. */ enum xfrm_attr_type_t { XFRMA_UNSPEC, @@ -205,6 +224,10 @@ enum xfrm_attr_type_t { XFRMA_SA, XFRMA_POLICY, XFRMA_SEC_CTX, /* struct xfrm_sec_ctx */ + XFRMA_LTIME_VAL, + XFRMA_REPLAY_VAL, + XFRMA_REPLAY_THRESH, + XFRMA_ETIMER_THRESH, __XFRMA_MAX #define XFRMA_MAX (__XFRMA_MAX - 1) @@ -235,6 +258,11 @@ struct xfrm_usersa_id { __u8 proto; }; +struct xfrm_aevent_id { + struct xfrm_usersa_id sa_id; + __u32 flags; +}; + struct xfrm_userspi_info { struct xfrm_usersa_info info; __u32 min; @@ -306,6 +334,8 @@ enum xfrm_nlgroups { #define XFRMNLGRP_SA XFRMNLGRP_SA XFRMNLGRP_POLICY, #define XFRMNLGRP_POLICY XFRMNLGRP_POLICY + XFRMNLGRP_AEVENTS, +#define XFRMNLGRP_AEVENTS XFRMNLGRP_AEVENTS __XFRMNLGRP_MAX }; #define XFRMNLGRP_MAX (__XFRMNLGRP_MAX - 1) diff -puN include/net/af_unix.h~git-net include/net/af_unix.h --- devel/include/net/af_unix.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/af_unix.h 2006-03-17 23:03:48.000000000 -0800 @@ -4,6 +4,7 @@ #include #include #include +#include #include extern void unix_inflight(struct file *fp); @@ -53,10 +54,12 @@ struct unix_address { struct unix_skb_parms { struct ucred creds; /* Skb credentials */ struct scm_fp_list *fp; /* Passed files */ + u32 sid; /* Security ID */ }; #define UNIXCB(skb) (*(struct unix_skb_parms*)&((skb)->cb)) #define UNIXCREDS(skb) (&UNIXCB((skb)).creds) +#define UNIXSID(skb) (&UNIXCB((skb)).sid) #define unix_state_rlock(s) spin_lock(&unix_sk(s)->lock) #define unix_state_runlock(s) spin_unlock(&unix_sk(s)->lock) @@ -71,7 +74,7 @@ struct unix_sock { struct unix_address *addr; struct dentry *dentry; struct vfsmount *mnt; - struct semaphore readsem; + struct mutex readlock; struct sock *peer; struct sock *other; struct sock *gc_tree; diff -puN include/net/dn_dev.h~git-net include/net/dn_dev.h --- devel/include/net/dn_dev.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/dn_dev.h 2006-03-17 23:03:48.000000000 -0800 @@ -7,11 +7,11 @@ struct dn_dev; struct dn_ifaddr { struct dn_ifaddr *ifa_next; struct dn_dev *ifa_dev; - dn_address ifa_local; - dn_address ifa_address; - unsigned char ifa_flags; - unsigned char ifa_scope; - char ifa_label[IFNAMSIZ]; + __le16 ifa_local; + __le16 ifa_address; + __u8 ifa_flags; + __u8 ifa_scope; + char ifa_label[IFNAMSIZ]; }; #define DN_DEV_S_RU 0 /* Run - working normally */ @@ -91,7 +91,7 @@ struct dn_dev { struct timer_list timer; unsigned long t3; struct neigh_parms *neigh_parms; - unsigned char addr[ETH_ALEN]; + __u8 addr[ETH_ALEN]; struct neighbour *router; /* Default router on circuit */ struct neighbour *peer; /* Peer on pointopoint links */ unsigned long uptime; /* Time device went up in jiffies */ @@ -99,56 +99,56 @@ struct dn_dev { struct dn_short_packet { - unsigned char msgflg; - unsigned short dstnode; - unsigned short srcnode; - unsigned char forward; + __u8 msgflg; + __le16 dstnode; + __le16 srcnode; + __u8 forward; } __attribute__((packed)); struct dn_long_packet { - unsigned char msgflg; - unsigned char d_area; - unsigned char d_subarea; - unsigned char d_id[6]; - unsigned char s_area; - unsigned char s_subarea; - unsigned char s_id[6]; - unsigned char nl2; - unsigned char visit_ct; - unsigned char s_class; - unsigned char pt; + __u8 msgflg; + __u8 d_area; + __u8 d_subarea; + __u8 d_id[6]; + __u8 s_area; + __u8 s_subarea; + __u8 s_id[6]; + __u8 nl2; + __u8 visit_ct; + __u8 s_class; + __u8 pt; } __attribute__((packed)); /*------------------------- DRP - Routing messages ---------------------*/ struct endnode_hello_message { - unsigned char msgflg; - unsigned char tiver[3]; - unsigned char id[6]; - unsigned char iinfo; - unsigned short blksize; - unsigned char area; - unsigned char seed[8]; - unsigned char neighbor[6]; - unsigned short timer; - unsigned char mpd; - unsigned char datalen; - unsigned char data[2]; + __u8 msgflg; + __u8 tiver[3]; + __u8 id[6]; + __u8 iinfo; + __le16 blksize; + __u8 area; + __u8 seed[8]; + __u8 neighbor[6]; + __le16 timer; + __u8 mpd; + __u8 datalen; + __u8 data[2]; } __attribute__((packed)); struct rtnode_hello_message { - unsigned char msgflg; - unsigned char tiver[3]; - unsigned char id[6]; - unsigned char iinfo; - unsigned short blksize; - unsigned char priority; - unsigned char area; - unsigned short timer; - unsigned char mpd; + __u8 msgflg; + __u8 tiver[3]; + __u8 id[6]; + __u8 iinfo; + __le16 blksize; + __u8 priority; + __u8 area; + __le16 timer; + __u8 mpd; } __attribute__((packed)); @@ -169,12 +169,12 @@ extern void dn_dev_down(struct net_devic extern int dn_dev_set_default(struct net_device *dev, int force); extern struct net_device *dn_dev_get_default(void); -extern int dn_dev_bind_default(dn_address *addr); +extern int dn_dev_bind_default(__le16 *addr); extern int register_dnaddr_notifier(struct notifier_block *nb); extern int unregister_dnaddr_notifier(struct notifier_block *nb); -static inline int dn_dev_islocal(struct net_device *dev, dn_address addr) +static inline int dn_dev_islocal(struct net_device *dev, __le16 addr) { struct dn_dev *dn_db = dev->dn_ptr; struct dn_ifaddr *ifa; diff -puN include/net/dn_fib.h~git-net include/net/dn_fib.h --- devel/include/net/dn_fib.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/dn_fib.h 2006-03-17 23:03:48.000000000 -0800 @@ -37,7 +37,7 @@ struct dn_fib_nh { int nh_weight; int nh_power; int nh_oif; - u32 nh_gw; + __le16 nh_gw; }; struct dn_fib_info { @@ -48,7 +48,7 @@ struct dn_fib_info { int fib_dead; unsigned fib_flags; int fib_protocol; - dn_address fib_prefsrc; + __le16 fib_prefsrc; __u32 fib_priority; __u32 fib_metrics[RTAX_MAX]; #define dn_fib_mtu fib_metrics[RTAX_MTU-1] @@ -71,15 +71,15 @@ struct dn_fib_info { #define DN_FIB_RES_OIF(res) (DN_FIB_RES_NH(res).nh_oif) typedef struct { - u16 datum; + __le16 datum; } dn_fib_key_t; typedef struct { - u16 datum; + __le16 datum; } dn_fib_hash_t; typedef struct { - u16 datum; + __u16 datum; } dn_fib_idx_t; struct dn_fib_node { @@ -126,11 +126,11 @@ extern int dn_fib_semantic_match(int typ const struct flowi *fl, struct dn_fib_res *res); extern void dn_fib_release_info(struct dn_fib_info *fi); -extern u16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type); +extern __le16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type); extern void dn_fib_flush(void); extern void dn_fib_select_multipath(const struct flowi *fl, struct dn_fib_res *res); -extern int dn_fib_sync_down(dn_address local, struct net_device *dev, +extern int dn_fib_sync_down(__le16 local, struct net_device *dev, int force); extern int dn_fib_sync_up(struct net_device *dev); @@ -148,8 +148,8 @@ extern void dn_fib_table_cleanup(void); extern void dn_fib_rules_init(void); extern void dn_fib_rules_cleanup(void); extern void dn_fib_rule_put(struct dn_fib_rule *); -extern __u16 dn_fib_rules_policy(__u16 saddr, struct dn_fib_res *res, unsigned *flags); -extern unsigned dnet_addr_type(__u16 addr); +extern __le16 dn_fib_rules_policy(__le16 saddr, struct dn_fib_res *res, unsigned *flags); +extern unsigned dnet_addr_type(__le16 addr); extern int dn_fib_lookup(const struct flowi *fl, struct dn_fib_res *res); /* @@ -194,10 +194,10 @@ extern struct dn_fib_table *dn_fib_table #endif /* CONFIG_DECNET_ROUTER */ -static inline u16 dnet_make_mask(int n) +static inline __le16 dnet_make_mask(int n) { if (n) - return htons(~((1<<(16-n))-1)); + return dn_htons(~((1<<(16-n))-1)); return 0; } diff -puN include/net/dn.h~git-net include/net/dn.h --- devel/include/net/dn.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/dn.h 2006-03-17 23:03:48.000000000 -0800 @@ -6,10 +6,8 @@ #include #include -typedef unsigned short dn_address; - -#define dn_ntohs(x) le16_to_cpu((unsigned short)(x)) -#define dn_htons(x) cpu_to_le16((unsigned short)(x)) +#define dn_ntohs(x) le16_to_cpu(x) +#define dn_htons(x) cpu_to_le16(x) struct dn_scp /* Session Control Port */ { @@ -31,36 +29,36 @@ struct dn_scp #define DN_CL 15 /* Closed */ #define DN_CN 16 /* Closed Notification */ - unsigned short addrloc; - unsigned short addrrem; - unsigned short numdat; - unsigned short numoth; - unsigned short numoth_rcv; - unsigned short numdat_rcv; - unsigned short ackxmt_dat; - unsigned short ackxmt_oth; - unsigned short ackrcv_dat; - unsigned short ackrcv_oth; - unsigned char flowrem_sw; - unsigned char flowloc_sw; + __le16 addrloc; + __le16 addrrem; + __u16 numdat; + __u16 numoth; + __u16 numoth_rcv; + __u16 numdat_rcv; + __u16 ackxmt_dat; + __u16 ackxmt_oth; + __u16 ackrcv_dat; + __u16 ackrcv_oth; + __u8 flowrem_sw; + __u8 flowloc_sw; #define DN_SEND 2 #define DN_DONTSEND 1 #define DN_NOCHANGE 0 - unsigned short flowrem_dat; - unsigned short flowrem_oth; - unsigned short flowloc_dat; - unsigned short flowloc_oth; - unsigned char services_rem; - unsigned char services_loc; - unsigned char info_rem; - unsigned char info_loc; - - unsigned short segsize_rem; - unsigned short segsize_loc; - - unsigned char nonagle; - unsigned char multi_ireq; - unsigned char accept_mode; + __u16 flowrem_dat; + __u16 flowrem_oth; + __u16 flowloc_dat; + __u16 flowloc_oth; + __u8 services_rem; + __u8 services_loc; + __u8 info_rem; + __u8 info_loc; + + __u16 segsize_rem; + __u16 segsize_loc; + + __u8 nonagle; + __u8 multi_ireq; + __u8 accept_mode; unsigned long seg_total; /* Running total of current segment */ struct optdata_dn conndata_in; @@ -160,40 +158,41 @@ static inline struct dn_scp *DN_SK(struc */ #define DN_SKB_CB(skb) ((struct dn_skb_cb *)(skb)->cb) struct dn_skb_cb { - unsigned short dst; - unsigned short src; - unsigned short hops; - unsigned short dst_port; - unsigned short src_port; - unsigned char services; - unsigned char info; - unsigned char rt_flags; - unsigned char nsp_flags; - unsigned short segsize; - unsigned short segnum; - unsigned short xmit_count; + __le16 dst; + __le16 src; + __u16 hops; + __le16 dst_port; + __le16 src_port; + __u8 services; + __u8 info; + __u8 rt_flags; + __u8 nsp_flags; + __u16 segsize; + __u16 segnum; + __u16 xmit_count; unsigned long stamp; int iif; }; -static inline dn_address dn_eth2dn(unsigned char *ethaddr) +static inline __le16 dn_eth2dn(unsigned char *ethaddr) { - return ethaddr[4] | (ethaddr[5] << 8); + return dn_htons(ethaddr[4] | (ethaddr[5] << 8)); } -static inline dn_address dn_saddr2dn(struct sockaddr_dn *saddr) +static inline __le16 dn_saddr2dn(struct sockaddr_dn *saddr) { - return *(dn_address *)saddr->sdn_nodeaddr; + return *(__le16 *)saddr->sdn_nodeaddr; } -static inline void dn_dn2eth(unsigned char *ethaddr, dn_address addr) +static inline void dn_dn2eth(unsigned char *ethaddr, __le16 addr) { + __u16 a = dn_ntohs(addr); ethaddr[0] = 0xAA; ethaddr[1] = 0x00; ethaddr[2] = 0x04; ethaddr[3] = 0x00; - ethaddr[4] = (unsigned char)(addr & 0xff); - ethaddr[5] = (unsigned char)(addr >> 8); + ethaddr[4] = (__u8)(a & 0xff); + ethaddr[5] = (__u8)(a >> 8); } static inline void dn_sk_ports_copy(struct flowi *fl, struct dn_scp *scp) @@ -202,7 +201,7 @@ static inline void dn_sk_ports_copy(stru fl->uli_u.dnports.dport = scp->addrrem; fl->uli_u.dnports.objnum = scp->addr.sdn_objnum; if (fl->uli_u.dnports.objnum == 0) { - fl->uli_u.dnports.objnamel = scp->addr.sdn_objnamel; + fl->uli_u.dnports.objnamel = (__u8)dn_ntohs(scp->addr.sdn_objnamel); memcpy(fl->uli_u.dnports.objname, scp->addr.sdn_objname, 16); } } @@ -217,7 +216,7 @@ extern unsigned dn_mss_from_pmtu(struct extern struct sock *dn_sklist_find_listener(struct sockaddr_dn *addr); extern struct sock *dn_find_by_skb(struct sk_buff *skb); #define DN_ASCBUF_LEN 9 -extern char *dn_addr2asc(dn_address, char *); +extern char *dn_addr2asc(__u16, char *); extern int dn_destroy_timer(struct sock *sk); extern int dn_sockaddr2username(struct sockaddr_dn *addr, unsigned char *buf, unsigned char type); @@ -226,7 +225,7 @@ extern int dn_username2sockaddr(unsigned extern void dn_start_slow_timer(struct sock *sk); extern void dn_stop_slow_timer(struct sock *sk); -extern dn_address decnet_address; +extern __le16 decnet_address; extern int decnet_debug_level; extern int decnet_time_wait; extern int decnet_dn_count; diff -puN include/net/dn_neigh.h~git-net include/net/dn_neigh.h --- devel/include/net/dn_neigh.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/dn_neigh.h 2006-03-17 23:03:48.000000000 -0800 @@ -7,13 +7,13 @@ */ struct dn_neigh { struct neighbour n; - dn_address addr; + __le16 addr; unsigned long flags; #define DN_NDFLAG_R1 0x0001 /* Router L1 */ #define DN_NDFLAG_R2 0x0002 /* Router L2 */ #define DN_NDFLAG_P3 0x0004 /* Phase III Node */ unsigned long blksize; - unsigned char priority; + __u8 priority; }; extern void dn_neigh_init(void); diff -puN include/net/dn_nsp.h~git-net include/net/dn_nsp.h --- devel/include/net/dn_nsp.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/dn_nsp.h 2006-03-17 23:03:48.000000000 -0800 @@ -72,77 +72,77 @@ extern struct sk_buff *dn_alloc_send_skb struct nsp_data_seg_msg { - unsigned char msgflg; - unsigned short dstaddr; - unsigned short srcaddr; + __u8 msgflg; + __le16 dstaddr; + __le16 srcaddr; } __attribute__((packed)); struct nsp_data_opt_msg { - unsigned short acknum; - unsigned short segnum; - unsigned short lsflgs; + __le16 acknum; + __le16 segnum; + __le16 lsflgs; } __attribute__((packed)); struct nsp_data_opt_msg1 { - unsigned short acknum; - unsigned short segnum; + __le16 acknum; + __le16 segnum; } __attribute__((packed)); /* Acknowledgment Message (data/other data) */ struct nsp_data_ack_msg { - unsigned char msgflg; - unsigned short dstaddr; - unsigned short srcaddr; - unsigned short acknum; + __u8 msgflg; + __le16 dstaddr; + __le16 srcaddr; + __le16 acknum; } __attribute__((packed)); /* Connect Acknowledgment Message */ struct nsp_conn_ack_msg { - unsigned char msgflg; - unsigned short dstaddr; + __u8 msgflg; + __le16 dstaddr; } __attribute__((packed)); /* Connect Initiate/Retransmit Initiate/Connect Confirm */ struct nsp_conn_init_msg { - unsigned char msgflg; + __u8 msgflg; #define NSP_CI 0x18 /* Connect Initiate */ #define NSP_RCI 0x68 /* Retrans. Conn Init */ - unsigned short dstaddr; - unsigned short srcaddr; - unsigned char services; + __le16 dstaddr; + __le16 srcaddr; + __u8 services; #define NSP_FC_NONE 0x00 /* Flow Control None */ #define NSP_FC_SRC 0x04 /* Seg Req. Count */ #define NSP_FC_SCMC 0x08 /* Sess. Control Mess */ #define NSP_FC_MASK 0x0c /* FC type mask */ - unsigned char info; - unsigned short segsize; + __u8 info; + __le16 segsize; } __attribute__((packed)); /* Disconnect Initiate/Disconnect Confirm */ struct nsp_disconn_init_msg { - unsigned char msgflg; - unsigned short dstaddr; - unsigned short srcaddr; - unsigned short reason; + __u8 msgflg; + __le16 dstaddr; + __le16 srcaddr; + __le16 reason; } __attribute__((packed)); struct srcobj_fmt { - char format; - unsigned char task; - unsigned short grpcode; - unsigned short usrcode; - char dlen; + __u8 format; + __u8 task; + __le16 grpcode; + __le16 usrcode; + __u8 dlen; } __attribute__((packed)); /* @@ -150,7 +150,7 @@ struct srcobj_fmt * numbers used in NSP. Similar in operation to the functions * of the same name in TCP. */ -static __inline__ int dn_before(unsigned short seq1, unsigned short seq2) +static __inline__ int dn_before(__u16 seq1, __u16 seq2) { seq1 &= 0x0fff; seq2 &= 0x0fff; @@ -159,7 +159,7 @@ static __inline__ int dn_before(unsigned } -static __inline__ int dn_after(unsigned short seq1, unsigned short seq2) +static __inline__ int dn_after(__u16 seq1, __u16 seq2) { seq1 &= 0x0fff; seq2 &= 0x0fff; @@ -167,23 +167,23 @@ static __inline__ int dn_after(unsigned return (int)((seq2 - seq1) & 0x0fff) > 2048; } -static __inline__ int dn_equal(unsigned short seq1, unsigned short seq2) +static __inline__ int dn_equal(__u16 seq1, __u16 seq2) { return ((seq1 ^ seq2) & 0x0fff) == 0; } -static __inline__ int dn_before_or_equal(unsigned short seq1, unsigned short seq2) +static __inline__ int dn_before_or_equal(__u16 seq1, __u16 seq2) { return (dn_before(seq1, seq2) || dn_equal(seq1, seq2)); } -static __inline__ void seq_add(unsigned short *seq, unsigned short off) +static __inline__ void seq_add(__u16 *seq, __u16 off) { (*seq) += off; (*seq) &= 0x0fff; } -static __inline__ int seq_next(unsigned short seq1, unsigned short seq2) +static __inline__ int seq_next(__u16 seq1, __u16 seq2) { return dn_equal(seq1 + 1, seq2); } @@ -191,7 +191,7 @@ static __inline__ int seq_next(unsigned /* * Can we delay the ack ? */ -static __inline__ int sendack(unsigned short seq) +static __inline__ int sendack(__u16 seq) { return (int)((seq & 0x1000) ? 0 : 1); } diff -puN include/net/dn_route.h~git-net include/net/dn_route.h --- devel/include/net/dn_route.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/dn_route.h 2006-03-17 23:03:48.000000000 -0800 @@ -71,12 +71,12 @@ struct dn_route { struct dn_route *rt_next; } u; - __u16 rt_saddr; - __u16 rt_daddr; - __u16 rt_gateway; - __u16 rt_local_src; /* Source used for forwarding packets */ - __u16 rt_src_map; - __u16 rt_dst_map; + __le16 rt_saddr; + __le16 rt_daddr; + __le16 rt_gateway; + __le16 rt_local_src; /* Source used for forwarding packets */ + __le16 rt_src_map; + __le16 rt_dst_map; unsigned rt_flags; unsigned rt_type; diff -puN include/net/flow.h~git-net include/net/flow.h --- devel/include/net/flow.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/flow.h 2006-03-17 23:03:48.000000000 -0800 @@ -30,8 +30,8 @@ struct flowi { } ip6_u; struct { - __u16 daddr; - __u16 saddr; + __le16 daddr; + __le16 saddr; __u32 fwmark; __u8 scope; } dn_u; @@ -64,8 +64,8 @@ struct flowi { } icmpt; struct { - __u16 sport; - __u16 dport; + __le16 sport; + __le16 dport; __u8 objnum; __u8 objnamel; /* Not 16 bits since max val is 16 */ __u8 objname[16]; /* Not zero terminated */ diff -puN include/net/if_inet6.h~git-net include/net/if_inet6.h --- devel/include/net/if_inet6.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/if_inet6.h 2006-03-17 23:03:48.000000000 -0800 @@ -180,11 +180,8 @@ struct inet6_dev #ifdef CONFIG_IPV6_PRIVACY u8 rndid[8]; - u8 entropy[8]; struct timer_list regen_timer; struct inet6_ifaddr *tempaddr_list; - __u8 work_eui64[8]; - __u8 work_digest[16]; #endif struct neigh_parms *nd_parms; diff -puN include/net/inet_connection_sock.h~git-net include/net/inet_connection_sock.h --- devel/include/net/inet_connection_sock.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/inet_connection_sock.h 2006-03-17 23:03:48.000000000 -0800 @@ -50,6 +50,12 @@ struct inet_connection_sock_af_ops { char __user *optval, int optlen); int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); void (*addr2sockaddr)(struct sock *sk, struct sockaddr *); int sockaddr_len; }; @@ -72,6 +78,7 @@ struct inet_connection_sock_af_ops { * @icsk_probes_out: unanswered 0 window probes * @icsk_ext_hdr_len: Network protocol overhead (IP/IPv6 options) * @icsk_ack: Delayed ACK control data + * @icsk_mtup; MTU probing control data */ struct inet_connection_sock { /* inet_sock has to be the first member! */ @@ -104,6 +111,16 @@ struct inet_connection_sock { __u16 last_seg_size; /* Size of last incoming segment */ __u16 rcv_mss; /* MSS used for delayed ACK decisions */ } icsk_ack; + struct { + int enabled; + + /* Range of MTUs to search */ + int search_high; + int search_low; + + /* Information on the current probe. */ + int probe_size; + } icsk_mtup; u32 icsk_ca_priv[16]; #define ICSK_CA_PRIV_SIZE (16 * sizeof(u32)) }; @@ -310,4 +327,13 @@ extern void inet_csk_listen_stop(struct extern void inet_csk_addr2sockaddr(struct sock *sk, struct sockaddr *uaddr); +extern int inet_csk_ctl_sock_create(struct socket **sock, + unsigned short family, + unsigned short type, + unsigned char protocol); + +extern int inet_csk_compat_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen); +extern int inet_csk_compat_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen); #endif /* _INET_CONNECTION_SOCK_H */ diff -puN include/net/ip6_route.h~git-net include/net/ip6_route.h --- devel/include/net/ip6_route.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/ip6_route.h 2006-03-17 23:03:48.000000000 -0800 @@ -7,6 +7,23 @@ #define IP6_RT_PRIO_KERN 512 #define IP6_RT_FLOW_MASK 0x00ff +struct route_info { + __u8 type; + __u8 length; + __u8 prefix_len; +#if defined(__BIG_ENDIAN_BITFIELD) + __u8 reserved_h:3, + route_pref:2, + reserved_l:3; +#elif defined(__LITTLE_ENDIAN_BITFIELD) + __u8 reserved_l:3, + route_pref:2, + reserved_h:3; +#endif + __u32 lifetime; + __u8 prefix[0]; /* 0,8 or 16 */ +}; + #ifdef __KERNEL__ #include @@ -87,11 +104,14 @@ extern struct rt6_info *addrconf_dst_all extern struct rt6_info * rt6_get_dflt_router(struct in6_addr *addr, struct net_device *dev); extern struct rt6_info * rt6_add_dflt_router(struct in6_addr *gwaddr, - struct net_device *dev); + struct net_device *dev, + unsigned int pref); extern void rt6_purge_dflt_routers(void); -extern void rt6_reset_dflt_pointer(struct rt6_info *rt); +extern int rt6_route_rcv(struct net_device *dev, + u8 *opt, int len, + struct in6_addr *gwaddr); extern void rt6_redirect(struct in6_addr *dest, struct in6_addr *saddr, diff -puN include/net/ip.h~git-net include/net/ip.h --- devel/include/net/ip.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/ip.h 2006-03-17 23:03:48.000000000 -0800 @@ -356,6 +356,10 @@ extern void ip_cmsg_recv(struct msghdr * extern int ip_cmsg_send(struct msghdr *msg, struct ipcm_cookie *ipc); extern int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); extern int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); +extern int compat_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen); +extern int compat_ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen); extern int ip_ra_control(struct sock *sk, unsigned char on, void (*destructor)(struct sock *)); extern int ip_recv_error(struct sock *sk, struct msghdr *msg, int len); diff -puN include/net/ipv6.h~git-net include/net/ipv6.h --- devel/include/net/ipv6.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/ipv6.h 2006-03-17 23:03:48.000000000 -0800 @@ -282,6 +282,18 @@ static inline int ipv6_addr_cmp(const st return memcmp((const void *) a1, (const void *) a2, sizeof(struct in6_addr)); } +static inline int +ipv6_masked_addr_cmp(const struct in6_addr *a1, const struct in6_addr *m, + const struct in6_addr *a2) +{ + unsigned int i; + + for (i = 0; i < 4; i++) + if ((a1->s6_addr32[i] ^ a2->s6_addr32[i]) & m->s6_addr32[i]) + return 1; + return 0; +} + static inline void ipv6_addr_copy(struct in6_addr *a1, const struct in6_addr *a2) { memcpy((void *) a1, (const void *) a2, sizeof(struct in6_addr)); @@ -508,6 +520,16 @@ extern int ipv6_getsockopt(struct sock int optname, char __user *optval, int __user *optlen); +extern int compat_ipv6_setsockopt(struct sock *sk, + int level, + int optname, + char __user *optval, + int optlen); +extern int compat_ipv6_getsockopt(struct sock *sk, + int level, + int optname, + char __user *optval, + int __user *optlen); extern void ipv6_packet_init(void); diff -puN include/net/llc.h~git-net include/net/llc.h --- devel/include/net/llc.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/llc.h 2006-03-17 23:03:48.000000000 -0800 @@ -71,7 +71,7 @@ extern int llc_rcv(struct sk_buff *skb, struct packet_type *pt, struct net_device *orig_dev); extern int llc_mac_hdr_init(struct sk_buff *skb, - unsigned char *sa, unsigned char *da); + const unsigned char *sa, const unsigned char *da); extern void llc_add_pack(int type, void (*handler)(struct llc_sap *sap, struct sk_buff *skb)); diff -puN include/net/ndisc.h~git-net include/net/ndisc.h --- devel/include/net/ndisc.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/ndisc.h 2006-03-17 23:03:48.000000000 -0800 @@ -22,6 +22,8 @@ enum { ND_OPT_PREFIX_INFO = 3, /* RFC2461 */ ND_OPT_REDIRECT_HDR = 4, /* RFC2461 */ ND_OPT_MTU = 5, /* RFC2461 */ + __ND_OPT_ARRAY_MAX, + ND_OPT_ROUTE_INFO = 24, /* RFC4191 */ __ND_OPT_MAX }; diff -puN include/net/neighbour.h~git-net include/net/neighbour.h --- devel/include/net/neighbour.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/neighbour.h 2006-03-17 23:03:48.000000000 -0800 @@ -68,6 +68,7 @@ struct neigh_parms struct net_device *dev; struct neigh_parms *next; int (*neigh_setup)(struct neighbour *); + void (*neigh_destructor)(struct neighbour *); struct neigh_table *tbl; void *sysctl_table; @@ -145,7 +146,6 @@ struct neighbour struct neigh_ops { int family; - void (*destructor)(struct neighbour *); void (*solicit)(struct neighbour *, struct sk_buff*); void (*error_report)(struct neighbour *, struct sk_buff*); int (*output)(struct sk_buff*); diff -puN include/net/netfilter/nf_conntrack.h~git-net include/net/netfilter/nf_conntrack.h --- devel/include/net/netfilter/nf_conntrack.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/netfilter/nf_conntrack.h 2006-03-17 23:03:48.000000000 -0800 @@ -67,6 +67,18 @@ do { \ struct nf_conntrack_helper; +/* nf_conn feature for connections that have a helper */ +struct nf_conn_help { + /* Helper. if any */ + struct nf_conntrack_helper *helper; + + union nf_conntrack_help help; + + /* Current number of expected connections */ + unsigned int expecting; +}; + + #include struct nf_conn { @@ -81,6 +93,9 @@ struct nf_conn /* Have we seen traffic both ways yet? (bitset) */ unsigned long status; + /* If we were expected by an expectation, this will be it */ + struct nf_conn *master; + /* Timer function; drops refcnt when it goes off. */ struct timer_list timeout; @@ -88,38 +103,22 @@ struct nf_conn /* Accounting Information (same cache line as other written members) */ struct ip_conntrack_counter counters[IP_CT_DIR_MAX]; #endif - /* If we were expected by an expectation, this will be it */ - struct nf_conn *master; - - /* Current number of expected connections */ - unsigned int expecting; /* Unique ID that identifies this conntrack*/ unsigned int id; - /* Helper. if any */ - struct nf_conntrack_helper *helper; - /* features - nat, helper, ... used by allocating system */ u_int32_t features; - /* Storage reserved for other modules: */ - - union nf_conntrack_proto proto; - #if defined(CONFIG_NF_CONNTRACK_MARK) u_int32_t mark; #endif - /* These members are dynamically allocated. */ - - union nf_conntrack_help *help; + /* Storage reserved for other modules: */ + union nf_conntrack_proto proto; - /* Layer 3 dependent members. (ex: NAT) */ - union { - struct nf_conntrack_ipv4 *ipv4; - } l3proto; - void *data[0]; + /* features dynamically at the end: helper, nat (both optional) */ + char data[0]; }; struct nf_conntrack_expect @@ -373,10 +372,23 @@ nf_conntrack_expect_event(enum ip_conntr #define NF_CT_F_NUM 4 extern int -nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size, - int (*init_conntrack)(struct nf_conn *, u_int32_t)); +nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size); extern void nf_conntrack_unregister_cache(u_int32_t features); +/* valid combinations: + * basic: nf_conn, nf_conn .. nf_conn_help + * nat: nf_conn .. nf_conn_nat, nf_conn .. nf_conn_nat, nf_conn help + */ +static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) +{ + unsigned int offset = sizeof(struct nf_conn); + + if (!(ct->features & NF_CT_F_HELP)) + return NULL; + + return (struct nf_conn_help *) ((void *)ct + offset); +} + #endif /* __KERNEL__ */ #endif /* _NF_CONNTRACK_H */ diff -puN include/net/scm.h~git-net include/net/scm.h --- devel/include/net/scm.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/scm.h 2006-03-17 23:03:48.000000000 -0800 @@ -3,6 +3,7 @@ #include #include +#include /* Well, we should have at least one descriptor open * to accept passed FDs 8) @@ -19,6 +20,7 @@ struct scm_cookie { struct ucred creds; /* Skb credentials */ struct scm_fp_list *fp; /* Passed files */ + u32 sid; /* Passed security ID */ unsigned long seq; /* Connection seqno */ }; @@ -27,6 +29,10 @@ extern void scm_detach_fds_compat(struct extern int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm); extern void __scm_destroy(struct scm_cookie *scm); extern struct scm_fp_list * scm_fp_dup(struct scm_fp_list *fpl); +extern int scm_send(struct socket *sock, struct msghdr *msg, + struct scm_cookie *scm); +extern void scm_recv(struct socket *sock, struct msghdr *msg, + struct scm_cookie *scm, int flags); static __inline__ void scm_destroy(struct scm_cookie *scm) { @@ -34,38 +40,5 @@ static __inline__ void scm_destroy(struc __scm_destroy(scm); } -static __inline__ int scm_send(struct socket *sock, struct msghdr *msg, - struct scm_cookie *scm) -{ - memset(scm, 0, sizeof(*scm)); - scm->creds.uid = current->uid; - scm->creds.gid = current->gid; - scm->creds.pid = current->tgid; - if (msg->msg_controllen <= 0) - return 0; - return __scm_send(sock, msg, scm); -} - -static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg, - struct scm_cookie *scm, int flags) -{ - if (!msg->msg_control) - { - if (test_bit(SOCK_PASSCRED, &sock->flags) || scm->fp) - msg->msg_flags |= MSG_CTRUNC; - scm_destroy(scm); - return; - } - - if (test_bit(SOCK_PASSCRED, &sock->flags)) - put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm->creds), &scm->creds); - - if (!scm->fp) - return; - - scm_detach_fds(msg, scm); -} - - #endif /* __LINUX_NET_SCM_H */ diff -puN include/net/sctp/structs.h~git-net include/net/sctp/structs.h --- devel/include/net/sctp/structs.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/sctp/structs.h 2006-03-17 23:03:48.000000000 -0800 @@ -514,6 +514,16 @@ struct sctp_af { int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int optlen); + int (*compat_getsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int __user *optlen); struct dst_entry *(*get_dst) (struct sctp_association *asoc, union sctp_addr *daddr, union sctp_addr *saddr); diff -puN include/net/sock.h~git-net include/net/sock.h --- devel/include/net/sock.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/sock.h 2006-03-17 23:03:48.000000000 -0800 @@ -520,6 +520,14 @@ struct proto { int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *option); + int (*compat_setsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int __user *option); int (*sendmsg)(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len); int (*recvmsg)(struct kiocb *iocb, struct sock *sk, @@ -816,6 +824,10 @@ extern int sock_common_recvmsg(struct ki struct msghdr *msg, size_t size, int flags); extern int sock_common_setsockopt(struct socket *sock, int level, int optname, char __user *optval, int optlen); +extern int compat_sock_common_getsockopt(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); +extern int compat_sock_common_setsockopt(struct socket *sock, int level, + int optname, char __user *optval, int optlen); extern void sk_common_release(struct sock *sk); diff -puN include/net/tcp.h~git-net include/net/tcp.h --- devel/include/net/tcp.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/tcp.h 2006-03-17 23:03:48.000000000 -0800 @@ -60,6 +60,9 @@ extern void tcp_time_wait(struct sock *s /* Minimal RCV_MSS. */ #define TCP_MIN_RCVMSS 536U +/* The least MTU to use for probing */ +#define TCP_BASE_MSS 512 + /* After receiving this amount of duplicate ACKs fast retransmit starts. */ #define TCP_FASTRETRANS_THRESH 3 @@ -219,6 +222,9 @@ extern int sysctl_tcp_nometrics_save; extern int sysctl_tcp_moderate_rcvbuf; extern int sysctl_tcp_tso_win_divisor; extern int sysctl_tcp_abc; +extern int sysctl_tcp_mtu_probing; +extern int sysctl_tcp_base_mss; +extern int sysctl_tcp_workaround_signed_windows; extern atomic_t tcp_memory_allocated; extern atomic_t tcp_sockets_allocated; @@ -347,6 +353,12 @@ extern int tcp_getsockopt(struct sock extern int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +extern int compat_tcp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_tcp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); extern void tcp_set_keepalive(struct sock *sk, int val); extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, @@ -447,6 +459,10 @@ extern int tcp_read_sock(struct sock *sk extern void tcp_initialize_rcv_mss(struct sock *sk); +extern int tcp_mtu_to_mss(struct sock *sk, int pmtu); +extern int tcp_mss_to_mtu(struct sock *sk, int mss); +extern void tcp_mtup_init(struct sock *sk); + static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd) { tp->pred_flags = htonl((tp->tcp_header_len << 26) | diff -puN include/net/xfrm.h~git-net include/net/xfrm.h --- devel/include/net/xfrm.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/include/net/xfrm.h 2006-03-17 23:03:48.000000000 -0800 @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -20,7 +21,11 @@ #define XFRM_ALIGN8(len) (((len) + 7) & ~7) -extern struct semaphore xfrm_cfg_sem; +extern struct sock *xfrm_nl; +extern u32 sysctl_xfrm_aevent_etime; +extern u32 sysctl_xfrm_aevent_rseqth; + +extern struct mutex xfrm_cfg_mutex; /* Organization of SPD aka "XFRM rules" ------------------------------------ @@ -135,6 +140,16 @@ struct xfrm_state /* State for replay detection */ struct xfrm_replay_state replay; + /* Replay detection state at the time we sent the last notification */ + struct xfrm_replay_state preplay; + + /* Replay detection notification settings */ + u32 replay_maxage; + u32 replay_maxdiff; + + /* Replay detection notification timer */ + struct timer_list rtimer; + /* Statistics */ struct xfrm_stats stats; @@ -169,6 +184,7 @@ struct km_event u32 hard; u32 proto; u32 byid; + u32 aevent; } data; u32 seq; @@ -199,10 +215,13 @@ extern int xfrm_policy_register_afinfo(s extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); extern void km_state_notify(struct xfrm_state *x, struct km_event *c); - #define XFRM_ACQ_EXPIRES 30 struct xfrm_tmpl; +extern int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); +extern void km_state_expired(struct xfrm_state *x, int hard, u32 pid); +extern int __xfrm_state_delete(struct xfrm_state *x); + struct xfrm_state_afinfo { unsigned short family; rwlock_t lock; @@ -305,7 +324,21 @@ struct xfrm_policy struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 +/* which seqno */ +#define XFRM_REPLAY_SEQ 1 +#define XFRM_REPLAY_OSEQ 2 +#define XFRM_REPLAY_SEQ_MASK 3 +/* what happened */ +#define XFRM_REPLAY_UPDATE XFRM_AE_CR +#define XFRM_REPLAY_TIMEOUT XFRM_AE_CE + +/* default aevent timeout in units of 100ms */ +#define XFRM_AE_ETIME 10 +/* Async Event timer multiplier */ +#define XFRM_AE_ETH_M 10 +/* default seq threshold size */ +#define XFRM_AE_SEQT_SIZE 2 struct xfrm_mgr { @@ -865,6 +898,7 @@ extern int xfrm_state_delete(struct xfrm extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); +extern void xfrm_replay_notify(struct xfrm_state *x, int event); extern int xfrm_state_check(struct xfrm_state *x, struct sk_buff *skb); extern int xfrm_state_mtu(struct xfrm_state *x, int mtu); extern int xfrm_init_state(struct xfrm_state *x); @@ -924,7 +958,7 @@ extern void xfrm_init_pmtu(struct dst_en extern wait_queue_head_t km_waitq; extern int km_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); -extern void km_policy_expired(struct xfrm_policy *pol, int dir, int hard); +extern void km_policy_expired(struct xfrm_policy *pol, int dir, int hard, u32 pid); extern void xfrm_input_init(void); extern int xfrm_parse_spi(struct sk_buff *skb, u8 nexthdr, u32 *spi, u32 *seq); @@ -965,4 +999,24 @@ static inline int xfrm_policy_id2dir(u32 return index & 7; } +static inline int xfrm_aevent_is_on(void) +{ + struct sock *nlsk; + int ret = 0; + + rcu_read_lock(); + nlsk = rcu_dereference(xfrm_nl); + if (nlsk) + ret = netlink_has_listeners(nlsk, XFRMNLGRP_AEVENTS); + rcu_read_unlock(); + return ret; +} + +static inline void xfrm_aevent_doreplay(struct xfrm_state *x) +{ + if (xfrm_aevent_is_on()) + xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); +} + + #endif /* _NET_XFRM_H */ diff -puN net/8021q/vlan.c~git-net net/8021q/vlan.c --- devel/net/8021q/vlan.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/8021q/vlan.c 2006-03-17 23:03:48.000000000 -0800 @@ -69,7 +69,7 @@ static struct packet_type vlan_packet_ty /* Bits of netdev state that are propagated from real device to virtual */ #define VLAN_LINK_STATE_MASK \ - ((1<<__LINK_STATE_PRESENT)|(1<<__LINK_STATE_NOCARRIER)) + ((1<<__LINK_STATE_PRESENT)|(1<<__LINK_STATE_NOCARRIER)|(1<<__LINK_STATE_DORMANT)) /* End of global variables definitions. */ @@ -344,6 +344,26 @@ static void vlan_setup(struct net_device new_dev->do_ioctl = vlan_dev_ioctl; } +static void vlan_transfer_operstate(const struct net_device *dev, struct net_device *vlandev) +{ + /* Have to respect userspace enforced dormant state + * of real device, also must allow supplicant running + * on VLAN device + */ + if (dev->operstate == IF_OPER_DORMANT) + netif_dormant_on(vlandev); + else + netif_dormant_off(vlandev); + + if (netif_carrier_ok(dev)) { + if (!netif_carrier_ok(vlandev)) + netif_carrier_on(vlandev); + } else { + if (netif_carrier_ok(vlandev)) + netif_carrier_off(vlandev); + } +} + /* Attach a VLAN device to a mac address (ie Ethernet Card). * Returns the device that was created, or NULL if there was * an error of some kind. @@ -450,7 +470,7 @@ static struct net_device *register_vlan_ new_dev->flags = real_dev->flags; new_dev->flags &= ~IFF_UP; - new_dev->state = real_dev->state & VLAN_LINK_STATE_MASK; + new_dev->state = real_dev->state & ~(1<<__LINK_STATE_START); /* need 4 bytes for extra VLAN header info, * hope the underlying device can handle it. @@ -498,6 +518,10 @@ static struct net_device *register_vlan_ if (register_netdevice(new_dev)) goto out_free_newdev; + new_dev->iflink = real_dev->ifindex; + vlan_transfer_operstate(real_dev, new_dev); + linkwatch_fire_event(new_dev); /* _MUST_ call rfc2863_policy() */ + /* So, got the sucker initialized, now lets place * it into our local structure. */ @@ -573,25 +597,12 @@ static int vlan_device_event(struct noti switch (event) { case NETDEV_CHANGE: /* Propagate real device state to vlan devices */ - flgs = dev->state & VLAN_LINK_STATE_MASK; for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++) { vlandev = grp->vlan_devices[i]; if (!vlandev) continue; - if (netif_carrier_ok(dev)) { - if (!netif_carrier_ok(vlandev)) - netif_carrier_on(vlandev); - } else { - if (netif_carrier_ok(vlandev)) - netif_carrier_off(vlandev); - } - - if ((vlandev->state & VLAN_LINK_STATE_MASK) != flgs) { - vlandev->state = (vlandev->state &~ VLAN_LINK_STATE_MASK) - | flgs; - netdev_state_change(vlandev); - } + vlan_transfer_operstate(dev, vlandev); } break; diff -puN net/8021q/vlan_dev.c~git-net net/8021q/vlan_dev.c --- devel/net/8021q/vlan_dev.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/8021q/vlan_dev.c 2006-03-17 23:03:48.000000000 -0800 @@ -163,10 +163,8 @@ int vlan_skb_recv(struct sk_buff *skb, s stats->rx_packets++; stats->rx_bytes += skb->len; - skb_pull(skb, VLAN_HLEN); /* take off the VLAN header (4 bytes currently) */ - - /* Need to correct hardware checksum */ - skb_postpull_rcsum(skb, vhdr, VLAN_HLEN); + /* Take off the VLAN header (4 bytes currently) */ + skb_pull_rcsum(skb, VLAN_HLEN); /* Ok, lets check to make sure the device (dev) we * came in on is what this VLAN is attached to. diff -puN net/802/psnap.c~git-net net/802/psnap.c --- devel/net/802/psnap.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/802/psnap.c 2006-03-17 23:03:48.000000000 -0800 @@ -59,10 +59,8 @@ static int snap_rcv(struct sk_buff *skb, proto = find_snap_client(skb->h.raw); if (proto) { /* Pass the frame on. */ - u8 *hdr = skb->data; skb->h.raw += 5; - skb_pull(skb, 5); - skb_postpull_rcsum(skb, hdr, 5); + skb_pull_rcsum(skb, 5); rc = proto->rcvfunc(skb, dev, &snap_packet_type, orig_dev); } else { skb->sk = NULL; diff -puN net/atm/clip.c~git-net net/atm/clip.c --- devel/net/atm/clip.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/atm/clip.c 2006-03-17 23:03:48.000000000 -0800 @@ -289,7 +289,6 @@ static void clip_neigh_error(struct neig static struct neigh_ops clip_neigh_ops = { .family = AF_INET, - .destructor = clip_neigh_destroy, .solicit = clip_neigh_solicit, .error_report = clip_neigh_error, .output = dev_queue_xmit, @@ -347,6 +346,7 @@ static struct neigh_table clip_tbl = { /* parameters are copied from ARP ... */ .parms = { .tbl = &clip_tbl, + .neigh_destructor = clip_neigh_destroy, .base_reachable_time = 30 * HZ, .retrans_time = 1 * HZ, .gc_staletime = 60 * HZ, diff -puN net/atm/common.c~git-net net/atm/common.c --- devel/net/atm/common.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/atm/common.c 2006-03-17 23:03:48.000000000 -0800 @@ -451,12 +451,12 @@ int vcc_connect(struct socket *sock, int dev = try_then_request_module(atm_dev_lookup(itf), "atm-device-%d", itf); } else { dev = NULL; - down(&atm_dev_mutex); + mutex_lock(&atm_dev_mutex); if (!list_empty(&atm_devs)) { dev = list_entry(atm_devs.next, struct atm_dev, dev_list); atm_dev_hold(dev); } - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); } if (!dev) return -ENODEV; diff -puN net/atm/ioctl.c~git-net net/atm/ioctl.c --- devel/net/atm/ioctl.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/atm/ioctl.c 2006-03-17 23:03:48.000000000 -0800 @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "resources.h" @@ -25,22 +26,22 @@ #include "common.h" -static DECLARE_MUTEX(ioctl_mutex); +static DEFINE_MUTEX(ioctl_mutex); static LIST_HEAD(ioctl_list); void register_atm_ioctl(struct atm_ioctl *ioctl) { - down(&ioctl_mutex); + mutex_lock(&ioctl_mutex); list_add_tail(&ioctl->list, &ioctl_list); - up(&ioctl_mutex); + mutex_unlock(&ioctl_mutex); } void deregister_atm_ioctl(struct atm_ioctl *ioctl) { - down(&ioctl_mutex); + mutex_lock(&ioctl_mutex); list_del(&ioctl->list); - up(&ioctl_mutex); + mutex_unlock(&ioctl_mutex); } EXPORT_SYMBOL(register_atm_ioctl); @@ -137,7 +138,7 @@ int vcc_ioctl(struct socket *sock, unsig error = -ENOIOCTLCMD; - down(&ioctl_mutex); + mutex_lock(&ioctl_mutex); list_for_each(pos, &ioctl_list) { struct atm_ioctl * ic = list_entry(pos, struct atm_ioctl, list); if (try_module_get(ic->owner)) { @@ -147,7 +148,7 @@ int vcc_ioctl(struct socket *sock, unsig break; } } - up(&ioctl_mutex); + mutex_unlock(&ioctl_mutex); if (error != -ENOIOCTLCMD) goto done; diff -puN net/atm/resources.c~git-net net/atm/resources.c --- devel/net/atm/resources.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/atm/resources.c 2006-03-17 23:03:48.000000000 -0800 @@ -18,6 +18,8 @@ #include #include #include +#include + #include /* for struct sock */ #include "common.h" @@ -26,7 +28,7 @@ LIST_HEAD(atm_devs); -DECLARE_MUTEX(atm_dev_mutex); +DEFINE_MUTEX(atm_dev_mutex); static struct atm_dev *__alloc_atm_dev(const char *type) { @@ -65,9 +67,9 @@ struct atm_dev *atm_dev_lookup(int numbe { struct atm_dev *dev; - down(&atm_dev_mutex); + mutex_lock(&atm_dev_mutex); dev = __atm_dev_lookup(number); - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); return dev; } @@ -83,11 +85,11 @@ struct atm_dev *atm_dev_register(const c type); return NULL; } - down(&atm_dev_mutex); + mutex_lock(&atm_dev_mutex); if (number != -1) { if ((inuse = __atm_dev_lookup(number))) { atm_dev_put(inuse); - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); kfree(dev); return NULL; } @@ -112,12 +114,12 @@ struct atm_dev *atm_dev_register(const c printk(KERN_ERR "atm_dev_register: " "atm_proc_dev_register failed for dev %s\n", type); - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); kfree(dev); return NULL; } list_add_tail(&dev->dev_list, &atm_devs); - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); return dev; } @@ -133,9 +135,9 @@ void atm_dev_deregister(struct atm_dev * * with same number can appear, such we need deregister proc, * release async all vccs and remove them from vccs list too */ - down(&atm_dev_mutex); + mutex_lock(&atm_dev_mutex); list_del(&dev->dev_list); - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); atm_dev_release_vccs(dev); atm_proc_dev_deregister(dev); @@ -196,16 +198,16 @@ int atm_dev_ioctl(unsigned int cmd, void return -EFAULT; if (get_user(len, &iobuf->length)) return -EFAULT; - down(&atm_dev_mutex); + mutex_lock(&atm_dev_mutex); list_for_each(p, &atm_devs) size += sizeof(int); if (size > len) { - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); return -E2BIG; } tmp_buf = kmalloc(size, GFP_ATOMIC); if (!tmp_buf) { - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); return -ENOMEM; } tmp_p = tmp_buf; @@ -213,7 +215,7 @@ int atm_dev_ioctl(unsigned int cmd, void dev = list_entry(p, struct atm_dev, dev_list); *tmp_p++ = dev->number; } - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); error = ((copy_to_user(buf, tmp_buf, size)) || put_user(size, &iobuf->length)) ? -EFAULT : 0; @@ -400,13 +402,13 @@ static __inline__ void *dev_get_idx(loff void *atm_dev_seq_start(struct seq_file *seq, loff_t *pos) { - down(&atm_dev_mutex); + mutex_lock(&atm_dev_mutex); return *pos ? dev_get_idx(*pos) : (void *) 1; } void atm_dev_seq_stop(struct seq_file *seq, void *v) { - up(&atm_dev_mutex); + mutex_unlock(&atm_dev_mutex); } void *atm_dev_seq_next(struct seq_file *seq, void *v, loff_t *pos) diff -puN net/atm/resources.h~git-net net/atm/resources.h --- devel/net/atm/resources.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/atm/resources.h 2006-03-17 23:03:48.000000000 -0800 @@ -8,10 +8,11 @@ #include #include +#include extern struct list_head atm_devs; -extern struct semaphore atm_dev_mutex; +extern struct mutex atm_dev_mutex; int atm_dev_ioctl(unsigned int cmd, void __user *arg); diff -puN net/bluetooth/rfcomm/core.c~git-net net/bluetooth/rfcomm/core.c --- devel/net/bluetooth/rfcomm/core.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bluetooth/rfcomm/core.c 2006-03-17 23:03:48.000000000 -0800 @@ -37,6 +37,8 @@ #include #include #include +#include + #include #include #include @@ -57,9 +59,9 @@ static unsigned int l2cap_mtu = RFCOMM_M static struct task_struct *rfcomm_thread; -static DECLARE_MUTEX(rfcomm_sem); -#define rfcomm_lock() down(&rfcomm_sem); -#define rfcomm_unlock() up(&rfcomm_sem); +static DEFINE_MUTEX(rfcomm_mutex); +#define rfcomm_lock() mutex_lock(&rfcomm_mutex) +#define rfcomm_unlock() mutex_unlock(&rfcomm_mutex) static unsigned long rfcomm_event; diff -puN net/bridge/br.c~git-net net/bridge/br.c --- devel/net/bridge/br.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br.c 2006-03-17 23:03:48.000000000 -0800 @@ -19,13 +19,23 @@ #include #include #include +#include +#include #include "br_private.h" int (*br_should_route_hook) (struct sk_buff **pskb) = NULL; +static struct llc_sap *br_stp_sap; + static int __init br_init(void) { + br_stp_sap = llc_sap_open(LLC_SAP_BSPAN, br_stp_rcv); + if (!br_stp_sap) { + printk(KERN_ERR "bridge: can't register sap for STP\n"); + return -EBUSY; + } + br_fdb_init(); #ifdef CONFIG_BRIDGE_NETFILTER @@ -45,6 +55,8 @@ static int __init br_init(void) static void __exit br_deinit(void) { + llc_sap_close(br_stp_sap); + #ifdef CONFIG_BRIDGE_NETFILTER br_netfilter_fini(); #endif diff -puN net/bridge/br_device.c~git-net net/bridge/br_device.c --- devel/net/bridge/br_device.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_device.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,6 +27,7 @@ static struct net_device_stats *br_dev_g return &br->statistics; } +/* net device transmit always called with no BH (preempt_disabled) */ int br_dev_xmit(struct sk_buff *skb, struct net_device *dev) { struct net_bridge *br = netdev_priv(dev); @@ -39,7 +40,6 @@ int br_dev_xmit(struct sk_buff *skb, str skb->mac.raw = skb->data; skb_pull(skb, ETH_HLEN); - rcu_read_lock(); if (dest[0] & 1) br_flood_deliver(br, skb, 0); else if ((dst = __br_fdb_get(br, dest)) != NULL) @@ -47,7 +47,6 @@ int br_dev_xmit(struct sk_buff *skb, str else br_flood_deliver(br, skb, 0); - rcu_read_unlock(); return 0; } diff -puN net/bridge/br_fdb.c~git-net net/bridge/br_fdb.c --- devel/net/bridge/br_fdb.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_fdb.c 2006-03-17 23:03:48.000000000 -0800 @@ -341,7 +341,6 @@ void br_fdb_update(struct net_bridge *br if (hold_time(br) == 0) return; - rcu_read_lock(); fdb = fdb_find(head, addr); if (likely(fdb)) { /* attempt to update an entry for a local interface */ @@ -356,13 +355,12 @@ void br_fdb_update(struct net_bridge *br fdb->ageing_timer = jiffies; } } else { - spin_lock_bh(&br->hash_lock); + spin_lock(&br->hash_lock); if (!fdb_find(head, addr)) fdb_create(head, source, addr, 0); /* else we lose race and someone else inserts * it first, don't bother updating */ - spin_unlock_bh(&br->hash_lock); + spin_unlock(&br->hash_lock); } - rcu_read_unlock(); } diff -puN net/bridge/br_if.c~git-net net/bridge/br_if.c --- devel/net/bridge/br_if.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_if.c 2006-03-17 23:03:48.000000000 -0800 @@ -210,7 +210,8 @@ static struct net_device *new_bridge_dev br->bridge_id.prio[0] = 0x80; br->bridge_id.prio[1] = 0x00; - memset(br->bridge_id.addr, 0, ETH_ALEN); + + memcpy(br->group_addr, br_group_address, ETH_ALEN); br->feature_mask = dev->features; br->stp_enabled = 0; @@ -237,12 +238,11 @@ static int find_portno(struct net_bridge struct net_bridge_port *p; unsigned long *inuse; - inuse = kmalloc(BITS_TO_LONGS(BR_MAX_PORTS)*sizeof(unsigned long), + inuse = kcalloc(BITS_TO_LONGS(BR_MAX_PORTS), sizeof(unsigned long), GFP_KERNEL); if (!inuse) return -ENOMEM; - memset(inuse, 0, BITS_TO_LONGS(BR_MAX_PORTS)*sizeof(unsigned long)); set_bit(0, inuse); /* zero is reserved */ list_for_each_entry(p, &br->port_list, list) { set_bit(p->port_no, inuse); @@ -264,11 +264,10 @@ static struct net_bridge_port *new_nbp(s if (index < 0) return ERR_PTR(index); - p = kmalloc(sizeof(*p), GFP_KERNEL); + p = kzalloc(sizeof(*p), GFP_KERNEL); if (p == NULL) return ERR_PTR(-ENOMEM); - memset(p, 0, sizeof(*p)); p->br = br; dev_hold(dev); p->dev = dev; diff -puN net/bridge/br_input.c~git-net net/bridge/br_input.c --- devel/net/bridge/br_input.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_input.c 2006-03-17 23:03:48.000000000 -0800 @@ -19,13 +19,8 @@ #include #include "br_private.h" -const unsigned char bridge_ula[6] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 }; - -static int br_pass_frame_up_finish(struct sk_buff *skb) -{ - netif_receive_skb(skb); - return 0; -} +/* Bridge group multicast address 802.1d (pg 51). */ +const u8 br_group_address[ETH_ALEN] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 }; static void br_pass_frame_up(struct net_bridge *br, struct sk_buff *skb) { @@ -38,7 +33,7 @@ static void br_pass_frame_up(struct net_ skb->dev = br->dev; NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL, - br_pass_frame_up_finish); + netif_receive_skb); } /* note: already called with rcu_read_lock (preempt_disabled) */ @@ -100,6 +95,25 @@ drop: goto out; } +/* note: already called with rcu_read_lock (preempt_disabled) */ +static int br_handle_local_finish(struct sk_buff *skb) +{ + struct net_bridge_port *p = rcu_dereference(skb->dev->br_port); + + if (p && p->state != BR_STATE_DISABLED) + br_fdb_update(p->br, p, eth_hdr(skb)->h_source); + + return 0; /* process further */ +} + +/* Does address match the link local multicast address. + * 01:80:c2:00:00:0X + */ +static inline int is_link_local(const unsigned char *dest) +{ + return memcmp(dest, br_group_address, 5) == 0 && (dest[5] & 0xf0) == 0; +} + /* * Called via br_handle_frame_hook. * Return 0 if *pskb should be processed furthur @@ -117,15 +131,10 @@ int br_handle_frame(struct net_bridge_po if (!is_valid_ether_addr(eth_hdr(skb)->h_source)) goto err; - if (p->br->stp_enabled && - !memcmp(dest, bridge_ula, 5) && - !(dest[5] & 0xF0)) { - if (!dest[5]) { - NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev, - NULL, br_stp_handle_bpdu); - return 1; - } - goto err; + if (unlikely(is_link_local(dest))) { + skb->pkt_type = PACKET_HOST; + return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev, + NULL, br_handle_local_finish) != 0; } if (p->state == BR_STATE_FORWARDING || p->state == BR_STATE_LEARNING) { diff -puN net/bridge/br_netfilter.c~git-net net/bridge/br_netfilter.c --- devel/net/bridge/br_netfilter.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_netfilter.c 2006-03-17 23:03:48.000000000 -0800 @@ -61,15 +61,25 @@ static int brnf_filter_vlan_tagged = 1; #define brnf_filter_vlan_tagged 1 #endif -#define IS_VLAN_IP (skb->protocol == __constant_htons(ETH_P_8021Q) && \ - hdr->h_vlan_encapsulated_proto == __constant_htons(ETH_P_IP) && \ - brnf_filter_vlan_tagged) -#define IS_VLAN_IPV6 (skb->protocol == __constant_htons(ETH_P_8021Q) && \ - hdr->h_vlan_encapsulated_proto == __constant_htons(ETH_P_IPV6) && \ - brnf_filter_vlan_tagged) -#define IS_VLAN_ARP (skb->protocol == __constant_htons(ETH_P_8021Q) && \ - hdr->h_vlan_encapsulated_proto == __constant_htons(ETH_P_ARP) && \ - brnf_filter_vlan_tagged) +static __be16 inline vlan_proto(const struct sk_buff *skb) +{ + return vlan_eth_hdr(skb)->h_vlan_encapsulated_proto; +} + +#define IS_VLAN_IP(skb) \ + (skb->protocol == htons(ETH_P_8021Q) && \ + vlan_proto(skb) == htons(ETH_P_IP) && \ + brnf_filter_vlan_tagged) + +#define IS_VLAN_IPV6(skb) \ + (skb->protocol == htons(ETH_P_8021Q) && \ + vlan_proto(skb) == htons(ETH_P_IPV6) &&\ + brnf_filter_vlan_tagged) + +#define IS_VLAN_ARP(skb) \ + (skb->protocol == htons(ETH_P_8021Q) && \ + vlan_proto(skb) == htons(ETH_P_ARP) && \ + brnf_filter_vlan_tagged) /* We need these fake structures to make netfilter happy -- * lots of places assume that skb->dst != NULL, which isn't @@ -103,6 +113,25 @@ static inline struct net_device *bridge_ return port ? port->br->dev : NULL; } +static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb) +{ + skb->nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC); + if (likely(skb->nf_bridge)) + atomic_set(&(skb->nf_bridge->use), 1); + + return skb->nf_bridge; +} + +static inline void nf_bridge_save_header(struct sk_buff *skb) +{ + int header_size = 16; + + if (skb->protocol == htons(ETH_P_8021Q)) + header_size = 18; + + memcpy(skb->nf_bridge->data, skb->data - header_size, header_size); +} + /* PF_BRIDGE/PRE_ROUTING *********************************************/ /* Undo the changes made for ip6tables PREROUTING and continue the * bridge PRE_ROUTING hook. */ @@ -120,7 +149,7 @@ static int br_nf_pre_routing_finish_ipv6 dst_hold(skb->dst); skb->dev = nf_bridge->physindev; - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_push(skb, VLAN_HLEN); skb->nh.raw -= VLAN_HLEN; } @@ -136,7 +165,7 @@ static void __br_dnat_complain(void) if (jiffies - last_complaint >= 5 * HZ) { printk(KERN_WARNING "Performing cross-bridge DNAT requires IP " - "forwarding to be enabled\n"); + "forwarding to be enabled\n"); last_complaint = jiffies; } } @@ -196,7 +225,7 @@ static int br_nf_pre_routing_finish_brid if (!skb->dev) kfree_skb(skb); else { - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_pull(skb, VLAN_HLEN); skb->nh.raw += VLAN_HLEN; } @@ -218,12 +247,17 @@ static int br_nf_pre_routing_finish(stru nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING; if (dnat_took_place(skb)) { - if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, - dev)) { + if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, dev)) { struct rtable *rt; - struct flowi fl = { .nl_u = - { .ip4_u = { .daddr = iph->daddr, .saddr = 0 , - .tos = RT_TOS(iph->tos)} }, .proto = 0}; + struct flowi fl = { + .nl_u = { + .ip4_u = { + .daddr = iph->daddr, + .saddr = 0, + .tos = RT_TOS(iph->tos) }, + }, + .proto = 0, + }; if (!ip_route_output_key(&rt, &fl)) { /* - Bridged-and-DNAT'ed traffic doesn't @@ -247,7 +281,7 @@ bridged_dnat: nf_bridge->mask |= BRNF_BRIDGED_DNAT; skb->dev = nf_bridge->physindev; if (skb->protocol == - __constant_htons(ETH_P_8021Q)) { + htons(ETH_P_8021Q)) { skb_push(skb, VLAN_HLEN); skb->nh.raw -= VLAN_HLEN; } @@ -257,8 +291,7 @@ bridged_dnat: 1); return 0; } - memcpy(eth_hdr(skb)->h_dest, dev->dev_addr, - ETH_ALEN); + memcpy(eth_hdr(skb)->h_dest, dev->dev_addr, ETH_ALEN); skb->pkt_type = PACKET_HOST; } } else { @@ -267,7 +300,7 @@ bridged_dnat: } skb->dev = nf_bridge->physindev; - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_push(skb, VLAN_HLEN); skb->nh.raw -= VLAN_HLEN; } @@ -297,10 +330,10 @@ static struct net_device *setup_pre_rout /* We only check the length. A bridge shouldn't do any hop-by-hop stuff anyway */ static int check_hbh_len(struct sk_buff *skb) { - unsigned char *raw = (u8*)(skb->nh.ipv6h+1); + unsigned char *raw = (u8 *) (skb->nh.ipv6h + 1); u32 pkt_len; int off = raw - skb->nh.raw; - int len = (raw[1]+1)<<3; + int len = (raw[1] + 1) << 3; if ((raw + len) - skb->data > skb_headlen(skb)) goto bad; @@ -309,7 +342,7 @@ static int check_hbh_len(struct sk_buff len -= 2; while (len > 0) { - int optlen = skb->nh.raw[off+1]+2; + int optlen = skb->nh.raw[off + 1] + 2; switch (skb->nh.raw[off]) { case IPV6_TLV_PAD0: @@ -320,16 +353,16 @@ static int check_hbh_len(struct sk_buff break; case IPV6_TLV_JUMBO: - if (skb->nh.raw[off+1] != 4 || (off&3) != 2) + if (skb->nh.raw[off + 1] != 4 || (off & 3) != 2) goto bad; - pkt_len = ntohl(*(u32*)(skb->nh.raw+off+2)); + pkt_len = ntohl(*(u32 *) (skb->nh.raw + off + 2)); if (pkt_len <= IPV6_MAXPLEN || skb->nh.ipv6h->payload_len) goto bad; if (pkt_len > skb->len - sizeof(struct ipv6hdr)) goto bad; if (pskb_trim_rcsum(skb, - pkt_len+sizeof(struct ipv6hdr))) + pkt_len + sizeof(struct ipv6hdr))) goto bad; break; default: @@ -350,12 +383,13 @@ bad: /* Replicate the checks that IPv6 does on packet reception and pass the packet * to ip6tables, which doesn't support NAT, so things are fairly simple. */ static unsigned int br_nf_pre_routing_ipv6(unsigned int hook, - struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, int (*okfn)(struct sk_buff *)) + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { struct ipv6hdr *hdr; u32 pkt_len; - struct nf_bridge_info *nf_bridge; if (skb->len < sizeof(struct ipv6hdr)) goto inhdr_error; @@ -381,10 +415,10 @@ static unsigned int br_nf_pre_routing_ip } } if (hdr->nexthdr == NEXTHDR_HOP && check_hbh_len(skb)) - goto inhdr_error; + goto inhdr_error; - nf_bridge_put(skb->nf_bridge); - if ((nf_bridge = nf_bridge_alloc(skb)) == NULL) + nf_bridge_put(skb->nf_bridge); + if (!nf_bridge_alloc(skb)) return NF_DROP; if (!setup_pre_routing(skb)) return NF_DROP; @@ -412,10 +446,8 @@ static unsigned int br_nf_pre_routing(un struct iphdr *iph; __u32 len; struct sk_buff *skb = *pskb; - struct nf_bridge_info *nf_bridge; - struct vlan_ethhdr *hdr = vlan_eth_hdr(*pskb); - if (skb->protocol == __constant_htons(ETH_P_IPV6) || IS_VLAN_IPV6) { + if (skb->protocol == htons(ETH_P_IPV6) || IS_VLAN_IPV6(skb)) { #ifdef CONFIG_SYSCTL if (!brnf_call_ip6tables) return NF_ACCEPT; @@ -423,10 +455,8 @@ static unsigned int br_nf_pre_routing(un if ((skb = skb_share_check(*pskb, GFP_ATOMIC)) == NULL) goto out; - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { - u8 *vhdr = skb->data; - skb_pull(skb, VLAN_HLEN); - skb_postpull_rcsum(skb, vhdr, VLAN_HLEN); + if (skb->protocol == htons(ETH_P_8021Q)) { + skb_pull_rcsum(skb, VLAN_HLEN); skb->nh.raw += VLAN_HLEN; } return br_nf_pre_routing_ipv6(hook, skb, in, out, okfn); @@ -436,16 +466,14 @@ static unsigned int br_nf_pre_routing(un return NF_ACCEPT; #endif - if (skb->protocol != __constant_htons(ETH_P_IP) && !IS_VLAN_IP) + if (skb->protocol != htons(ETH_P_IP) && !IS_VLAN_IP(skb)) return NF_ACCEPT; if ((skb = skb_share_check(*pskb, GFP_ATOMIC)) == NULL) goto out; - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { - u8 *vhdr = skb->data; - skb_pull(skb, VLAN_HLEN); - skb_postpull_rcsum(skb, vhdr, VLAN_HLEN); + if (skb->protocol == htons(ETH_P_8021Q)) { + skb_pull_rcsum(skb, VLAN_HLEN); skb->nh.raw += VLAN_HLEN; } @@ -456,15 +484,15 @@ static unsigned int br_nf_pre_routing(un if (iph->ihl < 5 || iph->version != 4) goto inhdr_error; - if (!pskb_may_pull(skb, 4*iph->ihl)) + if (!pskb_may_pull(skb, 4 * iph->ihl)) goto inhdr_error; iph = skb->nh.iph; - if (ip_fast_csum((__u8 *)iph, iph->ihl) != 0) + if (ip_fast_csum((__u8 *) iph, iph->ihl) != 0) goto inhdr_error; len = ntohs(iph->tot_len); - if (skb->len < len || len < 4*iph->ihl) + if (skb->len < len || len < 4 * iph->ihl) goto inhdr_error; if (skb->len > len) { @@ -473,8 +501,8 @@ static unsigned int br_nf_pre_routing(un skb->ip_summed = CHECKSUM_NONE; } - nf_bridge_put(skb->nf_bridge); - if ((nf_bridge = nf_bridge_alloc(skb)) == NULL) + nf_bridge_put(skb->nf_bridge); + if (!nf_bridge_alloc(skb)) return NF_DROP; if (!setup_pre_routing(skb)) return NF_DROP; @@ -486,7 +514,7 @@ static unsigned int br_nf_pre_routing(un return NF_STOLEN; inhdr_error: -// IP_INC_STATS_BH(IpInHdrErrors); +// IP_INC_STATS_BH(IpInHdrErrors); out: return NF_DROP; } @@ -500,8 +528,9 @@ out: * register an IPv4 PRE_ROUTING 'sabotage' hook that will * prevent this from happening. */ static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff **pskb, - const struct net_device *in, const struct net_device *out, - int (*okfn)(struct sk_buff *)) + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { struct sk_buff *skb = *pskb; @@ -513,15 +542,13 @@ static unsigned int br_nf_local_in(unsig return NF_ACCEPT; } - /* PF_BRIDGE/FORWARD *************************************************/ static int br_nf_forward_finish(struct sk_buff *skb) { struct nf_bridge_info *nf_bridge = skb->nf_bridge; struct net_device *in; - struct vlan_ethhdr *hdr = vlan_eth_hdr(skb); - if (skb->protocol != __constant_htons(ETH_P_ARP) && !IS_VLAN_ARP) { + if (skb->protocol != htons(ETH_P_ARP) && !IS_VLAN_ARP(skb)) { in = nf_bridge->physindev; if (nf_bridge->mask & BRNF_PKT_TYPE) { skb->pkt_type = PACKET_OTHERHOST; @@ -530,12 +557,12 @@ static int br_nf_forward_finish(struct s } else { in = *((struct net_device **)(skb->cb)); } - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_push(skb, VLAN_HLEN); skb->nh.raw -= VLAN_HLEN; } NF_HOOK_THRESH(PF_BRIDGE, NF_BR_FORWARD, skb, in, - skb->dev, br_forward_finish, 1); + skb->dev, br_forward_finish, 1); return 0; } @@ -545,12 +572,12 @@ static int br_nf_forward_finish(struct s * because of the physdev module. For ARP, indev and outdev are the * bridge ports. */ static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff **pskb, - const struct net_device *in, const struct net_device *out, - int (*okfn)(struct sk_buff *)) + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { struct sk_buff *skb = *pskb; struct nf_bridge_info *nf_bridge; - struct vlan_ethhdr *hdr = vlan_eth_hdr(skb); struct net_device *parent; int pf; @@ -561,12 +588,12 @@ static unsigned int br_nf_forward_ip(uns if (!parent) return NF_DROP; - if (skb->protocol == __constant_htons(ETH_P_IP) || IS_VLAN_IP) + if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb)) pf = PF_INET; else pf = PF_INET6; - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_pull(*pskb, VLAN_HLEN); (*pskb)->nh.raw += VLAN_HLEN; } @@ -588,11 +615,11 @@ static unsigned int br_nf_forward_ip(uns } static unsigned int br_nf_forward_arp(unsigned int hook, struct sk_buff **pskb, - const struct net_device *in, const struct net_device *out, - int (*okfn)(struct sk_buff *)) + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { struct sk_buff *skb = *pskb; - struct vlan_ethhdr *hdr = vlan_eth_hdr(skb); struct net_device **d = (struct net_device **)(skb->cb); #ifdef CONFIG_SYSCTL @@ -600,15 +627,15 @@ static unsigned int br_nf_forward_arp(un return NF_ACCEPT; #endif - if (skb->protocol != __constant_htons(ETH_P_ARP)) { - if (!IS_VLAN_ARP) + if (skb->protocol != htons(ETH_P_ARP)) { + if (!IS_VLAN_ARP(skb)) return NF_ACCEPT; skb_pull(*pskb, VLAN_HLEN); (*pskb)->nh.raw += VLAN_HLEN; } if (skb->nh.arph->ar_pln != 4) { - if (IS_VLAN_ARP) { + if (IS_VLAN_ARP(skb)) { skb_push(*pskb, VLAN_HLEN); (*pskb)->nh.raw -= VLAN_HLEN; } @@ -621,17 +648,16 @@ static unsigned int br_nf_forward_arp(un return NF_STOLEN; } - /* PF_BRIDGE/LOCAL_OUT ***********************************************/ static int br_nf_local_out_finish(struct sk_buff *skb) { - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_push(skb, VLAN_HLEN); skb->nh.raw -= VLAN_HLEN; } NF_HOOK_THRESH(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev, - br_forward_finish, NF_BR_PRI_FIRST + 1); + br_forward_finish, NF_BR_PRI_FIRST + 1); return 0; } @@ -657,19 +683,19 @@ static int br_nf_local_out_finish(struct * even routed packets that didn't arrive on a bridge interface have their * nf_bridge->physindev set. */ static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff **pskb, - const struct net_device *in, const struct net_device *out, - int (*okfn)(struct sk_buff *)) + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { struct net_device *realindev, *realoutdev; struct sk_buff *skb = *pskb; struct nf_bridge_info *nf_bridge; - struct vlan_ethhdr *hdr = vlan_eth_hdr(skb); int pf; if (!skb->nf_bridge) return NF_ACCEPT; - if (skb->protocol == __constant_htons(ETH_P_IP) || IS_VLAN_IP) + if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb)) pf = PF_INET; else pf = PF_INET6; @@ -695,7 +721,7 @@ static unsigned int br_nf_local_out(unsi skb->pkt_type = PACKET_OTHERHOST; nf_bridge->mask ^= BRNF_PKT_TYPE; } - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_push(skb, VLAN_HLEN); skb->nh.raw -= VLAN_HLEN; } @@ -713,14 +739,14 @@ static unsigned int br_nf_local_out(unsi if (nf_bridge->netoutdev) realoutdev = nf_bridge->netoutdev; #endif - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_pull(skb, VLAN_HLEN); (*pskb)->nh.raw += VLAN_HLEN; } /* IP forwarded traffic has a physindev, locally * generated traffic hasn't. */ if (realindev != NULL) { - if (!(nf_bridge->mask & BRNF_DONT_TAKE_PARENT) ) { + if (!(nf_bridge->mask & BRNF_DONT_TAKE_PARENT)) { struct net_device *parent = bridge_parent(realindev); if (parent) realindev = parent; @@ -742,12 +768,12 @@ out: /* PF_BRIDGE/POST_ROUTING ********************************************/ static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff **pskb, - const struct net_device *in, const struct net_device *out, - int (*okfn)(struct sk_buff *)) + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { struct sk_buff *skb = *pskb; struct nf_bridge_info *nf_bridge = (*pskb)->nf_bridge; - struct vlan_ethhdr *hdr = vlan_eth_hdr(skb); struct net_device *realoutdev = bridge_parent(skb->dev); int pf; @@ -756,7 +782,7 @@ static unsigned int br_nf_post_routing(u * keep the check just to be sure... */ if (skb->mac.raw < skb->head || skb->mac.raw + ETH_HLEN > skb->data) { printk(KERN_CRIT "br_netfilter: Argh!! br_nf_post_routing: " - "bad mac.raw pointer."); + "bad mac.raw pointer."); goto print_error; } #endif @@ -767,7 +793,7 @@ static unsigned int br_nf_post_routing(u if (!realoutdev) return NF_DROP; - if (skb->protocol == __constant_htons(ETH_P_IP) || IS_VLAN_IP) + if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb)) pf = PF_INET; else pf = PF_INET6; @@ -786,7 +812,7 @@ static unsigned int br_nf_post_routing(u nf_bridge->mask |= BRNF_PKT_TYPE; } - if (skb->protocol == __constant_htons(ETH_P_8021Q)) { + if (skb->protocol == htons(ETH_P_8021Q)) { skb_pull(skb, VLAN_HLEN); skb->nh.raw += VLAN_HLEN; } @@ -798,7 +824,7 @@ static unsigned int br_nf_post_routing(u realoutdev = nf_bridge->netoutdev; #endif NF_HOOK(pf, NF_IP_POST_ROUTING, skb, NULL, realoutdev, - br_dev_queue_push_xmit); + br_dev_queue_push_xmit); return NF_STOLEN; @@ -810,18 +836,18 @@ print_error: printk("[%s]", realoutdev->name); } printk(" head:%p, raw:%p, data:%p\n", skb->head, skb->mac.raw, - skb->data); + skb->data); return NF_ACCEPT; #endif } - /* IP/SABOTAGE *****************************************************/ /* Don't hand locally destined packets to PF_INET(6)/PRE_ROUTING * for the second time. */ static unsigned int ip_sabotage_in(unsigned int hook, struct sk_buff **pskb, - const struct net_device *in, const struct net_device *out, - int (*okfn)(struct sk_buff *)) + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { if ((*pskb)->nf_bridge && !((*pskb)->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) { @@ -835,18 +861,18 @@ static unsigned int ip_sabotage_in(unsig * and PF_INET(6)/POST_ROUTING until we have done the forwarding * decision in the bridge code and have determined nf_bridge->physoutdev. */ static unsigned int ip_sabotage_out(unsigned int hook, struct sk_buff **pskb, - const struct net_device *in, const struct net_device *out, - int (*okfn)(struct sk_buff *)) + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) { struct sk_buff *skb = *pskb; if ((out->hard_start_xmit == br_dev_xmit && - okfn != br_nf_forward_finish && - okfn != br_nf_local_out_finish && - okfn != br_dev_queue_push_xmit) + okfn != br_nf_forward_finish && + okfn != br_nf_local_out_finish && okfn != br_dev_queue_push_xmit) #if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE) || ((out->priv_flags & IFF_802_1Q_VLAN) && - VLAN_DEV_INFO(out)->real_dev->hard_start_xmit == br_dev_xmit) + VLAN_DEV_INFO(out)->real_dev->hard_start_xmit == br_dev_xmit) #endif ) { struct nf_bridge_info *nf_bridge; @@ -971,8 +997,8 @@ static struct nf_hook_ops br_nf_ops[] = #ifdef CONFIG_SYSCTL static -int brnf_sysctl_call_tables(ctl_table *ctl, int write, struct file * filp, - void __user *buffer, size_t *lenp, loff_t *ppos) +int brnf_sysctl_call_tables(ctl_table * ctl, int write, struct file *filp, + void __user * buffer, size_t * lenp, loff_t * ppos) { int ret; @@ -1059,7 +1085,8 @@ int br_netfilter_init(void) #ifdef CONFIG_SYSCTL brnf_sysctl_header = register_sysctl_table(brnf_net_table, 0); if (brnf_sysctl_header == NULL) { - printk(KERN_WARNING "br_netfilter: can't register to sysctl.\n"); + printk(KERN_WARNING + "br_netfilter: can't register to sysctl.\n"); for (i = 0; i < ARRAY_SIZE(br_nf_ops); i++) nf_unregister_hook(&br_nf_ops[i]); return -EFAULT; diff -puN net/bridge/br_private.h~git-net net/bridge/br_private.h --- devel/net/bridge/br_private.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_private.h 2006-03-17 23:03:48.000000000 -0800 @@ -109,6 +109,7 @@ struct net_bridge unsigned long bridge_hello_time; unsigned long bridge_forward_delay; + u8 group_addr[ETH_ALEN]; u16 root_port; unsigned char stp_enabled; unsigned char topology_change; @@ -122,7 +123,7 @@ struct net_bridge }; extern struct notifier_block br_device_notifier; -extern const unsigned char bridge_ula[6]; +extern const u8 br_group_address[ETH_ALEN]; /* called under bridge lock */ static inline int br_is_root_bridge(const struct net_bridge *br) @@ -217,7 +218,8 @@ extern void br_stp_set_path_cost(struct extern ssize_t br_show_bridge_id(char *buf, const struct bridge_id *id); /* br_stp_bpdu.c */ -extern int br_stp_handle_bpdu(struct sk_buff *skb); +extern int br_stp_rcv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, struct net_device *orig_dev); /* br_stp_timer.c */ extern void br_stp_timer_init(struct net_bridge *br); diff -puN net/bridge/br_stp_bpdu.c~git-net net/bridge/br_stp_bpdu.c --- devel/net/bridge/br_stp_bpdu.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_stp_bpdu.c 2006-03-17 23:03:48.000000000 -0800 @@ -15,158 +15,162 @@ #include #include +#include +#include +#include +#include #include "br_private.h" #include "br_private_stp.h" -#define JIFFIES_TO_TICKS(j) (((j) << 8) / HZ) -#define TICKS_TO_JIFFIES(j) (((j) * HZ) >> 8) +#define STP_HZ 256 -static void br_send_bpdu(struct net_bridge_port *p, unsigned char *data, int length) +#define LLC_RESERVE sizeof(struct llc_pdu_un) + +static void br_send_bpdu(struct net_bridge_port *p, + const unsigned char *data, int length) { - struct net_device *dev; struct sk_buff *skb; - int size; if (!p->br->stp_enabled) return; - size = length + 2*ETH_ALEN + 2; - if (size < 60) - size = 60; - - dev = p->dev; - - if ((skb = dev_alloc_skb(size)) == NULL) { - printk(KERN_INFO "br: memory squeeze!\n"); + skb = dev_alloc_skb(length+LLC_RESERVE); + if (!skb) return; - } - skb->dev = dev; + skb->dev = p->dev; skb->protocol = htons(ETH_P_802_2); - skb->mac.raw = skb_put(skb, size); - memcpy(skb->mac.raw, bridge_ula, ETH_ALEN); - memcpy(skb->mac.raw+ETH_ALEN, dev->dev_addr, ETH_ALEN); - skb->mac.raw[2*ETH_ALEN] = 0; - skb->mac.raw[2*ETH_ALEN+1] = length; - skb->nh.raw = skb->mac.raw + 2*ETH_ALEN + 2; - memcpy(skb->nh.raw, data, length); - memset(skb->nh.raw + length, 0xa5, size - length - 2*ETH_ALEN - 2); + + skb_reserve(skb, LLC_RESERVE); + memcpy(__skb_put(skb, length), data, length); + + llc_pdu_header_init(skb, LLC_PDU_TYPE_U, LLC_SAP_BSPAN, + LLC_SAP_BSPAN, LLC_PDU_CMD); + llc_pdu_init_as_ui_cmd(skb); + + llc_mac_hdr_init(skb, p->dev->dev_addr, p->br->group_addr); NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev, dev_queue_xmit); } -static __inline__ void br_set_ticks(unsigned char *dest, int jiff) +static inline void br_set_ticks(unsigned char *dest, int j) { - __u16 ticks; + unsigned long ticks = (STP_HZ * j)/ HZ; - ticks = JIFFIES_TO_TICKS(jiff); - dest[0] = (ticks >> 8) & 0xFF; - dest[1] = ticks & 0xFF; + *((__be16 *) dest) = htons(ticks); } -static __inline__ int br_get_ticks(unsigned char *dest) +static inline int br_get_ticks(const unsigned char *src) { - return TICKS_TO_JIFFIES((dest[0] << 8) | dest[1]); + unsigned long ticks = ntohs(*(__be16 *)src); + + return (ticks * HZ + STP_HZ - 1) / STP_HZ; } /* called under bridge lock */ void br_send_config_bpdu(struct net_bridge_port *p, struct br_config_bpdu *bpdu) { - unsigned char buf[38]; + unsigned char buf[35]; - buf[0] = 0x42; - buf[1] = 0x42; - buf[2] = 0x03; - buf[3] = 0; - buf[4] = 0; - buf[5] = 0; - buf[6] = BPDU_TYPE_CONFIG; - buf[7] = (bpdu->topology_change ? 0x01 : 0) | + buf[0] = 0; + buf[1] = 0; + buf[2] = 0; + buf[3] = BPDU_TYPE_CONFIG; + buf[4] = (bpdu->topology_change ? 0x01 : 0) | (bpdu->topology_change_ack ? 0x80 : 0); - buf[8] = bpdu->root.prio[0]; - buf[9] = bpdu->root.prio[1]; - buf[10] = bpdu->root.addr[0]; - buf[11] = bpdu->root.addr[1]; - buf[12] = bpdu->root.addr[2]; - buf[13] = bpdu->root.addr[3]; - buf[14] = bpdu->root.addr[4]; - buf[15] = bpdu->root.addr[5]; - buf[16] = (bpdu->root_path_cost >> 24) & 0xFF; - buf[17] = (bpdu->root_path_cost >> 16) & 0xFF; - buf[18] = (bpdu->root_path_cost >> 8) & 0xFF; - buf[19] = bpdu->root_path_cost & 0xFF; - buf[20] = bpdu->bridge_id.prio[0]; - buf[21] = bpdu->bridge_id.prio[1]; - buf[22] = bpdu->bridge_id.addr[0]; - buf[23] = bpdu->bridge_id.addr[1]; - buf[24] = bpdu->bridge_id.addr[2]; - buf[25] = bpdu->bridge_id.addr[3]; - buf[26] = bpdu->bridge_id.addr[4]; - buf[27] = bpdu->bridge_id.addr[5]; - buf[28] = (bpdu->port_id >> 8) & 0xFF; - buf[29] = bpdu->port_id & 0xFF; - - br_set_ticks(buf+30, bpdu->message_age); - br_set_ticks(buf+32, bpdu->max_age); - br_set_ticks(buf+34, bpdu->hello_time); - br_set_ticks(buf+36, bpdu->forward_delay); + buf[5] = bpdu->root.prio[0]; + buf[6] = bpdu->root.prio[1]; + buf[7] = bpdu->root.addr[0]; + buf[8] = bpdu->root.addr[1]; + buf[9] = bpdu->root.addr[2]; + buf[10] = bpdu->root.addr[3]; + buf[11] = bpdu->root.addr[4]; + buf[12] = bpdu->root.addr[5]; + buf[13] = (bpdu->root_path_cost >> 24) & 0xFF; + buf[14] = (bpdu->root_path_cost >> 16) & 0xFF; + buf[15] = (bpdu->root_path_cost >> 8) & 0xFF; + buf[16] = bpdu->root_path_cost & 0xFF; + buf[17] = bpdu->bridge_id.prio[0]; + buf[18] = bpdu->bridge_id.prio[1]; + buf[19] = bpdu->bridge_id.addr[0]; + buf[20] = bpdu->bridge_id.addr[1]; + buf[21] = bpdu->bridge_id.addr[2]; + buf[22] = bpdu->bridge_id.addr[3]; + buf[23] = bpdu->bridge_id.addr[4]; + buf[24] = bpdu->bridge_id.addr[5]; + buf[25] = (bpdu->port_id >> 8) & 0xFF; + buf[26] = bpdu->port_id & 0xFF; + + br_set_ticks(buf+27, bpdu->message_age); + br_set_ticks(buf+29, bpdu->max_age); + br_set_ticks(buf+31, bpdu->hello_time); + br_set_ticks(buf+33, bpdu->forward_delay); - br_send_bpdu(p, buf, 38); + br_send_bpdu(p, buf, 35); } /* called under bridge lock */ void br_send_tcn_bpdu(struct net_bridge_port *p) { - unsigned char buf[7]; + unsigned char buf[4]; - buf[0] = 0x42; - buf[1] = 0x42; - buf[2] = 0x03; - buf[3] = 0; - buf[4] = 0; - buf[5] = 0; - buf[6] = BPDU_TYPE_TCN; + buf[0] = 0; + buf[1] = 0; + buf[2] = 0; + buf[3] = BPDU_TYPE_TCN; br_send_bpdu(p, buf, 7); } -static const unsigned char header[6] = {0x42, 0x42, 0x03, 0x00, 0x00, 0x00}; - -/* NO locks, but rcu_read_lock (preempt_disabled) */ -int br_stp_handle_bpdu(struct sk_buff *skb) -{ - struct net_bridge_port *p = rcu_dereference(skb->dev->br_port); +/* + * Called from llc. + * + * NO locks, but rcu_read_lock (preempt_disabled) + */ +int br_stp_rcv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, struct net_device *orig_dev) +{ + const struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb); + const unsigned char *dest = eth_hdr(skb)->h_dest; + struct net_bridge_port *p = rcu_dereference(dev->br_port); struct net_bridge *br; - unsigned char *buf; + const unsigned char *buf; if (!p) goto err; - br = p->br; - spin_lock(&br->lock); + if (pdu->ssap != LLC_SAP_BSPAN + || pdu->dsap != LLC_SAP_BSPAN + || pdu->ctrl_1 != LLC_PDU_TYPE_U) + goto err; - if (p->state == BR_STATE_DISABLED || !(br->dev->flags & IFF_UP)) - goto out; + if (!pskb_may_pull(skb, 4)) + goto err; - /* insert into forwarding database after filtering to avoid spoofing */ - br_fdb_update(br, p, eth_hdr(skb)->h_source); + /* compare of protocol id and version */ + buf = skb->data; + if (buf[0] != 0 || buf[1] != 0 || buf[2] != 0) + goto err; + + br = p->br; + spin_lock(&br->lock); - if (!br->stp_enabled) + if (p->state == BR_STATE_DISABLED + || !br->stp_enabled + || !(br->dev->flags & IFF_UP)) goto out; - /* need at least the 802 and STP headers */ - if (!pskb_may_pull(skb, sizeof(header)+1) || - memcmp(skb->data, header, sizeof(header))) + if (compare_ether_addr(dest, br->group_addr) != 0) goto out; - buf = skb_pull(skb, sizeof(header)); + buf = skb_pull(skb, 3); if (buf[0] == BPDU_TYPE_CONFIG) { struct br_config_bpdu bpdu; if (!pskb_may_pull(skb, 32)) - goto out; + goto out; buf = skb->data; bpdu.topology_change = (buf[1] & 0x01) ? 1 : 0; diff -puN net/bridge/br_stp_timer.c~git-net net/bridge/br_stp_timer.c --- devel/net/bridge/br_stp_timer.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_stp_timer.c 2006-03-17 23:03:48.000000000 -0800 @@ -39,13 +39,13 @@ static void br_hello_timer_expired(unsig struct net_bridge *br = (struct net_bridge *)arg; pr_debug("%s: hello timer expired\n", br->dev->name); - spin_lock_bh(&br->lock); + spin_lock(&br->lock); if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); mod_timer(&br->hello_timer, jiffies + br->hello_time); } - spin_unlock_bh(&br->lock); + spin_unlock(&br->lock); } static void br_message_age_timer_expired(unsigned long arg) @@ -71,7 +71,7 @@ static void br_message_age_timer_expired * running when we are the root bridge. So.. this was_root * check is redundant. I'm leaving it in for now, though. */ - spin_lock_bh(&br->lock); + spin_lock(&br->lock); if (p->state == BR_STATE_DISABLED) goto unlock; was_root = br_is_root_bridge(br); @@ -82,7 +82,7 @@ static void br_message_age_timer_expired if (br_is_root_bridge(br) && !was_root) br_become_root_bridge(br); unlock: - spin_unlock_bh(&br->lock); + spin_unlock(&br->lock); } static void br_forward_delay_timer_expired(unsigned long arg) @@ -92,7 +92,7 @@ static void br_forward_delay_timer_expir pr_debug("%s: %d(%s) forward delay timer\n", br->dev->name, p->port_no, p->dev->name); - spin_lock_bh(&br->lock); + spin_lock(&br->lock); if (p->state == BR_STATE_LISTENING) { p->state = BR_STATE_LEARNING; mod_timer(&p->forward_delay_timer, @@ -103,7 +103,7 @@ static void br_forward_delay_timer_expir br_topology_change_detection(br); } br_log_state(p); - spin_unlock_bh(&br->lock); + spin_unlock(&br->lock); } static void br_tcn_timer_expired(unsigned long arg) @@ -111,13 +111,13 @@ static void br_tcn_timer_expired(unsigne struct net_bridge *br = (struct net_bridge *) arg; pr_debug("%s: tcn timer expired\n", br->dev->name); - spin_lock_bh(&br->lock); + spin_lock(&br->lock); if (br->dev->flags & IFF_UP) { br_transmit_tcn(br); mod_timer(&br->tcn_timer,jiffies + br->bridge_hello_time); } - spin_unlock_bh(&br->lock); + spin_unlock(&br->lock); } static void br_topology_change_timer_expired(unsigned long arg) @@ -125,10 +125,10 @@ static void br_topology_change_timer_exp struct net_bridge *br = (struct net_bridge *) arg; pr_debug("%s: topo change timer expired\n", br->dev->name); - spin_lock_bh(&br->lock); + spin_lock(&br->lock); br->topology_change_detected = 0; br->topology_change = 0; - spin_unlock_bh(&br->lock); + spin_unlock(&br->lock); } static void br_hold_timer_expired(unsigned long arg) @@ -138,45 +138,36 @@ static void br_hold_timer_expired(unsign pr_debug("%s: %d(%s) hold timer expired\n", p->br->dev->name, p->port_no, p->dev->name); - spin_lock_bh(&p->br->lock); + spin_lock(&p->br->lock); if (p->config_pending) br_transmit_config(p); - spin_unlock_bh(&p->br->lock); -} - -static inline void br_timer_init(struct timer_list *timer, - void (*_function)(unsigned long), - unsigned long _data) -{ - init_timer(timer); - timer->function = _function; - timer->data = _data; + spin_unlock(&p->br->lock); } void br_stp_timer_init(struct net_bridge *br) { - br_timer_init(&br->hello_timer, br_hello_timer_expired, + setup_timer(&br->hello_timer, br_hello_timer_expired, (unsigned long) br); - br_timer_init(&br->tcn_timer, br_tcn_timer_expired, + setup_timer(&br->tcn_timer, br_tcn_timer_expired, (unsigned long) br); - br_timer_init(&br->topology_change_timer, + setup_timer(&br->topology_change_timer, br_topology_change_timer_expired, (unsigned long) br); - br_timer_init(&br->gc_timer, br_fdb_cleanup, (unsigned long) br); + setup_timer(&br->gc_timer, br_fdb_cleanup, (unsigned long) br); } void br_stp_port_timer_init(struct net_bridge_port *p) { - br_timer_init(&p->message_age_timer, br_message_age_timer_expired, + setup_timer(&p->message_age_timer, br_message_age_timer_expired, (unsigned long) p); - br_timer_init(&p->forward_delay_timer, br_forward_delay_timer_expired, + setup_timer(&p->forward_delay_timer, br_forward_delay_timer_expired, (unsigned long) p); - br_timer_init(&p->hold_timer, br_hold_timer_expired, + setup_timer(&p->hold_timer, br_hold_timer_expired, (unsigned long) p); } diff -puN net/bridge/br_sysfs_br.c~git-net net/bridge/br_sysfs_br.c --- devel/net/bridge/br_sysfs_br.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/br_sysfs_br.c 2006-03-17 23:03:48.000000000 -0800 @@ -242,6 +242,54 @@ static ssize_t show_gc_timer(struct clas } static CLASS_DEVICE_ATTR(gc_timer, S_IRUGO, show_gc_timer, NULL); +static ssize_t show_group_addr(struct class_device *cd, char *buf) +{ + struct net_bridge *br = to_bridge(cd); + return sprintf(buf, "%x:%x:%x:%x:%x:%x\n", + br->group_addr[0], br->group_addr[1], + br->group_addr[2], br->group_addr[3], + br->group_addr[4], br->group_addr[5]); +} + +static ssize_t store_group_addr(struct class_device *cd, const char *buf, + size_t len) +{ + struct net_bridge *br = to_bridge(cd); + unsigned new_addr[6]; + int i; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + if (sscanf(buf, "%x:%x:%x:%x:%x:%x", + &new_addr[0], &new_addr[1], &new_addr[2], + &new_addr[3], &new_addr[4], &new_addr[5]) != 6) + return -EINVAL; + + /* Must be 01:80:c2:00:00:0X */ + for (i = 0; i < 5; i++) + if (new_addr[i] != br_group_address[i]) + return -EINVAL; + + if (new_addr[5] & ~0xf) + return -EINVAL; + + if (new_addr[5] == 1 /* 802.3x Pause address */ + || new_addr[5] == 2 /* 802.3ad Slow protocols */ + || new_addr[5] == 3) /* 802.1X PAE address */ + return -EINVAL; + + spin_lock_bh(&br->lock); + for (i = 0; i < 6; i++) + br->group_addr[i] = new_addr[i]; + spin_unlock_bh(&br->lock); + return len; +} + +static CLASS_DEVICE_ATTR(group_addr, S_IRUGO | S_IWUSR, + show_group_addr, store_group_addr); + + static struct attribute *bridge_attrs[] = { &class_device_attr_forward_delay.attr, &class_device_attr_hello_time.attr, @@ -259,6 +307,7 @@ static struct attribute *bridge_attrs[] &class_device_attr_tcn_timer.attr, &class_device_attr_topology_change_timer.attr, &class_device_attr_gc_timer.attr, + &class_device_attr_group_addr.attr, NULL }; diff -puN net/bridge/Kconfig~git-net net/bridge/Kconfig --- devel/net/bridge/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -4,6 +4,7 @@ config BRIDGE tristate "802.1d Ethernet Bridging" + select LLC ---help--- If you say Y here, then your Linux box will be able to act as an Ethernet bridge, which means that the different Ethernet segments it diff -puN net/bridge/netfilter/ebtables.c~git-net net/bridge/netfilter/ebtables.c --- devel/net/bridge/netfilter/ebtables.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/bridge/netfilter/ebtables.c 2006-03-17 23:03:48.000000000 -0800 @@ -35,6 +35,7 @@ #define ASSERT_READ_LOCK(x) #define ASSERT_WRITE_LOCK(x) #include +#include #if 0 /* use this for remote debugging @@ -81,7 +82,7 @@ static void print_string(char *str) -static DECLARE_MUTEX(ebt_mutex); +static DEFINE_MUTEX(ebt_mutex); static LIST_HEAD(ebt_tables); static LIST_HEAD(ebt_targets); static LIST_HEAD(ebt_matches); @@ -296,18 +297,18 @@ letscontinue: /* If it succeeds, returns element and locks mutex */ static inline void * find_inlist_lock_noload(struct list_head *head, const char *name, int *error, - struct semaphore *mutex) + struct mutex *mutex) { void *ret; - *error = down_interruptible(mutex); + *error = mutex_lock_interruptible(mutex); if (*error != 0) return NULL; ret = list_named_find(head, name); if (!ret) { *error = -ENOENT; - up(mutex); + mutex_unlock(mutex); } return ret; } @@ -317,7 +318,7 @@ find_inlist_lock_noload(struct list_head #else static void * find_inlist_lock(struct list_head *head, const char *name, const char *prefix, - int *error, struct semaphore *mutex) + int *error, struct mutex *mutex) { void *ret; @@ -331,25 +332,25 @@ find_inlist_lock(struct list_head *head, #endif static inline struct ebt_table * -find_table_lock(const char *name, int *error, struct semaphore *mutex) +find_table_lock(const char *name, int *error, struct mutex *mutex) { return find_inlist_lock(&ebt_tables, name, "ebtable_", error, mutex); } static inline struct ebt_match * -find_match_lock(const char *name, int *error, struct semaphore *mutex) +find_match_lock(const char *name, int *error, struct mutex *mutex) { return find_inlist_lock(&ebt_matches, name, "ebt_", error, mutex); } static inline struct ebt_watcher * -find_watcher_lock(const char *name, int *error, struct semaphore *mutex) +find_watcher_lock(const char *name, int *error, struct mutex *mutex) { return find_inlist_lock(&ebt_watchers, name, "ebt_", error, mutex); } static inline struct ebt_target * -find_target_lock(const char *name, int *error, struct semaphore *mutex) +find_target_lock(const char *name, int *error, struct mutex *mutex) { return find_inlist_lock(&ebt_targets, name, "ebt_", error, mutex); } @@ -369,10 +370,10 @@ ebt_check_match(struct ebt_entry_match * return ret; m->u.match = match; if (!try_module_get(match->me)) { - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return -ENOENT; } - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); if (match->check && match->check(name, hookmask, e, m->data, m->match_size) != 0) { BUGPRINT("match->check failed\n"); @@ -398,10 +399,10 @@ ebt_check_watcher(struct ebt_entry_watch return ret; w->u.watcher = watcher; if (!try_module_get(watcher->me)) { - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return -ENOENT; } - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); if (watcher->check && watcher->check(name, hookmask, e, w->data, w->watcher_size) != 0) { BUGPRINT("watcher->check failed\n"); @@ -638,11 +639,11 @@ ebt_check_entry(struct ebt_entry *e, str if (!target) goto cleanup_watchers; if (!try_module_get(target->me)) { - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); ret = -ENOENT; goto cleanup_watchers; } - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); t->u.target = target; if (t->u.target == &ebt_standard_target) { @@ -1015,7 +1016,7 @@ static int do_replace(void __user *user, t->private = newinfo; write_unlock_bh(&t->lock); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); /* so, a user can change the chains while having messed up her counter allocation. Only reason why this is done is because this way the lock is held only once, while this doesn't bring the kernel into a @@ -1045,7 +1046,7 @@ static int do_replace(void __user *user, return ret; free_unlock: - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); free_iterate: EBT_ENTRY_ITERATE(newinfo->entries, newinfo->entries_size, ebt_cleanup_entry, NULL); @@ -1068,69 +1069,69 @@ int ebt_register_target(struct ebt_targe { int ret; - ret = down_interruptible(&ebt_mutex); + ret = mutex_lock_interruptible(&ebt_mutex); if (ret != 0) return ret; if (!list_named_insert(&ebt_targets, target)) { - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return -EEXIST; } - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return 0; } void ebt_unregister_target(struct ebt_target *target) { - down(&ebt_mutex); + mutex_lock(&ebt_mutex); LIST_DELETE(&ebt_targets, target); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); } int ebt_register_match(struct ebt_match *match) { int ret; - ret = down_interruptible(&ebt_mutex); + ret = mutex_lock_interruptible(&ebt_mutex); if (ret != 0) return ret; if (!list_named_insert(&ebt_matches, match)) { - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return -EEXIST; } - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return 0; } void ebt_unregister_match(struct ebt_match *match) { - down(&ebt_mutex); + mutex_lock(&ebt_mutex); LIST_DELETE(&ebt_matches, match); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); } int ebt_register_watcher(struct ebt_watcher *watcher) { int ret; - ret = down_interruptible(&ebt_mutex); + ret = mutex_lock_interruptible(&ebt_mutex); if (ret != 0) return ret; if (!list_named_insert(&ebt_watchers, watcher)) { - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return -EEXIST; } - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return 0; } void ebt_unregister_watcher(struct ebt_watcher *watcher) { - down(&ebt_mutex); + mutex_lock(&ebt_mutex); LIST_DELETE(&ebt_watchers, watcher); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); } int ebt_register_table(struct ebt_table *table) @@ -1178,7 +1179,7 @@ int ebt_register_table(struct ebt_table table->private = newinfo; rwlock_init(&table->lock); - ret = down_interruptible(&ebt_mutex); + ret = mutex_lock_interruptible(&ebt_mutex); if (ret != 0) goto free_chainstack; @@ -1194,10 +1195,10 @@ int ebt_register_table(struct ebt_table goto free_unlock; } list_prepend(&ebt_tables, table); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); return 0; free_unlock: - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); free_chainstack: if (newinfo->chainstack) { for_each_cpu(i) @@ -1218,9 +1219,9 @@ void ebt_unregister_table(struct ebt_tab BUGPRINT("Request to unregister NULL table!!!\n"); return; } - down(&ebt_mutex); + mutex_lock(&ebt_mutex); LIST_DELETE(&ebt_tables, table); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); vfree(table->private->entries); if (table->private->chainstack) { for_each_cpu(i) @@ -1281,7 +1282,7 @@ static int update_counters(void __user * write_unlock_bh(&t->lock); ret = 0; unlock_mutex: - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); free_tmp: vfree(tmp); return ret; @@ -1328,7 +1329,7 @@ static inline int ebt_make_names(struct return 0; } -/* called with ebt_mutex down */ +/* called with ebt_mutex locked */ static int copy_everything_to_user(struct ebt_table *t, void __user *user, int *len, int cmd) { @@ -1440,7 +1441,7 @@ static int do_ebt_get_ctl(struct sock *s case EBT_SO_GET_INIT_INFO: if (*len != sizeof(struct ebt_replace)){ ret = -EINVAL; - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); break; } if (cmd == EBT_SO_GET_INFO) { @@ -1452,7 +1453,7 @@ static int do_ebt_get_ctl(struct sock *s tmp.entries_size = t->table->entries_size; tmp.valid_hooks = t->table->valid_hooks; } - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); if (copy_to_user(user, &tmp, *len) != 0){ BUGPRINT("c2u Didn't work\n"); ret = -EFAULT; @@ -1464,11 +1465,11 @@ static int do_ebt_get_ctl(struct sock *s case EBT_SO_GET_ENTRIES: case EBT_SO_GET_INIT_ENTRIES: ret = copy_everything_to_user(t, user, len, cmd); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); break; default: - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); ret = -EINVAL; } @@ -1476,17 +1477,23 @@ static int do_ebt_get_ctl(struct sock *s } static struct nf_sockopt_ops ebt_sockopts = -{ { NULL, NULL }, PF_INET, EBT_BASE_CTL, EBT_SO_SET_MAX + 1, do_ebt_set_ctl, - EBT_BASE_CTL, EBT_SO_GET_MAX + 1, do_ebt_get_ctl, 0, NULL +{ + .pf = PF_INET, + .set_optmin = EBT_BASE_CTL, + .set_optmax = EBT_SO_SET_MAX + 1, + .set = do_ebt_set_ctl, + .get_optmin = EBT_BASE_CTL, + .get_optmax = EBT_SO_GET_MAX + 1, + .get = do_ebt_get_ctl, }; static int __init init(void) { int ret; - down(&ebt_mutex); + mutex_lock(&ebt_mutex); list_named_insert(&ebt_targets, &ebt_standard_target); - up(&ebt_mutex); + mutex_unlock(&ebt_mutex); if ((ret = nf_register_sockopt(&ebt_sockopts)) < 0) return ret; diff -puN net/compat.c~git-net net/compat.c --- devel/net/compat.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/compat.c 2006-03-17 23:03:48.000000000 -0800 @@ -416,7 +416,7 @@ struct compat_sock_fprog { compat_uptr_t filter; /* struct sock_filter * */ }; -static int do_set_attach_filter(int fd, int level, int optname, +static int do_set_attach_filter(struct socket *sock, int level, int optname, char __user *optval, int optlen) { struct compat_sock_fprog __user *fprog32 = (struct compat_sock_fprog __user *)optval; @@ -432,11 +432,12 @@ static int do_set_attach_filter(int fd, __put_user(compat_ptr(ptr), &kfprog->filter)) return -EFAULT; - return sys_setsockopt(fd, level, optname, (char __user *)kfprog, + return sock_setsockopt(sock, level, optname, (char __user *)kfprog, sizeof(struct sock_fprog)); } -static int do_set_sock_timeout(int fd, int level, int optname, char __user *optval, int optlen) +static int do_set_sock_timeout(struct socket *sock, int level, + int optname, char __user *optval, int optlen) { struct compat_timeval __user *up = (struct compat_timeval __user *) optval; struct timeval ktime; @@ -451,30 +452,61 @@ static int do_set_sock_timeout(int fd, i return -EFAULT; old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_setsockopt(fd, level, optname, (char *) &ktime, sizeof(ktime)); + err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime)); set_fs(old_fs); return err; } +static int compat_sock_setsockopt(struct socket *sock, int level, int optname, + char __user *optval, int optlen) +{ + if (optname == SO_ATTACH_FILTER) + return do_set_attach_filter(sock, level, optname, + optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_set_sock_timeout(sock, level, optname, optval, optlen); + + return sock_setsockopt(sock, level, optname, optval, optlen); +} + asmlinkage long compat_sys_setsockopt(int fd, int level, int optname, char __user *optval, int optlen) { + int err; + struct socket *sock; + /* SO_SET_REPLACE seems to be the same in all levels */ if (optname == IPT_SO_SET_REPLACE) return do_netfilter_replace(fd, level, optname, optval, optlen); - if (level == SOL_SOCKET && optname == SO_ATTACH_FILTER) - return do_set_attach_filter(fd, level, optname, - optval, optlen); - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_set_sock_timeout(fd, level, optname, optval, optlen); - return sys_setsockopt(fd, level, optname, optval, optlen); + if (optlen < 0) + return -EINVAL; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_setsockopt(sock,level,optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_setsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_setsockopt) + err = sock->ops->compat_setsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->setsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; } -static int do_get_sock_timeout(int fd, int level, int optname, +static int do_get_sock_timeout(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct compat_timeval __user *up; @@ -490,7 +522,7 @@ static int do_get_sock_timeout(int fd, i len = sizeof(ktime); old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_getsockopt(fd, level, optname, (char *) &ktime, &len); + err = sock_getsockopt(sock, level, optname, (char *) &ktime, &len); set_fs(old_fs); if (!err) { @@ -503,15 +535,42 @@ static int do_get_sock_timeout(int fd, i return err; } -asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, +static int compat_sock_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_get_sock_timeout(fd, level, optname, optval, optlen); - return sys_getsockopt(fd, level, optname, optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_get_sock_timeout(sock, level, optname, optval, optlen); + return sock_getsockopt(sock, level, optname, optval, optlen); } +asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + struct socket *sock; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_getsockopt(sock, level, + optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_getsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_getsockopt) + err = sock->ops->compat_getsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->getsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; +} /* Argument list sizes for compat_sys_socketcall */ #define AL(x) ((x) * sizeof(u32)) static unsigned char nas[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), diff -puN net/core/dev.c~git-net net/core/dev.c --- devel/net/core/dev.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/dev.c 2006-03-17 23:03:48.000000000 -0800 @@ -81,6 +81,7 @@ #include #include #include +#include #include #include #include @@ -1760,8 +1761,7 @@ static void net_rx_action(struct softirq if (dev->quota <= 0 || dev->poll(dev, &budget)) { netpoll_poll_unlock(have); local_irq_disable(); - list_del(&dev->poll_list); - list_add_tail(&dev->poll_list, &queue->poll_list); + list_move_tail(&dev->poll_list, &queue->poll_list); if (dev->quota < 0) dev->quota += dev->weight; else @@ -2181,12 +2181,20 @@ unsigned dev_get_flags(const struct net_ flags = (dev->flags & ~(IFF_PROMISC | IFF_ALLMULTI | - IFF_RUNNING)) | + IFF_RUNNING | + IFF_LOWER_UP | + IFF_DORMANT)) | (dev->gflags & (IFF_PROMISC | IFF_ALLMULTI)); - if (netif_running(dev) && netif_carrier_ok(dev)) - flags |= IFF_RUNNING; + if (netif_running(dev)) { + if (netif_oper_up(dev)) + flags |= IFF_RUNNING; + if (netif_carrier_ok(dev)) + flags |= IFF_LOWER_UP; + if (netif_dormant(dev)) + flags |= IFF_DORMANT; + } return flags; } @@ -2465,9 +2473,9 @@ int dev_ioctl(unsigned int cmd, void __u */ if (cmd == SIOCGIFCONF) { - rtnl_shlock(); + rtnl_lock(); ret = dev_ifconf((char __user *) arg); - rtnl_shunlock(); + rtnl_unlock(); return ret; } if (cmd == SIOCGIFNAME) @@ -2876,7 +2884,7 @@ static void netdev_wait_allrefs(struct n rebroadcast_time = warning_time = jiffies; while (atomic_read(&dev->refcnt) != 0) { if (time_after(jiffies, rebroadcast_time + 1 * HZ)) { - rtnl_shlock(); + rtnl_lock(); /* Rebroadcast unregister notification */ notifier_call_chain(&netdev_chain, @@ -2893,7 +2901,7 @@ static void netdev_wait_allrefs(struct n linkwatch_run_queue(); } - rtnl_shunlock(); + __rtnl_unlock(); rebroadcast_time = jiffies; } @@ -2931,7 +2939,7 @@ static void netdev_wait_allrefs(struct n * 2) Since we run with the RTNL semaphore not held, we can sleep * safely in order to wait for the netdev refcnt to drop to zero. */ -static DECLARE_MUTEX(net_todo_run_mutex); +static DEFINE_MUTEX(net_todo_run_mutex); void netdev_run_todo(void) { struct list_head list = LIST_HEAD_INIT(list); @@ -2939,7 +2947,7 @@ void netdev_run_todo(void) /* Need to guard against multiple cpu's getting out of order. */ - down(&net_todo_run_mutex); + mutex_lock(&net_todo_run_mutex); /* Not safe to do outside the semaphore. We must not return * until all unregister events invoked by the local processor @@ -2996,7 +3004,7 @@ void netdev_run_todo(void) } out: - up(&net_todo_run_mutex); + mutex_unlock(&net_todo_run_mutex); } /** diff -puN net/core/flow.c~git-net net/core/flow.c --- devel/net/core/flow.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/flow.c 2006-03-17 23:03:48.000000000 -0800 @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -287,11 +288,11 @@ static void flow_cache_flush_per_cpu(voi void flow_cache_flush(void) { struct flow_flush_info info; - static DECLARE_MUTEX(flow_flush_sem); + static DEFINE_MUTEX(flow_flush_sem); /* Don't want cpus going down or up during this. */ lock_cpu_hotplug(); - down(&flow_flush_sem); + mutex_lock(&flow_flush_sem); atomic_set(&info.cpuleft, num_online_cpus()); init_completion(&info.completion); @@ -301,7 +302,7 @@ void flow_cache_flush(void) local_bh_enable(); wait_for_completion(&info.completion); - up(&flow_flush_sem); + mutex_unlock(&flow_flush_sem); unlock_cpu_hotplug(); } diff -puN net/core/link_watch.c~git-net net/core/link_watch.c --- devel/net/core/link_watch.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/link_watch.c 2006-03-17 23:03:48.000000000 -0800 @@ -49,6 +49,45 @@ struct lw_event { /* Avoid kmalloc() for most systems */ static struct lw_event singleevent; +static unsigned char default_operstate(const struct net_device *dev) +{ + if (!netif_carrier_ok(dev)) + return (dev->ifindex != dev->iflink ? + IF_OPER_LOWERLAYERDOWN : IF_OPER_DOWN); + + if (netif_dormant(dev)) + return IF_OPER_DORMANT; + + return IF_OPER_UP; +} + + +static void rfc2863_policy(struct net_device *dev) +{ + unsigned char operstate = default_operstate(dev); + + if (operstate == dev->operstate) + return; + + write_lock_bh(&dev_base_lock); + + switch(dev->link_mode) { + case IF_LINK_MODE_DORMANT: + if (operstate == IF_OPER_UP) + operstate = IF_OPER_DORMANT; + break; + + case IF_LINK_MODE_DEFAULT: + default: + break; + }; + + dev->operstate = operstate; + + write_unlock_bh(&dev_base_lock); +} + + /* Must be called with the rtnl semaphore held */ void linkwatch_run_queue(void) { @@ -74,6 +113,7 @@ void linkwatch_run_queue(void) */ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state); + rfc2863_policy(dev); if (dev->flags & IFF_UP) { if (netif_carrier_ok(dev)) { WARN_ON(dev->qdisc_sleeping == &noop_qdisc); @@ -99,9 +139,9 @@ static void linkwatch_event(void *dummy) linkwatch_nextevent = jiffies + HZ; clear_bit(LW_RUNNING, &linkwatch_flags); - rtnl_shlock(); + rtnl_lock(); linkwatch_run_queue(); - rtnl_shunlock(); + rtnl_unlock(); } diff -puN net/core/neighbour.c~git-net net/core/neighbour.c --- devel/net/core/neighbour.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/neighbour.c 2006-03-17 23:03:48.000000000 -0800 @@ -586,8 +586,8 @@ void neigh_destroy(struct neighbour *nei kfree(hh); } - if (neigh->ops && neigh->ops->destructor) - (neigh->ops->destructor)(neigh); + if (neigh->parms->neigh_destructor) + (neigh->parms->neigh_destructor)(neigh); skb_queue_purge(&neigh->arp_queue); @@ -750,11 +750,13 @@ static void neigh_timer_handler(unsigned neigh->used + neigh->parms->delay_probe_time)) { NEIGH_PRINTK2("neigh %p is delayed.\n", neigh); neigh->nud_state = NUD_DELAY; + neigh->updated = jiffies; neigh_suspect(neigh); next = now + neigh->parms->delay_probe_time; } else { NEIGH_PRINTK2("neigh %p is suspected.\n", neigh); neigh->nud_state = NUD_STALE; + neigh->updated = jiffies; neigh_suspect(neigh); } } else if (state & NUD_DELAY) { @@ -762,11 +764,13 @@ static void neigh_timer_handler(unsigned neigh->confirmed + neigh->parms->delay_probe_time)) { NEIGH_PRINTK2("neigh %p is now reachable.\n", neigh); neigh->nud_state = NUD_REACHABLE; + neigh->updated = jiffies; neigh_connect(neigh); next = neigh->confirmed + neigh->parms->reachable_time; } else { NEIGH_PRINTK2("neigh %p is probed.\n", neigh); neigh->nud_state = NUD_PROBE; + neigh->updated = jiffies; atomic_set(&neigh->probes, 0); next = now + neigh->parms->retrans_time; } @@ -780,6 +784,7 @@ static void neigh_timer_handler(unsigned struct sk_buff *skb; neigh->nud_state = NUD_FAILED; + neigh->updated = jiffies; notify = 1; NEIGH_CACHE_STAT_INC(neigh->tbl, res_failed); NEIGH_PRINTK2("neigh %p is failed.\n", neigh); @@ -843,10 +848,12 @@ int __neigh_event_send(struct neighbour if (neigh->parms->mcast_probes + neigh->parms->app_probes) { atomic_set(&neigh->probes, neigh->parms->ucast_probes); neigh->nud_state = NUD_INCOMPLETE; + neigh->updated = jiffies; neigh_hold(neigh); neigh_add_timer(neigh, now + 1); } else { neigh->nud_state = NUD_FAILED; + neigh->updated = jiffies; write_unlock_bh(&neigh->lock); if (skb) @@ -857,6 +864,7 @@ int __neigh_event_send(struct neighbour NEIGH_PRINTK2("neigh %p is delayed.\n", neigh); neigh_hold(neigh); neigh->nud_state = NUD_DELAY; + neigh->updated = jiffies; neigh_add_timer(neigh, jiffies + neigh->parms->delay_probe_time); } diff -puN net/core/netpoll.c~git-net net/core/netpoll.c --- devel/net/core/netpoll.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/netpoll.c 2006-03-17 23:03:48.000000000 -0800 @@ -669,14 +669,14 @@ int netpoll_setup(struct netpoll *np) printk(KERN_INFO "%s: device %s not up yet, forcing it\n", np->name, np->dev_name); - rtnl_shlock(); + rtnl_lock(); if (dev_change_flags(ndev, ndev->flags | IFF_UP) < 0) { printk(KERN_ERR "%s: failed to open %s\n", np->name, np->dev_name); - rtnl_shunlock(); + rtnl_unlock(); goto release; } - rtnl_shunlock(); + rtnl_unlock(); atleast = jiffies + HZ/10; atmost = jiffies + 4*HZ; diff -puN net/core/net-sysfs.c~git-net net/core/net-sysfs.c --- devel/net/core/net-sysfs.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/net-sysfs.c 2006-03-17 23:03:48.000000000 -0800 @@ -91,6 +91,7 @@ NETDEVICE_SHOW(iflink, fmt_dec); NETDEVICE_SHOW(ifindex, fmt_dec); NETDEVICE_SHOW(features, fmt_long_hex); NETDEVICE_SHOW(type, fmt_dec); +NETDEVICE_SHOW(link_mode, fmt_dec); /* use same locking rules as GIFHWADDR ioctl's */ static ssize_t format_addr(char *buf, const unsigned char *addr, int len) @@ -133,6 +134,43 @@ static ssize_t show_carrier(struct class return -EINVAL; } +static ssize_t show_dormant(struct class_device *dev, char *buf) +{ + struct net_device *netdev = to_net_dev(dev); + + if (netif_running(netdev)) + return sprintf(buf, fmt_dec, !!netif_dormant(netdev)); + + return -EINVAL; +} + +static const char *operstates[] = { + "unknown", + "notpresent", /* currently unused */ + "down", + "lowerlayerdown", + "testing", /* currently unused */ + "dormant", + "up" +}; + +static ssize_t show_operstate(struct class_device *dev, char *buf) +{ + const struct net_device *netdev = to_net_dev(dev); + unsigned char operstate; + + read_lock(&dev_base_lock); + operstate = netdev->operstate; + if (!netif_running(netdev)) + operstate = IF_OPER_DOWN; + read_unlock(&dev_base_lock); + + if (operstate >= sizeof(operstates)) + return -EINVAL; /* should not happen */ + + return sprintf(buf, "%s\n", operstates[operstate]); +} + /* read-write attributes */ NETDEVICE_SHOW(mtu, fmt_dec); @@ -190,9 +228,12 @@ static struct class_device_attribute net __ATTR(ifindex, S_IRUGO, show_ifindex, NULL), __ATTR(features, S_IRUGO, show_features, NULL), __ATTR(type, S_IRUGO, show_type, NULL), + __ATTR(link_mode, S_IRUGO, show_link_mode, NULL), __ATTR(address, S_IRUGO, show_address, NULL), __ATTR(broadcast, S_IRUGO, show_broadcast, NULL), __ATTR(carrier, S_IRUGO, show_carrier, NULL), + __ATTR(dormant, S_IRUGO, show_dormant, NULL), + __ATTR(operstate, S_IRUGO, show_operstate, NULL), __ATTR(mtu, S_IRUGO | S_IWUSR, show_mtu, store_mtu), __ATTR(flags, S_IRUGO | S_IWUSR, show_flags, store_flags), __ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len, diff -puN net/core/pktgen.c~git-net net/core/pktgen.c --- devel/net/core/pktgen.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/pktgen.c 2006-03-17 23:03:48.000000000 -0800 @@ -113,6 +113,7 @@ #include #include #include +#include #include #include #include @@ -125,6 +126,7 @@ #include #include #include +#include #include #include #include @@ -149,38 +151,34 @@ #include #include #include -#include /* do_div */ +#include /* do_div */ #include - -#define VERSION "pktgen v2.63: Packet Generator for packet performance testing.\n" +#define VERSION "pktgen v2.66: Packet Generator for packet performance testing.\n" /* #define PG_DEBUG(a) a */ -#define PG_DEBUG(a) +#define PG_DEBUG(a) /* The buckets are exponential in 'width' */ #define LAT_BUCKETS_MAX 32 #define IP_NAME_SZ 32 /* Device flag bits */ -#define F_IPSRC_RND (1<<0) /* IP-Src Random */ -#define F_IPDST_RND (1<<1) /* IP-Dst Random */ -#define F_UDPSRC_RND (1<<2) /* UDP-Src Random */ -#define F_UDPDST_RND (1<<3) /* UDP-Dst Random */ -#define F_MACSRC_RND (1<<4) /* MAC-Src Random */ -#define F_MACDST_RND (1<<5) /* MAC-Dst Random */ -#define F_TXSIZE_RND (1<<6) /* Transmit size is random */ -#define F_IPV6 (1<<7) /* Interface in IPV6 Mode */ +#define F_IPSRC_RND (1<<0) /* IP-Src Random */ +#define F_IPDST_RND (1<<1) /* IP-Dst Random */ +#define F_UDPSRC_RND (1<<2) /* UDP-Src Random */ +#define F_UDPDST_RND (1<<3) /* UDP-Dst Random */ +#define F_MACSRC_RND (1<<4) /* MAC-Src Random */ +#define F_MACDST_RND (1<<5) /* MAC-Dst Random */ +#define F_TXSIZE_RND (1<<6) /* Transmit size is random */ +#define F_IPV6 (1<<7) /* Interface in IPV6 Mode */ /* Thread control flag bits */ -#define T_TERMINATE (1<<0) -#define T_STOP (1<<1) /* Stop run */ -#define T_RUN (1<<2) /* Start run */ -#define T_REMDEV (1<<3) /* Remove all devs */ - -/* Locks */ -#define thread_lock() down(&pktgen_sem) -#define thread_unlock() up(&pktgen_sem) +#define T_TERMINATE (1<<0) +#define T_STOP (1<<1) /* Stop run */ +#define T_RUN (1<<2) /* Start run */ +#define T_REMDEVALL (1<<3) /* Remove all devs */ +#define T_REMDEV (1<<4) /* Remove one dev */ /* If lock -- can be removed after some work */ #define if_lock(t) spin_lock(&(t->if_lock)); @@ -194,10 +192,9 @@ static struct proc_dir_entry *pg_proc_di #define MAX_CFLOWS 65536 -struct flow_state -{ - __u32 cur_daddr; - int count; +struct flow_state { + __u32 cur_daddr; + int count; }; struct pktgen_dev { @@ -206,141 +203,144 @@ struct pktgen_dev { * Try to keep frequent/infrequent used vars. separated. */ - char ifname[IFNAMSIZ]; - char result[512]; + char ifname[IFNAMSIZ]; + char result[512]; - struct pktgen_thread* pg_thread; /* the owner */ - struct pktgen_dev *next; /* Used for chaining in the thread's run-queue */ + struct pktgen_thread *pg_thread; /* the owner */ + struct list_head list; /* Used for chaining in the thread's run-queue */ - int running; /* if this changes to false, the test will stop */ - - /* If min != max, then we will either do a linear iteration, or - * we will do a random selection from within the range. - */ - __u32 flags; - - int min_pkt_size; /* = ETH_ZLEN; */ - int max_pkt_size; /* = ETH_ZLEN; */ - int nfrags; - __u32 delay_us; /* Default delay */ - __u32 delay_ns; - __u64 count; /* Default No packets to send */ - __u64 sofar; /* How many pkts we've sent so far */ - __u64 tx_bytes; /* How many bytes we've transmitted */ - __u64 errors; /* Errors when trying to transmit, pkts will be re-sent */ - - /* runtime counters relating to clone_skb */ - __u64 next_tx_us; /* timestamp of when to tx next */ - __u32 next_tx_ns; - - __u64 allocated_skbs; - __u32 clone_count; - int last_ok; /* Was last skb sent? - * Or a failed transmit of some sort? This will keep - * sequence numbers in order, for example. - */ - __u64 started_at; /* micro-seconds */ - __u64 stopped_at; /* micro-seconds */ - __u64 idle_acc; /* micro-seconds */ - __u32 seq_num; - - int clone_skb; /* Use multiple SKBs during packet gen. If this number - * is greater than 1, then that many copies of the same - * packet will be sent before a new packet is allocated. - * For instance, if you want to send 1024 identical packets - * before creating a new packet, set clone_skb to 1024. - */ - - char dst_min[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ - char dst_max[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ - char src_min[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ - char src_max[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ - - struct in6_addr in6_saddr; - struct in6_addr in6_daddr; - struct in6_addr cur_in6_daddr; - struct in6_addr cur_in6_saddr; + int running; /* if this changes to false, the test will stop */ + + /* If min != max, then we will either do a linear iteration, or + * we will do a random selection from within the range. + */ + __u32 flags; + int removal_mark; /* non-zero => the device is marked for + * removal by worker thread */ + + int min_pkt_size; /* = ETH_ZLEN; */ + int max_pkt_size; /* = ETH_ZLEN; */ + int nfrags; + __u32 delay_us; /* Default delay */ + __u32 delay_ns; + __u64 count; /* Default No packets to send */ + __u64 sofar; /* How many pkts we've sent so far */ + __u64 tx_bytes; /* How many bytes we've transmitted */ + __u64 errors; /* Errors when trying to transmit, pkts will be re-sent */ + + /* runtime counters relating to clone_skb */ + __u64 next_tx_us; /* timestamp of when to tx next */ + __u32 next_tx_ns; + + __u64 allocated_skbs; + __u32 clone_count; + int last_ok; /* Was last skb sent? + * Or a failed transmit of some sort? This will keep + * sequence numbers in order, for example. + */ + __u64 started_at; /* micro-seconds */ + __u64 stopped_at; /* micro-seconds */ + __u64 idle_acc; /* micro-seconds */ + __u32 seq_num; + + int clone_skb; /* Use multiple SKBs during packet gen. If this number + * is greater than 1, then that many copies of the same + * packet will be sent before a new packet is allocated. + * For instance, if you want to send 1024 identical packets + * before creating a new packet, set clone_skb to 1024. + */ + + char dst_min[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ + char dst_max[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ + char src_min[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ + char src_max[IP_NAME_SZ]; /* IP, ie 1.2.3.4 */ + + struct in6_addr in6_saddr; + struct in6_addr in6_daddr; + struct in6_addr cur_in6_daddr; + struct in6_addr cur_in6_saddr; /* For ranges */ - struct in6_addr min_in6_daddr; - struct in6_addr max_in6_daddr; - struct in6_addr min_in6_saddr; - struct in6_addr max_in6_saddr; - - /* If we're doing ranges, random or incremental, then this - * defines the min/max for those ranges. - */ - __u32 saddr_min; /* inclusive, source IP address */ - __u32 saddr_max; /* exclusive, source IP address */ - __u32 daddr_min; /* inclusive, dest IP address */ - __u32 daddr_max; /* exclusive, dest IP address */ - - __u16 udp_src_min; /* inclusive, source UDP port */ - __u16 udp_src_max; /* exclusive, source UDP port */ - __u16 udp_dst_min; /* inclusive, dest UDP port */ - __u16 udp_dst_max; /* exclusive, dest UDP port */ - - __u32 src_mac_count; /* How many MACs to iterate through */ - __u32 dst_mac_count; /* How many MACs to iterate through */ - - unsigned char dst_mac[ETH_ALEN]; - unsigned char src_mac[ETH_ALEN]; - - __u32 cur_dst_mac_offset; - __u32 cur_src_mac_offset; - __u32 cur_saddr; - __u32 cur_daddr; - __u16 cur_udp_dst; - __u16 cur_udp_src; - __u32 cur_pkt_size; - - __u8 hh[14]; - /* = { - 0x00, 0x80, 0xC8, 0x79, 0xB3, 0xCB, - - We fill in SRC address later - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x08, 0x00 - }; - */ - __u16 pad; /* pad out the hh struct to an even 16 bytes */ - - struct sk_buff* skb; /* skb we are to transmit next, mainly used for when we - * are transmitting the same one multiple times - */ - struct net_device* odev; /* The out-going device. Note that the device should - * have it's pg_info pointer pointing back to this - * device. This will be set when the user specifies - * the out-going device name (not when the inject is - * started as it used to do.) - */ + struct in6_addr min_in6_daddr; + struct in6_addr max_in6_daddr; + struct in6_addr min_in6_saddr; + struct in6_addr max_in6_saddr; + + /* If we're doing ranges, random or incremental, then this + * defines the min/max for those ranges. + */ + __u32 saddr_min; /* inclusive, source IP address */ + __u32 saddr_max; /* exclusive, source IP address */ + __u32 daddr_min; /* inclusive, dest IP address */ + __u32 daddr_max; /* exclusive, dest IP address */ + + __u16 udp_src_min; /* inclusive, source UDP port */ + __u16 udp_src_max; /* exclusive, source UDP port */ + __u16 udp_dst_min; /* inclusive, dest UDP port */ + __u16 udp_dst_max; /* exclusive, dest UDP port */ + + __u32 src_mac_count; /* How many MACs to iterate through */ + __u32 dst_mac_count; /* How many MACs to iterate through */ + + unsigned char dst_mac[ETH_ALEN]; + unsigned char src_mac[ETH_ALEN]; + + __u32 cur_dst_mac_offset; + __u32 cur_src_mac_offset; + __u32 cur_saddr; + __u32 cur_daddr; + __u16 cur_udp_dst; + __u16 cur_udp_src; + __u32 cur_pkt_size; + + __u8 hh[14]; + /* = { + 0x00, 0x80, 0xC8, 0x79, 0xB3, 0xCB, + + We fill in SRC address later + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x08, 0x00 + }; + */ + __u16 pad; /* pad out the hh struct to an even 16 bytes */ + + struct sk_buff *skb; /* skb we are to transmit next, mainly used for when we + * are transmitting the same one multiple times + */ + struct net_device *odev; /* The out-going device. Note that the device should + * have it's pg_info pointer pointing back to this + * device. This will be set when the user specifies + * the out-going device name (not when the inject is + * started as it used to do.) + */ struct flow_state *flows; - unsigned cflows; /* Concurrent flows (config) */ - unsigned lflow; /* Flow length (config) */ - unsigned nflows; /* accumulated flows (stats) */ + unsigned cflows; /* Concurrent flows (config) */ + unsigned lflow; /* Flow length (config) */ + unsigned nflows; /* accumulated flows (stats) */ }; struct pktgen_hdr { - __u32 pgh_magic; - __u32 seq_num; + __u32 pgh_magic; + __u32 seq_num; __u32 tv_sec; __u32 tv_usec; }; struct pktgen_thread { - spinlock_t if_lock; - struct pktgen_dev *if_list; /* All device here */ - struct pktgen_thread* next; - char name[32]; - char result[512]; - u32 max_before_softirq; /* We'll call do_softirq to prevent starvation. */ - - /* Field for thread to receive "posted" events terminate, stop ifs etc.*/ + spinlock_t if_lock; + struct list_head if_list; /* All device here */ + struct list_head th_list; + int removed; + char name[32]; + char result[512]; + u32 max_before_softirq; /* We'll call do_softirq to prevent starvation. */ + + /* Field for thread to receive "posted" events terminate, stop ifs etc. */ - u32 control; + u32 control; int pid; int cpu; - wait_queue_head_t queue; + wait_queue_head_t queue; }; #define REMOVE 1 @@ -364,77 +364,76 @@ struct pktgen_thread { */ static inline s64 divremdi3(s64 x, s64 y, int type) { - u64 a = (x < 0) ? -x : x; - u64 b = (y < 0) ? -y : y; - u64 res = 0, d = 1; - - if (b > 0) { - while (b < a) { - b <<= 1; - d <<= 1; - } - } - - do { - if ( a >= b ) { - a -= b; - res += d; - } - b >>= 1; - d >>= 1; - } - while (d); - - if (PG_DIV == type) { - return (((x ^ y) & (1ll<<63)) == 0) ? res : -(s64)res; - } - else { - return ((x & (1ll<<63)) == 0) ? a : -(s64)a; - } + u64 a = (x < 0) ? -x : x; + u64 b = (y < 0) ? -y : y; + u64 res = 0, d = 1; + + if (b > 0) { + while (b < a) { + b <<= 1; + d <<= 1; + } + } + + do { + if (a >= b) { + a -= b; + res += d; + } + b >>= 1; + d >>= 1; + } + while (d); + + if (PG_DIV == type) { + return (((x ^ y) & (1ll << 63)) == 0) ? res : -(s64) res; + } else { + return ((x & (1ll << 63)) == 0) ? a : -(s64) a; + } } /* End of hacks to deal with 64-bit math on x86 */ /** Convert to milliseconds */ -static inline __u64 tv_to_ms(const struct timeval* tv) +static inline __u64 tv_to_ms(const struct timeval *tv) { - __u64 ms = tv->tv_usec / 1000; - ms += (__u64)tv->tv_sec * (__u64)1000; - return ms; + __u64 ms = tv->tv_usec / 1000; + ms += (__u64) tv->tv_sec * (__u64) 1000; + return ms; } - /** Convert to micro-seconds */ -static inline __u64 tv_to_us(const struct timeval* tv) +static inline __u64 tv_to_us(const struct timeval *tv) { - __u64 us = tv->tv_usec; - us += (__u64)tv->tv_sec * (__u64)1000000; - return us; + __u64 us = tv->tv_usec; + us += (__u64) tv->tv_sec * (__u64) 1000000; + return us; } -static inline __u64 pg_div(__u64 n, __u32 base) { - __u64 tmp = n; - do_div(tmp, base); - /* printk("pktgen: pg_div, n: %llu base: %d rv: %llu\n", - n, base, tmp); */ - return tmp; +static inline __u64 pg_div(__u64 n, __u32 base) +{ + __u64 tmp = n; + do_div(tmp, base); + /* printk("pktgen: pg_div, n: %llu base: %d rv: %llu\n", + n, base, tmp); */ + return tmp; } -static inline __u64 pg_div64(__u64 n, __u64 base) +static inline __u64 pg_div64(__u64 n, __u64 base) { - __u64 tmp = n; + __u64 tmp = n; /* * How do we know if the architecture we are running on * supports division with 64 bit base? * */ -#if defined(__sparc_v9__) || defined(__powerpc64__) || defined(__alpha__) || defined(__x86_64__) || defined(__ia64__) +#if defined(__sparc_v9__) || defined(__powerpc64__) || defined(__alpha__) || defined(__x86_64__) || defined(__ia64__) - do_div(tmp, base); + do_div(tmp, base); #else - tmp = divremdi3(n, base, PG_DIV); + tmp = divremdi3(n, base, PG_DIV); #endif - return tmp; + return tmp; } static inline u32 pktgen_random(void) @@ -448,51 +447,51 @@ static inline u32 pktgen_random(void) #endif } -static inline __u64 getCurMs(void) +static inline __u64 getCurMs(void) { - struct timeval tv; - do_gettimeofday(&tv); - return tv_to_ms(&tv); + struct timeval tv; + do_gettimeofday(&tv); + return tv_to_ms(&tv); } -static inline __u64 getCurUs(void) +static inline __u64 getCurUs(void) { - struct timeval tv; - do_gettimeofday(&tv); - return tv_to_us(&tv); + struct timeval tv; + do_gettimeofday(&tv); + return tv_to_us(&tv); } -static inline __u64 tv_diff(const struct timeval* a, const struct timeval* b) +static inline __u64 tv_diff(const struct timeval *a, const struct timeval *b) { - return tv_to_us(a) - tv_to_us(b); + return tv_to_us(a) - tv_to_us(b); } - /* old include end */ static char version[] __initdata = VERSION; -static int pktgen_remove_device(struct pktgen_thread* t, struct pktgen_dev *i); -static int pktgen_add_device(struct pktgen_thread* t, const char* ifname); -static struct pktgen_dev *pktgen_find_dev(struct pktgen_thread* t, const char* ifname); +static int pktgen_remove_device(struct pktgen_thread *t, struct pktgen_dev *i); +static int pktgen_add_device(struct pktgen_thread *t, const char *ifname); +static struct pktgen_dev *pktgen_find_dev(struct pktgen_thread *t, + const char *ifname); static int pktgen_device_event(struct notifier_block *, unsigned long, void *); static void pktgen_run_all_threads(void); static void pktgen_stop_all_threads_ifs(void); static int pktgen_stop_device(struct pktgen_dev *pkt_dev); -static void pktgen_stop(struct pktgen_thread* t); +static void pktgen_stop(struct pktgen_thread *t); static void pktgen_clear_counters(struct pktgen_dev *pkt_dev); -static struct pktgen_dev *pktgen_NN_threads(const char* dev_name, int remove); -static unsigned int scan_ip6(const char *s,char ip[16]); -static unsigned int fmt_ip6(char *s,const char ip[16]); +static int pktgen_mark_device(const char *ifname); +static unsigned int scan_ip6(const char *s, char ip[16]); +static unsigned int fmt_ip6(char *s, const char ip[16]); /* Module parameters, defaults. */ -static int pg_count_d = 1000; /* 1000 pkts by default */ +static int pg_count_d = 1000; /* 1000 pkts by default */ static int pg_delay_d; static int pg_clone_skb_d; static int debug; -static DECLARE_MUTEX(pktgen_sem); -static struct pktgen_thread *pktgen_threads = NULL; +static DEFINE_MUTEX(pktgen_thread_lock); +static LIST_HEAD(pktgen_threads); static struct notifier_block pktgen_notifier_block = { .notifier_call = pktgen_device_event, @@ -504,21 +503,21 @@ static struct notifier_block pktgen_noti */ static int pgctrl_show(struct seq_file *seq, void *v) -{ +{ seq_puts(seq, VERSION); return 0; } -static ssize_t pgctrl_write(struct file* file,const char __user * buf, - size_t count, loff_t *ppos) +static ssize_t pgctrl_write(struct file *file, const char __user * buf, + size_t count, loff_t * ppos) { int err = 0; char data[128]; - if (!capable(CAP_NET_ADMIN)){ - err = -EPERM; + if (!capable(CAP_NET_ADMIN)) { + err = -EPERM; goto out; - } + } if (count > sizeof(data)) count = sizeof(data); @@ -526,22 +525,22 @@ static ssize_t pgctrl_write(struct file* if (copy_from_user(data, buf, count)) { err = -EFAULT; goto out; - } - data[count-1] = 0; /* Make string */ + } + data[count - 1] = 0; /* Make string */ - if (!strcmp(data, "stop")) + if (!strcmp(data, "stop")) pktgen_stop_all_threads_ifs(); - else if (!strcmp(data, "start")) + else if (!strcmp(data, "start")) pktgen_run_all_threads(); - else + else printk("pktgen: Unknown command: %s\n", data); err = count; - out: - return err; +out: + return err; } static int pgctrl_open(struct inode *inode, struct file *file) @@ -550,147 +549,158 @@ static int pgctrl_open(struct inode *ino } static struct file_operations pktgen_fops = { - .owner = THIS_MODULE, - .open = pgctrl_open, - .read = seq_read, - .llseek = seq_lseek, - .write = pgctrl_write, - .release = single_release, + .owner = THIS_MODULE, + .open = pgctrl_open, + .read = seq_read, + .llseek = seq_lseek, + .write = pgctrl_write, + .release = single_release, }; static int pktgen_if_show(struct seq_file *seq, void *v) { int i; - struct pktgen_dev *pkt_dev = seq->private; - __u64 sa; - __u64 stopped; - __u64 now = getCurUs(); - - seq_printf(seq, "Params: count %llu min_pkt_size: %u max_pkt_size: %u\n", - (unsigned long long) pkt_dev->count, - pkt_dev->min_pkt_size, pkt_dev->max_pkt_size); - - seq_printf(seq, " frags: %d delay: %u clone_skb: %d ifname: %s\n", - pkt_dev->nfrags, 1000*pkt_dev->delay_us+pkt_dev->delay_ns, pkt_dev->clone_skb, pkt_dev->ifname); - - seq_printf(seq, " flows: %u flowlen: %u\n", pkt_dev->cflows, pkt_dev->lflow); + struct pktgen_dev *pkt_dev = seq->private; + __u64 sa; + __u64 stopped; + __u64 now = getCurUs(); + + seq_printf(seq, + "Params: count %llu min_pkt_size: %u max_pkt_size: %u\n", + (unsigned long long)pkt_dev->count, pkt_dev->min_pkt_size, + pkt_dev->max_pkt_size); + + seq_printf(seq, + " frags: %d delay: %u clone_skb: %d ifname: %s\n", + pkt_dev->nfrags, + 1000 * pkt_dev->delay_us + pkt_dev->delay_ns, + pkt_dev->clone_skb, pkt_dev->ifname); + seq_printf(seq, " flows: %u flowlen: %u\n", pkt_dev->cflows, + pkt_dev->lflow); - if(pkt_dev->flags & F_IPV6) { + if (pkt_dev->flags & F_IPV6) { char b1[128], b2[128], b3[128]; - fmt_ip6(b1, pkt_dev->in6_saddr.s6_addr); - fmt_ip6(b2, pkt_dev->min_in6_saddr.s6_addr); - fmt_ip6(b3, pkt_dev->max_in6_saddr.s6_addr); - seq_printf(seq, " saddr: %s min_saddr: %s max_saddr: %s\n", b1, b2, b3); - - fmt_ip6(b1, pkt_dev->in6_daddr.s6_addr); - fmt_ip6(b2, pkt_dev->min_in6_daddr.s6_addr); - fmt_ip6(b3, pkt_dev->max_in6_daddr.s6_addr); - seq_printf(seq, " daddr: %s min_daddr: %s max_daddr: %s\n", b1, b2, b3); - - } - else - seq_printf(seq," dst_min: %s dst_max: %s\n src_min: %s src_max: %s\n", - pkt_dev->dst_min, pkt_dev->dst_max, pkt_dev->src_min, pkt_dev->src_max); + fmt_ip6(b1, pkt_dev->in6_saddr.s6_addr); + fmt_ip6(b2, pkt_dev->min_in6_saddr.s6_addr); + fmt_ip6(b3, pkt_dev->max_in6_saddr.s6_addr); + seq_printf(seq, + " saddr: %s min_saddr: %s max_saddr: %s\n", b1, + b2, b3); + + fmt_ip6(b1, pkt_dev->in6_daddr.s6_addr); + fmt_ip6(b2, pkt_dev->min_in6_daddr.s6_addr); + fmt_ip6(b3, pkt_dev->max_in6_daddr.s6_addr); + seq_printf(seq, + " daddr: %s min_daddr: %s max_daddr: %s\n", b1, + b2, b3); + + } else + seq_printf(seq, + " dst_min: %s dst_max: %s\n src_min: %s src_max: %s\n", + pkt_dev->dst_min, pkt_dev->dst_max, pkt_dev->src_min, + pkt_dev->src_max); seq_puts(seq, " src_mac: "); if (is_zero_ether_addr(pkt_dev->src_mac)) - for (i = 0; i < 6; i++) - seq_printf(seq, "%02X%s", pkt_dev->odev->dev_addr[i], i == 5 ? " " : ":"); - else - for (i = 0; i < 6; i++) - seq_printf(seq, "%02X%s", pkt_dev->src_mac[i], i == 5 ? " " : ":"); - - seq_printf(seq, "dst_mac: "); - for (i = 0; i < 6; i++) - seq_printf(seq, "%02X%s", pkt_dev->dst_mac[i], i == 5 ? "\n" : ":"); - - seq_printf(seq, " udp_src_min: %d udp_src_max: %d udp_dst_min: %d udp_dst_max: %d\n", - pkt_dev->udp_src_min, pkt_dev->udp_src_max, pkt_dev->udp_dst_min, - pkt_dev->udp_dst_max); + for (i = 0; i < 6; i++) + seq_printf(seq, "%02X%s", pkt_dev->odev->dev_addr[i], + i == 5 ? " " : ":"); + else + for (i = 0; i < 6; i++) + seq_printf(seq, "%02X%s", pkt_dev->src_mac[i], + i == 5 ? " " : ":"); + + seq_printf(seq, "dst_mac: "); + for (i = 0; i < 6; i++) + seq_printf(seq, "%02X%s", pkt_dev->dst_mac[i], + i == 5 ? "\n" : ":"); + + seq_printf(seq, + " udp_src_min: %d udp_src_max: %d udp_dst_min: %d udp_dst_max: %d\n", + pkt_dev->udp_src_min, pkt_dev->udp_src_max, + pkt_dev->udp_dst_min, pkt_dev->udp_dst_max); - seq_printf(seq, " src_mac_count: %d dst_mac_count: %d \n Flags: ", + seq_printf(seq, + " src_mac_count: %d dst_mac_count: %d \n Flags: ", pkt_dev->src_mac_count, pkt_dev->dst_mac_count); + if (pkt_dev->flags & F_IPV6) + seq_printf(seq, "IPV6 "); + + if (pkt_dev->flags & F_IPSRC_RND) + seq_printf(seq, "IPSRC_RND "); + + if (pkt_dev->flags & F_IPDST_RND) + seq_printf(seq, "IPDST_RND "); - if (pkt_dev->flags & F_IPV6) - seq_printf(seq, "IPV6 "); + if (pkt_dev->flags & F_TXSIZE_RND) + seq_printf(seq, "TXSIZE_RND "); - if (pkt_dev->flags & F_IPSRC_RND) - seq_printf(seq, "IPSRC_RND "); + if (pkt_dev->flags & F_UDPSRC_RND) + seq_printf(seq, "UDPSRC_RND "); - if (pkt_dev->flags & F_IPDST_RND) - seq_printf(seq, "IPDST_RND "); - - if (pkt_dev->flags & F_TXSIZE_RND) - seq_printf(seq, "TXSIZE_RND "); - - if (pkt_dev->flags & F_UDPSRC_RND) - seq_printf(seq, "UDPSRC_RND "); - - if (pkt_dev->flags & F_UDPDST_RND) - seq_printf(seq, "UDPDST_RND "); - - if (pkt_dev->flags & F_MACSRC_RND) - seq_printf(seq, "MACSRC_RND "); - - if (pkt_dev->flags & F_MACDST_RND) - seq_printf(seq, "MACDST_RND "); - - - seq_puts(seq, "\n"); - - sa = pkt_dev->started_at; - stopped = pkt_dev->stopped_at; - if (pkt_dev->running) - stopped = now; /* not really stopped, more like last-running-at */ - - seq_printf(seq, "Current:\n pkts-sofar: %llu errors: %llu\n started: %lluus stopped: %lluus idle: %lluus\n", - (unsigned long long) pkt_dev->sofar, - (unsigned long long) pkt_dev->errors, - (unsigned long long) sa, - (unsigned long long) stopped, - (unsigned long long) pkt_dev->idle_acc); + if (pkt_dev->flags & F_UDPDST_RND) + seq_printf(seq, "UDPDST_RND "); - seq_printf(seq, " seq_num: %d cur_dst_mac_offset: %d cur_src_mac_offset: %d\n", + if (pkt_dev->flags & F_MACSRC_RND) + seq_printf(seq, "MACSRC_RND "); + + if (pkt_dev->flags & F_MACDST_RND) + seq_printf(seq, "MACDST_RND "); + + seq_puts(seq, "\n"); + + sa = pkt_dev->started_at; + stopped = pkt_dev->stopped_at; + if (pkt_dev->running) + stopped = now; /* not really stopped, more like last-running-at */ + + seq_printf(seq, + "Current:\n pkts-sofar: %llu errors: %llu\n started: %lluus stopped: %lluus idle: %lluus\n", + (unsigned long long)pkt_dev->sofar, + (unsigned long long)pkt_dev->errors, (unsigned long long)sa, + (unsigned long long)stopped, + (unsigned long long)pkt_dev->idle_acc); + + seq_printf(seq, + " seq_num: %d cur_dst_mac_offset: %d cur_src_mac_offset: %d\n", pkt_dev->seq_num, pkt_dev->cur_dst_mac_offset, pkt_dev->cur_src_mac_offset); - if(pkt_dev->flags & F_IPV6) { + if (pkt_dev->flags & F_IPV6) { char b1[128], b2[128]; - fmt_ip6(b1, pkt_dev->cur_in6_daddr.s6_addr); - fmt_ip6(b2, pkt_dev->cur_in6_saddr.s6_addr); - seq_printf(seq, " cur_saddr: %s cur_daddr: %s\n", b2, b1); - } - else - seq_printf(seq, " cur_saddr: 0x%x cur_daddr: 0x%x\n", + fmt_ip6(b1, pkt_dev->cur_in6_daddr.s6_addr); + fmt_ip6(b2, pkt_dev->cur_in6_saddr.s6_addr); + seq_printf(seq, " cur_saddr: %s cur_daddr: %s\n", b2, b1); + } else + seq_printf(seq, " cur_saddr: 0x%x cur_daddr: 0x%x\n", pkt_dev->cur_saddr, pkt_dev->cur_daddr); - - seq_printf(seq, " cur_udp_dst: %d cur_udp_src: %d\n", + seq_printf(seq, " cur_udp_dst: %d cur_udp_src: %d\n", pkt_dev->cur_udp_dst, pkt_dev->cur_udp_src); - seq_printf(seq, " flows: %u\n", pkt_dev->nflows); + seq_printf(seq, " flows: %u\n", pkt_dev->nflows); if (pkt_dev->result[0]) - seq_printf(seq, "Result: %s\n", pkt_dev->result); + seq_printf(seq, "Result: %s\n", pkt_dev->result); else - seq_printf(seq, "Result: Idle\n"); + seq_printf(seq, "Result: Idle\n"); return 0; } - -static int count_trail_chars(const char __user *user_buffer, unsigned int maxlen) +static int count_trail_chars(const char __user * user_buffer, + unsigned int maxlen) { int i; for (i = 0; i < maxlen; i++) { - char c; - if (get_user(c, &user_buffer[i])) - return -EFAULT; - switch (c) { + char c; + if (get_user(c, &user_buffer[i])) + return -EFAULT; + switch (c) { case '\"': case '\n': case '\r': @@ -706,34 +716,34 @@ done: return i; } -static unsigned long num_arg(const char __user *user_buffer, unsigned long maxlen, - unsigned long *num) +static unsigned long num_arg(const char __user * user_buffer, + unsigned long maxlen, unsigned long *num) { int i = 0; *num = 0; - - for(; i < maxlen; i++) { - char c; - if (get_user(c, &user_buffer[i])) - return -EFAULT; - if ((c >= '0') && (c <= '9')) { + + for (; i < maxlen; i++) { + char c; + if (get_user(c, &user_buffer[i])) + return -EFAULT; + if ((c >= '0') && (c <= '9')) { *num *= 10; - *num += c -'0'; + *num += c - '0'; } else break; } return i; } -static int strn_len(const char __user *user_buffer, unsigned int maxlen) +static int strn_len(const char __user * user_buffer, unsigned int maxlen) { int i = 0; - for(; i < maxlen; i++) { - char c; - if (get_user(c, &user_buffer[i])) - return -EFAULT; - switch (c) { + for (; i < maxlen; i++) { + char c; + if (get_user(c, &user_buffer[i])) + return -EFAULT; + switch (c) { case '\"': case '\n': case '\r': @@ -746,119 +756,133 @@ static int strn_len(const char __user *u }; } done_str: - return i; } -static ssize_t pktgen_if_write(struct file *file, const char __user *user_buffer, - size_t count, loff_t *offset) +static ssize_t pktgen_if_write(struct file *file, + const char __user * user_buffer, size_t count, + loff_t * offset) { - struct seq_file *seq = (struct seq_file *) file->private_data; - struct pktgen_dev *pkt_dev = seq->private; + struct seq_file *seq = (struct seq_file *)file->private_data; + struct pktgen_dev *pkt_dev = seq->private; int i = 0, max, len; char name[16], valstr[32]; unsigned long value = 0; - char* pg_result = NULL; - int tmp = 0; + char *pg_result = NULL; + int tmp = 0; char buf[128]; - - pg_result = &(pkt_dev->result[0]); - + + pg_result = &(pkt_dev->result[0]); + if (count < 1) { printk("pktgen: wrong command format\n"); return -EINVAL; } - + max = count - i; tmp = count_trail_chars(&user_buffer[i], max); - if (tmp < 0) { + if (tmp < 0) { printk("pktgen: illegal format\n"); - return tmp; + return tmp; } - i += tmp; - + i += tmp; + /* Read variable name */ len = strn_len(&user_buffer[i], sizeof(name) - 1); - if (len < 0) { return len; } + if (len < 0) { + return len; + } memset(name, 0, sizeof(name)); - if (copy_from_user(name, &user_buffer[i], len) ) + if (copy_from_user(name, &user_buffer[i], len)) return -EFAULT; i += len; - - max = count -i; + + max = count - i; len = count_trail_chars(&user_buffer[i], max); - if (len < 0) - return len; - + if (len < 0) + return len; + i += len; if (debug) { - char tb[count + 1]; - if (copy_from_user(tb, user_buffer, count)) + char tb[count + 1]; + if (copy_from_user(tb, user_buffer, count)) return -EFAULT; - tb[count] = 0; + tb[count] = 0; printk("pktgen: %s,%lu buffer -:%s:-\n", name, - (unsigned long) count, tb); - } + (unsigned long)count, tb); + } if (!strcmp(name, "min_pkt_size")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - if (value < 14+20+8) - value = 14+20+8; - if (value != pkt_dev->min_pkt_size) { - pkt_dev->min_pkt_size = value; - pkt_dev->cur_pkt_size = value; - } - sprintf(pg_result, "OK: min_pkt_size=%u", pkt_dev->min_pkt_size); + if (value < 14 + 20 + 8) + value = 14 + 20 + 8; + if (value != pkt_dev->min_pkt_size) { + pkt_dev->min_pkt_size = value; + pkt_dev->cur_pkt_size = value; + } + sprintf(pg_result, "OK: min_pkt_size=%u", + pkt_dev->min_pkt_size); return count; } - if (!strcmp(name, "max_pkt_size")) { + if (!strcmp(name, "max_pkt_size")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - if (value < 14+20+8) - value = 14+20+8; - if (value != pkt_dev->max_pkt_size) { - pkt_dev->max_pkt_size = value; - pkt_dev->cur_pkt_size = value; - } - sprintf(pg_result, "OK: max_pkt_size=%u", pkt_dev->max_pkt_size); + if (value < 14 + 20 + 8) + value = 14 + 20 + 8; + if (value != pkt_dev->max_pkt_size) { + pkt_dev->max_pkt_size = value; + pkt_dev->cur_pkt_size = value; + } + sprintf(pg_result, "OK: max_pkt_size=%u", + pkt_dev->max_pkt_size); return count; } - /* Shortcut for min = max */ + /* Shortcut for min = max */ if (!strcmp(name, "pkt_size")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - if (value < 14+20+8) - value = 14+20+8; - if (value != pkt_dev->min_pkt_size) { - pkt_dev->min_pkt_size = value; - pkt_dev->max_pkt_size = value; - pkt_dev->cur_pkt_size = value; - } + if (value < 14 + 20 + 8) + value = 14 + 20 + 8; + if (value != pkt_dev->min_pkt_size) { + pkt_dev->min_pkt_size = value; + pkt_dev->max_pkt_size = value; + pkt_dev->cur_pkt_size = value; + } sprintf(pg_result, "OK: pkt_size=%u", pkt_dev->min_pkt_size); return count; } - if (!strcmp(name, "debug")) { + if (!strcmp(name, "debug")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - debug = value; + debug = value; sprintf(pg_result, "OK: debug=%u", debug); return count; } - if (!strcmp(name, "frags")) { + if (!strcmp(name, "frags")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; pkt_dev->nfrags = value; sprintf(pg_result, "OK: frags=%u", pkt_dev->nfrags); @@ -866,7 +890,9 @@ static ssize_t pktgen_if_write(struct fi } if (!strcmp(name, "delay")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; if (value == 0x7FFFFFFF) { pkt_dev->delay_us = 0x7FFFFFFF; @@ -875,308 +901,347 @@ static ssize_t pktgen_if_write(struct fi pkt_dev->delay_us = value / 1000; pkt_dev->delay_ns = value % 1000; } - sprintf(pg_result, "OK: delay=%u", 1000*pkt_dev->delay_us+pkt_dev->delay_ns); + sprintf(pg_result, "OK: delay=%u", + 1000 * pkt_dev->delay_us + pkt_dev->delay_ns); return count; } - if (!strcmp(name, "udp_src_min")) { + if (!strcmp(name, "udp_src_min")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - if (value != pkt_dev->udp_src_min) { - pkt_dev->udp_src_min = value; - pkt_dev->cur_udp_src = value; - } + if (value != pkt_dev->udp_src_min) { + pkt_dev->udp_src_min = value; + pkt_dev->cur_udp_src = value; + } sprintf(pg_result, "OK: udp_src_min=%u", pkt_dev->udp_src_min); return count; } - if (!strcmp(name, "udp_dst_min")) { + if (!strcmp(name, "udp_dst_min")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - if (value != pkt_dev->udp_dst_min) { - pkt_dev->udp_dst_min = value; - pkt_dev->cur_udp_dst = value; - } + if (value != pkt_dev->udp_dst_min) { + pkt_dev->udp_dst_min = value; + pkt_dev->cur_udp_dst = value; + } sprintf(pg_result, "OK: udp_dst_min=%u", pkt_dev->udp_dst_min); return count; } - if (!strcmp(name, "udp_src_max")) { + if (!strcmp(name, "udp_src_max")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - if (value != pkt_dev->udp_src_max) { - pkt_dev->udp_src_max = value; - pkt_dev->cur_udp_src = value; - } + if (value != pkt_dev->udp_src_max) { + pkt_dev->udp_src_max = value; + pkt_dev->cur_udp_src = value; + } sprintf(pg_result, "OK: udp_src_max=%u", pkt_dev->udp_src_max); return count; } - if (!strcmp(name, "udp_dst_max")) { + if (!strcmp(name, "udp_dst_max")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - if (value != pkt_dev->udp_dst_max) { - pkt_dev->udp_dst_max = value; - pkt_dev->cur_udp_dst = value; - } + if (value != pkt_dev->udp_dst_max) { + pkt_dev->udp_dst_max = value; + pkt_dev->cur_udp_dst = value; + } sprintf(pg_result, "OK: udp_dst_max=%u", pkt_dev->udp_dst_max); return count; } if (!strcmp(name, "clone_skb")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; - pkt_dev->clone_skb = value; - + pkt_dev->clone_skb = value; + sprintf(pg_result, "OK: clone_skb=%d", pkt_dev->clone_skb); return count; } if (!strcmp(name, "count")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; pkt_dev->count = value; sprintf(pg_result, "OK: count=%llu", - (unsigned long long) pkt_dev->count); + (unsigned long long)pkt_dev->count); return count; } if (!strcmp(name, "src_mac_count")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; if (pkt_dev->src_mac_count != value) { - pkt_dev->src_mac_count = value; - pkt_dev->cur_src_mac_offset = 0; - } - sprintf(pg_result, "OK: src_mac_count=%d", pkt_dev->src_mac_count); + pkt_dev->src_mac_count = value; + pkt_dev->cur_src_mac_offset = 0; + } + sprintf(pg_result, "OK: src_mac_count=%d", + pkt_dev->src_mac_count); return count; } if (!strcmp(name, "dst_mac_count")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; if (pkt_dev->dst_mac_count != value) { - pkt_dev->dst_mac_count = value; - pkt_dev->cur_dst_mac_offset = 0; - } - sprintf(pg_result, "OK: dst_mac_count=%d", pkt_dev->dst_mac_count); + pkt_dev->dst_mac_count = value; + pkt_dev->cur_dst_mac_offset = 0; + } + sprintf(pg_result, "OK: dst_mac_count=%d", + pkt_dev->dst_mac_count); return count; } if (!strcmp(name, "flag")) { - char f[32]; - memset(f, 0, 32); + char f[32]; + memset(f, 0, 32); len = strn_len(&user_buffer[i], sizeof(f) - 1); - if (len < 0) { return len; } + if (len < 0) { + return len; + } if (copy_from_user(f, &user_buffer[i], len)) return -EFAULT; i += len; - if (strcmp(f, "IPSRC_RND") == 0) - pkt_dev->flags |= F_IPSRC_RND; - - else if (strcmp(f, "!IPSRC_RND") == 0) - pkt_dev->flags &= ~F_IPSRC_RND; - - else if (strcmp(f, "TXSIZE_RND") == 0) - pkt_dev->flags |= F_TXSIZE_RND; - - else if (strcmp(f, "!TXSIZE_RND") == 0) - pkt_dev->flags &= ~F_TXSIZE_RND; - - else if (strcmp(f, "IPDST_RND") == 0) - pkt_dev->flags |= F_IPDST_RND; - - else if (strcmp(f, "!IPDST_RND") == 0) - pkt_dev->flags &= ~F_IPDST_RND; - - else if (strcmp(f, "UDPSRC_RND") == 0) - pkt_dev->flags |= F_UDPSRC_RND; - - else if (strcmp(f, "!UDPSRC_RND") == 0) - pkt_dev->flags &= ~F_UDPSRC_RND; - - else if (strcmp(f, "UDPDST_RND") == 0) - pkt_dev->flags |= F_UDPDST_RND; - - else if (strcmp(f, "!UDPDST_RND") == 0) - pkt_dev->flags &= ~F_UDPDST_RND; - - else if (strcmp(f, "MACSRC_RND") == 0) - pkt_dev->flags |= F_MACSRC_RND; - - else if (strcmp(f, "!MACSRC_RND") == 0) - pkt_dev->flags &= ~F_MACSRC_RND; - - else if (strcmp(f, "MACDST_RND") == 0) - pkt_dev->flags |= F_MACDST_RND; - - else if (strcmp(f, "!MACDST_RND") == 0) - pkt_dev->flags &= ~F_MACDST_RND; - - else { - sprintf(pg_result, "Flag -:%s:- unknown\nAvailable flags, (prepend ! to un-set flag):\n%s", - f, - "IPSRC_RND, IPDST_RND, TXSIZE_RND, UDPSRC_RND, UDPDST_RND, MACSRC_RND, MACDST_RND\n"); - return count; - } + if (strcmp(f, "IPSRC_RND") == 0) + pkt_dev->flags |= F_IPSRC_RND; + + else if (strcmp(f, "!IPSRC_RND") == 0) + pkt_dev->flags &= ~F_IPSRC_RND; + + else if (strcmp(f, "TXSIZE_RND") == 0) + pkt_dev->flags |= F_TXSIZE_RND; + + else if (strcmp(f, "!TXSIZE_RND") == 0) + pkt_dev->flags &= ~F_TXSIZE_RND; + + else if (strcmp(f, "IPDST_RND") == 0) + pkt_dev->flags |= F_IPDST_RND; + + else if (strcmp(f, "!IPDST_RND") == 0) + pkt_dev->flags &= ~F_IPDST_RND; + + else if (strcmp(f, "UDPSRC_RND") == 0) + pkt_dev->flags |= F_UDPSRC_RND; + + else if (strcmp(f, "!UDPSRC_RND") == 0) + pkt_dev->flags &= ~F_UDPSRC_RND; + + else if (strcmp(f, "UDPDST_RND") == 0) + pkt_dev->flags |= F_UDPDST_RND; + + else if (strcmp(f, "!UDPDST_RND") == 0) + pkt_dev->flags &= ~F_UDPDST_RND; + + else if (strcmp(f, "MACSRC_RND") == 0) + pkt_dev->flags |= F_MACSRC_RND; + + else if (strcmp(f, "!MACSRC_RND") == 0) + pkt_dev->flags &= ~F_MACSRC_RND; + + else if (strcmp(f, "MACDST_RND") == 0) + pkt_dev->flags |= F_MACDST_RND; + + else if (strcmp(f, "!MACDST_RND") == 0) + pkt_dev->flags &= ~F_MACDST_RND; + + else { + sprintf(pg_result, + "Flag -:%s:- unknown\nAvailable flags, (prepend ! to un-set flag):\n%s", + f, + "IPSRC_RND, IPDST_RND, TXSIZE_RND, UDPSRC_RND, UDPDST_RND, MACSRC_RND, MACDST_RND\n"); + return count; + } sprintf(pg_result, "OK: flags=0x%x", pkt_dev->flags); return count; } if (!strcmp(name, "dst_min") || !strcmp(name, "dst")) { len = strn_len(&user_buffer[i], sizeof(pkt_dev->dst_min) - 1); - if (len < 0) { return len; } + if (len < 0) { + return len; + } - if (copy_from_user(buf, &user_buffer[i], len)) + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; - if (strcmp(buf, pkt_dev->dst_min) != 0) { - memset(pkt_dev->dst_min, 0, sizeof(pkt_dev->dst_min)); - strncpy(pkt_dev->dst_min, buf, len); - pkt_dev->daddr_min = in_aton(pkt_dev->dst_min); - pkt_dev->cur_daddr = pkt_dev->daddr_min; - } - if(debug) - printk("pktgen: dst_min set to: %s\n", pkt_dev->dst_min); - i += len; + buf[len] = 0; + if (strcmp(buf, pkt_dev->dst_min) != 0) { + memset(pkt_dev->dst_min, 0, sizeof(pkt_dev->dst_min)); + strncpy(pkt_dev->dst_min, buf, len); + pkt_dev->daddr_min = in_aton(pkt_dev->dst_min); + pkt_dev->cur_daddr = pkt_dev->daddr_min; + } + if (debug) + printk("pktgen: dst_min set to: %s\n", + pkt_dev->dst_min); + i += len; sprintf(pg_result, "OK: dst_min=%s", pkt_dev->dst_min); return count; } if (!strcmp(name, "dst_max")) { len = strn_len(&user_buffer[i], sizeof(pkt_dev->dst_max) - 1); - if (len < 0) { return len; } + if (len < 0) { + return len; + } - if (copy_from_user(buf, &user_buffer[i], len)) + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; - if (strcmp(buf, pkt_dev->dst_max) != 0) { - memset(pkt_dev->dst_max, 0, sizeof(pkt_dev->dst_max)); - strncpy(pkt_dev->dst_max, buf, len); - pkt_dev->daddr_max = in_aton(pkt_dev->dst_max); - pkt_dev->cur_daddr = pkt_dev->daddr_max; - } - if(debug) - printk("pktgen: dst_max set to: %s\n", pkt_dev->dst_max); + buf[len] = 0; + if (strcmp(buf, pkt_dev->dst_max) != 0) { + memset(pkt_dev->dst_max, 0, sizeof(pkt_dev->dst_max)); + strncpy(pkt_dev->dst_max, buf, len); + pkt_dev->daddr_max = in_aton(pkt_dev->dst_max); + pkt_dev->cur_daddr = pkt_dev->daddr_max; + } + if (debug) + printk("pktgen: dst_max set to: %s\n", + pkt_dev->dst_max); i += len; sprintf(pg_result, "OK: dst_max=%s", pkt_dev->dst_max); return count; } if (!strcmp(name, "dst6")) { len = strn_len(&user_buffer[i], sizeof(buf) - 1); - if (len < 0) return len; + if (len < 0) + return len; pkt_dev->flags |= F_IPV6; - if (copy_from_user(buf, &user_buffer[i], len)) + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; + buf[len] = 0; scan_ip6(buf, pkt_dev->in6_daddr.s6_addr); - fmt_ip6(buf, pkt_dev->in6_daddr.s6_addr); + fmt_ip6(buf, pkt_dev->in6_daddr.s6_addr); ipv6_addr_copy(&pkt_dev->cur_in6_daddr, &pkt_dev->in6_daddr); - if(debug) + if (debug) printk("pktgen: dst6 set to: %s\n", buf); - i += len; + i += len; sprintf(pg_result, "OK: dst6=%s", buf); return count; } if (!strcmp(name, "dst6_min")) { len = strn_len(&user_buffer[i], sizeof(buf) - 1); - if (len < 0) return len; + if (len < 0) + return len; pkt_dev->flags |= F_IPV6; - if (copy_from_user(buf, &user_buffer[i], len)) + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; + buf[len] = 0; scan_ip6(buf, pkt_dev->min_in6_daddr.s6_addr); - fmt_ip6(buf, pkt_dev->min_in6_daddr.s6_addr); + fmt_ip6(buf, pkt_dev->min_in6_daddr.s6_addr); - ipv6_addr_copy(&pkt_dev->cur_in6_daddr, &pkt_dev->min_in6_daddr); - if(debug) + ipv6_addr_copy(&pkt_dev->cur_in6_daddr, + &pkt_dev->min_in6_daddr); + if (debug) printk("pktgen: dst6_min set to: %s\n", buf); - i += len; + i += len; sprintf(pg_result, "OK: dst6_min=%s", buf); return count; } if (!strcmp(name, "dst6_max")) { len = strn_len(&user_buffer[i], sizeof(buf) - 1); - if (len < 0) return len; + if (len < 0) + return len; pkt_dev->flags |= F_IPV6; - if (copy_from_user(buf, &user_buffer[i], len)) + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; + buf[len] = 0; scan_ip6(buf, pkt_dev->max_in6_daddr.s6_addr); - fmt_ip6(buf, pkt_dev->max_in6_daddr.s6_addr); + fmt_ip6(buf, pkt_dev->max_in6_daddr.s6_addr); - if(debug) + if (debug) printk("pktgen: dst6_max set to: %s\n", buf); - i += len; + i += len; sprintf(pg_result, "OK: dst6_max=%s", buf); return count; } if (!strcmp(name, "src6")) { len = strn_len(&user_buffer[i], sizeof(buf) - 1); - if (len < 0) return len; + if (len < 0) + return len; pkt_dev->flags |= F_IPV6; - if (copy_from_user(buf, &user_buffer[i], len)) + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; + buf[len] = 0; scan_ip6(buf, pkt_dev->in6_saddr.s6_addr); - fmt_ip6(buf, pkt_dev->in6_saddr.s6_addr); + fmt_ip6(buf, pkt_dev->in6_saddr.s6_addr); ipv6_addr_copy(&pkt_dev->cur_in6_saddr, &pkt_dev->in6_saddr); - if(debug) + if (debug) printk("pktgen: src6 set to: %s\n", buf); - - i += len; + + i += len; sprintf(pg_result, "OK: src6=%s", buf); return count; } if (!strcmp(name, "src_min")) { len = strn_len(&user_buffer[i], sizeof(pkt_dev->src_min) - 1); - if (len < 0) { return len; } - if (copy_from_user(buf, &user_buffer[i], len)) + if (len < 0) { + return len; + } + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; - if (strcmp(buf, pkt_dev->src_min) != 0) { - memset(pkt_dev->src_min, 0, sizeof(pkt_dev->src_min)); - strncpy(pkt_dev->src_min, buf, len); - pkt_dev->saddr_min = in_aton(pkt_dev->src_min); - pkt_dev->cur_saddr = pkt_dev->saddr_min; - } - if(debug) - printk("pktgen: src_min set to: %s\n", pkt_dev->src_min); + buf[len] = 0; + if (strcmp(buf, pkt_dev->src_min) != 0) { + memset(pkt_dev->src_min, 0, sizeof(pkt_dev->src_min)); + strncpy(pkt_dev->src_min, buf, len); + pkt_dev->saddr_min = in_aton(pkt_dev->src_min); + pkt_dev->cur_saddr = pkt_dev->saddr_min; + } + if (debug) + printk("pktgen: src_min set to: %s\n", + pkt_dev->src_min); i += len; sprintf(pg_result, "OK: src_min=%s", pkt_dev->src_min); return count; } if (!strcmp(name, "src_max")) { len = strn_len(&user_buffer[i], sizeof(pkt_dev->src_max) - 1); - if (len < 0) { return len; } - if (copy_from_user(buf, &user_buffer[i], len)) + if (len < 0) { + return len; + } + if (copy_from_user(buf, &user_buffer[i], len)) return -EFAULT; - buf[len] = 0; - if (strcmp(buf, pkt_dev->src_max) != 0) { - memset(pkt_dev->src_max, 0, sizeof(pkt_dev->src_max)); - strncpy(pkt_dev->src_max, buf, len); - pkt_dev->saddr_max = in_aton(pkt_dev->src_max); - pkt_dev->cur_saddr = pkt_dev->saddr_max; - } - if(debug) - printk("pktgen: src_max set to: %s\n", pkt_dev->src_max); + buf[len] = 0; + if (strcmp(buf, pkt_dev->src_max) != 0) { + memset(pkt_dev->src_max, 0, sizeof(pkt_dev->src_max)); + strncpy(pkt_dev->src_max, buf, len); + pkt_dev->saddr_max = in_aton(pkt_dev->src_max); + pkt_dev->cur_saddr = pkt_dev->saddr_max; + } + if (debug) + printk("pktgen: src_max set to: %s\n", + pkt_dev->src_max); i += len; sprintf(pg_result, "OK: src_max=%s", pkt_dev->src_max); return count; @@ -1186,15 +1251,17 @@ static ssize_t pktgen_if_write(struct fi unsigned char old_dmac[ETH_ALEN]; unsigned char *m = pkt_dev->dst_mac; memcpy(old_dmac, pkt_dev->dst_mac, ETH_ALEN); - + len = strn_len(&user_buffer[i], sizeof(valstr) - 1); - if (len < 0) { return len; } + if (len < 0) { + return len; + } memset(valstr, 0, sizeof(valstr)); - if( copy_from_user(valstr, &user_buffer[i], len)) + if (copy_from_user(valstr, &user_buffer[i], len)) return -EFAULT; i += len; - for(*m = 0;*v && m < pkt_dev->dst_mac + 6; v++) { + for (*m = 0; *v && m < pkt_dev->dst_mac + 6; v++) { if (*v >= '0' && *v <= '9') { *m *= 16; *m += *v - '0'; @@ -1216,7 +1283,7 @@ static ssize_t pktgen_if_write(struct fi /* Set up Dest MAC */ if (compare_ether_addr(old_dmac, pkt_dev->dst_mac)) memcpy(&(pkt_dev->hh[0]), pkt_dev->dst_mac, ETH_ALEN); - + sprintf(pg_result, "OK: dstmac"); return count; } @@ -1225,13 +1292,15 @@ static ssize_t pktgen_if_write(struct fi unsigned char *m = pkt_dev->src_mac; len = strn_len(&user_buffer[i], sizeof(valstr) - 1); - if (len < 0) { return len; } + if (len < 0) { + return len; + } memset(valstr, 0, sizeof(valstr)); - if( copy_from_user(valstr, &user_buffer[i], len)) + if (copy_from_user(valstr, &user_buffer[i], len)) return -EFAULT; i += len; - for(*m = 0;*v && m < pkt_dev->src_mac + 6; v++) { + for (*m = 0; *v && m < pkt_dev->src_mac + 6; v++) { if (*v >= '0' && *v <= '9') { *m *= 16; *m += *v - '0'; @@ -1248,21 +1317,23 @@ static ssize_t pktgen_if_write(struct fi m++; *m = 0; } - } + } - sprintf(pg_result, "OK: srcmac"); + sprintf(pg_result, "OK: srcmac"); return count; } - if (!strcmp(name, "clear_counters")) { - pktgen_clear_counters(pkt_dev); - sprintf(pg_result, "OK: Clearing counters.\n"); - return count; - } + if (!strcmp(name, "clear_counters")) { + pktgen_clear_counters(pkt_dev); + sprintf(pg_result, "OK: Clearing counters.\n"); + return count; + } if (!strcmp(name, "flows")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; if (value > MAX_CFLOWS) value = MAX_CFLOWS; @@ -1274,13 +1345,15 @@ static ssize_t pktgen_if_write(struct fi if (!strcmp(name, "flowlen")) { len = num_arg(&user_buffer[i], 10, &value); - if (len < 0) { return len; } + if (len < 0) { + return len; + } i += len; pkt_dev->lflow = value; sprintf(pg_result, "OK: flowlen=%u", pkt_dev->lflow); return count; } - + sprintf(pkt_dev->result, "No such parameter \"%s\"", name); return -EINVAL; } @@ -1291,35 +1364,35 @@ static int pktgen_if_open(struct inode * } static struct file_operations pktgen_if_fops = { - .owner = THIS_MODULE, - .open = pktgen_if_open, - .read = seq_read, - .llseek = seq_lseek, - .write = pktgen_if_write, - .release = single_release, + .owner = THIS_MODULE, + .open = pktgen_if_open, + .read = seq_read, + .llseek = seq_lseek, + .write = pktgen_if_write, + .release = single_release, }; static int pktgen_thread_show(struct seq_file *seq, void *v) { - struct pktgen_thread *t = seq->private; - struct pktgen_dev *pkt_dev = NULL; + struct pktgen_thread *t = seq->private; + struct pktgen_dev *pkt_dev; BUG_ON(!t); seq_printf(seq, "Name: %s max_before_softirq: %d\n", - t->name, t->max_before_softirq); + t->name, t->max_before_softirq); - seq_printf(seq, "Running: "); - - if_lock(t); - for(pkt_dev = t->if_list;pkt_dev; pkt_dev = pkt_dev->next) - if(pkt_dev->running) + seq_printf(seq, "Running: "); + + if_lock(t); + list_for_each_entry(pkt_dev, &t->if_list, list) + if (pkt_dev->running) seq_printf(seq, "%s ", pkt_dev->ifname); - - seq_printf(seq, "\nStopped: "); - for(pkt_dev = t->if_list;pkt_dev; pkt_dev = pkt_dev->next) - if(!pkt_dev->running) + seq_printf(seq, "\nStopped: "); + + list_for_each_entry(pkt_dev, &t->if_list, list) + if (!pkt_dev->running) seq_printf(seq, "%s ", pkt_dev->ifname); if (t->result[0]) @@ -1327,30 +1400,30 @@ static int pktgen_thread_show(struct seq else seq_printf(seq, "\nResult: NA\n"); - if_unlock(t); + if_unlock(t); return 0; } static ssize_t pktgen_thread_write(struct file *file, - const char __user *user_buffer, - size_t count, loff_t *offset) + const char __user * user_buffer, + size_t count, loff_t * offset) { - struct seq_file *seq = (struct seq_file *) file->private_data; - struct pktgen_thread *t = seq->private; + struct seq_file *seq = (struct seq_file *)file->private_data; + struct pktgen_thread *t = seq->private; int i = 0, max, len, ret; char name[40]; - char *pg_result; - unsigned long value = 0; + char *pg_result; + unsigned long value = 0; if (count < 1) { - // sprintf(pg_result, "Wrong command format"); + // sprintf(pg_result, "Wrong command format"); return -EINVAL; } max = count - i; - len = count_trail_chars(&user_buffer[i], max); - if (len < 0) + len = count_trail_chars(&user_buffer[i], max); + if (len < 0) return len; i += len; @@ -1358,26 +1431,25 @@ static ssize_t pktgen_thread_write(struc /* Read variable name */ len = strn_len(&user_buffer[i], sizeof(name) - 1); - if (len < 0) + if (len < 0) return len; - + memset(name, 0, sizeof(name)); if (copy_from_user(name, &user_buffer[i], len)) return -EFAULT; i += len; - max = count -i; + max = count - i; len = count_trail_chars(&user_buffer[i], max); - if (len < 0) + if (len < 0) return len; i += len; if (debug) - printk("pktgen: t=%s, count=%lu\n", name, - (unsigned long) count); + printk("pktgen: t=%s, count=%lu\n", name, (unsigned long)count); - if(!t) { + if (!t) { printk("pktgen: ERROR: No thread\n"); ret = -EINVAL; goto out; @@ -1385,48 +1457,47 @@ static ssize_t pktgen_thread_write(struc pg_result = &(t->result[0]); - if (!strcmp(name, "add_device")) { - char f[32]; - memset(f, 0, 32); + if (!strcmp(name, "add_device")) { + char f[32]; + memset(f, 0, 32); len = strn_len(&user_buffer[i], sizeof(f) - 1); - if (len < 0) { - ret = len; + if (len < 0) { + ret = len; goto out; } - if( copy_from_user(f, &user_buffer[i], len) ) + if (copy_from_user(f, &user_buffer[i], len)) return -EFAULT; i += len; - thread_lock(); - pktgen_add_device(t, f); - thread_unlock(); - ret = count; - sprintf(pg_result, "OK: add_device=%s", f); + mutex_lock(&pktgen_thread_lock); + pktgen_add_device(t, f); + mutex_unlock(&pktgen_thread_lock); + ret = count; + sprintf(pg_result, "OK: add_device=%s", f); goto out; } - if (!strcmp(name, "rem_device_all")) { - thread_lock(); - t->control |= T_REMDEV; - thread_unlock(); - schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Propagate thread->control */ + if (!strcmp(name, "rem_device_all")) { + mutex_lock(&pktgen_thread_lock); + t->control |= T_REMDEVALL; + mutex_unlock(&pktgen_thread_lock); + schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Propagate thread->control */ ret = count; - sprintf(pg_result, "OK: rem_device_all"); + sprintf(pg_result, "OK: rem_device_all"); goto out; } - if (!strcmp(name, "max_before_softirq")) { - len = num_arg(&user_buffer[i], 10, &value); - thread_lock(); - t->max_before_softirq = value; - thread_unlock(); - ret = count; - sprintf(pg_result, "OK: max_before_softirq=%lu", value); + if (!strcmp(name, "max_before_softirq")) { + len = num_arg(&user_buffer[i], 10, &value); + mutex_lock(&pktgen_thread_lock); + t->max_before_softirq = value; + mutex_unlock(&pktgen_thread_lock); + ret = count; + sprintf(pg_result, "OK: max_before_softirq=%lu", value); goto out; } ret = -EINVAL; - out: - +out: return ret; } @@ -1436,47 +1507,78 @@ static int pktgen_thread_open(struct ino } static struct file_operations pktgen_thread_fops = { - .owner = THIS_MODULE, - .open = pktgen_thread_open, - .read = seq_read, - .llseek = seq_lseek, - .write = pktgen_thread_write, - .release = single_release, + .owner = THIS_MODULE, + .open = pktgen_thread_open, + .read = seq_read, + .llseek = seq_lseek, + .write = pktgen_thread_write, + .release = single_release, }; /* Think find or remove for NN */ -static struct pktgen_dev *__pktgen_NN_threads(const char* ifname, int remove) +static struct pktgen_dev *__pktgen_NN_threads(const char *ifname, int remove) { struct pktgen_thread *t; struct pktgen_dev *pkt_dev = NULL; - t = pktgen_threads; - - while (t) { + list_for_each_entry(t, &pktgen_threads, th_list) { pkt_dev = pktgen_find_dev(t, ifname); if (pkt_dev) { - if(remove) { - if_lock(t); - pktgen_remove_device(t, pkt_dev); - if_unlock(t); - } + if (remove) { + if_lock(t); + pkt_dev->removal_mark = 1; + t->control |= T_REMDEV; + if_unlock(t); + } break; } - t = t->next; } - return pkt_dev; + return pkt_dev; } -static struct pktgen_dev *pktgen_NN_threads(const char* ifname, int remove) +/* + * mark a device for removal + */ +static int pktgen_mark_device(const char *ifname) { struct pktgen_dev *pkt_dev = NULL; - thread_lock(); - pkt_dev = __pktgen_NN_threads(ifname, remove); - thread_unlock(); - return pkt_dev; + const int max_tries = 10, msec_per_try = 125; + int i = 0; + int ret = 0; + + mutex_lock(&pktgen_thread_lock); + PG_DEBUG(printk("pktgen: pktgen_mark_device marking %s for removal\n", + ifname)); + + while (1) { + + pkt_dev = __pktgen_NN_threads(ifname, REMOVE); + if (pkt_dev == NULL) + break; /* success */ + + mutex_unlock(&pktgen_thread_lock); + PG_DEBUG(printk("pktgen: pktgen_mark_device waiting for %s " + "to disappear....\n", ifname)); + schedule_timeout_interruptible(msecs_to_jiffies(msec_per_try)); + mutex_lock(&pktgen_thread_lock); + + if (++i >= max_tries) { + printk("pktgen_mark_device: timed out after waiting " + "%d msec for device %s to be removed\n", + msec_per_try * i, ifname); + ret = 1; + break; + } + + } + + mutex_unlock(&pktgen_thread_lock); + + return ret; } -static int pktgen_device_event(struct notifier_block *unused, unsigned long event, void *ptr) +static int pktgen_device_event(struct notifier_block *unused, + unsigned long event, void *ptr) { struct net_device *dev = (struct net_device *)(ptr); @@ -1491,9 +1593,9 @@ static int pktgen_device_event(struct no case NETDEV_UP: /* Ignore for now */ break; - + case NETDEV_UNREGISTER: - pktgen_NN_threads(dev->name, REMOVE); + pktgen_mark_device(dev->name); break; }; @@ -1502,15 +1604,16 @@ static int pktgen_device_event(struct no /* Associate pktgen_dev with a device. */ -static struct net_device* pktgen_setup_dev(struct pktgen_dev *pkt_dev) { +static struct net_device *pktgen_setup_dev(struct pktgen_dev *pkt_dev) +{ struct net_device *odev; /* Clean old setups */ if (pkt_dev->odev) { dev_put(pkt_dev->odev); - pkt_dev->odev = NULL; - } + pkt_dev->odev = NULL; + } odev = dev_get_by_name(pkt_dev->ifname); @@ -1519,7 +1622,8 @@ static struct net_device* pktgen_setup_d goto out; } if (odev->type != ARPHRD_ETHER) { - printk("pktgen: not an ethernet device: \"%s\"\n", pkt_dev->ifname); + printk("pktgen: not an ethernet device: \"%s\"\n", + pkt_dev->ifname); goto out_put; } if (!netif_running(odev)) { @@ -1527,13 +1631,13 @@ static struct net_device* pktgen_setup_d goto out_put; } pkt_dev->odev = odev; - - return pkt_dev->odev; + + return pkt_dev->odev; out_put: dev_put(odev); out: - return NULL; + return NULL; } @@ -1543,59 +1647,64 @@ out: static void pktgen_setup_inject(struct pktgen_dev *pkt_dev) { /* Try once more, just in case it works now. */ - if (!pkt_dev->odev) - pktgen_setup_dev(pkt_dev); - - if (!pkt_dev->odev) { - printk("pktgen: ERROR: pkt_dev->odev == NULL in setup_inject.\n"); - sprintf(pkt_dev->result, "ERROR: pkt_dev->odev == NULL in setup_inject.\n"); - return; - } - - /* Default to the interface's mac if not explicitly set. */ + if (!pkt_dev->odev) + pktgen_setup_dev(pkt_dev); + + if (!pkt_dev->odev) { + printk("pktgen: ERROR: pkt_dev->odev == NULL in setup_inject.\n"); + sprintf(pkt_dev->result, + "ERROR: pkt_dev->odev == NULL in setup_inject.\n"); + return; + } + + /* Default to the interface's mac if not explicitly set. */ if (is_zero_ether_addr(pkt_dev->src_mac)) - memcpy(&(pkt_dev->hh[6]), pkt_dev->odev->dev_addr, ETH_ALEN); + memcpy(&(pkt_dev->hh[6]), pkt_dev->odev->dev_addr, ETH_ALEN); - /* Set up Dest MAC */ + /* Set up Dest MAC */ memcpy(&(pkt_dev->hh[0]), pkt_dev->dst_mac, ETH_ALEN); - /* Set up pkt size */ - pkt_dev->cur_pkt_size = pkt_dev->min_pkt_size; - - if(pkt_dev->flags & F_IPV6) { + /* Set up pkt size */ + pkt_dev->cur_pkt_size = pkt_dev->min_pkt_size; + + if (pkt_dev->flags & F_IPV6) { /* * Skip this automatic address setting until locks or functions * gets exported */ #ifdef NOTNOW - int i, set = 0, err=1; + int i, set = 0, err = 1; struct inet6_dev *idev; - for(i=0; i< IN6_ADDR_HSIZE; i++) - if(pkt_dev->cur_in6_saddr.s6_addr[i]) { + for (i = 0; i < IN6_ADDR_HSIZE; i++) + if (pkt_dev->cur_in6_saddr.s6_addr[i]) { set = 1; break; } - if(!set) { - + if (!set) { + /* * Use linklevel address if unconfigured. * * use ipv6_get_lladdr if/when it's get exported */ - read_lock(&addrconf_lock); if ((idev = __in6_dev_get(pkt_dev->odev)) != NULL) { struct inet6_ifaddr *ifp; read_lock_bh(&idev->lock); - for (ifp=idev->addr_list; ifp; ifp=ifp->if_next) { - if (ifp->scope == IFA_LINK && !(ifp->flags&IFA_F_TENTATIVE)) { - ipv6_addr_copy(&pkt_dev->cur_in6_saddr, &ifp->addr); + for (ifp = idev->addr_list; ifp; + ifp = ifp->if_next) { + if (ifp->scope == IFA_LINK + && !(ifp-> + flags & IFA_F_TENTATIVE)) { + ipv6_addr_copy(&pkt_dev-> + cur_in6_saddr, + &ifp->addr); err = 0; break; } @@ -1603,28 +1712,28 @@ static void pktgen_setup_inject(struct p read_unlock_bh(&idev->lock); } read_unlock(&addrconf_lock); - if(err) printk("pktgen: ERROR: IPv6 link address not availble.\n"); + if (err) + printk("pktgen: ERROR: IPv6 link address not availble.\n"); } #endif - } - else { + } else { pkt_dev->saddr_min = 0; pkt_dev->saddr_max = 0; if (strlen(pkt_dev->src_min) == 0) { - - struct in_device *in_dev; + + struct in_device *in_dev; rcu_read_lock(); in_dev = __in_dev_get_rcu(pkt_dev->odev); if (in_dev) { if (in_dev->ifa_list) { - pkt_dev->saddr_min = in_dev->ifa_list->ifa_address; + pkt_dev->saddr_min = + in_dev->ifa_list->ifa_address; pkt_dev->saddr_max = pkt_dev->saddr_min; } } rcu_read_unlock(); - } - else { + } else { pkt_dev->saddr_min = in_aton(pkt_dev->src_min); pkt_dev->saddr_max = in_aton(pkt_dev->src_max); } @@ -1632,13 +1741,13 @@ static void pktgen_setup_inject(struct p pkt_dev->daddr_min = in_aton(pkt_dev->dst_min); pkt_dev->daddr_max = in_aton(pkt_dev->dst_max); } - /* Initialize current values. */ - pkt_dev->cur_dst_mac_offset = 0; - pkt_dev->cur_src_mac_offset = 0; - pkt_dev->cur_saddr = pkt_dev->saddr_min; - pkt_dev->cur_daddr = pkt_dev->daddr_min; - pkt_dev->cur_udp_dst = pkt_dev->udp_dst_min; - pkt_dev->cur_udp_src = pkt_dev->udp_src_min; + /* Initialize current values. */ + pkt_dev->cur_dst_mac_offset = 0; + pkt_dev->cur_src_mac_offset = 0; + pkt_dev->cur_saddr = pkt_dev->saddr_min; + pkt_dev->cur_daddr = pkt_dev->daddr_min; + pkt_dev->cur_udp_dst = pkt_dev->udp_dst_min; + pkt_dev->cur_udp_src = pkt_dev->udp_src_min; pkt_dev->nflows = 0; } @@ -1651,7 +1760,7 @@ static void spin(struct pktgen_dev *pkt_ printk(KERN_INFO "sleeping for %d\n", (int)(spin_until_us - now)); while (now < spin_until_us) { /* TODO: optimize sleeping behavior */ - if (spin_until_us - now > jiffies_to_usecs(1)+1) + if (spin_until_us - now > jiffies_to_usecs(1) + 1) schedule_timeout_interruptible(1); else if (spin_until_us - now > 100) { do_softirq(); @@ -1667,102 +1776,110 @@ static void spin(struct pktgen_dev *pkt_ pkt_dev->idle_acc += now - start; } - /* Increment/randomize headers according to flags and current values * for IP src/dest, UDP src/dst port, MAC-Addr src/dst */ -static void mod_cur_headers(struct pktgen_dev *pkt_dev) { - __u32 imn; - __u32 imx; - int flow = 0; +static void mod_cur_headers(struct pktgen_dev *pkt_dev) +{ + __u32 imn; + __u32 imx; + int flow = 0; - if(pkt_dev->cflows) { + if (pkt_dev->cflows) { flow = pktgen_random() % pkt_dev->cflows; - + if (pkt_dev->flows[flow].count > pkt_dev->lflow) pkt_dev->flows[flow].count = 0; - } - + } /* Deal with source MAC */ - if (pkt_dev->src_mac_count > 1) { - __u32 mc; - __u32 tmp; - - if (pkt_dev->flags & F_MACSRC_RND) - mc = pktgen_random() % (pkt_dev->src_mac_count); - else { - mc = pkt_dev->cur_src_mac_offset++; - if (pkt_dev->cur_src_mac_offset > pkt_dev->src_mac_count) - pkt_dev->cur_src_mac_offset = 0; - } - - tmp = pkt_dev->src_mac[5] + (mc & 0xFF); - pkt_dev->hh[11] = tmp; - tmp = (pkt_dev->src_mac[4] + ((mc >> 8) & 0xFF) + (tmp >> 8)); - pkt_dev->hh[10] = tmp; - tmp = (pkt_dev->src_mac[3] + ((mc >> 16) & 0xFF) + (tmp >> 8)); - pkt_dev->hh[9] = tmp; - tmp = (pkt_dev->src_mac[2] + ((mc >> 24) & 0xFF) + (tmp >> 8)); - pkt_dev->hh[8] = tmp; - tmp = (pkt_dev->src_mac[1] + (tmp >> 8)); - pkt_dev->hh[7] = tmp; - } - - /* Deal with Destination MAC */ - if (pkt_dev->dst_mac_count > 1) { - __u32 mc; - __u32 tmp; - - if (pkt_dev->flags & F_MACDST_RND) - mc = pktgen_random() % (pkt_dev->dst_mac_count); - - else { - mc = pkt_dev->cur_dst_mac_offset++; - if (pkt_dev->cur_dst_mac_offset > pkt_dev->dst_mac_count) { - pkt_dev->cur_dst_mac_offset = 0; - } - } - - tmp = pkt_dev->dst_mac[5] + (mc & 0xFF); - pkt_dev->hh[5] = tmp; - tmp = (pkt_dev->dst_mac[4] + ((mc >> 8) & 0xFF) + (tmp >> 8)); - pkt_dev->hh[4] = tmp; - tmp = (pkt_dev->dst_mac[3] + ((mc >> 16) & 0xFF) + (tmp >> 8)); - pkt_dev->hh[3] = tmp; - tmp = (pkt_dev->dst_mac[2] + ((mc >> 24) & 0xFF) + (tmp >> 8)); - pkt_dev->hh[2] = tmp; - tmp = (pkt_dev->dst_mac[1] + (tmp >> 8)); - pkt_dev->hh[1] = tmp; - } - - if (pkt_dev->udp_src_min < pkt_dev->udp_src_max) { - if (pkt_dev->flags & F_UDPSRC_RND) - pkt_dev->cur_udp_src = ((pktgen_random() % (pkt_dev->udp_src_max - pkt_dev->udp_src_min)) + pkt_dev->udp_src_min); + if (pkt_dev->src_mac_count > 1) { + __u32 mc; + __u32 tmp; - else { + if (pkt_dev->flags & F_MACSRC_RND) + mc = pktgen_random() % (pkt_dev->src_mac_count); + else { + mc = pkt_dev->cur_src_mac_offset++; + if (pkt_dev->cur_src_mac_offset > + pkt_dev->src_mac_count) + pkt_dev->cur_src_mac_offset = 0; + } + + tmp = pkt_dev->src_mac[5] + (mc & 0xFF); + pkt_dev->hh[11] = tmp; + tmp = (pkt_dev->src_mac[4] + ((mc >> 8) & 0xFF) + (tmp >> 8)); + pkt_dev->hh[10] = tmp; + tmp = (pkt_dev->src_mac[3] + ((mc >> 16) & 0xFF) + (tmp >> 8)); + pkt_dev->hh[9] = tmp; + tmp = (pkt_dev->src_mac[2] + ((mc >> 24) & 0xFF) + (tmp >> 8)); + pkt_dev->hh[8] = tmp; + tmp = (pkt_dev->src_mac[1] + (tmp >> 8)); + pkt_dev->hh[7] = tmp; + } + + /* Deal with Destination MAC */ + if (pkt_dev->dst_mac_count > 1) { + __u32 mc; + __u32 tmp; + + if (pkt_dev->flags & F_MACDST_RND) + mc = pktgen_random() % (pkt_dev->dst_mac_count); + + else { + mc = pkt_dev->cur_dst_mac_offset++; + if (pkt_dev->cur_dst_mac_offset > + pkt_dev->dst_mac_count) { + pkt_dev->cur_dst_mac_offset = 0; + } + } + + tmp = pkt_dev->dst_mac[5] + (mc & 0xFF); + pkt_dev->hh[5] = tmp; + tmp = (pkt_dev->dst_mac[4] + ((mc >> 8) & 0xFF) + (tmp >> 8)); + pkt_dev->hh[4] = tmp; + tmp = (pkt_dev->dst_mac[3] + ((mc >> 16) & 0xFF) + (tmp >> 8)); + pkt_dev->hh[3] = tmp; + tmp = (pkt_dev->dst_mac[2] + ((mc >> 24) & 0xFF) + (tmp >> 8)); + pkt_dev->hh[2] = tmp; + tmp = (pkt_dev->dst_mac[1] + (tmp >> 8)); + pkt_dev->hh[1] = tmp; + } + + if (pkt_dev->udp_src_min < pkt_dev->udp_src_max) { + if (pkt_dev->flags & F_UDPSRC_RND) + pkt_dev->cur_udp_src = + ((pktgen_random() % + (pkt_dev->udp_src_max - pkt_dev->udp_src_min)) + + pkt_dev->udp_src_min); + + else { pkt_dev->cur_udp_src++; if (pkt_dev->cur_udp_src >= pkt_dev->udp_src_max) pkt_dev->cur_udp_src = pkt_dev->udp_src_min; - } - } + } + } - if (pkt_dev->udp_dst_min < pkt_dev->udp_dst_max) { - if (pkt_dev->flags & F_UDPDST_RND) { - pkt_dev->cur_udp_dst = ((pktgen_random() % (pkt_dev->udp_dst_max - pkt_dev->udp_dst_min)) + pkt_dev->udp_dst_min); - } - else { + if (pkt_dev->udp_dst_min < pkt_dev->udp_dst_max) { + if (pkt_dev->flags & F_UDPDST_RND) { + pkt_dev->cur_udp_dst = + ((pktgen_random() % + (pkt_dev->udp_dst_max - pkt_dev->udp_dst_min)) + + pkt_dev->udp_dst_min); + } else { pkt_dev->cur_udp_dst++; - if (pkt_dev->cur_udp_dst >= pkt_dev->udp_dst_max) + if (pkt_dev->cur_udp_dst >= pkt_dev->udp_dst_max) pkt_dev->cur_udp_dst = pkt_dev->udp_dst_min; - } - } + } + } if (!(pkt_dev->flags & F_IPV6)) { - if ((imn = ntohl(pkt_dev->saddr_min)) < (imx = ntohl(pkt_dev->saddr_max))) { + if ((imn = ntohl(pkt_dev->saddr_min)) < (imx = + ntohl(pkt_dev-> + saddr_max))) { __u32 t; - if (pkt_dev->flags & F_IPSRC_RND) + if (pkt_dev->flags & F_IPSRC_RND) t = ((pktgen_random() % (imx - imn)) + imn); else { t = ntohl(pkt_dev->cur_saddr); @@ -1773,25 +1890,32 @@ static void mod_cur_headers(struct pktge } pkt_dev->cur_saddr = htonl(t); } - + if (pkt_dev->cflows && pkt_dev->flows[flow].count != 0) { pkt_dev->cur_daddr = pkt_dev->flows[flow].cur_daddr; } else { - if ((imn = ntohl(pkt_dev->daddr_min)) < (imx = ntohl(pkt_dev->daddr_max))) { + if ((imn = ntohl(pkt_dev->daddr_min)) < (imx = + ntohl(pkt_dev-> + daddr_max))) + { __u32 t; if (pkt_dev->flags & F_IPDST_RND) { - t = ((pktgen_random() % (imx - imn)) + imn); + t = ((pktgen_random() % (imx - imn)) + + imn); t = htonl(t); - while( LOOPBACK(t) || MULTICAST(t) || BADCLASS(t) || ZERONET(t) || LOCAL_MCAST(t) ) { - t = ((pktgen_random() % (imx - imn)) + imn); + while (LOOPBACK(t) || MULTICAST(t) + || BADCLASS(t) || ZERONET(t) + || LOCAL_MCAST(t)) { + t = ((pktgen_random() % + (imx - imn)) + imn); t = htonl(t); } pkt_dev->cur_daddr = t; } - + else { t = ntohl(pkt_dev->cur_daddr); t++; @@ -1801,60 +1925,59 @@ static void mod_cur_headers(struct pktge pkt_dev->cur_daddr = htonl(t); } } - if(pkt_dev->cflows) { - pkt_dev->flows[flow].cur_daddr = pkt_dev->cur_daddr; + if (pkt_dev->cflows) { + pkt_dev->flows[flow].cur_daddr = + pkt_dev->cur_daddr; pkt_dev->nflows++; } } - } - else /* IPV6 * */ - { - if(pkt_dev->min_in6_daddr.s6_addr32[0] == 0 && - pkt_dev->min_in6_daddr.s6_addr32[1] == 0 && - pkt_dev->min_in6_daddr.s6_addr32[2] == 0 && - pkt_dev->min_in6_daddr.s6_addr32[3] == 0); + } else { /* IPV6 * */ + + if (pkt_dev->min_in6_daddr.s6_addr32[0] == 0 && + pkt_dev->min_in6_daddr.s6_addr32[1] == 0 && + pkt_dev->min_in6_daddr.s6_addr32[2] == 0 && + pkt_dev->min_in6_daddr.s6_addr32[3] == 0) ; else { int i; /* Only random destinations yet */ - for(i=0; i < 4; i++) { + for (i = 0; i < 4; i++) { pkt_dev->cur_in6_daddr.s6_addr32[i] = - ((pktgen_random() | - pkt_dev->min_in6_daddr.s6_addr32[i]) & - pkt_dev->max_in6_daddr.s6_addr32[i]); - } - } + ((pktgen_random() | + pkt_dev->min_in6_daddr.s6_addr32[i]) & + pkt_dev->max_in6_daddr.s6_addr32[i]); + } + } } - if (pkt_dev->min_pkt_size < pkt_dev->max_pkt_size) { - __u32 t; - if (pkt_dev->flags & F_TXSIZE_RND) { - t = ((pktgen_random() % (pkt_dev->max_pkt_size - pkt_dev->min_pkt_size)) - + pkt_dev->min_pkt_size); - } - else { + if (pkt_dev->min_pkt_size < pkt_dev->max_pkt_size) { + __u32 t; + if (pkt_dev->flags & F_TXSIZE_RND) { + t = ((pktgen_random() % + (pkt_dev->max_pkt_size - pkt_dev->min_pkt_size)) + + pkt_dev->min_pkt_size); + } else { t = pkt_dev->cur_pkt_size + 1; - if (t > pkt_dev->max_pkt_size) + if (t > pkt_dev->max_pkt_size) t = pkt_dev->min_pkt_size; - } - pkt_dev->cur_pkt_size = t; - } + } + pkt_dev->cur_pkt_size = t; + } pkt_dev->flows[flow].count++; } - -static struct sk_buff *fill_packet_ipv4(struct net_device *odev, - struct pktgen_dev *pkt_dev) +static struct sk_buff *fill_packet_ipv4(struct net_device *odev, + struct pktgen_dev *pkt_dev) { struct sk_buff *skb = NULL; __u8 *eth; struct udphdr *udph; int datalen, iplen; struct iphdr *iph; - struct pktgen_hdr *pgh = NULL; - + struct pktgen_hdr *pgh = NULL; + /* Update any of the values, used when we're incrementing various * fields. */ @@ -1875,47 +1998,47 @@ static struct sk_buff *fill_packet_ipv4( udph = (struct udphdr *)skb_put(skb, sizeof(struct udphdr)); memcpy(eth, pkt_dev->hh, 12); - *(u16*)ð[12] = __constant_htons(ETH_P_IP); + *(u16 *) & eth[12] = __constant_htons(ETH_P_IP); - datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8; /* Eth + IPh + UDPh */ - if (datalen < sizeof(struct pktgen_hdr)) + datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8; /* Eth + IPh + UDPh */ + if (datalen < sizeof(struct pktgen_hdr)) datalen = sizeof(struct pktgen_hdr); - + udph->source = htons(pkt_dev->cur_udp_src); udph->dest = htons(pkt_dev->cur_udp_dst); - udph->len = htons(datalen + 8); /* DATA + udphdr */ - udph->check = 0; /* No checksum */ + udph->len = htons(datalen + 8); /* DATA + udphdr */ + udph->check = 0; /* No checksum */ iph->ihl = 5; iph->version = 4; iph->ttl = 32; iph->tos = 0; - iph->protocol = IPPROTO_UDP; /* UDP */ + iph->protocol = IPPROTO_UDP; /* UDP */ iph->saddr = pkt_dev->cur_saddr; iph->daddr = pkt_dev->cur_daddr; iph->frag_off = 0; iplen = 20 + 8 + datalen; iph->tot_len = htons(iplen); iph->check = 0; - iph->check = ip_fast_csum((void *) iph, iph->ihl); + iph->check = ip_fast_csum((void *)iph, iph->ihl); skb->protocol = __constant_htons(ETH_P_IP); - skb->mac.raw = ((u8 *)iph) - 14; + skb->mac.raw = ((u8 *) iph) - 14; skb->dev = odev; skb->pkt_type = PACKET_HOST; - if (pkt_dev->nfrags <= 0) - pgh = (struct pktgen_hdr *)skb_put(skb, datalen); + if (pkt_dev->nfrags <= 0) + pgh = (struct pktgen_hdr *)skb_put(skb, datalen); else { int frags = pkt_dev->nfrags; int i; - pgh = (struct pktgen_hdr*)(((char*)(udph)) + 8); - + pgh = (struct pktgen_hdr *)(((char *)(udph)) + 8); + if (frags > MAX_SKB_FRAGS) frags = MAX_SKB_FRAGS; - if (datalen > frags*PAGE_SIZE) { - skb_put(skb, datalen-frags*PAGE_SIZE); - datalen = frags*PAGE_SIZE; + if (datalen > frags * PAGE_SIZE) { + skb_put(skb, datalen - frags * PAGE_SIZE); + datalen = frags * PAGE_SIZE; } i = 0; @@ -1924,7 +2047,7 @@ static struct sk_buff *fill_packet_ipv4( skb_shinfo(skb)->frags[i].page = page; skb_shinfo(skb)->frags[i].page_offset = 0; skb_shinfo(skb)->frags[i].size = - (datalen < PAGE_SIZE ? datalen : PAGE_SIZE); + (datalen < PAGE_SIZE ? datalen : PAGE_SIZE); datalen -= skb_shinfo(skb)->frags[i].size; skb->len += skb_shinfo(skb)->frags[i].size; skb->data_len += skb_shinfo(skb)->frags[i].size; @@ -1944,30 +2067,33 @@ static struct sk_buff *fill_packet_ipv4( skb_shinfo(skb)->frags[i - 1].size -= rem; - skb_shinfo(skb)->frags[i] = skb_shinfo(skb)->frags[i - 1]; + skb_shinfo(skb)->frags[i] = + skb_shinfo(skb)->frags[i - 1]; get_page(skb_shinfo(skb)->frags[i].page); - skb_shinfo(skb)->frags[i].page = skb_shinfo(skb)->frags[i - 1].page; - skb_shinfo(skb)->frags[i].page_offset += skb_shinfo(skb)->frags[i - 1].size; + skb_shinfo(skb)->frags[i].page = + skb_shinfo(skb)->frags[i - 1].page; + skb_shinfo(skb)->frags[i].page_offset += + skb_shinfo(skb)->frags[i - 1].size; skb_shinfo(skb)->frags[i].size = rem; i++; skb_shinfo(skb)->nr_frags = i; } } - /* Stamp the time, and sequence number, convert them to network byte order */ + /* Stamp the time, and sequence number, convert them to network byte order */ + + if (pgh) { + struct timeval timestamp; + + pgh->pgh_magic = htonl(PKTGEN_MAGIC); + pgh->seq_num = htonl(pkt_dev->seq_num); + + do_gettimeofday(×tamp); + pgh->tv_sec = htonl(timestamp.tv_sec); + pgh->tv_usec = htonl(timestamp.tv_usec); + } + pkt_dev->seq_num++; - if (pgh) { - struct timeval timestamp; - - pgh->pgh_magic = htonl(PKTGEN_MAGIC); - pgh->seq_num = htonl(pkt_dev->seq_num); - - do_gettimeofday(×tamp); - pgh->tv_sec = htonl(timestamp.tv_sec); - pgh->tv_usec = htonl(timestamp.tv_usec); - } - pkt_dev->seq_num++; - return skb; } @@ -1980,23 +2106,24 @@ static struct sk_buff *fill_packet_ipv4( * --ro */ -static unsigned int scan_ip6(const char *s,char ip[16]) +static unsigned int scan_ip6(const char *s, char ip[16]) { unsigned int i; - unsigned int len=0; + unsigned int len = 0; unsigned long u; char suffix[16]; - unsigned int prefixlen=0; - unsigned int suffixlen=0; + unsigned int prefixlen = 0; + unsigned int suffixlen = 0; __u32 tmp; - for (i=0; i<16; i++) ip[i]=0; + for (i = 0; i < 16; i++) + ip[i] = 0; for (;;) { if (*s == ':') { len++; - if (s[1] == ':') { /* Found "::", skip to part 2 */ - s+=2; + if (s[1] == ':') { /* Found "::", skip to part 2 */ + s += 2; len++; break; } @@ -2004,129 +2131,149 @@ static unsigned int scan_ip6(const char } { char *tmp; - u=simple_strtoul(s,&tmp,16); - i=tmp-s; + u = simple_strtoul(s, &tmp, 16); + i = tmp - s; } - if (!i) return 0; - if (prefixlen==12 && s[i]=='.') { + if (!i) + return 0; + if (prefixlen == 12 && s[i] == '.') { /* the last 4 bytes may be written as IPv4 address */ tmp = in_aton(s); - memcpy((struct in_addr*)(ip+12), &tmp, sizeof(tmp)); - return i+len; + memcpy((struct in_addr *)(ip + 12), &tmp, sizeof(tmp)); + return i + len; } ip[prefixlen++] = (u >> 8); ip[prefixlen++] = (u & 255); - s += i; len += i; - if (prefixlen==16) + s += i; + len += i; + if (prefixlen == 16) return len; } /* part 2, after "::" */ for (;;) { if (*s == ':') { - if (suffixlen==0) + if (suffixlen == 0) break; s++; len++; - } else if (suffixlen!=0) + } else if (suffixlen != 0) break; { char *tmp; - u=simple_strtol(s,&tmp,16); - i=tmp-s; + u = simple_strtol(s, &tmp, 16); + i = tmp - s; } if (!i) { - if (*s) len--; + if (*s) + len--; break; } - if (suffixlen+prefixlen<=12 && s[i]=='.') { + if (suffixlen + prefixlen <= 12 && s[i] == '.') { tmp = in_aton(s); - memcpy((struct in_addr*)(suffix+suffixlen), &tmp, sizeof(tmp)); - suffixlen+=4; - len+=strlen(s); + memcpy((struct in_addr *)(suffix + suffixlen), &tmp, + sizeof(tmp)); + suffixlen += 4; + len += strlen(s); break; } suffix[suffixlen++] = (u >> 8); suffix[suffixlen++] = (u & 255); - s += i; len += i; - if (prefixlen+suffixlen==16) + s += i; + len += i; + if (prefixlen + suffixlen == 16) break; } - for (i=0; i9?hexdigit+'a'-10:hexdigit+'0'; +static char tohex(char hexdigit) +{ + return hexdigit > 9 ? hexdigit + 'a' - 10 : hexdigit + '0'; } -static int fmt_xlong(char* s,unsigned int i) { - char* bak=s; - *s=tohex((i>>12)&0xf); if (s!=bak || *s!='0') ++s; - *s=tohex((i>>8)&0xf); if (s!=bak || *s!='0') ++s; - *s=tohex((i>>4)&0xf); if (s!=bak || *s!='0') ++s; - *s=tohex(i&0xf); - return s-bak+1; +static int fmt_xlong(char *s, unsigned int i) +{ + char *bak = s; + *s = tohex((i >> 12) & 0xf); + if (s != bak || *s != '0') + ++s; + *s = tohex((i >> 8) & 0xf); + if (s != bak || *s != '0') + ++s; + *s = tohex((i >> 4) & 0xf); + if (s != bak || *s != '0') + ++s; + *s = tohex(i & 0xf); + return s - bak + 1; } -static unsigned int fmt_ip6(char *s,const char ip[16]) { +static unsigned int fmt_ip6(char *s, const char ip[16]) +{ unsigned int len; unsigned int i; unsigned int temp; unsigned int compressing; int j; - len = 0; compressing = 0; - for (j=0; j<16; j+=2) { + len = 0; + compressing = 0; + for (j = 0; j < 16; j += 2) { #ifdef V4MAPPEDPREFIX - if (j==12 && !memcmp(ip,V4mappedprefix,12)) { - inet_ntoa_r(*(struct in_addr*)(ip+12),s); - temp=strlen(s); - return len+temp; + if (j == 12 && !memcmp(ip, V4mappedprefix, 12)) { + inet_ntoa_r(*(struct in_addr *)(ip + 12), s); + temp = strlen(s); + return len + temp; } #endif - temp = ((unsigned long) (unsigned char) ip[j] << 8) + - (unsigned long) (unsigned char) ip[j+1]; + temp = ((unsigned long)(unsigned char)ip[j] << 8) + + (unsigned long)(unsigned char)ip[j + 1]; if (temp == 0) { if (!compressing) { - compressing=1; - if (j==0) { - *s++=':'; ++len; + compressing = 1; + if (j == 0) { + *s++ = ':'; + ++len; } } } else { if (compressing) { - compressing=0; - *s++=':'; ++len; + compressing = 0; + *s++ = ':'; + ++len; } - i = fmt_xlong(s,temp); len += i; s += i; - if (j<14) { + i = fmt_xlong(s, temp); + len += i; + s += i; + if (j < 14) { *s++ = ':'; ++len; } } } if (compressing) { - *s++=':'; ++len; + *s++ = ':'; + ++len; } - *s=0; + *s = 0; return len; } -static struct sk_buff *fill_packet_ipv6(struct net_device *odev, - struct pktgen_dev *pkt_dev) +static struct sk_buff *fill_packet_ipv6(struct net_device *odev, + struct pktgen_dev *pkt_dev) { struct sk_buff *skb = NULL; __u8 *eth; struct udphdr *udph; int datalen; struct ipv6hdr *iph; - struct pktgen_hdr *pgh = NULL; + struct pktgen_hdr *pgh = NULL; /* Update any of the values, used when we're incrementing various * fields. @@ -2147,23 +2294,23 @@ static struct sk_buff *fill_packet_ipv6( udph = (struct udphdr *)skb_put(skb, sizeof(struct udphdr)); memcpy(eth, pkt_dev->hh, 12); - *(u16*)ð[12] = __constant_htons(ETH_P_IPV6); + *(u16 *) & eth[12] = __constant_htons(ETH_P_IPV6); - datalen = pkt_dev->cur_pkt_size-14- - sizeof(struct ipv6hdr)-sizeof(struct udphdr); /* Eth + IPh + UDPh */ + datalen = pkt_dev->cur_pkt_size - 14 - sizeof(struct ipv6hdr) - sizeof(struct udphdr); /* Eth + IPh + UDPh */ - if (datalen < sizeof(struct pktgen_hdr)) { + if (datalen < sizeof(struct pktgen_hdr)) { datalen = sizeof(struct pktgen_hdr); if (net_ratelimit()) - printk(KERN_INFO "pktgen: increased datalen to %d\n", datalen); + printk(KERN_INFO "pktgen: increased datalen to %d\n", + datalen); } udph->source = htons(pkt_dev->cur_udp_src); udph->dest = htons(pkt_dev->cur_udp_dst); - udph->len = htons(datalen + sizeof(struct udphdr)); - udph->check = 0; /* No checksum */ + udph->len = htons(datalen + sizeof(struct udphdr)); + udph->check = 0; /* No checksum */ - *(u32*)iph = __constant_htonl(0x60000000); /* Version + flow */ + *(u32 *) iph = __constant_htonl(0x60000000); /* Version + flow */ iph->hop_limit = 32; @@ -2173,24 +2320,24 @@ static struct sk_buff *fill_packet_ipv6( ipv6_addr_copy(&iph->daddr, &pkt_dev->cur_in6_daddr); ipv6_addr_copy(&iph->saddr, &pkt_dev->cur_in6_saddr); - skb->mac.raw = ((u8 *)iph) - 14; + skb->mac.raw = ((u8 *) iph) - 14; skb->protocol = __constant_htons(ETH_P_IPV6); skb->dev = odev; skb->pkt_type = PACKET_HOST; - if (pkt_dev->nfrags <= 0) - pgh = (struct pktgen_hdr *)skb_put(skb, datalen); + if (pkt_dev->nfrags <= 0) + pgh = (struct pktgen_hdr *)skb_put(skb, datalen); else { int frags = pkt_dev->nfrags; int i; - pgh = (struct pktgen_hdr*)(((char*)(udph)) + 8); - + pgh = (struct pktgen_hdr *)(((char *)(udph)) + 8); + if (frags > MAX_SKB_FRAGS) frags = MAX_SKB_FRAGS; - if (datalen > frags*PAGE_SIZE) { - skb_put(skb, datalen-frags*PAGE_SIZE); - datalen = frags*PAGE_SIZE; + if (datalen > frags * PAGE_SIZE) { + skb_put(skb, datalen - frags * PAGE_SIZE); + datalen = frags * PAGE_SIZE; } i = 0; @@ -2199,7 +2346,7 @@ static struct sk_buff *fill_packet_ipv6( skb_shinfo(skb)->frags[i].page = page; skb_shinfo(skb)->frags[i].page_offset = 0; skb_shinfo(skb)->frags[i].size = - (datalen < PAGE_SIZE ? datalen : PAGE_SIZE); + (datalen < PAGE_SIZE ? datalen : PAGE_SIZE); datalen -= skb_shinfo(skb)->frags[i].size; skb->len += skb_shinfo(skb)->frags[i].size; skb->data_len += skb_shinfo(skb)->frags[i].size; @@ -2219,305 +2366,333 @@ static struct sk_buff *fill_packet_ipv6( skb_shinfo(skb)->frags[i - 1].size -= rem; - skb_shinfo(skb)->frags[i] = skb_shinfo(skb)->frags[i - 1]; + skb_shinfo(skb)->frags[i] = + skb_shinfo(skb)->frags[i - 1]; get_page(skb_shinfo(skb)->frags[i].page); - skb_shinfo(skb)->frags[i].page = skb_shinfo(skb)->frags[i - 1].page; - skb_shinfo(skb)->frags[i].page_offset += skb_shinfo(skb)->frags[i - 1].size; + skb_shinfo(skb)->frags[i].page = + skb_shinfo(skb)->frags[i - 1].page; + skb_shinfo(skb)->frags[i].page_offset += + skb_shinfo(skb)->frags[i - 1].size; skb_shinfo(skb)->frags[i].size = rem; i++; skb_shinfo(skb)->nr_frags = i; } } - /* Stamp the time, and sequence number, convert them to network byte order */ + /* Stamp the time, and sequence number, convert them to network byte order */ /* should we update cloned packets too ? */ - if (pgh) { - struct timeval timestamp; - - pgh->pgh_magic = htonl(PKTGEN_MAGIC); - pgh->seq_num = htonl(pkt_dev->seq_num); - - do_gettimeofday(×tamp); - pgh->tv_sec = htonl(timestamp.tv_sec); - pgh->tv_usec = htonl(timestamp.tv_usec); - } - pkt_dev->seq_num++; - + if (pgh) { + struct timeval timestamp; + + pgh->pgh_magic = htonl(PKTGEN_MAGIC); + pgh->seq_num = htonl(pkt_dev->seq_num); + + do_gettimeofday(×tamp); + pgh->tv_sec = htonl(timestamp.tv_sec); + pgh->tv_usec = htonl(timestamp.tv_usec); + } + pkt_dev->seq_num++; + return skb; } -static inline struct sk_buff *fill_packet(struct net_device *odev, - struct pktgen_dev *pkt_dev) +static inline struct sk_buff *fill_packet(struct net_device *odev, + struct pktgen_dev *pkt_dev) { - if(pkt_dev->flags & F_IPV6) + if (pkt_dev->flags & F_IPV6) return fill_packet_ipv6(odev, pkt_dev); else return fill_packet_ipv4(odev, pkt_dev); } -static void pktgen_clear_counters(struct pktgen_dev *pkt_dev) +static void pktgen_clear_counters(struct pktgen_dev *pkt_dev) { - pkt_dev->seq_num = 1; - pkt_dev->idle_acc = 0; + pkt_dev->seq_num = 1; + pkt_dev->idle_acc = 0; pkt_dev->sofar = 0; - pkt_dev->tx_bytes = 0; - pkt_dev->errors = 0; + pkt_dev->tx_bytes = 0; + pkt_dev->errors = 0; } /* Set up structure for sending pkts, clear counters */ static void pktgen_run(struct pktgen_thread *t) { - struct pktgen_dev *pkt_dev = NULL; + struct pktgen_dev *pkt_dev; int started = 0; PG_DEBUG(printk("pktgen: entering pktgen_run. %p\n", t)); if_lock(t); - for (pkt_dev = t->if_list; pkt_dev; pkt_dev = pkt_dev->next ) { + list_for_each_entry(pkt_dev, &t->if_list, list) { /* * setup odev and create initial packet. */ pktgen_setup_inject(pkt_dev); - if(pkt_dev->odev) { + if (pkt_dev->odev) { pktgen_clear_counters(pkt_dev); - pkt_dev->running = 1; /* Cranke yeself! */ + pkt_dev->running = 1; /* Cranke yeself! */ pkt_dev->skb = NULL; pkt_dev->started_at = getCurUs(); - pkt_dev->next_tx_us = getCurUs(); /* Transmit immediately */ + pkt_dev->next_tx_us = getCurUs(); /* Transmit immediately */ pkt_dev->next_tx_ns = 0; - + strcpy(pkt_dev->result, "Starting"); started++; - } - else + } else strcpy(pkt_dev->result, "Error starting"); } if_unlock(t); - if(started) t->control &= ~(T_STOP); + if (started) + t->control &= ~(T_STOP); } static void pktgen_stop_all_threads_ifs(void) { - struct pktgen_thread *t = pktgen_threads; + struct pktgen_thread *t; - PG_DEBUG(printk("pktgen: entering pktgen_stop_all_threads.\n")); + PG_DEBUG(printk("pktgen: entering pktgen_stop_all_threads_ifs.\n")); - thread_lock(); - while(t) { - pktgen_stop(t); - t = t->next; - } - thread_unlock(); + mutex_lock(&pktgen_thread_lock); + + list_for_each_entry(t, &pktgen_threads, th_list) + t->control |= T_STOP; + + mutex_unlock(&pktgen_thread_lock); } -static int thread_is_running(struct pktgen_thread *t ) +static int thread_is_running(struct pktgen_thread *t) { - struct pktgen_dev *next; - int res = 0; + struct pktgen_dev *pkt_dev; + int res = 0; - for(next=t->if_list; next; next=next->next) { - if(next->running) { + list_for_each_entry(pkt_dev, &t->if_list, list) + if (pkt_dev->running) { res = 1; break; } - } - return res; + return res; } -static int pktgen_wait_thread_run(struct pktgen_thread *t ) +static int pktgen_wait_thread_run(struct pktgen_thread *t) { - if_lock(t); + if_lock(t); - while(thread_is_running(t)) { + while (thread_is_running(t)) { - if_unlock(t); + if_unlock(t); - msleep_interruptible(100); + msleep_interruptible(100); - if (signal_pending(current)) - goto signal; - if_lock(t); - } - if_unlock(t); - return 1; - signal: - return 0; + if (signal_pending(current)) + goto signal; + if_lock(t); + } + if_unlock(t); + return 1; +signal: + return 0; } static int pktgen_wait_all_threads_run(void) { - struct pktgen_thread *t = pktgen_threads; + struct pktgen_thread *t; int sig = 1; - - while (t) { + + mutex_lock(&pktgen_thread_lock); + + list_for_each_entry(t, &pktgen_threads, th_list) { sig = pktgen_wait_thread_run(t); - if( sig == 0 ) break; - thread_lock(); - t=t->next; - thread_unlock(); - } - if(sig == 0) { - thread_lock(); - while (t) { - t->control |= (T_STOP); - t=t->next; - } - thread_unlock(); + if (sig == 0) + break; } + + if (sig == 0) + list_for_each_entry(t, &pktgen_threads, th_list) + t->control |= (T_STOP); + + mutex_unlock(&pktgen_thread_lock); return sig; } static void pktgen_run_all_threads(void) { - struct pktgen_thread *t = pktgen_threads; + struct pktgen_thread *t; PG_DEBUG(printk("pktgen: entering pktgen_run_all_threads.\n")); - thread_lock(); + mutex_lock(&pktgen_thread_lock); - while(t) { + list_for_each_entry(t, &pktgen_threads, th_list) t->control |= (T_RUN); - t = t->next; - } - thread_unlock(); - schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Propagate thread->control */ - + mutex_unlock(&pktgen_thread_lock); + + schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Propagate thread->control */ + pktgen_wait_all_threads_run(); } - static void show_results(struct pktgen_dev *pkt_dev, int nr_frags) { - __u64 total_us, bps, mbps, pps, idle; - char *p = pkt_dev->result; + __u64 total_us, bps, mbps, pps, idle; + char *p = pkt_dev->result; + + total_us = pkt_dev->stopped_at - pkt_dev->started_at; + + idle = pkt_dev->idle_acc; - total_us = pkt_dev->stopped_at - pkt_dev->started_at; + p += sprintf(p, "OK: %llu(c%llu+d%llu) usec, %llu (%dbyte,%dfrags)\n", + (unsigned long long)total_us, + (unsigned long long)(total_us - idle), + (unsigned long long)idle, + (unsigned long long)pkt_dev->sofar, + pkt_dev->cur_pkt_size, nr_frags); - idle = pkt_dev->idle_acc; + pps = pkt_dev->sofar * USEC_PER_SEC; - p += sprintf(p, "OK: %llu(c%llu+d%llu) usec, %llu (%dbyte,%dfrags)\n", - (unsigned long long) total_us, - (unsigned long long)(total_us - idle), - (unsigned long long) idle, - (unsigned long long) pkt_dev->sofar, - pkt_dev->cur_pkt_size, nr_frags); - - pps = pkt_dev->sofar * USEC_PER_SEC; - - while ((total_us >> 32) != 0) { - pps >>= 1; - total_us >>= 1; - } - - do_div(pps, total_us); - - bps = pps * 8 * pkt_dev->cur_pkt_size; - - mbps = bps; - do_div(mbps, 1000000); - p += sprintf(p, " %llupps %lluMb/sec (%llubps) errors: %llu", - (unsigned long long) pps, - (unsigned long long) mbps, - (unsigned long long) bps, - (unsigned long long) pkt_dev->errors); + while ((total_us >> 32) != 0) { + pps >>= 1; + total_us >>= 1; + } + + do_div(pps, total_us); + + bps = pps * 8 * pkt_dev->cur_pkt_size; + + mbps = bps; + do_div(mbps, 1000000); + p += sprintf(p, " %llupps %lluMb/sec (%llubps) errors: %llu", + (unsigned long long)pps, + (unsigned long long)mbps, + (unsigned long long)bps, + (unsigned long long)pkt_dev->errors); } - /* Set stopped-at timer, remove from running list, do counters & statistics */ -static int pktgen_stop_device(struct pktgen_dev *pkt_dev) +static int pktgen_stop_device(struct pktgen_dev *pkt_dev) { - - if (!pkt_dev->running) { - printk("pktgen: interface: %s is already stopped\n", pkt_dev->ifname); - return -EINVAL; - } + int nr_frags = pkt_dev->skb ? skb_shinfo(pkt_dev->skb)->nr_frags : -1; - pkt_dev->stopped_at = getCurUs(); - pkt_dev->running = 0; + if (!pkt_dev->running) { + printk("pktgen: interface: %s is already stopped\n", + pkt_dev->ifname); + return -EINVAL; + } - show_results(pkt_dev, skb_shinfo(pkt_dev->skb)->nr_frags); + pkt_dev->stopped_at = getCurUs(); + pkt_dev->running = 0; - if (pkt_dev->skb) - kfree_skb(pkt_dev->skb); + show_results(pkt_dev, nr_frags); - pkt_dev->skb = NULL; - - return 0; + return 0; } -static struct pktgen_dev *next_to_run(struct pktgen_thread *t ) +static struct pktgen_dev *next_to_run(struct pktgen_thread *t) { - struct pktgen_dev *next, *best = NULL; - + struct pktgen_dev *pkt_dev, *best = NULL; + if_lock(t); - for(next=t->if_list; next ; next=next->next) { - if(!next->running) continue; - if(best == NULL) best=next; - else if ( next->next_tx_us < best->next_tx_us) - best = next; + list_for_each_entry(pkt_dev, &t->if_list, list) { + if (!pkt_dev->running) + continue; + if (best == NULL) + best = pkt_dev; + else if (pkt_dev->next_tx_us < best->next_tx_us) + best = pkt_dev; } if_unlock(t); - return best; + return best; } -static void pktgen_stop(struct pktgen_thread *t) { - struct pktgen_dev *next = NULL; +static void pktgen_stop(struct pktgen_thread *t) +{ + struct pktgen_dev *pkt_dev; + + PG_DEBUG(printk("pktgen: entering pktgen_stop\n")); - PG_DEBUG(printk("pktgen: entering pktgen_stop.\n")); + if_lock(t); - if_lock(t); + list_for_each_entry(pkt_dev, &t->if_list, list) { + pktgen_stop_device(pkt_dev); + if (pkt_dev->skb) + kfree_skb(pkt_dev->skb); - for(next=t->if_list; next; next=next->next) - pktgen_stop_device(next); + pkt_dev->skb = NULL; + } - if_unlock(t); + if_unlock(t); } -static void pktgen_rem_all_ifs(struct pktgen_thread *t) +/* + * one of our devices needs to be removed - find it + * and remove it + */ +static void pktgen_rem_one_if(struct pktgen_thread *t) { - struct pktgen_dev *cur, *next = NULL; - - /* Remove all devices, free mem */ - - if_lock(t); + struct list_head *q, *n; + struct pktgen_dev *cur; + + PG_DEBUG(printk("pktgen: entering pktgen_rem_one_if\n")); + + if_lock(t); + + list_for_each_safe(q, n, &t->if_list) { + cur = list_entry(q, struct pktgen_dev, list); + + if (!cur->removal_mark) + continue; + + if (cur->skb) + kfree_skb(cur->skb); + cur->skb = NULL; - for(cur=t->if_list; cur; cur=next) { - next = cur->next; pktgen_remove_device(t, cur); + + break; } - if_unlock(t); + if_unlock(t); } -static void pktgen_rem_thread(struct pktgen_thread *t) +static void pktgen_rem_all_ifs(struct pktgen_thread *t) { - /* Remove from the thread list */ + struct list_head *q, *n; + struct pktgen_dev *cur; - struct pktgen_thread *tmp = pktgen_threads; + /* Remove all devices, free mem */ - remove_proc_entry(t->name, pg_proc_dir); + PG_DEBUG(printk("pktgen: entering pktgen_rem_all_ifs\n")); + if_lock(t); - thread_lock(); + list_for_each_safe(q, n, &t->if_list) { + cur = list_entry(q, struct pktgen_dev, list); - if (tmp == t) - pktgen_threads = tmp->next; - else { - while (tmp) { - if (tmp->next == t) { - tmp->next = t->next; - t->next = NULL; - break; - } - tmp = tmp->next; - } + if (cur->skb) + kfree_skb(cur->skb); + cur->skb = NULL; + + pktgen_remove_device(t, cur); } - thread_unlock(); + + if_unlock(t); +} + +static void pktgen_rem_thread(struct pktgen_thread *t) +{ + /* Remove from the thread list */ + + remove_proc_entry(t->name, pg_proc_dir); + + mutex_lock(&pktgen_thread_lock); + + list_del(&t->th_list); + + mutex_unlock(&pktgen_thread_lock); } static __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev) @@ -2527,7 +2702,7 @@ static __inline__ void pktgen_xmit(struc int ret; odev = pkt_dev->odev; - + if (pkt_dev->delay_us || pkt_dev->delay_ns) { u64 now; @@ -2544,67 +2719,71 @@ static __inline__ void pktgen_xmit(struc goto out; } } - + if (netif_queue_stopped(odev) || need_resched()) { idle_start = getCurUs(); - + if (!netif_running(odev)) { pktgen_stop_device(pkt_dev); + if (pkt_dev->skb) + kfree_skb(pkt_dev->skb); + pkt_dev->skb = NULL; goto out; } - if (need_resched()) + if (need_resched()) schedule(); - + pkt_dev->idle_acc += getCurUs() - idle_start; - + if (netif_queue_stopped(odev)) { - pkt_dev->next_tx_us = getCurUs(); /* TODO */ + pkt_dev->next_tx_us = getCurUs(); /* TODO */ pkt_dev->next_tx_ns = 0; - goto out; /* Try the next interface */ + goto out; /* Try the next interface */ } } - + if (pkt_dev->last_ok || !pkt_dev->skb) { - if ((++pkt_dev->clone_count >= pkt_dev->clone_skb ) || (!pkt_dev->skb)) { + if ((++pkt_dev->clone_count >= pkt_dev->clone_skb) + || (!pkt_dev->skb)) { /* build a new pkt */ - if (pkt_dev->skb) + if (pkt_dev->skb) kfree_skb(pkt_dev->skb); - + pkt_dev->skb = fill_packet(odev, pkt_dev); if (pkt_dev->skb == NULL) { printk("pktgen: ERROR: couldn't allocate skb in fill_packet.\n"); schedule(); - pkt_dev->clone_count--; /* back out increment, OOM */ + pkt_dev->clone_count--; /* back out increment, OOM */ goto out; } pkt_dev->allocated_skbs++; - pkt_dev->clone_count = 0; /* reset counter */ + pkt_dev->clone_count = 0; /* reset counter */ } } - + spin_lock_bh(&odev->xmit_lock); if (!netif_queue_stopped(odev)) { atomic_inc(&(pkt_dev->skb->users)); -retry_now: + retry_now: ret = odev->hard_start_xmit(pkt_dev->skb, odev); if (likely(ret == NETDEV_TX_OK)) { - pkt_dev->last_ok = 1; + pkt_dev->last_ok = 1; pkt_dev->sofar++; pkt_dev->seq_num++; pkt_dev->tx_bytes += pkt_dev->cur_pkt_size; - - } else if (ret == NETDEV_TX_LOCKED + + } else if (ret == NETDEV_TX_LOCKED && (odev->features & NETIF_F_LLTX)) { cpu_relax(); goto retry_now; - } else { /* Retry it next time */ - + } else { /* Retry it next time */ + atomic_dec(&(pkt_dev->skb->users)); - + if (debug && net_ratelimit()) printk(KERN_INFO "pktgen: Hard xmit error\n"); - + pkt_dev->errors++; pkt_dev->last_ok = 0; } @@ -2619,16 +2798,16 @@ retry_now: pkt_dev->next_tx_us++; pkt_dev->next_tx_ns -= 1000; } - } + } - else { /* Retry it next time */ - pkt_dev->last_ok = 0; - pkt_dev->next_tx_us = getCurUs(); /* TODO */ + else { /* Retry it next time */ + pkt_dev->last_ok = 0; + pkt_dev->next_tx_us = getCurUs(); /* TODO */ pkt_dev->next_tx_ns = 0; - } + } spin_unlock_bh(&odev->xmit_lock); - + /* If pkt_dev->count is zero, then run forever */ if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) { if (atomic_read(&(pkt_dev->skb->users)) != 1) { @@ -2641,72 +2820,74 @@ retry_now: } pkt_dev->idle_acc += getCurUs() - idle_start; } - + /* Done with this */ pktgen_stop_device(pkt_dev); - } - out:; - } + if (pkt_dev->skb) + kfree_skb(pkt_dev->skb); + pkt_dev->skb = NULL; + } +out:; +} /* * Main loop of the thread goes here */ -static void pktgen_thread_worker(struct pktgen_thread *t) +static void pktgen_thread_worker(struct pktgen_thread *t) { DEFINE_WAIT(wait); - struct pktgen_dev *pkt_dev = NULL; + struct pktgen_dev *pkt_dev = NULL; int cpu = t->cpu; sigset_t tmpsig; u32 max_before_softirq; - u32 tx_since_softirq = 0; + u32 tx_since_softirq = 0; daemonize("pktgen/%d", cpu); - /* Block all signals except SIGKILL, SIGSTOP and SIGTERM */ + /* Block all signals except SIGKILL, SIGSTOP and SIGTERM */ - spin_lock_irq(¤t->sighand->siglock); - tmpsig = current->blocked; - siginitsetinv(¤t->blocked, - sigmask(SIGKILL) | - sigmask(SIGSTOP)| - sigmask(SIGTERM)); + spin_lock_irq(¤t->sighand->siglock); + tmpsig = current->blocked; + siginitsetinv(¤t->blocked, + sigmask(SIGKILL) | sigmask(SIGSTOP) | sigmask(SIGTERM)); - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); + recalc_sigpending(); + spin_unlock_irq(¤t->sighand->siglock); /* Migrate to the right CPU */ set_cpus_allowed(current, cpumask_of_cpu(cpu)); - if (smp_processor_id() != cpu) - BUG(); + if (smp_processor_id() != cpu) + BUG(); init_waitqueue_head(&t->queue); t->control &= ~(T_TERMINATE); t->control &= ~(T_RUN); t->control &= ~(T_STOP); + t->control &= ~(T_REMDEVALL); t->control &= ~(T_REMDEV); - t->pid = current->pid; + t->pid = current->pid; - PG_DEBUG(printk("pktgen: starting pktgen/%d: pid=%d\n", cpu, current->pid)); + PG_DEBUG(printk("pktgen: starting pktgen/%d: pid=%d\n", cpu, current->pid)); max_before_softirq = t->max_before_softirq; - - __set_current_state(TASK_INTERRUPTIBLE); - mb(); - while (1) { - + __set_current_state(TASK_INTERRUPTIBLE); + mb(); + + while (1) { + __set_current_state(TASK_RUNNING); /* * Get next dev to xmit -- if any. */ - pkt_dev = next_to_run(t); - - if (pkt_dev) { + pkt_dev = next_to_run(t); + + if (pkt_dev) { pktgen_xmit(pkt_dev); @@ -2724,115 +2905,125 @@ static void pktgen_thread_worker(struct } } else { prepare_to_wait(&(t->queue), &wait, TASK_INTERRUPTIBLE); - schedule_timeout(HZ/10); + schedule_timeout(HZ / 10); finish_wait(&(t->queue), &wait); } - /* + /* * Back from sleep, either due to the timeout or signal. * We check if we have any "posted" work for us. */ - if (t->control & T_TERMINATE || signal_pending(current)) - /* we received a request to terminate ourself */ - break; - + if (t->control & T_TERMINATE || signal_pending(current)) + /* we received a request to terminate ourself */ + break; - if(t->control & T_STOP) { + if (t->control & T_STOP) { pktgen_stop(t); t->control &= ~(T_STOP); } - if(t->control & T_RUN) { + if (t->control & T_RUN) { pktgen_run(t); t->control &= ~(T_RUN); } - if(t->control & T_REMDEV) { + if (t->control & T_REMDEVALL) { pktgen_rem_all_ifs(t); + t->control &= ~(T_REMDEVALL); + } + + if (t->control & T_REMDEV) { + pktgen_rem_one_if(t); t->control &= ~(T_REMDEV); } - if (need_resched()) + if (need_resched()) schedule(); - } + } - PG_DEBUG(printk("pktgen: %s stopping all device\n", t->name)); - pktgen_stop(t); + PG_DEBUG(printk("pktgen: %s stopping all device\n", t->name)); + pktgen_stop(t); - PG_DEBUG(printk("pktgen: %s removing all device\n", t->name)); - pktgen_rem_all_ifs(t); + PG_DEBUG(printk("pktgen: %s removing all device\n", t->name)); + pktgen_rem_all_ifs(t); - PG_DEBUG(printk("pktgen: %s removing thread.\n", t->name)); - pktgen_rem_thread(t); + PG_DEBUG(printk("pktgen: %s removing thread.\n", t->name)); + pktgen_rem_thread(t); + + t->removed = 1; } -static struct pktgen_dev *pktgen_find_dev(struct pktgen_thread *t, const char* ifname) +static struct pktgen_dev *pktgen_find_dev(struct pktgen_thread *t, + const char *ifname) { - struct pktgen_dev *pkt_dev = NULL; - if_lock(t); + struct pktgen_dev *p, *pkt_dev = NULL; + if_lock(t); - for(pkt_dev=t->if_list; pkt_dev; pkt_dev = pkt_dev->next ) { - if (strncmp(pkt_dev->ifname, ifname, IFNAMSIZ) == 0) { - break; - } - } + list_for_each_entry(p, &t->if_list, list) + if (strncmp(p->ifname, ifname, IFNAMSIZ) == 0) { + pkt_dev = p; + break; + } - if_unlock(t); - PG_DEBUG(printk("pktgen: find_dev(%s) returning %p\n", ifname,pkt_dev)); - return pkt_dev; + if_unlock(t); + PG_DEBUG(printk("pktgen: find_dev(%s) returning %p\n", ifname, pkt_dev)); + return pkt_dev; } /* * Adds a dev at front of if_list. */ -static int add_dev_to_thread(struct pktgen_thread *t, struct pktgen_dev *pkt_dev) +static int add_dev_to_thread(struct pktgen_thread *t, + struct pktgen_dev *pkt_dev) { int rv = 0; - - if_lock(t); - if (pkt_dev->pg_thread) { - printk("pktgen: ERROR: already assigned to a thread.\n"); - rv = -EBUSY; - goto out; - } - pkt_dev->next =t->if_list; t->if_list=pkt_dev; - pkt_dev->pg_thread = t; + if_lock(t); + + if (pkt_dev->pg_thread) { + printk("pktgen: ERROR: already assigned to a thread.\n"); + rv = -EBUSY; + goto out; + } + + list_add(&pkt_dev->list, &t->if_list); + pkt_dev->pg_thread = t; pkt_dev->running = 0; - out: - if_unlock(t); - return rv; +out: + if_unlock(t); + return rv; } /* Called under thread lock */ -static int pktgen_add_device(struct pktgen_thread *t, const char* ifname) +static int pktgen_add_device(struct pktgen_thread *t, const char *ifname) { - struct pktgen_dev *pkt_dev; + struct pktgen_dev *pkt_dev; struct proc_dir_entry *pe; - + /* We don't allow a device to be on several threads */ pkt_dev = __pktgen_NN_threads(ifname, FIND); if (pkt_dev) { - printk("pktgen: ERROR: interface already used.\n"); - return -EBUSY; - } + printk("pktgen: ERROR: interface already used.\n"); + return -EBUSY; + } pkt_dev = kzalloc(sizeof(struct pktgen_dev), GFP_KERNEL); if (!pkt_dev) return -ENOMEM; - pkt_dev->flows = vmalloc(MAX_CFLOWS*sizeof(struct flow_state)); + pkt_dev->flows = vmalloc(MAX_CFLOWS * sizeof(struct flow_state)); if (pkt_dev->flows == NULL) { kfree(pkt_dev); return -ENOMEM; } - memset(pkt_dev->flows, 0, MAX_CFLOWS*sizeof(struct flow_state)); + memset(pkt_dev->flows, 0, MAX_CFLOWS * sizeof(struct flow_state)); + pkt_dev->removal_mark = 0; pkt_dev->min_pkt_size = ETH_ZLEN; pkt_dev->max_pkt_size = ETH_ZLEN; pkt_dev->nfrags = 0; @@ -2841,14 +3032,14 @@ static int pktgen_add_device(struct pktg pkt_dev->delay_ns = pg_delay_d % 1000; pkt_dev->count = pg_count_d; pkt_dev->sofar = 0; - pkt_dev->udp_src_min = 9; /* sink port */ + pkt_dev->udp_src_min = 9; /* sink port */ pkt_dev->udp_src_max = 9; pkt_dev->udp_dst_min = 9; pkt_dev->udp_dst_max = 9; strncpy(pkt_dev->ifname, ifname, IFNAMSIZ); - if (! pktgen_setup_dev(pkt_dev)) { + if (!pktgen_setup_dev(pkt_dev)) { printk("pktgen: ERROR: pktgen_setup_dev failed.\n"); if (pkt_dev->flows) vfree(pkt_dev->flows); @@ -2871,65 +3062,74 @@ static int pktgen_add_device(struct pktg return add_dev_to_thread(t, pkt_dev); } -static struct pktgen_thread * __init pktgen_find_thread(const char* name) +static struct pktgen_thread *__init pktgen_find_thread(const char *name) { - struct pktgen_thread *t = NULL; + struct pktgen_thread *t; - thread_lock(); + mutex_lock(&pktgen_thread_lock); - t = pktgen_threads; - while (t) { - if (strcmp(t->name, name) == 0) - break; + list_for_each_entry(t, &pktgen_threads, th_list) + if (strcmp(t->name, name) == 0) { + mutex_unlock(&pktgen_thread_lock); + return t; + } - t = t->next; - } - thread_unlock(); - return t; + mutex_unlock(&pktgen_thread_lock); + return NULL; } -static int __init pktgen_create_thread(const char* name, int cpu) +static int __init pktgen_create_thread(const char *name, int cpu) { - struct pktgen_thread *t = NULL; + int err; + struct pktgen_thread *t = NULL; struct proc_dir_entry *pe; - if (strlen(name) > 31) { - printk("pktgen: ERROR: Thread name cannot be more than 31 characters.\n"); - return -EINVAL; - } - - if (pktgen_find_thread(name)) { - printk("pktgen: ERROR: thread: %s already exists\n", name); - return -EINVAL; - } - - t = kzalloc(sizeof(struct pktgen_thread), GFP_KERNEL); - if (!t) { - printk("pktgen: ERROR: out of memory, can't create new thread.\n"); - return -ENOMEM; - } + if (strlen(name) > 31) { + printk("pktgen: ERROR: Thread name cannot be more than 31 characters.\n"); + return -EINVAL; + } - strcpy(t->name, name); - spin_lock_init(&t->if_lock); + if (pktgen_find_thread(name)) { + printk("pktgen: ERROR: thread: %s already exists\n", name); + return -EINVAL; + } + + t = kzalloc(sizeof(struct pktgen_thread), GFP_KERNEL); + if (!t) { + printk("pktgen: ERROR: out of memory, can't create new thread.\n"); + return -ENOMEM; + } + + strcpy(t->name, name); + spin_lock_init(&t->if_lock); t->cpu = cpu; - - pe = create_proc_entry(t->name, 0600, pg_proc_dir); - if (!pe) { - printk("pktgen: cannot create %s/%s procfs entry.\n", + + pe = create_proc_entry(t->name, 0600, pg_proc_dir); + if (!pe) { + printk("pktgen: cannot create %s/%s procfs entry.\n", PG_PROC_DIR, t->name); - kfree(t); - return -EINVAL; - } + kfree(t); + return -EINVAL; + } pe->proc_fops = &pktgen_thread_fops; pe->data = t; - t->next = pktgen_threads; - pktgen_threads = t; + INIT_LIST_HEAD(&t->if_list); + + list_add_tail(&t->th_list, &pktgen_threads); - if (kernel_thread((void *) pktgen_thread_worker, (void *) t, - CLONE_FS | CLONE_FILES | CLONE_SIGHAND) < 0) + t->removed = 0; + + err = kernel_thread((void *)pktgen_thread_worker, (void *)t, + CLONE_FS | CLONE_FILES | CLONE_SIGHAND); + if (err < 0) { printk("pktgen: kernel_thread() failed for cpu %d\n", t->cpu); + remove_proc_entry(t->name, pg_proc_dir); + list_del(&t->th_list); + kfree(t); + return err; + } return 0; } @@ -2937,55 +3137,52 @@ static int __init pktgen_create_thread(c /* * Removes a device from the thread if_list. */ -static void _rem_dev_from_if_list(struct pktgen_thread *t, struct pktgen_dev *pkt_dev) +static void _rem_dev_from_if_list(struct pktgen_thread *t, + struct pktgen_dev *pkt_dev) { - struct pktgen_dev *i, *prev = NULL; - - i = t->if_list; + struct list_head *q, *n; + struct pktgen_dev *p; - while(i) { - if(i == pkt_dev) { - if(prev) prev->next = i->next; - else t->if_list = NULL; - break; - } - prev = i; - i=i->next; + list_for_each_safe(q, n, &t->if_list) { + p = list_entry(q, struct pktgen_dev, list); + if (p == pkt_dev) + list_del(&p->list); } } -static int pktgen_remove_device(struct pktgen_thread *t, struct pktgen_dev *pkt_dev) +static int pktgen_remove_device(struct pktgen_thread *t, + struct pktgen_dev *pkt_dev) { PG_DEBUG(printk("pktgen: remove_device pkt_dev=%p\n", pkt_dev)); - if (pkt_dev->running) { - printk("pktgen:WARNING: trying to remove a running interface, stopping it now.\n"); - pktgen_stop_device(pkt_dev); - } - - /* Dis-associate from the interface */ + if (pkt_dev->running) { + printk("pktgen:WARNING: trying to remove a running interface, stopping it now.\n"); + pktgen_stop_device(pkt_dev); + } + + /* Dis-associate from the interface */ if (pkt_dev->odev) { dev_put(pkt_dev->odev); - pkt_dev->odev = NULL; - } - + pkt_dev->odev = NULL; + } + /* And update the thread if_list */ _rem_dev_from_if_list(t, pkt_dev); - /* Clean up proc file system */ + /* Clean up proc file system */ remove_proc_entry(pkt_dev->ifname, pg_proc_dir); if (pkt_dev->flows) vfree(pkt_dev->flows); kfree(pkt_dev); - return 0; + return 0; } -static int __init pg_init(void) +static int __init pg_init(void) { int cpu; struct proc_dir_entry *pe; @@ -2998,50 +3195,65 @@ static int __init pg_init(void) pg_proc_dir->owner = THIS_MODULE; pe = create_proc_entry(PGCTRL, 0600, pg_proc_dir); - if (pe == NULL) { - printk("pktgen: ERROR: cannot create %s procfs entry.\n", PGCTRL); + if (pe == NULL) { + printk("pktgen: ERROR: cannot create %s procfs entry.\n", + PGCTRL); proc_net_remove(PG_PROC_DIR); - return -EINVAL; - } + return -EINVAL; + } - pe->proc_fops = &pktgen_fops; - pe->data = NULL; + pe->proc_fops = &pktgen_fops; + pe->data = NULL; /* Register us to receive netdevice events */ register_netdevice_notifier(&pktgen_notifier_block); - + for_each_online_cpu(cpu) { + int err; char buf[30]; - sprintf(buf, "kpktgend_%i", cpu); - pktgen_create_thread(buf, cpu); - } - return 0; + sprintf(buf, "kpktgend_%i", cpu); + err = pktgen_create_thread(buf, cpu); + if (err) + printk("pktgen: WARNING: Cannot create thread for cpu %d (%d)\n", + cpu, err); + } + + if (list_empty(&pktgen_threads)) { + printk("pktgen: ERROR: Initialization failed for all threads\n"); + unregister_netdevice_notifier(&pktgen_notifier_block); + remove_proc_entry(PGCTRL, pg_proc_dir); + proc_net_remove(PG_PROC_DIR); + return -ENODEV; + } + + return 0; } static void __exit pg_cleanup(void) { + struct pktgen_thread *t; + struct list_head *q, *n; wait_queue_head_t queue; init_waitqueue_head(&queue); - /* Stop all interfaces & threads */ + /* Stop all interfaces & threads */ - while (pktgen_threads) { - struct pktgen_thread *t = pktgen_threads; - pktgen_threads->control |= (T_TERMINATE); + list_for_each_safe(q, n, &pktgen_threads) { + t = list_entry(q, struct pktgen_thread, th_list); + t->control |= (T_TERMINATE); - wait_event_interruptible_timeout(queue, (t != pktgen_threads), HZ); - } + wait_event_interruptible_timeout(queue, (t->removed == 1), HZ); + } - /* Un-register us from receiving netdevice events */ + /* Un-register us from receiving netdevice events */ unregister_netdevice_notifier(&pktgen_notifier_block); - /* Clean up proc file system */ + /* Clean up proc file system */ remove_proc_entry(PGCTRL, pg_proc_dir); proc_net_remove(PG_PROC_DIR); } - module_init(pg_init); module_exit(pg_cleanup); diff -puN net/core/rtnetlink.c~git-net net/core/rtnetlink.c --- devel/net/core/rtnetlink.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/rtnetlink.c 2006-03-17 23:03:48.000000000 -0800 @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -51,25 +52,31 @@ #include #include -DECLARE_MUTEX(rtnl_sem); +static DEFINE_MUTEX(rtnl_mutex); void rtnl_lock(void) { - rtnl_shlock(); + mutex_lock(&rtnl_mutex); } -int rtnl_lock_interruptible(void) +void __rtnl_unlock(void) { - return down_interruptible(&rtnl_sem); + mutex_unlock(&rtnl_mutex); } - + void rtnl_unlock(void) { - rtnl_shunlock(); - + mutex_unlock(&rtnl_mutex); + if (rtnl && rtnl->sk_receive_queue.qlen) + rtnl->sk_data_ready(rtnl, 0); netdev_run_todo(); } +int rtnl_trylock(void) +{ + return mutex_trylock(&rtnl_mutex); +} + int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, int len) { memset(tb, 0, sizeof(struct rtattr*)*maxattr); @@ -179,6 +186,33 @@ rtattr_failure: } +static void set_operstate(struct net_device *dev, unsigned char transition) +{ + unsigned char operstate = dev->operstate; + + switch(transition) { + case IF_OPER_UP: + if ((operstate == IF_OPER_DORMANT || + operstate == IF_OPER_UNKNOWN) && + !netif_dormant(dev)) + operstate = IF_OPER_UP; + break; + + case IF_OPER_DORMANT: + if (operstate == IF_OPER_UP || + operstate == IF_OPER_UNKNOWN) + operstate = IF_OPER_DORMANT; + break; + }; + + if (dev->operstate != operstate) { + write_lock_bh(&dev_base_lock); + dev->operstate = operstate; + write_unlock_bh(&dev_base_lock); + netdev_state_change(dev); + } +} + static int rtnetlink_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, int type, u32 pid, u32 seq, u32 change, unsigned int flags) @@ -209,6 +243,13 @@ static int rtnetlink_fill_ifinfo(struct } if (1) { + u8 operstate = netif_running(dev)?dev->operstate:IF_OPER_DOWN; + u8 link_mode = dev->link_mode; + RTA_PUT(skb, IFLA_OPERSTATE, sizeof(operstate), &operstate); + RTA_PUT(skb, IFLA_LINKMODE, sizeof(link_mode), &link_mode); + } + + if (1) { struct rtnl_link_ifmap map = { .mem_start = dev->mem_start, .mem_end = dev->mem_end, @@ -399,6 +440,22 @@ static int do_setlink(struct sk_buff *sk dev->weight = *((u32 *) RTA_DATA(ida[IFLA_WEIGHT - 1])); } + if (ida[IFLA_OPERSTATE - 1]) { + if (ida[IFLA_OPERSTATE - 1]->rta_len != RTA_LENGTH(sizeof(u8))) + goto out; + + set_operstate(dev, *((u8 *) RTA_DATA(ida[IFLA_OPERSTATE - 1]))); + } + + if (ida[IFLA_LINKMODE - 1]) { + if (ida[IFLA_LINKMODE - 1]->rta_len != RTA_LENGTH(sizeof(u8))) + goto out; + + write_lock_bh(&dev_base_lock); + dev->link_mode = *((u8 *) RTA_DATA(ida[IFLA_LINKMODE - 1])); + write_unlock_bh(&dev_base_lock); + } + if (ifm->ifi_index >= 0 && ida[IFLA_IFNAME - 1]) { char ifname[IFNAMSIZ]; @@ -575,9 +632,9 @@ static void rtnetlink_rcv(struct sock *s unsigned int qlen = 0; do { - rtnl_lock(); + mutex_lock(&rtnl_mutex); netlink_run_queue(sk, &qlen, &rtnetlink_rcv_msg); - up(&rtnl_sem); + mutex_unlock(&rtnl_mutex); netdev_run_todo(); } while (qlen); @@ -654,6 +711,5 @@ EXPORT_SYMBOL(rtnetlink_links); EXPORT_SYMBOL(rtnetlink_put_metrics); EXPORT_SYMBOL(rtnl); EXPORT_SYMBOL(rtnl_lock); -EXPORT_SYMBOL(rtnl_lock_interruptible); -EXPORT_SYMBOL(rtnl_sem); +EXPORT_SYMBOL(rtnl_trylock); EXPORT_SYMBOL(rtnl_unlock); diff -puN net/core/scm.c~git-net net/core/scm.c --- devel/net/core/scm.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/scm.c 2006-03-17 23:03:48.000000000 -0800 @@ -284,8 +284,57 @@ struct scm_fp_list *scm_fp_dup(struct sc return new_fpl; } +int scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm) +{ + struct task_struct *p = current; + scm->creds = (struct ucred) { + .uid = p->uid, + .gid = p->gid, + .pid = p->tgid + }; + scm->fp = NULL; + scm->sid = security_sk_sid(sock->sk, NULL, 0); + scm->seq = 0; + if (msg->msg_controllen <= 0) + return 0; + return __scm_send(sock, msg, scm); +} + +void scm_recv(struct socket *sock, struct msghdr *msg, + struct scm_cookie *scm, int flags) +{ + char *scontext; + int scontext_len, err; + + if (!msg->msg_control) { + if (test_bit(SOCK_PASSCRED, &sock->flags) || scm->fp) + msg->msg_flags |= MSG_CTRUNC; + scm_destroy(scm); + return; + } + + if (test_bit(SOCK_PASSCRED, &sock->flags)) + put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, + sizeof(scm->creds), &scm->creds); + + if (test_bit(SOCK_PASSSEC, &sock->flags)) { + err = security_sid_to_context(scm->sid, &scontext, + &scontext_len); + if (!err) + put_cmsg(msg, SOL_SOCKET, SCM_SECURITY, + scontext_len, scontext); + } + + if (!scm->fp) + return; + + scm_detach_fds(msg, scm); +} + EXPORT_SYMBOL(__scm_destroy); EXPORT_SYMBOL(__scm_send); +EXPORT_SYMBOL(scm_send); +EXPORT_SYMBOL(scm_recv); EXPORT_SYMBOL(put_cmsg); EXPORT_SYMBOL(scm_detach_fds); EXPORT_SYMBOL(scm_fp_dup); diff -puN net/core/skbuff.c~git-net net/core/skbuff.c --- devel/net/core/skbuff.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/skbuff.c 2006-03-17 23:03:48.000000000 -0800 @@ -356,6 +356,24 @@ void __kfree_skb(struct sk_buff *skb) } /** + * kfree_skb - free an sk_buff + * @skb: buffer to free + * + * Drop a reference to the buffer and free it if the usage count has + * hit zero. + */ +void kfree_skb(struct sk_buff *skb) +{ + if (unlikely(!skb)) + return; + if (likely(atomic_read(&skb->users) == 1)) + smp_rmb(); + else if (likely(!atomic_dec_and_test(&skb->users))) + return; + __kfree_skb(skb); +} + +/** * skb_clone - duplicate an sk_buff * @skb: buffer to clone * @gfp_mask: allocation priority @@ -1777,6 +1795,29 @@ int skb_append_datato_frags(struct sock return 0; } +/** + * skb_pull_rcsum - pull skb and update receive checksum + * @skb: buffer to update + * @start: start of data before pull + * @len: length of data pulled + * + * This function performs an skb_pull on the packet and updates + * update the CHECKSUM_HW checksum. It should be used on receive + * path processing instead of skb_pull unless you know that the + * checksum difference is zero (e.g., a valid IP header) or you + * are setting ip_summed to CHECKSUM_NONE. + */ +unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len) +{ + BUG_ON(len > skb->len); + skb->len -= len; + BUG_ON(skb->len < skb->data_len); + skb_postpull_rcsum(skb, skb->data, len); + return skb->data += len; +} + +EXPORT_SYMBOL_GPL(skb_pull_rcsum); + void __init skb_init(void) { skbuff_head_cache = kmem_cache_create("skbuff_head_cache", @@ -1799,6 +1840,7 @@ void __init skb_init(void) EXPORT_SYMBOL(___pskb_trim); EXPORT_SYMBOL(__kfree_skb); +EXPORT_SYMBOL(kfree_skb); EXPORT_SYMBOL(__pskb_pull_tail); EXPORT_SYMBOL(__alloc_skb); EXPORT_SYMBOL(pskb_copy); diff -puN net/core/sock.c~git-net net/core/sock.c --- devel/net/core/sock.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/sock.c 2006-03-17 23:03:48.000000000 -0800 @@ -457,6 +457,13 @@ set_rcvbuf: ret = -ENONET; break; + case SO_PASSSEC: + if (valbool) + set_bit(SOCK_PASSSEC, &sock->flags); + else + clear_bit(SOCK_PASSSEC, &sock->flags); + break; + /* We implement the SO_SNDLOWAT etc to not be settable (1003.1g 5.3) */ default: @@ -615,8 +622,12 @@ int sock_getsockopt(struct socket *sock, v.val = sk->sk_state == TCP_LISTEN; break; + case SO_PASSSEC: + v.val = test_bit(SOCK_PASSSEC, &sock->flags) ? 1 : 0; + break; + case SO_PEERSEC: - return security_socket_getpeersec(sock, optval, optlen, len); + return security_socket_getpeersec_stream(sock, optval, optlen, len); default: return(-ENOPROTOOPT); @@ -1385,6 +1396,20 @@ int sock_common_getsockopt(struct socket EXPORT_SYMBOL(sock_common_getsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt != NULL) + return sk->sk_prot->compat_getsockopt(sk, level, optname, + optval, optlen); + return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_getsockopt); +#endif + int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size, int flags) { @@ -1414,6 +1439,20 @@ int sock_common_setsockopt(struct socket EXPORT_SYMBOL(sock_common_setsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_setsockopt(struct socket *sock, int level, int optname, + char __user *optval, int optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt != NULL) + return sk->sk_prot->compat_setsockopt(sk, level, optname, + optval, optlen); + return sk->sk_prot->setsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_setsockopt); +#endif + void sk_common_release(struct sock *sk) { if (sk->sk_prot->destroy) diff -puN net/core/sysctl_net_core.c~git-net net/core/sysctl_net_core.c --- devel/net/core/sysctl_net_core.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/core/sysctl_net_core.c 2006-03-17 23:03:48.000000000 -0800 @@ -26,6 +26,11 @@ extern int sysctl_core_destroy_delay; extern char sysctl_divert_version[]; #endif /* CONFIG_NET_DIVERT */ +#ifdef CONFIG_XFRM +extern u32 sysctl_xfrm_aevent_etime; +extern u32 sysctl_xfrm_aevent_rseqth; +#endif + ctl_table core_table[] = { #ifdef CONFIG_NET { @@ -111,6 +116,24 @@ ctl_table core_table[] = { .proc_handler = &proc_dostring }, #endif /* CONFIG_NET_DIVERT */ +#ifdef CONFIG_XFRM + { + .ctl_name = NET_CORE_AEVENT_ETIME, + .procname = "xfrm_aevent_etime", + .data = &sysctl_xfrm_aevent_etime, + .maxlen = sizeof(u32), + .mode = 0644, + .proc_handler = &proc_dointvec + }, + { + .ctl_name = NET_CORE_AEVENT_RSEQTH, + .procname = "xfrm_aevent_rseqth", + .data = &sysctl_xfrm_aevent_rseqth, + .maxlen = sizeof(u32), + .mode = 0644, + .proc_handler = &proc_dointvec + }, +#endif /* CONFIG_XFRM */ #endif /* CONFIG_NET */ { .ctl_name = NET_CORE_SOMAXCONN, diff -puN net/dccp/ackvec.c~git-net net/dccp/ackvec.c --- devel/net/dccp/ackvec.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ackvec.c 2006-03-17 23:03:48.000000000 -0800 @@ -13,36 +13,83 @@ #include "dccp.h" #include +#include +#include +#include #include +#include #include +static kmem_cache_t *dccp_ackvec_slab; +static kmem_cache_t *dccp_ackvec_record_slab; + +static struct dccp_ackvec_record *dccp_ackvec_record_new(void) +{ + struct dccp_ackvec_record *avr = + kmem_cache_alloc(dccp_ackvec_record_slab, GFP_ATOMIC); + + if (avr != NULL) + INIT_LIST_HEAD(&avr->dccpavr_node); + + return avr; +} + +static void dccp_ackvec_record_delete(struct dccp_ackvec_record *avr) +{ + if (unlikely(avr == NULL)) + return; + /* Check if deleting a linked record */ + WARN_ON(!list_empty(&avr->dccpavr_node)); + kmem_cache_free(dccp_ackvec_record_slab, avr); +} + +static void dccp_ackvec_insert_avr(struct dccp_ackvec *av, + struct dccp_ackvec_record *avr) +{ + /* + * AVRs are sorted by seqno. Since we are sending them in order, we + * just add the AVR at the head of the list. + * -sorbo. + */ + if (!list_empty(&av->dccpav_records)) { + const struct dccp_ackvec_record *head = + list_entry(av->dccpav_records.next, + struct dccp_ackvec_record, + dccpavr_node); + BUG_ON(before48(avr->dccpavr_ack_seqno, + head->dccpavr_ack_seqno)); + } + + list_add(&avr->dccpavr_node, &av->dccpav_records); +} + int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); +#ifdef CONFIG_IP_DCCP_DEBUG + const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? + "CLIENT tx: " : "server tx: "; +#endif struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec; int len = av->dccpav_vec_len + 2; struct timeval now; u32 elapsed_time; unsigned char *to, *from; + struct dccp_ackvec_record *avr; + + if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) + return -1; dccp_timestamp(sk, &now); elapsed_time = timeval_delta(&now, &av->dccpav_time) / 10; - if (elapsed_time != 0) - dccp_insert_option_elapsed_time(sk, skb, elapsed_time); - - if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) + if (elapsed_time != 0 && + dccp_insert_option_elapsed_time(sk, skb, elapsed_time)) return -1; - /* - * XXX: now we have just one ack vector sent record, so - * we have to wait for it to be cleared. - * - * Of course this is not acceptable, but this is just for - * basic testing now. - */ - if (av->dccpav_ack_seqno != DCCP_MAX_SEQNO + 1) + avr = dccp_ackvec_record_new(); + if (avr == NULL) return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len; @@ -55,8 +102,8 @@ int dccp_insert_option_ackvec(struct soc from = av->dccpav_buf + av->dccpav_buf_head; /* Check if buf_head wraps */ - if ((int)av->dccpav_buf_head + len > av->dccpav_vec_len) { - const u32 tailsize = av->dccpav_vec_len - av->dccpav_buf_head; + if ((int)av->dccpav_buf_head + len > DCCP_MAX_ACKVEC_LEN) { + const u32 tailsize = DCCP_MAX_ACKVEC_LEN - av->dccpav_buf_head; memcpy(to, from, tailsize); to += tailsize; @@ -73,45 +120,37 @@ int dccp_insert_option_ackvec(struct soc * sequence number it used for the ack packet; ack_ptr will equal * buf_head; ack_ackno will equal buf_ackno; and ack_nonce will * equal buf_nonce. - * - * This implemention uses just one ack record for now. */ - av->dccpav_ack_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - av->dccpav_ack_ptr = av->dccpav_buf_head; - av->dccpav_ack_ackno = av->dccpav_buf_ackno; - av->dccpav_ack_nonce = av->dccpav_buf_nonce; - av->dccpav_sent_len = av->dccpav_vec_len; + avr->dccpavr_ack_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + avr->dccpavr_ack_ptr = av->dccpav_buf_head; + avr->dccpavr_ack_ackno = av->dccpav_buf_ackno; + avr->dccpavr_ack_nonce = av->dccpav_buf_nonce; + avr->dccpavr_sent_len = av->dccpav_vec_len; + + dccp_ackvec_insert_avr(av, avr); dccp_pr_debug("%sACK Vector 0, len=%d, ack_seqno=%llu, " "ack_ackno=%llu\n", - debug_prefix, av->dccpav_sent_len, - (unsigned long long)av->dccpav_ack_seqno, - (unsigned long long)av->dccpav_ack_ackno); - return -1; + debug_prefix, avr->dccpavr_sent_len, + (unsigned long long)avr->dccpavr_ack_seqno, + (unsigned long long)avr->dccpavr_ack_ackno); + return 0; } -struct dccp_ackvec *dccp_ackvec_alloc(const unsigned int len, - const gfp_t priority) +struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority) { - struct dccp_ackvec *av; - - BUG_ON(len == 0); - - if (len > DCCP_MAX_ACKVEC_LEN) - return NULL; + struct dccp_ackvec *av = kmem_cache_alloc(dccp_ackvec_slab, priority); - av = kmalloc(sizeof(*av) + len, priority); if (av != NULL) { - av->dccpav_buf_len = len; av->dccpav_buf_head = - av->dccpav_buf_tail = av->dccpav_buf_len - 1; - av->dccpav_buf_ackno = - av->dccpav_ack_ackno = av->dccpav_ack_seqno = ~0LLU; + av->dccpav_buf_tail = DCCP_MAX_ACKVEC_LEN - 1; + av->dccpav_buf_ackno = DCCP_MAX_SEQNO + 1; av->dccpav_buf_nonce = av->dccpav_buf_nonce = 0; av->dccpav_ack_ptr = 0; av->dccpav_time.tv_sec = 0; av->dccpav_time.tv_usec = 0; av->dccpav_sent_len = av->dccpav_vec_len = 0; + INIT_LIST_HEAD(&av->dccpav_records); } return av; @@ -119,7 +158,20 @@ struct dccp_ackvec *dccp_ackvec_alloc(co void dccp_ackvec_free(struct dccp_ackvec *av) { - kfree(av); + if (unlikely(av == NULL)) + return; + + if (!list_empty(&av->dccpav_records)) { + struct dccp_ackvec_record *avr, *next; + + list_for_each_entry_safe(avr, next, &av->dccpav_records, + dccpavr_node) { + list_del_init(&avr->dccpavr_node); + dccp_ackvec_record_delete(avr); + } + } + + kmem_cache_free(dccp_ackvec_slab, av); } static inline u8 dccp_ackvec_state(const struct dccp_ackvec *av, @@ -146,7 +198,7 @@ static inline int dccp_ackvec_set_buf_he unsigned int gap; long new_head; - if (av->dccpav_vec_len + packets > av->dccpav_buf_len) + if (av->dccpav_vec_len + packets > DCCP_MAX_ACKVEC_LEN) return -ENOBUFS; gap = packets - 1; @@ -158,7 +210,7 @@ static inline int dccp_ackvec_set_buf_he gap + new_head + 1); gap = -new_head; } - new_head += av->dccpav_buf_len; + new_head += DCCP_MAX_ACKVEC_LEN; } av->dccpav_buf_head = new_head; @@ -251,7 +303,7 @@ int dccp_ackvec_add(struct dccp_ackvec * goto out_duplicate; delta -= len + 1; - if (++index == av->dccpav_buf_len) + if (++index == DCCP_MAX_ACKVEC_LEN) index = 0; } } @@ -259,7 +311,6 @@ int dccp_ackvec_add(struct dccp_ackvec * av->dccpav_buf_ackno = ackno; dccp_timestamp(sk, &av->dccpav_time); out: - dccp_pr_debug(""); return 0; out_duplicate: @@ -297,44 +348,50 @@ void dccp_ackvec_print(const struct dccp } #endif -static void dccp_ackvec_throw_away_ack_record(struct dccp_ackvec *av) +static void dccp_ackvec_throw_record(struct dccp_ackvec *av, + struct dccp_ackvec_record *avr) { - /* - * As we're keeping track of the ack vector size (dccpav_vec_len) and - * the sent ack vector size (dccpav_sent_len) we don't need - * dccpav_buf_tail at all, but keep this code here as in the future - * we'll implement a vector of ack records, as suggested in - * draft-ietf-dccp-spec-11.txt Appendix A. -acme - */ -#if 0 - u32 new_buf_tail = av->dccpav_ack_ptr + 1; - if (new_buf_tail >= av->dccpav_vec_len) - new_buf_tail -= av->dccpav_vec_len; - av->dccpav_buf_tail = new_buf_tail; -#endif - av->dccpav_vec_len -= av->dccpav_sent_len; + struct dccp_ackvec_record *next; + + av->dccpav_buf_tail = avr->dccpavr_ack_ptr - 1; + if (av->dccpav_buf_tail == 0) + av->dccpav_buf_tail = DCCP_MAX_ACKVEC_LEN - 1; + + av->dccpav_vec_len -= avr->dccpavr_sent_len; + + /* free records */ + list_for_each_entry_safe_from(avr, next, &av->dccpav_records, + dccpavr_node) { + list_del_init(&avr->dccpavr_node); + dccp_ackvec_record_delete(avr); + } } void dccp_ackvec_check_rcv_ackno(struct dccp_ackvec *av, struct sock *sk, const u64 ackno) { - /* Check if we actually sent an ACK vector */ - if (av->dccpav_ack_seqno == DCCP_MAX_SEQNO + 1) - return; + struct dccp_ackvec_record *avr; - if (ackno == av->dccpav_ack_seqno) { + /* + * If we traverse backwards, it should be faster when we have large + * windows. We will be receiving ACKs for stuff we sent a while back + * -sorbo. + */ + list_for_each_entry_reverse(avr, &av->dccpav_records, dccpavr_node) { + if (ackno == avr->dccpavr_ack_seqno) { #ifdef CONFIG_IP_DCCP_DEBUG - struct dccp_sock *dp = dccp_sk(sk); - const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? - "CLIENT rx ack: " : "server rx ack: "; + struct dccp_sock *dp = dccp_sk(sk); + const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? + "CLIENT rx ack: " : "server rx ack: "; #endif - dccp_pr_debug("%sACK packet 0, len=%d, ack_seqno=%llu, " - "ack_ackno=%llu, ACKED!\n", - debug_prefix, 1, - (unsigned long long)av->dccpav_ack_seqno, - (unsigned long long)av->dccpav_ack_ackno); - dccp_ackvec_throw_away_ack_record(av); - av->dccpav_ack_seqno = DCCP_MAX_SEQNO + 1; + dccp_pr_debug("%sACK packet 0, len=%d, ack_seqno=%llu, " + "ack_ackno=%llu, ACKED!\n", + debug_prefix, 1, + (unsigned long long)avr->dccpavr_ack_seqno, + (unsigned long long)avr->dccpavr_ack_ackno); + dccp_ackvec_throw_record(av, avr); + break; + } } } @@ -344,28 +401,20 @@ static void dccp_ackvec_check_rcv_ackvec const unsigned char *vector) { unsigned char i; + struct dccp_ackvec_record *avr; /* Check if we actually sent an ACK vector */ - if (av->dccpav_ack_seqno == DCCP_MAX_SEQNO + 1) - return; - /* - * We're in the receiver half connection, so if the received an ACK - * vector ackno (e.g. 50) before dccpav_ack_seqno (e.g. 52), we're - * not interested. - * - * Extra explanation with example: - * - * if we received an ACK vector with ackno 50, it can only be acking - * 50, 49, 48, etc, not 52 (the seqno for the ACK vector we sent). - */ - /* dccp_pr_debug("is %llu < %llu? ", ackno, av->dccpav_ack_seqno); */ - if (before48(ackno, av->dccpav_ack_seqno)) { - /* dccp_pr_debug_cat("yes\n"); */ + if (list_empty(&av->dccpav_records)) return; - } - /* dccp_pr_debug_cat("no\n"); */ i = len; + /* + * XXX + * I think it might be more efficient to work backwards. See comment on + * rcv_ackno. -sorbo. + */ + avr = list_entry(av->dccpav_records.next, struct dccp_ackvec_record, + dccpavr_node); while (i--) { const u8 rl = *vector & DCCP_ACKVEC_LEN_MASK; u64 ackno_end_rl; @@ -373,14 +422,20 @@ static void dccp_ackvec_check_rcv_ackvec dccp_set_seqno(&ackno_end_rl, ackno - rl); /* - * dccp_pr_debug("is %llu <= %llu <= %llu? ", ackno_end_rl, - * av->dccpav_ack_seqno, ackno); + * If our AVR sequence number is greater than the ack, go + * forward in the AVR list until it is not so. */ - if (between48(av->dccpav_ack_seqno, ackno_end_rl, ackno)) { + list_for_each_entry_from(avr, &av->dccpav_records, + dccpavr_node) { + if (!after48(avr->dccpavr_ack_seqno, ackno)) + goto found; + } + /* End of the dccpav_records list, not found, exit */ + break; +found: + if (between48(avr->dccpavr_ack_seqno, ackno_end_rl, ackno)) { const u8 state = (*vector & DCCP_ACKVEC_STATE_MASK) >> 6; - /* dccp_pr_debug_cat("yes\n"); */ - if (state != DCCP_ACKVEC_STATE_NOT_RECEIVED) { #ifdef CONFIG_IP_DCCP_DEBUG struct dccp_sock *dp = dccp_sk(sk); @@ -393,19 +448,16 @@ static void dccp_ackvec_check_rcv_ackvec "ACKED!\n", debug_prefix, len, (unsigned long long) - av->dccpav_ack_seqno, + avr->dccpavr_ack_seqno, (unsigned long long) - av->dccpav_ack_ackno); - dccp_ackvec_throw_away_ack_record(av); + avr->dccpavr_ack_ackno); + dccp_ackvec_throw_record(av, avr); } /* - * If dccpav_ack_seqno was not received, no problem - * we'll send another ACK vector. + * If it wasn't received, continue scanning... we might + * find another one. */ - av->dccpav_ack_seqno = DCCP_MAX_SEQNO + 1; - break; } - /* dccp_pr_debug_cat("no\n"); */ dccp_set_seqno(&ackno, ackno_end_rl - 1); ++vector; @@ -424,3 +476,43 @@ int dccp_ackvec_parse(struct sock *sk, c len, value); return 0; } + +static char dccp_ackvec_slab_msg[] __initdata = + KERN_CRIT "DCCP: Unable to create ack vectors slab caches\n"; + +int __init dccp_ackvec_init(void) +{ + dccp_ackvec_slab = kmem_cache_create("dccp_ackvec", + sizeof(struct dccp_ackvec), 0, + SLAB_HWCACHE_ALIGN, NULL, NULL); + if (dccp_ackvec_slab == NULL) + goto out_err; + + dccp_ackvec_record_slab = + kmem_cache_create("dccp_ackvec_record", + sizeof(struct dccp_ackvec_record), + 0, SLAB_HWCACHE_ALIGN, NULL, NULL); + if (dccp_ackvec_record_slab == NULL) + goto out_destroy_slab; + + return 0; + +out_destroy_slab: + kmem_cache_destroy(dccp_ackvec_slab); + dccp_ackvec_slab = NULL; +out_err: + printk(dccp_ackvec_slab_msg); + return -ENOBUFS; +} + +void dccp_ackvec_exit(void) +{ + if (dccp_ackvec_slab != NULL) { + kmem_cache_destroy(dccp_ackvec_slab); + dccp_ackvec_slab = NULL; + } + if (dccp_ackvec_record_slab != NULL) { + kmem_cache_destroy(dccp_ackvec_record_slab); + dccp_ackvec_record_slab = NULL; + } +} diff -puN net/dccp/ackvec.h~git-net net/dccp/ackvec.h --- devel/net/dccp/ackvec.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ackvec.h 2006-03-17 23:03:48.000000000 -0800 @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -42,39 +43,57 @@ * Ack Vectors it has recently sent. For each packet sent carrying an * Ack Vector, it remembers four variables: * - * @dccpav_ack_seqno - the Sequence Number used for the packet - * (HC-Receiver seqno) * @dccpav_ack_ptr - the value of buf_head at the time of acknowledgement. - * @dccpav_ack_ackno - the Acknowledgement Number used for the packet - * (HC-Sender seqno) + * @dccpav_records - list of dccp_ackvec_record * @dccpav_ack_nonce - the one-bit sum of the ECN Nonces for all State 0. * - * @dccpav_buf_len - circular buffer length * @dccpav_time - the time in usecs * @dccpav_buf - circular buffer of acknowledgeable packets */ struct dccp_ackvec { u64 dccpav_buf_ackno; - u64 dccpav_ack_seqno; - u64 dccpav_ack_ackno; + struct list_head dccpav_records; struct timeval dccpav_time; u8 dccpav_buf_head; u8 dccpav_buf_tail; u8 dccpav_ack_ptr; u8 dccpav_sent_len; u8 dccpav_vec_len; - u8 dccpav_buf_len; u8 dccpav_buf_nonce; u8 dccpav_ack_nonce; - u8 dccpav_buf[0]; + u8 dccpav_buf[DCCP_MAX_ACKVEC_LEN]; +}; + +/** struct dccp_ackvec_record - ack vector record + * + * ACK vector record as defined in Appendix A of spec. + * + * The list is sorted by dccpavr_ack_seqno + * + * @dccpavr_node - node in dccpav_records + * @dccpavr_ack_seqno - sequence number of the packet this record was sent on + * @dccpavr_ack_ackno - sequence number being acknowledged + * @dccpavr_ack_ptr - pointer into dccpav_buf where this record starts + * @dccpavr_ack_nonce - dccpav_ack_nonce at the time this record was sent + * @dccpavr_sent_len - lenght of the record in dccpav_buf + */ +struct dccp_ackvec_record { + struct list_head dccpavr_node; + u64 dccpavr_ack_seqno; + u64 dccpavr_ack_ackno; + u8 dccpavr_ack_ptr; + u8 dccpavr_ack_nonce; + u8 dccpavr_sent_len; }; struct sock; struct sk_buff; #ifdef CONFIG_IP_DCCP_ACKVEC -extern struct dccp_ackvec *dccp_ackvec_alloc(unsigned int len, - const gfp_t priority); +extern int dccp_ackvec_init(void); +extern void dccp_ackvec_exit(void); + +extern struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority); extern void dccp_ackvec_free(struct dccp_ackvec *av); extern int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock *sk, @@ -92,8 +111,16 @@ static inline int dccp_ackvec_pending(co return av->dccpav_sent_len != av->dccpav_vec_len; } #else /* CONFIG_IP_DCCP_ACKVEC */ -static inline struct dccp_ackvec *dccp_ackvec_alloc(unsigned int len, - const gfp_t priority) +static inline int dccp_ackvec_init(void) +{ + return 0; +} + +static inline void dccp_ackvec_exit(void) +{ +} + +static inline struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority) { return NULL; } diff -puN net/dccp/ccid.c~git-net net/dccp/ccid.c --- devel/net/dccp/ccid.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ccid.c 2006-03-17 23:03:48.000000000 -0800 @@ -13,7 +13,7 @@ #include "ccid.h" -static struct ccid *ccids[CCID_MAX]; +static struct ccid_operations *ccids[CCID_MAX]; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT) static atomic_t ccids_lockct = ATOMIC_INIT(0); static DEFINE_SPINLOCK(ccids_lock); @@ -55,85 +55,202 @@ static inline void ccids_read_unlock(voi #define ccids_read_unlock() do { } while(0) #endif -int ccid_register(struct ccid *ccid) +static kmem_cache_t *ccid_kmem_cache_create(int obj_size, const char *fmt,...) { - int err; + kmem_cache_t *slab; + char slab_name_fmt[32], *slab_name; + va_list args; + + va_start(args, fmt); + vsnprintf(slab_name_fmt, sizeof(slab_name_fmt), fmt, args); + va_end(args); + + slab_name = kstrdup(slab_name_fmt, GFP_KERNEL); + if (slab_name == NULL) + return NULL; + slab = kmem_cache_create(slab_name, sizeof(struct ccid) + obj_size, 0, + SLAB_HWCACHE_ALIGN, NULL, NULL); + if (slab == NULL) + kfree(slab_name); + return slab; +} + +static void ccid_kmem_cache_destroy(kmem_cache_t *slab) +{ + if (slab != NULL) { + const char *name = kmem_cache_name(slab); + + kmem_cache_destroy(slab); + kfree(name); + } +} + +int ccid_register(struct ccid_operations *ccid_ops) +{ + int err = -ENOBUFS; + + ccid_ops->ccid_hc_rx_slab = + ccid_kmem_cache_create(ccid_ops->ccid_hc_rx_obj_size, + "%s_hc_rx_sock", + ccid_ops->ccid_name); + if (ccid_ops->ccid_hc_rx_slab == NULL) + goto out; - if (ccid->ccid_init == NULL) - return -1; + ccid_ops->ccid_hc_tx_slab = + ccid_kmem_cache_create(ccid_ops->ccid_hc_tx_obj_size, + "%s_hc_tx_sock", + ccid_ops->ccid_name); + if (ccid_ops->ccid_hc_tx_slab == NULL) + goto out_free_rx_slab; ccids_write_lock(); err = -EEXIST; - if (ccids[ccid->ccid_id] == NULL) { - ccids[ccid->ccid_id] = ccid; + if (ccids[ccid_ops->ccid_id] == NULL) { + ccids[ccid_ops->ccid_id] = ccid_ops; err = 0; } ccids_write_unlock(); - if (err == 0) - pr_info("CCID: Registered CCID %d (%s)\n", - ccid->ccid_id, ccid->ccid_name); + if (err != 0) + goto out_free_tx_slab; + + pr_info("CCID: Registered CCID %d (%s)\n", + ccid_ops->ccid_id, ccid_ops->ccid_name); +out: return err; +out_free_tx_slab: + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_tx_slab); + ccid_ops->ccid_hc_tx_slab = NULL; + goto out; +out_free_rx_slab: + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_rx_slab); + ccid_ops->ccid_hc_rx_slab = NULL; + goto out; } EXPORT_SYMBOL_GPL(ccid_register); -int ccid_unregister(struct ccid *ccid) +int ccid_unregister(struct ccid_operations *ccid_ops) { ccids_write_lock(); - ccids[ccid->ccid_id] = NULL; + ccids[ccid_ops->ccid_id] = NULL; ccids_write_unlock(); + + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_tx_slab); + ccid_ops->ccid_hc_tx_slab = NULL; + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_rx_slab); + ccid_ops->ccid_hc_rx_slab = NULL; + pr_info("CCID: Unregistered CCID %d (%s)\n", - ccid->ccid_id, ccid->ccid_name); + ccid_ops->ccid_id, ccid_ops->ccid_name); return 0; } EXPORT_SYMBOL_GPL(ccid_unregister); -struct ccid *ccid_init(unsigned char id, struct sock *sk) +struct ccid *ccid_new(unsigned char id, struct sock *sk, int rx, gfp_t gfp) { - struct ccid *ccid; + struct ccid_operations *ccid_ops; + struct ccid *ccid = NULL; + ccids_read_lock(); #ifdef CONFIG_KMOD - if (ccids[id] == NULL) + if (ccids[id] == NULL) { + /* We only try to load if in process context */ + ccids_read_unlock(); + if (gfp & GFP_ATOMIC) + goto out; request_module("net-dccp-ccid-%d", id); + ccids_read_lock(); + } #endif - ccids_read_lock(); + ccid_ops = ccids[id]; + if (ccid_ops == NULL) + goto out_unlock; - ccid = ccids[id]; - if (ccid == NULL) - goto out; + if (!try_module_get(ccid_ops->ccid_owner)) + goto out_unlock; - if (!try_module_get(ccid->ccid_owner)) - goto out_err; + ccids_read_unlock(); - if (ccid->ccid_init(sk) != 0) + ccid = kmem_cache_alloc(rx ? ccid_ops->ccid_hc_rx_slab : + ccid_ops->ccid_hc_tx_slab, gfp); + if (ccid == NULL) goto out_module_put; + ccid->ccid_ops = ccid_ops; + if (rx) { + memset(ccid + 1, 0, ccid_ops->ccid_hc_rx_obj_size); + if (ccid->ccid_ops->ccid_hc_rx_init != NULL && + ccid->ccid_ops->ccid_hc_rx_init(ccid, sk) != 0) + goto out_free_ccid; + } else { + memset(ccid + 1, 0, ccid_ops->ccid_hc_tx_obj_size); + if (ccid->ccid_ops->ccid_hc_tx_init != NULL && + ccid->ccid_ops->ccid_hc_tx_init(ccid, sk) != 0) + goto out_free_ccid; + } out: - ccids_read_unlock(); return ccid; -out_module_put: - module_put(ccid->ccid_owner); -out_err: +out_unlock: + ccids_read_unlock(); + goto out; +out_free_ccid: + kmem_cache_free(rx ? ccid_ops->ccid_hc_rx_slab : + ccid_ops->ccid_hc_tx_slab, ccid); ccid = NULL; +out_module_put: + module_put(ccid_ops->ccid_owner); goto out; } -EXPORT_SYMBOL_GPL(ccid_init); +EXPORT_SYMBOL_GPL(ccid_new); + +struct ccid *ccid_hc_rx_new(unsigned char id, struct sock *sk, gfp_t gfp) +{ + return ccid_new(id, sk, 1, gfp); +} + +EXPORT_SYMBOL_GPL(ccid_hc_rx_new); + +struct ccid *ccid_hc_tx_new(unsigned char id,struct sock *sk, gfp_t gfp) +{ + return ccid_new(id, sk, 0, gfp); +} + +EXPORT_SYMBOL_GPL(ccid_hc_tx_new); -void ccid_exit(struct ccid *ccid, struct sock *sk) +static void ccid_delete(struct ccid *ccid, struct sock *sk, int rx) { + struct ccid_operations *ccid_ops; + if (ccid == NULL) return; + ccid_ops = ccid->ccid_ops; + if (rx) { + if (ccid_ops->ccid_hc_rx_exit != NULL) + ccid_ops->ccid_hc_rx_exit(sk); + kmem_cache_free(ccid_ops->ccid_hc_rx_slab, ccid); + } else { + if (ccid_ops->ccid_hc_tx_exit != NULL) + ccid_ops->ccid_hc_tx_exit(sk); + kmem_cache_free(ccid_ops->ccid_hc_tx_slab, ccid); + } ccids_read_lock(); + if (ccids[ccid_ops->ccid_id] != NULL) + module_put(ccid_ops->ccid_owner); + ccids_read_unlock(); +} - if (ccids[ccid->ccid_id] != NULL) { - if (ccid->ccid_exit != NULL) - ccid->ccid_exit(sk); - module_put(ccid->ccid_owner); - } +void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk) +{ + ccid_delete(ccid, sk, 1); +} - ccids_read_unlock(); +EXPORT_SYMBOL_GPL(ccid_hc_rx_delete); + +void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk) +{ + ccid_delete(ccid, sk, 0); } -EXPORT_SYMBOL_GPL(ccid_exit); +EXPORT_SYMBOL_GPL(ccid_hc_tx_delete); diff -puN net/dccp/ccid.h~git-net net/dccp/ccid.h --- devel/net/dccp/ccid.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ccid.h 2006-03-17 23:03:48.000000000 -0800 @@ -23,14 +23,16 @@ struct tcp_info; -struct ccid { +struct ccid_operations { unsigned char ccid_id; const char *ccid_name; struct module *ccid_owner; - int (*ccid_init)(struct sock *sk); - void (*ccid_exit)(struct sock *sk); - int (*ccid_hc_rx_init)(struct sock *sk); - int (*ccid_hc_tx_init)(struct sock *sk); + kmem_cache_t *ccid_hc_rx_slab; + __u32 ccid_hc_rx_obj_size; + kmem_cache_t *ccid_hc_tx_slab; + __u32 ccid_hc_tx_obj_size; + int (*ccid_hc_rx_init)(struct ccid *ccid, struct sock *sk); + int (*ccid_hc_tx_init)(struct ccid *ccid, struct sock *sk); void (*ccid_hc_rx_exit)(struct sock *sk); void (*ccid_hc_tx_exit)(struct sock *sk); void (*ccid_hc_rx_packet_recv)(struct sock *sk, @@ -39,9 +41,9 @@ struct ccid { unsigned char option, unsigned char len, u16 idx, unsigned char* value); - void (*ccid_hc_rx_insert_options)(struct sock *sk, + int (*ccid_hc_rx_insert_options)(struct sock *sk, struct sk_buff *skb); - void (*ccid_hc_tx_insert_options)(struct sock *sk, + int (*ccid_hc_tx_insert_options)(struct sock *sk, struct sk_buff *skb); void (*ccid_hc_tx_packet_recv)(struct sock *sk, struct sk_buff *skb); @@ -67,75 +69,58 @@ struct ccid { int __user *optlen); }; -extern int ccid_register(struct ccid *ccid); -extern int ccid_unregister(struct ccid *ccid); +extern int ccid_register(struct ccid_operations *ccid_ops); +extern int ccid_unregister(struct ccid_operations *ccid_ops); -extern struct ccid *ccid_init(unsigned char id, struct sock *sk); -extern void ccid_exit(struct ccid *ccid, struct sock *sk); +struct ccid { + struct ccid_operations *ccid_ops; + char ccid_priv[0]; +}; -static inline void __ccid_get(struct ccid *ccid) +static inline void *ccid_priv(const struct ccid *ccid) { - __module_get(ccid->ccid_owner); + return (void *)ccid->ccid_priv; } +extern struct ccid *ccid_new(unsigned char id, struct sock *sk, int rx, + gfp_t gfp); + +extern struct ccid *ccid_hc_rx_new(unsigned char id, struct sock *sk, + gfp_t gfp); +extern struct ccid *ccid_hc_tx_new(unsigned char id, struct sock *sk, + gfp_t gfp); + +extern void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk); +extern void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk); + static inline int ccid_hc_tx_send_packet(struct ccid *ccid, struct sock *sk, struct sk_buff *skb, int len) { int rc = 0; - if (ccid->ccid_hc_tx_send_packet != NULL) - rc = ccid->ccid_hc_tx_send_packet(sk, skb, len); + if (ccid->ccid_ops->ccid_hc_tx_send_packet != NULL) + rc = ccid->ccid_ops->ccid_hc_tx_send_packet(sk, skb, len); return rc; } static inline void ccid_hc_tx_packet_sent(struct ccid *ccid, struct sock *sk, int more, int len) { - if (ccid->ccid_hc_tx_packet_sent != NULL) - ccid->ccid_hc_tx_packet_sent(sk, more, len); + if (ccid->ccid_ops->ccid_hc_tx_packet_sent != NULL) + ccid->ccid_ops->ccid_hc_tx_packet_sent(sk, more, len); } - -static inline int ccid_hc_rx_init(struct ccid *ccid, struct sock *sk) -{ - int rc = 0; - if (ccid->ccid_hc_rx_init != NULL) - rc = ccid->ccid_hc_rx_init(sk); - return rc; -} - -static inline int ccid_hc_tx_init(struct ccid *ccid, struct sock *sk) -{ - int rc = 0; - if (ccid->ccid_hc_tx_init != NULL) - rc = ccid->ccid_hc_tx_init(sk); - return rc; -} - -static inline void ccid_hc_rx_exit(struct ccid *ccid, struct sock *sk) -{ - if (ccid != NULL && ccid->ccid_hc_rx_exit != NULL && - dccp_sk(sk)->dccps_hc_rx_ccid_private != NULL) - ccid->ccid_hc_rx_exit(sk); -} - -static inline void ccid_hc_tx_exit(struct ccid *ccid, struct sock *sk) -{ - if (ccid != NULL && ccid->ccid_hc_tx_exit != NULL && - dccp_sk(sk)->dccps_hc_tx_ccid_private != NULL) - ccid->ccid_hc_tx_exit(sk); -} - + static inline void ccid_hc_rx_packet_recv(struct ccid *ccid, struct sock *sk, struct sk_buff *skb) { - if (ccid->ccid_hc_rx_packet_recv != NULL) - ccid->ccid_hc_rx_packet_recv(sk, skb); + if (ccid->ccid_ops->ccid_hc_rx_packet_recv != NULL) + ccid->ccid_ops->ccid_hc_rx_packet_recv(sk, skb); } static inline void ccid_hc_tx_packet_recv(struct ccid *ccid, struct sock *sk, struct sk_buff *skb) { - if (ccid->ccid_hc_tx_packet_recv != NULL) - ccid->ccid_hc_tx_packet_recv(sk, skb); + if (ccid->ccid_ops->ccid_hc_tx_packet_recv != NULL) + ccid->ccid_ops->ccid_hc_tx_packet_recv(sk, skb); } static inline int ccid_hc_tx_parse_options(struct ccid *ccid, struct sock *sk, @@ -144,8 +129,8 @@ static inline int ccid_hc_tx_parse_optio unsigned char* value) { int rc = 0; - if (ccid->ccid_hc_tx_parse_options != NULL) - rc = ccid->ccid_hc_tx_parse_options(sk, option, len, idx, + if (ccid->ccid_ops->ccid_hc_tx_parse_options != NULL) + rc = ccid->ccid_ops->ccid_hc_tx_parse_options(sk, option, len, idx, value); return rc; } @@ -156,37 +141,39 @@ static inline int ccid_hc_rx_parse_optio unsigned char* value) { int rc = 0; - if (ccid->ccid_hc_rx_parse_options != NULL) - rc = ccid->ccid_hc_rx_parse_options(sk, option, len, idx, value); + if (ccid->ccid_ops->ccid_hc_rx_parse_options != NULL) + rc = ccid->ccid_ops->ccid_hc_rx_parse_options(sk, option, len, idx, value); return rc; } -static inline void ccid_hc_tx_insert_options(struct ccid *ccid, struct sock *sk, - struct sk_buff *skb) +static inline int ccid_hc_tx_insert_options(struct ccid *ccid, struct sock *sk, + struct sk_buff *skb) { - if (ccid->ccid_hc_tx_insert_options != NULL) - ccid->ccid_hc_tx_insert_options(sk, skb); + if (ccid->ccid_ops->ccid_hc_tx_insert_options != NULL) + return ccid->ccid_ops->ccid_hc_tx_insert_options(sk, skb); + return 0; } -static inline void ccid_hc_rx_insert_options(struct ccid *ccid, struct sock *sk, - struct sk_buff *skb) +static inline int ccid_hc_rx_insert_options(struct ccid *ccid, struct sock *sk, + struct sk_buff *skb) { - if (ccid->ccid_hc_rx_insert_options != NULL) - ccid->ccid_hc_rx_insert_options(sk, skb); + if (ccid->ccid_ops->ccid_hc_rx_insert_options != NULL) + return ccid->ccid_ops->ccid_hc_rx_insert_options(sk, skb); + return 0; } static inline void ccid_hc_rx_get_info(struct ccid *ccid, struct sock *sk, struct tcp_info *info) { - if (ccid->ccid_hc_rx_get_info != NULL) - ccid->ccid_hc_rx_get_info(sk, info); + if (ccid->ccid_ops->ccid_hc_rx_get_info != NULL) + ccid->ccid_ops->ccid_hc_rx_get_info(sk, info); } static inline void ccid_hc_tx_get_info(struct ccid *ccid, struct sock *sk, struct tcp_info *info) { - if (ccid->ccid_hc_tx_get_info != NULL) - ccid->ccid_hc_tx_get_info(sk, info); + if (ccid->ccid_ops->ccid_hc_tx_get_info != NULL) + ccid->ccid_ops->ccid_hc_tx_get_info(sk, info); } static inline int ccid_hc_rx_getsockopt(struct ccid *ccid, struct sock *sk, @@ -194,8 +181,8 @@ static inline int ccid_hc_rx_getsockopt( u32 __user *optval, int __user *optlen) { int rc = -ENOPROTOOPT; - if (ccid->ccid_hc_rx_getsockopt != NULL) - rc = ccid->ccid_hc_rx_getsockopt(sk, optname, len, + if (ccid->ccid_ops->ccid_hc_rx_getsockopt != NULL) + rc = ccid->ccid_ops->ccid_hc_rx_getsockopt(sk, optname, len, optval, optlen); return rc; } @@ -205,8 +192,8 @@ static inline int ccid_hc_tx_getsockopt( u32 __user *optval, int __user *optlen) { int rc = -ENOPROTOOPT; - if (ccid->ccid_hc_tx_getsockopt != NULL) - rc = ccid->ccid_hc_tx_getsockopt(sk, optname, len, + if (ccid->ccid_ops->ccid_hc_tx_getsockopt != NULL) + rc = ccid->ccid_ops->ccid_hc_tx_getsockopt(sk, optname, len, optval, optlen); return rc; } diff -puN /dev/null net/dccp/ccids/ccid2.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/ccids/ccid2.c 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,779 @@ +/* + * net/dccp/ccids/ccid2.c + * + * Copyright (c) 2005, 2006 Andrea Bittau + * + * Changes to meet Linux coding standards, and DCCP infrastructure fixes. + * + * Copyright (c) 2006 Arnaldo Carvalho de Melo + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +/* + * This implementation should follow: draft-ietf-dccp-ccid2-10.txt + * + * BUGS: + * - sequence number wrapping + * - jiffies wrapping + */ + +#include +#include "../ccid.h" +#include "../dccp.h" +#include "ccid2.h" + +static int ccid2_debug; + +#undef CCID2_DEBUG +#ifdef CCID2_DEBUG +#define ccid2_pr_debug(format, a...) \ + do { if (ccid2_debug) \ + printk(KERN_DEBUG "%s: " format, __FUNCTION__, ##a); \ + } while (0) +#else +#define ccid2_pr_debug(format, a...) +#endif + +static const int ccid2_seq_len = 128; + +#ifdef CCID2_DEBUG +static void ccid2_hc_tx_check_sanity(const struct ccid2_hc_tx_sock *hctx) +{ + int len = 0; + int pipe = 0; + struct ccid2_seq *seqp = hctx->ccid2hctx_seqh; + + /* there is data in the chain */ + if (seqp != hctx->ccid2hctx_seqt) { + seqp = seqp->ccid2s_prev; + len++; + if (!seqp->ccid2s_acked) + pipe++; + + while (seqp != hctx->ccid2hctx_seqt) { + struct ccid2_seq *prev = seqp->ccid2s_prev; + + len++; + if (!prev->ccid2s_acked) + pipe++; + + /* packets are sent sequentially */ + BUG_ON(seqp->ccid2s_seq <= prev->ccid2s_seq); + BUG_ON(seqp->ccid2s_sent < prev->ccid2s_sent); + BUG_ON(len > ccid2_seq_len); + + seqp = prev; + } + } + + BUG_ON(pipe != hctx->ccid2hctx_pipe); + ccid2_pr_debug("len of chain=%d\n", len); + + do { + seqp = seqp->ccid2s_prev; + len++; + BUG_ON(len > ccid2_seq_len); + } while (seqp != hctx->ccid2hctx_seqh); + + BUG_ON(len != ccid2_seq_len); + ccid2_pr_debug("total len=%d\n", len); +} +#else +#define ccid2_hc_tx_check_sanity(hctx) do {} while (0) +#endif + +static int ccid2_hc_tx_send_packet(struct sock *sk, + struct sk_buff *skb, int len) +{ + struct ccid2_hc_tx_sock *hctx; + + switch (DCCP_SKB_CB(skb)->dccpd_type) { + case 0: /* XXX data packets from userland come through like this */ + case DCCP_PKT_DATA: + case DCCP_PKT_DATAACK: + break; + /* No congestion control on other packets */ + default: + return 0; + } + + hctx = ccid2_hc_tx_sk(sk); + + ccid2_pr_debug("pipe=%d cwnd=%d\n", hctx->ccid2hctx_pipe, + hctx->ccid2hctx_cwnd); + + if (hctx->ccid2hctx_pipe < hctx->ccid2hctx_cwnd) { + /* OK we can send... make sure previous packet was sent off */ + if (!hctx->ccid2hctx_sendwait) { + hctx->ccid2hctx_sendwait = 1; + return 0; + } + } + + return 100; /* XXX */ +} + +static void ccid2_change_l_ack_ratio(struct sock *sk, int val) +{ + struct dccp_sock *dp = dccp_sk(sk); + /* + * XXX I don't really agree with val != 2. If cwnd is 1, ack ratio + * should be 1... it shouldn't be allowed to become 2. + * -sorbo. + */ + if (val != 2) { + const struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + int max = hctx->ccid2hctx_cwnd / 2; + + /* round up */ + if (hctx->ccid2hctx_cwnd & 1) + max++; + + if (val > max) + val = max; + } + + ccid2_pr_debug("changing local ack ratio to %d\n", val); + WARN_ON(val <= 0); + dp->dccps_l_ack_ratio = val; +} + +static void ccid2_change_cwnd(struct sock *sk, int val) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + if (val == 0) + val = 1; + + /* XXX do we need to change ack ratio? */ + ccid2_pr_debug("change cwnd to %d\n", val); + + BUG_ON(val < 1); + hctx->ccid2hctx_cwnd = val; +} + +static void ccid2_start_rto_timer(struct sock *sk); + +static void ccid2_hc_tx_rto_expire(unsigned long data) +{ + struct sock *sk = (struct sock *)data; + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + long s; + + bh_lock_sock(sk); + if (sock_owned_by_user(sk)) { + sk_reset_timer(sk, &hctx->ccid2hctx_rtotimer, + jiffies + HZ / 5); + goto out; + } + + ccid2_pr_debug("RTO_EXPIRE\n"); + + ccid2_hc_tx_check_sanity(hctx); + + /* back-off timer */ + hctx->ccid2hctx_rto <<= 1; + + s = hctx->ccid2hctx_rto / HZ; + if (s > 60) + hctx->ccid2hctx_rto = 60 * HZ; + + ccid2_start_rto_timer(sk); + + /* adjust pipe, cwnd etc */ + hctx->ccid2hctx_pipe = 0; + hctx->ccid2hctx_ssthresh = hctx->ccid2hctx_cwnd >> 1; + if (hctx->ccid2hctx_ssthresh < 2) + hctx->ccid2hctx_ssthresh = 2; + ccid2_change_cwnd(sk, 1); + + /* clear state about stuff we sent */ + hctx->ccid2hctx_seqt = hctx->ccid2hctx_seqh; + hctx->ccid2hctx_ssacks = 0; + hctx->ccid2hctx_acks = 0; + hctx->ccid2hctx_sent = 0; + + /* clear ack ratio state. */ + hctx->ccid2hctx_arsent = 0; + hctx->ccid2hctx_ackloss = 0; + hctx->ccid2hctx_rpseq = 0; + hctx->ccid2hctx_rpdupack = -1; + ccid2_change_l_ack_ratio(sk, 1); + ccid2_hc_tx_check_sanity(hctx); +out: + bh_unlock_sock(sk); + sock_put(sk); +} + +static void ccid2_start_rto_timer(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + ccid2_pr_debug("setting RTO timeout=%ld\n", hctx->ccid2hctx_rto); + + BUG_ON(timer_pending(&hctx->ccid2hctx_rtotimer)); + sk_reset_timer(sk, &hctx->ccid2hctx_rtotimer, + jiffies + hctx->ccid2hctx_rto); +} + +static void ccid2_hc_tx_packet_sent(struct sock *sk, int more, int len) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + u64 seq; + + ccid2_hc_tx_check_sanity(hctx); + + BUG_ON(!hctx->ccid2hctx_sendwait); + hctx->ccid2hctx_sendwait = 0; + hctx->ccid2hctx_pipe++; + BUG_ON(hctx->ccid2hctx_pipe < 0); + + /* There is an issue. What if another packet is sent between + * packet_send() and packet_sent(). Then the sequence number would be + * wrong. + * -sorbo. + */ + seq = dp->dccps_gss; + + hctx->ccid2hctx_seqh->ccid2s_seq = seq; + hctx->ccid2hctx_seqh->ccid2s_acked = 0; + hctx->ccid2hctx_seqh->ccid2s_sent = jiffies; + hctx->ccid2hctx_seqh = hctx->ccid2hctx_seqh->ccid2s_next; + + ccid2_pr_debug("cwnd=%d pipe=%d\n", hctx->ccid2hctx_cwnd, + hctx->ccid2hctx_pipe); + + if (hctx->ccid2hctx_seqh == hctx->ccid2hctx_seqt) { + /* XXX allocate more space */ + WARN_ON(1); + } + + hctx->ccid2hctx_sent++; + + /* Ack Ratio. Need to maintain a concept of how many windows we sent */ + hctx->ccid2hctx_arsent++; + /* We had an ack loss in this window... */ + if (hctx->ccid2hctx_ackloss) { + if (hctx->ccid2hctx_arsent >= hctx->ccid2hctx_cwnd) { + hctx->ccid2hctx_arsent = 0; + hctx->ccid2hctx_ackloss = 0; + } + } else { + /* No acks lost up to now... */ + /* decrease ack ratio if enough packets were sent */ + if (dp->dccps_l_ack_ratio > 1) { + /* XXX don't calculate denominator each time */ + int denom = dp->dccps_l_ack_ratio * dp->dccps_l_ack_ratio - + dp->dccps_l_ack_ratio; + + denom = hctx->ccid2hctx_cwnd * hctx->ccid2hctx_cwnd / denom; + + if (hctx->ccid2hctx_arsent >= denom) { + ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio - 1); + hctx->ccid2hctx_arsent = 0; + } + } else { + /* we can't increase ack ratio further [1] */ + hctx->ccid2hctx_arsent = 0; /* or maybe set it to cwnd*/ + } + } + + /* setup RTO timer */ + if (!timer_pending(&hctx->ccid2hctx_rtotimer)) + ccid2_start_rto_timer(sk); + +#ifdef CCID2_DEBUG + ccid2_pr_debug("pipe=%d\n", hctx->ccid2hctx_pipe); + ccid2_pr_debug("Sent: seq=%llu\n", seq); + do { + struct ccid2_seq *seqp = hctx->ccid2hctx_seqt; + + while (seqp != hctx->ccid2hctx_seqh) { + ccid2_pr_debug("out seq=%llu acked=%d time=%lu\n", + seqp->ccid2s_seq, seqp->ccid2s_acked, + seqp->ccid2s_sent); + seqp = seqp->ccid2s_next; + } + } while (0); + ccid2_pr_debug("=========\n"); + ccid2_hc_tx_check_sanity(hctx); +#endif +} + +/* XXX Lame code duplication! + * returns -1 if none was found. + * else returns the next offset to use in the function call. + */ +static int ccid2_ackvector(struct sock *sk, struct sk_buff *skb, int offset, + unsigned char **vec, unsigned char *veclen) +{ + const struct dccp_hdr *dh = dccp_hdr(skb); + unsigned char *options = (unsigned char *)dh + dccp_hdr_len(skb); + unsigned char *opt_ptr; + const unsigned char *opt_end = (unsigned char *)dh + + (dh->dccph_doff * 4); + unsigned char opt, len; + unsigned char *value; + + BUG_ON(offset < 0); + options += offset; + opt_ptr = options; + if (opt_ptr >= opt_end) + return -1; + + while (opt_ptr != opt_end) { + opt = *opt_ptr++; + len = 0; + value = NULL; + + /* Check if this isn't a single byte option */ + if (opt > DCCPO_MAX_RESERVED) { + if (opt_ptr == opt_end) + goto out_invalid_option; + + len = *opt_ptr++; + if (len < 3) + goto out_invalid_option; + /* + * Remove the type and len fields, leaving + * just the value size + */ + len -= 2; + value = opt_ptr; + opt_ptr += len; + + if (opt_ptr > opt_end) + goto out_invalid_option; + } + + switch (opt) { + case DCCPO_ACK_VECTOR_0: + case DCCPO_ACK_VECTOR_1: + *vec = value; + *veclen = len; + return offset + (opt_ptr - options); + } + } + + return -1; + +out_invalid_option: + BUG_ON(1); /* should never happen... options were previously parsed ! */ + return -1; +} + +static void ccid2_hc_tx_kill_rto_timer(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + sk_stop_timer(sk, &hctx->ccid2hctx_rtotimer); + ccid2_pr_debug("deleted RTO timer\n"); +} + +static inline void ccid2_new_ack(struct sock *sk, + struct ccid2_seq *seqp, + unsigned int *maxincr) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + /* slow start */ + if (hctx->ccid2hctx_cwnd < hctx->ccid2hctx_ssthresh) { + hctx->ccid2hctx_acks = 0; + + /* We can increase cwnd at most maxincr [ack_ratio/2] */ + if (*maxincr) { + /* increase every 2 acks */ + hctx->ccid2hctx_ssacks++; + if (hctx->ccid2hctx_ssacks == 2) { + ccid2_change_cwnd(sk, hctx->ccid2hctx_cwnd + 1); + hctx->ccid2hctx_ssacks = 0; + *maxincr = *maxincr - 1; + } + } else { + /* increased cwnd enough for this single ack */ + hctx->ccid2hctx_ssacks = 0; + } + } else { + hctx->ccid2hctx_ssacks = 0; + hctx->ccid2hctx_acks++; + + if (hctx->ccid2hctx_acks >= hctx->ccid2hctx_cwnd) { + ccid2_change_cwnd(sk, hctx->ccid2hctx_cwnd + 1); + hctx->ccid2hctx_acks = 0; + } + } + + /* update RTO */ + if (hctx->ccid2hctx_srtt == -1 || + (jiffies - hctx->ccid2hctx_lastrtt) >= hctx->ccid2hctx_srtt) { + unsigned long r = jiffies - seqp->ccid2s_sent; + int s; + + /* first measurement */ + if (hctx->ccid2hctx_srtt == -1) { + ccid2_pr_debug("R: %lu Time=%lu seq=%llu\n", + r, jiffies, seqp->ccid2s_seq); + hctx->ccid2hctx_srtt = r; + hctx->ccid2hctx_rttvar = r >> 1; + } else { + /* RTTVAR */ + long tmp = hctx->ccid2hctx_srtt - r; + if (tmp < 0) + tmp *= -1; + + tmp >>= 2; + hctx->ccid2hctx_rttvar *= 3; + hctx->ccid2hctx_rttvar >>= 2; + hctx->ccid2hctx_rttvar += tmp; + + /* SRTT */ + hctx->ccid2hctx_srtt *= 7; + hctx->ccid2hctx_srtt >>= 3; + tmp = r >> 3; + hctx->ccid2hctx_srtt += tmp; + } + s = hctx->ccid2hctx_rttvar << 2; + /* clock granularity is 1 when based on jiffies */ + if (!s) + s = 1; + hctx->ccid2hctx_rto = hctx->ccid2hctx_srtt + s; + + /* must be at least a second */ + s = hctx->ccid2hctx_rto / HZ; + /* DCCP doesn't require this [but I like it cuz my code sux] */ +#if 1 + if (s < 1) + hctx->ccid2hctx_rto = HZ; +#endif + /* max 60 seconds */ + if (s > 60) + hctx->ccid2hctx_rto = HZ * 60; + + hctx->ccid2hctx_lastrtt = jiffies; + + ccid2_pr_debug("srtt: %ld rttvar: %ld rto: %ld (HZ=%d) R=%lu\n", + hctx->ccid2hctx_srtt, hctx->ccid2hctx_rttvar, + hctx->ccid2hctx_rto, HZ, r); + hctx->ccid2hctx_sent = 0; + } + + /* we got a new ack, so re-start RTO timer */ + ccid2_hc_tx_kill_rto_timer(sk); + ccid2_start_rto_timer(sk); +} + +static void ccid2_hc_tx_dec_pipe(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + hctx->ccid2hctx_pipe--; + BUG_ON(hctx->ccid2hctx_pipe < 0); + + if (hctx->ccid2hctx_pipe == 0) + ccid2_hc_tx_kill_rto_timer(sk); +} + +static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + u64 ackno, seqno; + struct ccid2_seq *seqp; + unsigned char *vector; + unsigned char veclen; + int offset = 0; + int done = 0; + int loss = 0; + unsigned int maxincr = 0; + + ccid2_hc_tx_check_sanity(hctx); + /* check reverse path congestion */ + seqno = DCCP_SKB_CB(skb)->dccpd_seq; + + /* XXX this whole "algorithm" is broken. Need to fix it to keep track + * of the seqnos of the dupacks so that rpseq and rpdupack are correct + * -sorbo. + */ + /* need to bootstrap */ + if (hctx->ccid2hctx_rpdupack == -1) { + hctx->ccid2hctx_rpdupack = 0; + hctx->ccid2hctx_rpseq = seqno; + } else { + /* check if packet is consecutive */ + if ((hctx->ccid2hctx_rpseq + 1) == seqno) + hctx->ccid2hctx_rpseq++; + /* it's a later packet */ + else if (after48(seqno, hctx->ccid2hctx_rpseq)) { + hctx->ccid2hctx_rpdupack++; + + /* check if we got enough dupacks */ + if (hctx->ccid2hctx_rpdupack >= + hctx->ccid2hctx_numdupack) { + hctx->ccid2hctx_rpdupack = -1; /* XXX lame */ + hctx->ccid2hctx_rpseq = 0; + + ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio << 1); + } + } + } + + /* check forward path congestion */ + /* still didn't send out new data packets */ + if (hctx->ccid2hctx_seqh == hctx->ccid2hctx_seqt) + return; + + switch (DCCP_SKB_CB(skb)->dccpd_type) { + case DCCP_PKT_ACK: + case DCCP_PKT_DATAACK: + break; + default: + return; + } + + ackno = DCCP_SKB_CB(skb)->dccpd_ack_seq; + seqp = hctx->ccid2hctx_seqh->ccid2s_prev; + + /* If in slow-start, cwnd can increase at most Ack Ratio / 2 packets for + * this single ack. I round up. + * -sorbo. + */ + maxincr = dp->dccps_l_ack_ratio >> 1; + maxincr++; + + /* go through all ack vectors */ + while ((offset = ccid2_ackvector(sk, skb, offset, + &vector, &veclen)) != -1) { + /* go through this ack vector */ + while (veclen--) { + const u8 rl = *vector & DCCP_ACKVEC_LEN_MASK; + u64 ackno_end_rl; + + dccp_set_seqno(&ackno_end_rl, ackno - rl); + ccid2_pr_debug("ackvec start:%llu end:%llu\n", ackno, + ackno_end_rl); + /* if the seqno we are analyzing is larger than the + * current ackno, then move towards the tail of our + * seqnos. + */ + while (after48(seqp->ccid2s_seq, ackno)) { + if (seqp == hctx->ccid2hctx_seqt) { + done = 1; + break; + } + seqp = seqp->ccid2s_prev; + } + if (done) + break; + + /* check all seqnos in the range of the vector + * run length + */ + while (between48(seqp->ccid2s_seq,ackno_end_rl,ackno)) { + const u8 state = (*vector & + DCCP_ACKVEC_STATE_MASK) >> 6; + + /* new packet received or marked */ + if (state != DCCP_ACKVEC_STATE_NOT_RECEIVED && + !seqp->ccid2s_acked) { + if (state == + DCCP_ACKVEC_STATE_ECN_MARKED) { + loss = 1; + } else + ccid2_new_ack(sk, seqp, + &maxincr); + + seqp->ccid2s_acked = 1; + ccid2_pr_debug("Got ack for %llu\n", + seqp->ccid2s_seq); + ccid2_hc_tx_dec_pipe(sk); + } + if (seqp == hctx->ccid2hctx_seqt) { + done = 1; + break; + } + seqp = seqp->ccid2s_next; + } + if (done) + break; + + + dccp_set_seqno(&ackno, ackno_end_rl - 1); + vector++; + } + if (done) + break; + } + + /* The state about what is acked should be correct now + * Check for NUMDUPACK + */ + seqp = hctx->ccid2hctx_seqh->ccid2s_prev; + done = 0; + while (1) { + if (seqp->ccid2s_acked) { + done++; + if (done == hctx->ccid2hctx_numdupack) + break; + } + if (seqp == hctx->ccid2hctx_seqt) + break; + seqp = seqp->ccid2s_prev; + } + + /* If there are at least 3 acknowledgements, anything unacknowledged + * below the last sequence number is considered lost + */ + if (done == hctx->ccid2hctx_numdupack) { + struct ccid2_seq *last_acked = seqp; + + /* check for lost packets */ + while (1) { + if (!seqp->ccid2s_acked) { + loss = 1; + ccid2_hc_tx_dec_pipe(sk); + } + if (seqp == hctx->ccid2hctx_seqt) + break; + seqp = seqp->ccid2s_prev; + } + + hctx->ccid2hctx_seqt = last_acked; + } + + /* trim acked packets in tail */ + while (hctx->ccid2hctx_seqt != hctx->ccid2hctx_seqh) { + if (!hctx->ccid2hctx_seqt->ccid2s_acked) + break; + + hctx->ccid2hctx_seqt = hctx->ccid2hctx_seqt->ccid2s_next; + } + + if (loss) { + /* XXX do bit shifts guarantee a 0 as the new bit? */ + ccid2_change_cwnd(sk, hctx->ccid2hctx_cwnd >> 1); + hctx->ccid2hctx_ssthresh = hctx->ccid2hctx_cwnd; + if (hctx->ccid2hctx_ssthresh < 2) + hctx->ccid2hctx_ssthresh = 2; + } + + ccid2_hc_tx_check_sanity(hctx); +} + +static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid_priv(ccid); + int seqcount = ccid2_seq_len; + int i; + + /* XXX init variables with proper values */ + hctx->ccid2hctx_cwnd = 1; + hctx->ccid2hctx_ssthresh = 10; + hctx->ccid2hctx_numdupack = 3; + + /* XXX init ~ to window size... */ + hctx->ccid2hctx_seqbuf = kmalloc(sizeof(*hctx->ccid2hctx_seqbuf) * + seqcount, gfp_any()); + if (hctx->ccid2hctx_seqbuf == NULL) + return -ENOMEM; + + for (i = 0; i < (seqcount - 1); i++) { + hctx->ccid2hctx_seqbuf[i].ccid2s_next = + &hctx->ccid2hctx_seqbuf[i + 1]; + hctx->ccid2hctx_seqbuf[i + 1].ccid2s_prev = + &hctx->ccid2hctx_seqbuf[i]; + } + hctx->ccid2hctx_seqbuf[seqcount - 1].ccid2s_next = + hctx->ccid2hctx_seqbuf; + hctx->ccid2hctx_seqbuf->ccid2s_prev = + &hctx->ccid2hctx_seqbuf[seqcount - 1]; + + hctx->ccid2hctx_seqh = hctx->ccid2hctx_seqbuf; + hctx->ccid2hctx_seqt = hctx->ccid2hctx_seqh; + hctx->ccid2hctx_sent = 0; + hctx->ccid2hctx_rto = 3 * HZ; + hctx->ccid2hctx_srtt = -1; + hctx->ccid2hctx_rttvar = -1; + hctx->ccid2hctx_lastrtt = 0; + hctx->ccid2hctx_rpdupack = -1; + + hctx->ccid2hctx_rtotimer.function = &ccid2_hc_tx_rto_expire; + hctx->ccid2hctx_rtotimer.data = (unsigned long)sk; + init_timer(&hctx->ccid2hctx_rtotimer); + + ccid2_hc_tx_check_sanity(hctx); + return 0; +} + +static void ccid2_hc_tx_exit(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + ccid2_hc_tx_kill_rto_timer(sk); + kfree(hctx->ccid2hctx_seqbuf); + hctx->ccid2hctx_seqbuf = NULL; +} + +static void ccid2_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) +{ + const struct dccp_sock *dp = dccp_sk(sk); + struct ccid2_hc_rx_sock *hcrx = ccid2_hc_rx_sk(sk); + + switch (DCCP_SKB_CB(skb)->dccpd_type) { + case DCCP_PKT_DATA: + case DCCP_PKT_DATAACK: + hcrx->ccid2hcrx_data++; + if (hcrx->ccid2hcrx_data >= dp->dccps_r_ack_ratio) { + dccp_send_ack(sk); + hcrx->ccid2hcrx_data = 0; + } + break; + } +} + +static struct ccid_operations ccid2 = { + .ccid_id = 2, + .ccid_name = "ccid2", + .ccid_owner = THIS_MODULE, + .ccid_hc_tx_obj_size = sizeof(struct ccid2_hc_tx_sock), + .ccid_hc_tx_init = ccid2_hc_tx_init, + .ccid_hc_tx_exit = ccid2_hc_tx_exit, + .ccid_hc_tx_send_packet = ccid2_hc_tx_send_packet, + .ccid_hc_tx_packet_sent = ccid2_hc_tx_packet_sent, + .ccid_hc_tx_packet_recv = ccid2_hc_tx_packet_recv, + .ccid_hc_rx_obj_size = sizeof(struct ccid2_hc_rx_sock), + .ccid_hc_rx_packet_recv = ccid2_hc_rx_packet_recv, +}; + +module_param(ccid2_debug, int, 0444); +MODULE_PARM_DESC(ccid2_debug, "Enable debug messages"); + +static __init int ccid2_module_init(void) +{ + return ccid_register(&ccid2); +} +module_init(ccid2_module_init); + +static __exit void ccid2_module_exit(void) +{ + ccid_unregister(&ccid2); +} +module_exit(ccid2_module_exit); + +MODULE_AUTHOR("Andrea Bittau "); +MODULE_DESCRIPTION("DCCP TCP-Like (CCID2) CCID"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("net-dccp-ccid-2"); diff -puN /dev/null net/dccp/ccids/ccid2.h --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/ccids/ccid2.h 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,85 @@ +/* + * net/dccp/ccids/ccid2.h + * + * Copyright (c) 2005 Andrea Bittau + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ +#ifndef _DCCP_CCID2_H_ +#define _DCCP_CCID2_H_ + +#include +#include +#include +#include "../ccid.h" + +struct sock; + +struct ccid2_seq { + u64 ccid2s_seq; + unsigned long ccid2s_sent; + int ccid2s_acked; + struct ccid2_seq *ccid2s_prev; + struct ccid2_seq *ccid2s_next; +}; + +/** struct ccid2_hc_tx_sock - CCID2 TX half connection + * + * @ccid2hctx_ssacks - ACKs recv in slow start + * @ccid2hctx_acks - ACKS recv in AI phase + * @ccid2hctx_sent - packets sent in this window + * @ccid2hctx_lastrtt -time RTT was last measured + * @ccid2hctx_arsent - packets sent [ack ratio] + * @ccid2hctx_ackloss - ack was lost in this win + * @ccid2hctx_rpseq - last consecutive seqno + * @ccid2hctx_rpdupack - dupacks since rpseq +*/ +struct ccid2_hc_tx_sock { + int ccid2hctx_cwnd; + int ccid2hctx_ssacks; + int ccid2hctx_acks; + int ccid2hctx_ssthresh; + int ccid2hctx_pipe; + int ccid2hctx_numdupack; + struct ccid2_seq *ccid2hctx_seqbuf; + struct ccid2_seq *ccid2hctx_seqh; + struct ccid2_seq *ccid2hctx_seqt; + long ccid2hctx_rto; + long ccid2hctx_srtt; + long ccid2hctx_rttvar; + int ccid2hctx_sent; + unsigned long ccid2hctx_lastrtt; + struct timer_list ccid2hctx_rtotimer; + unsigned long ccid2hctx_arsent; + int ccid2hctx_ackloss; + u64 ccid2hctx_rpseq; + int ccid2hctx_rpdupack; + int ccid2hctx_sendwait; +}; + +struct ccid2_hc_rx_sock { + int ccid2hcrx_data; +}; + +static inline struct ccid2_hc_tx_sock *ccid2_hc_tx_sk(const struct sock *sk) +{ + return ccid_priv(dccp_sk(sk)->dccps_hc_tx_ccid); +} + +static inline struct ccid2_hc_rx_sock *ccid2_hc_rx_sk(const struct sock *sk) +{ + return ccid_priv(dccp_sk(sk)->dccps_hc_rx_ccid); +} +#endif /* _DCCP_CCID2_H_ */ diff -puN net/dccp/ccids/ccid3.c~git-net net/dccp/ccids/ccid3.c --- devel/net/dccp/ccids/ccid3.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/ccid3.c 2006-03-17 23:03:48.000000000 -0800 @@ -46,7 +46,7 @@ * Reason for maths here is to avoid 32 bit overflow when a is big. * With this we get close to the limit. */ -static inline u32 usecs_div(const u32 a, const u32 b) +static u32 usecs_div(const u32 a, const u32 b) { const u32 div = a < (UINT_MAX / (USEC_PER_SEC / 10)) ? 10 : a < (UINT_MAX / (USEC_PER_SEC / 50)) ? 50 : @@ -76,15 +76,6 @@ static struct dccp_tx_hist *ccid3_tx_his static struct dccp_rx_hist *ccid3_rx_hist; static struct dccp_li_hist *ccid3_li_hist; -static int ccid3_init(struct sock *sk) -{ - return 0; -} - -static void ccid3_exit(struct sock *sk) -{ -} - /* TFRC sender states */ enum ccid3_hc_tx_states { TFRC_SSTATE_NO_SENT = 1, @@ -107,8 +98,8 @@ static const char *ccid3_tx_state_name(e } #endif -static inline void ccid3_hc_tx_set_state(struct sock *sk, - enum ccid3_hc_tx_states state) +static void ccid3_hc_tx_set_state(struct sock *sk, + enum ccid3_hc_tx_states state) { struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk); enum ccid3_hc_tx_states oldstate = hctx->ccid3hctx_state; @@ -316,8 +307,6 @@ static int ccid3_hc_tx_send_packet(struc switch (hctx->ccid3hctx_state) { case TFRC_SSTATE_NO_SENT: - hctx->ccid3hctx_no_feedback_timer.function = ccid3_hc_tx_no_feedback_timer; - hctx->ccid3hctx_no_feedback_timer.data = (unsigned long)sk; sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer, jiffies + usecs_to_jiffies(TFRC_INITIAL_TIMEOUT)); hctx->ccid3hctx_last_win_count = 0; @@ -585,16 +574,15 @@ static void ccid3_hc_tx_packet_recv(stru } } -static void ccid3_hc_tx_insert_options(struct sock *sk, struct sk_buff *skb) +static int ccid3_hc_tx_insert_options(struct sock *sk, struct sk_buff *skb) { const struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk); BUG_ON(hctx == NULL); - if (!(sk->sk_state == DCCP_OPEN || sk->sk_state == DCCP_PARTOPEN)) - return; - - DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count; + if (sk->sk_state == DCCP_OPEN || sk->sk_state == DCCP_PARTOPEN) + DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count; + return 0; } static int ccid3_hc_tx_parse_options(struct sock *sk, unsigned char option, @@ -626,7 +614,7 @@ static int ccid3_hc_tx_parse_options(str __FUNCTION__, dccp_role(sk), sk); rc = -EINVAL; } else { - opt_recv->ccid3or_loss_event_rate = ntohl(*(u32 *)value); + opt_recv->ccid3or_loss_event_rate = ntohl(*(__be32 *)value); ccid3_pr_debug("%s, sk=%p, LOSS_EVENT_RATE=%u\n", dccp_role(sk), sk, opt_recv->ccid3or_loss_event_rate); @@ -647,7 +635,7 @@ static int ccid3_hc_tx_parse_options(str __FUNCTION__, dccp_role(sk), sk); rc = -EINVAL; } else { - opt_recv->ccid3or_receive_rate = ntohl(*(u32 *)value); + opt_recv->ccid3or_receive_rate = ntohl(*(__be32 *)value); ccid3_pr_debug("%s, sk=%p, RECEIVE_RATE=%u\n", dccp_role(sk), sk, opt_recv->ccid3or_receive_rate); @@ -658,17 +646,10 @@ static int ccid3_hc_tx_parse_options(str return rc; } -static int ccid3_hc_tx_init(struct sock *sk) +static int ccid3_hc_tx_init(struct ccid *ccid, struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); - struct ccid3_hc_tx_sock *hctx; - - dp->dccps_hc_tx_ccid_private = kmalloc(sizeof(*hctx), gfp_any()); - if (dp->dccps_hc_tx_ccid_private == NULL) - return -ENOMEM; - - hctx = ccid3_hc_tx_sk(sk); - memset(hctx, 0, sizeof(*hctx)); + struct ccid3_hc_tx_sock *hctx = ccid_priv(ccid); if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) @@ -681,6 +662,9 @@ static int ccid3_hc_tx_init(struct sock hctx->ccid3hctx_t_rto = USEC_PER_SEC; hctx->ccid3hctx_state = TFRC_SSTATE_NO_SENT; INIT_LIST_HEAD(&hctx->ccid3hctx_hist); + + hctx->ccid3hctx_no_feedback_timer.function = ccid3_hc_tx_no_feedback_timer; + hctx->ccid3hctx_no_feedback_timer.data = (unsigned long)sk; init_timer(&hctx->ccid3hctx_no_feedback_timer); return 0; @@ -688,7 +672,6 @@ static int ccid3_hc_tx_init(struct sock static void ccid3_hc_tx_exit(struct sock *sk) { - struct dccp_sock *dp = dccp_sk(sk); struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk); BUG_ON(hctx == NULL); @@ -698,9 +681,6 @@ static void ccid3_hc_tx_exit(struct sock /* Empty packet history */ dccp_tx_hist_purge(ccid3_tx_hist, &hctx->ccid3hctx_hist); - - kfree(dp->dccps_hc_tx_ccid_private); - dp->dccps_hc_tx_ccid_private = NULL; } /* @@ -727,8 +707,8 @@ static const char *ccid3_rx_state_name(e } #endif -static inline void ccid3_hc_rx_set_state(struct sock *sk, - enum ccid3_hc_rx_states state) +static void ccid3_hc_rx_set_state(struct sock *sk, + enum ccid3_hc_rx_states state) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); enum ccid3_hc_rx_states oldstate = hcrx->ccid3hcrx_state; @@ -793,31 +773,35 @@ static void ccid3_hc_rx_send_feedback(st dccp_send_ack(sk); } -static void ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) +static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) { const struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - u32 x_recv, pinv; + __be32 x_recv, pinv; BUG_ON(hcrx == NULL); if (!(sk->sk_state == DCCP_OPEN || sk->sk_state == DCCP_PARTOPEN)) - return; + return 0; DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_last_counter; if (dccp_packet_without_ack(skb)) - return; - - if (hcrx->ccid3hcrx_elapsed_time != 0) - dccp_insert_option_elapsed_time(sk, skb, - hcrx->ccid3hcrx_elapsed_time); - dccp_insert_option_timestamp(sk, skb); + return 0; + x_recv = htonl(hcrx->ccid3hcrx_x_recv); pinv = htonl(hcrx->ccid3hcrx_pinv); - dccp_insert_option(sk, skb, TFRC_OPT_LOSS_EVENT_RATE, - &pinv, sizeof(pinv)); - dccp_insert_option(sk, skb, TFRC_OPT_RECEIVE_RATE, - &x_recv, sizeof(x_recv)); + + if ((hcrx->ccid3hcrx_elapsed_time != 0 && + dccp_insert_option_elapsed_time(sk, skb, + hcrx->ccid3hcrx_elapsed_time)) || + dccp_insert_option_timestamp(sk, skb) || + dccp_insert_option(sk, skb, TFRC_OPT_LOSS_EVENT_RATE, + &pinv, sizeof(pinv)) || + dccp_insert_option(sk, skb, TFRC_OPT_RECEIVE_RATE, + &x_recv, sizeof(x_recv))) + return -1; + + return 0; } /* calculate first loss interval @@ -1047,20 +1031,13 @@ static void ccid3_hc_rx_packet_recv(stru } } -static int ccid3_hc_rx_init(struct sock *sk) +static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); - struct ccid3_hc_rx_sock *hcrx; + struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid); ccid3_pr_debug("%s, sk=%p\n", dccp_role(sk), sk); - dp->dccps_hc_rx_ccid_private = kmalloc(sizeof(*hcrx), gfp_any()); - if (dp->dccps_hc_rx_ccid_private == NULL) - return -ENOMEM; - - hcrx = ccid3_hc_rx_sk(sk); - memset(hcrx, 0, sizeof(*hcrx)); - if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) hcrx->ccid3hcrx_s = dp->dccps_packet_size; @@ -1079,7 +1056,6 @@ static int ccid3_hc_rx_init(struct sock static void ccid3_hc_rx_exit(struct sock *sk) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct dccp_sock *dp = dccp_sk(sk); BUG_ON(hcrx == NULL); @@ -1090,9 +1066,6 @@ static void ccid3_hc_rx_exit(struct sock /* Empty loss interval history */ dccp_li_hist_purge(ccid3_li_hist, &hcrx->ccid3hcrx_li_hist); - - kfree(dp->dccps_hc_rx_ccid_private); - dp->dccps_hc_rx_ccid_private = NULL; } static void ccid3_hc_rx_get_info(struct sock *sk, struct tcp_info *info) @@ -1178,12 +1151,11 @@ static int ccid3_hc_tx_getsockopt(struct return 0; } -static struct ccid ccid3 = { +static struct ccid_operations ccid3 = { .ccid_id = 3, .ccid_name = "ccid3", .ccid_owner = THIS_MODULE, - .ccid_init = ccid3_init, - .ccid_exit = ccid3_exit, + .ccid_hc_tx_obj_size = sizeof(struct ccid3_hc_tx_sock), .ccid_hc_tx_init = ccid3_hc_tx_init, .ccid_hc_tx_exit = ccid3_hc_tx_exit, .ccid_hc_tx_send_packet = ccid3_hc_tx_send_packet, @@ -1191,6 +1163,7 @@ static struct ccid ccid3 = { .ccid_hc_tx_packet_recv = ccid3_hc_tx_packet_recv, .ccid_hc_tx_insert_options = ccid3_hc_tx_insert_options, .ccid_hc_tx_parse_options = ccid3_hc_tx_parse_options, + .ccid_hc_rx_obj_size = sizeof(struct ccid3_hc_rx_sock), .ccid_hc_rx_init = ccid3_hc_rx_init, .ccid_hc_rx_exit = ccid3_hc_rx_exit, .ccid_hc_rx_insert_options = ccid3_hc_rx_insert_options, @@ -1241,15 +1214,6 @@ module_init(ccid3_module_init); static __exit void ccid3_module_exit(void) { -#ifdef CONFIG_IP_DCCP_UNLOAD_HACK - /* - * Hack to use while developing, so that we get rid of the control - * sock, that is what keeps a refcount on dccp.ko -acme - */ - extern void dccp_ctl_sock_exit(void); - - dccp_ctl_sock_exit(); -#endif ccid_unregister(&ccid3); if (ccid3_tx_hist != NULL) { diff -puN net/dccp/ccids/ccid3.h~git-net net/dccp/ccids/ccid3.h --- devel/net/dccp/ccids/ccid3.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/ccid3.h 2006-03-17 23:03:48.000000000 -0800 @@ -41,6 +41,7 @@ #include #include #include +#include "../ccid.h" #define TFRC_MIN_PACKET_SIZE 16 #define TFRC_STD_PACKET_SIZE 256 @@ -135,12 +136,12 @@ struct ccid3_hc_rx_sock { static inline struct ccid3_hc_tx_sock *ccid3_hc_tx_sk(const struct sock *sk) { - return dccp_sk(sk)->dccps_hc_tx_ccid_private; + return ccid_priv(dccp_sk(sk)->dccps_hc_tx_ccid); } static inline struct ccid3_hc_rx_sock *ccid3_hc_rx_sk(const struct sock *sk) { - return dccp_sk(sk)->dccps_hc_rx_ccid_private; + return ccid_priv(dccp_sk(sk)->dccps_hc_rx_ccid); } #endif /* _DCCP_CCID3_H_ */ diff -puN net/dccp/ccids/Kconfig~git-net net/dccp/ccids/Kconfig --- devel/net/dccp/ccids/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -1,9 +1,39 @@ menu "DCCP CCIDs Configuration (EXPERIMENTAL)" depends on IP_DCCP && EXPERIMENTAL +config IP_DCCP_CCID2 + tristate "CCID2 (TCP-Like) (EXPERIMENTAL)" + depends on IP_DCCP + def_tristate IP_DCCP + select IP_DCCP_ACKVEC + ---help--- + CCID 2, TCP-like Congestion Control, denotes Additive Increase, + Multiplicative Decrease (AIMD) congestion control with behavior + modelled directly on TCP, including congestion window, slow start, + timeouts, and so forth [RFC 2581]. CCID 2 achieves maximum + bandwidth over the long term, consistent with the use of end-to-end + congestion control, but halves its congestion window in response to + each congestion event. This leads to the abrupt rate changes + typical of TCP. Applications should use CCID 2 if they prefer + maximum bandwidth utilization to steadiness of rate. This is often + the case for applications that are not playing their data directly + to the user. For example, a hypothetical application that + transferred files over DCCP, using application-level retransmissions + for lost packets, would prefer CCID 2 to CCID 3. On-line games may + also prefer CCID 2. + + CCID 2 is further described in: + http://www.icir.org/kohler/dccp/draft-ietf-dccp-ccid2-10.txt + + This text was extracted from: + http://www.icir.org/kohler/dccp/draft-ietf-dccp-spec-13.txt + + If in doubt, say M. + config IP_DCCP_CCID3 - tristate "CCID3 (TFRC) (EXPERIMENTAL)" + tristate "CCID3 (TCP-Friendly) (EXPERIMENTAL)" depends on IP_DCCP + def_tristate IP_DCCP ---help--- CCID 3 denotes TCP-Friendly Rate Control (TFRC), an equation-based rate-controlled congestion control mechanism. TFRC is designed to @@ -15,10 +45,15 @@ config IP_DCCP_CCID3 suitable than CCID 2 for applications such streaming media where a relatively smooth sending rate is of importance. - CCID 3 is further described in [CCID 3 PROFILE]. The TFRC - congestion control algorithms were initially described in RFC 3448. + CCID 3 is further described in: + + http://www.icir.org/kohler/dccp/draft-ietf-dccp-ccid3-11.txt. + + The TFRC congestion control algorithms were initially described in + RFC 3448. - This text was extracted from draft-ietf-dccp-spec-11.txt. + This text was extracted from: + http://www.icir.org/kohler/dccp/draft-ietf-dccp-spec-13.txt If in doubt, say M. diff -puN net/dccp/ccids/Makefile~git-net net/dccp/ccids/Makefile --- devel/net/dccp/ccids/Makefile~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/Makefile 2006-03-17 23:03:48.000000000 -0800 @@ -2,4 +2,8 @@ obj-$(CONFIG_IP_DCCP_CCID3) += dccp_ccid dccp_ccid3-y := ccid3.o +obj-$(CONFIG_IP_DCCP_CCID2) += dccp_ccid2.o + +dccp_ccid2-y := ccid2.o + obj-y += lib/ diff -puN net/dccp/dccp.h~git-net net/dccp/dccp.h --- devel/net/dccp/dccp.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/dccp.h 2006-03-17 23:03:48.000000000 -0800 @@ -59,8 +59,6 @@ extern void dccp_time_wait(struct sock * #define DCCP_RTO_MAX ((unsigned)(120 * HZ)) /* FIXME: using TCP value */ -extern struct proto dccp_prot; - /* is seq1 < seq2 ? */ static inline int before48(const u64 seq1, const u64 seq2) { @@ -120,7 +118,6 @@ DECLARE_SNMP_STAT(struct dccp_mib, dccp_ extern int dccp_retransmit_skb(struct sock *sk, struct sk_buff *skb); -extern int dccp_send_response(struct sock *sk); extern void dccp_send_ack(struct sock *sk); extern void dccp_send_delayed_ack(struct sock *sk); extern void dccp_send_sync(struct sock *sk, const u64 seq, @@ -140,53 +137,8 @@ extern unsigned int dccp_sync_mss(struct extern const char *dccp_packet_name(const int type); extern const char *dccp_state_name(const int state); -static inline void dccp_set_state(struct sock *sk, const int state) -{ - const int oldstate = sk->sk_state; - - dccp_pr_debug("%s(%p) %-10.10s -> %s\n", - dccp_role(sk), sk, - dccp_state_name(oldstate), dccp_state_name(state)); - WARN_ON(state == oldstate); - - switch (state) { - case DCCP_OPEN: - if (oldstate != DCCP_OPEN) - DCCP_INC_STATS(DCCP_MIB_CURRESTAB); - break; - - case DCCP_CLOSED: - if (oldstate == DCCP_CLOSING || oldstate == DCCP_OPEN) - DCCP_INC_STATS(DCCP_MIB_ESTABRESETS); - - sk->sk_prot->unhash(sk); - if (inet_csk(sk)->icsk_bind_hash != NULL && - !(sk->sk_userlocks & SOCK_BINDPORT_LOCK)) - inet_put_port(&dccp_hashinfo, sk); - /* fall through */ - default: - if (oldstate == DCCP_OPEN) - DCCP_DEC_STATS(DCCP_MIB_CURRESTAB); - } - - /* Change state AFTER socket is unhashed to avoid closed - * socket sitting in hash tables. - */ - sk->sk_state = state; -} - -static inline void dccp_done(struct sock *sk) -{ - dccp_set_state(sk, DCCP_CLOSED); - dccp_clear_xmit_timers(sk); - - sk->sk_shutdown = SHUTDOWN_MASK; - - if (!sock_flag(sk, SOCK_DEAD)) - sk->sk_state_change(sk); - else - inet_csk_destroy_sock(sk); -} +extern void dccp_set_state(struct sock *sk, const int state); +extern void dccp_done(struct sock *sk); static inline void dccp_openreq_init(struct request_sock *req, struct dccp_sock *dp, @@ -209,10 +161,6 @@ extern struct sock *dccp_create_openreq_ extern int dccp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); -extern void dccp_v4_err(struct sk_buff *skb, u32); - -extern int dccp_v4_rcv(struct sk_buff *skb); - extern struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, @@ -228,24 +176,30 @@ extern int dccp_rcv_state_process(struct extern int dccp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct dccp_hdr *dh, const unsigned len); -extern int dccp_v4_init_sock(struct sock *sk); -extern int dccp_v4_destroy_sock(struct sock *sk); +extern int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized); +extern int dccp_destroy_sock(struct sock *sk); extern void dccp_close(struct sock *sk, long timeout); extern struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst, struct request_sock *req); -extern struct sk_buff *dccp_make_reset(struct sock *sk, - struct dst_entry *dst, - enum dccp_reset_codes code); extern int dccp_connect(struct sock *sk); extern int dccp_disconnect(struct sock *sk, int flags); +extern void dccp_hash(struct sock *sk); extern void dccp_unhash(struct sock *sk); extern int dccp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); extern int dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +#ifdef CONFIG_COMPAT +extern int compat_dccp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_dccp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); +#endif extern int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size); @@ -262,15 +216,14 @@ extern int dccp_v4_connect(struct soc int addr_len); extern int dccp_v4_checksum(const struct sk_buff *skb, - const u32 saddr, const u32 daddr); + const __be32 saddr, const __be32 daddr); -extern int dccp_v4_send_reset(struct sock *sk, - enum dccp_reset_codes code); +extern int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code); extern void dccp_send_close(struct sock *sk, const int active); extern int dccp_invalid_packet(struct sk_buff *skb); static inline int dccp_bad_service_code(const struct sock *sk, - const __u32 service) + const __be32 service) { const struct dccp_sock *dp = dccp_sk(sk); @@ -334,41 +287,29 @@ static inline void dccp_hdr_set_seq(stru { struct dccp_hdr_ext *dhx = (struct dccp_hdr_ext *)((void *)dh + sizeof(*dh)); - -#if defined(__LITTLE_ENDIAN_BITFIELD) - dh->dccph_seq = htonl((gss >> 32)) >> 8; -#elif defined(__BIG_ENDIAN_BITFIELD) - dh->dccph_seq = htonl((gss >> 32)); -#else -#error "Adjust your defines" -#endif + dh->dccph_seq2 = 0; + dh->dccph_seq = htons((gss >> 32) & 0xfffff); dhx->dccph_seq_low = htonl(gss & 0xffffffff); } static inline void dccp_hdr_set_ack(struct dccp_hdr_ack_bits *dhack, const u64 gsr) { -#if defined(__LITTLE_ENDIAN_BITFIELD) - dhack->dccph_ack_nr_high = htonl((gsr >> 32)) >> 8; -#elif defined(__BIG_ENDIAN_BITFIELD) - dhack->dccph_ack_nr_high = htonl((gsr >> 32)); -#else -#error "Adjust your defines" -#endif + dhack->dccph_reserved1 = 0; + dhack->dccph_ack_nr_high = htons(gsr >> 32); dhack->dccph_ack_nr_low = htonl(gsr & 0xffffffff); } static inline void dccp_update_gsr(struct sock *sk, u64 seq) { struct dccp_sock *dp = dccp_sk(sk); + const struct dccp_minisock *dmsk = dccp_msk(sk); dp->dccps_gsr = seq; dccp_set_seqno(&dp->dccps_swl, - (dp->dccps_gsr + 1 - - (dp->dccps_options.dccpo_sequence_window / 4))); + dp->dccps_gsr + 1 - (dmsk->dccpms_sequence_window / 4)); dccp_set_seqno(&dp->dccps_swh, - (dp->dccps_gsr + - (3 * dp->dccps_options.dccpo_sequence_window) / 4)); + dp->dccps_gsr + (3 * dmsk->dccpms_sequence_window) / 4); } static inline void dccp_update_gss(struct sock *sk, u64 seq) @@ -378,7 +319,7 @@ static inline void dccp_update_gss(struc dp->dccps_awh = dp->dccps_gss = seq; dccp_set_seqno(&dp->dccps_awl, (dp->dccps_gss - - dp->dccps_options.dccpo_sequence_window + 1)); + dccp_msk(sk)->dccpms_sequence_window + 1)); } static inline int dccp_ack_pending(const struct sock *sk) @@ -386,24 +327,22 @@ static inline int dccp_ack_pending(const const struct dccp_sock *dp = dccp_sk(sk); return dp->dccps_timestamp_echo != 0 || #ifdef CONFIG_IP_DCCP_ACKVEC - (dp->dccps_options.dccpo_send_ack_vector && + (dccp_msk(sk)->dccpms_send_ack_vector && dccp_ackvec_pending(dp->dccps_hc_rx_ackvec)) || #endif inet_csk_ack_scheduled(sk); } -extern void dccp_insert_options(struct sock *sk, struct sk_buff *skb); -extern void dccp_insert_option_elapsed_time(struct sock *sk, +extern int dccp_insert_options(struct sock *sk, struct sk_buff *skb); +extern int dccp_insert_option_elapsed_time(struct sock *sk, struct sk_buff *skb, u32 elapsed_time); -extern void dccp_insert_option_timestamp(struct sock *sk, +extern int dccp_insert_option_timestamp(struct sock *sk, struct sk_buff *skb); -extern void dccp_insert_option(struct sock *sk, struct sk_buff *skb, +extern int dccp_insert_option(struct sock *sk, struct sk_buff *skb, unsigned char option, const void *value, unsigned char len); -extern struct socket *dccp_ctl_socket; - extern void dccp_timestamp(const struct sock *sk, struct timeval *tv); static inline suseconds_t timeval_usecs(const struct timeval *tv) @@ -444,4 +383,18 @@ static inline void timeval_sub_usecs(str } } +#ifdef CONFIG_SYSCTL +extern int dccp_sysctl_init(void); +extern void dccp_sysctl_exit(void); +#else +static inline int dccp_sysctl_init(void) +{ + return 0; +} + +static inline void dccp_sysctl_exit(void) +{ +} +#endif + #endif /* _DCCP_H */ diff -puN net/dccp/diag.c~git-net net/dccp/diag.c --- devel/net/dccp/diag.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/diag.c 2006-03-17 23:03:48.000000000 -0800 @@ -30,7 +30,7 @@ static void dccp_get_info(struct sock *s info->tcpi_backoff = icsk->icsk_backoff; info->tcpi_pmtu = icsk->icsk_pmtu_cookie; - if (dp->dccps_options.dccpo_send_ack_vector) + if (dccp_msk(sk)->dccpms_send_ack_vector) info->tcpi_options |= TCPI_OPT_SACK; ccid_hc_rx_get_info(dp->dccps_hc_rx_ccid, sk, info); diff -puN /dev/null net/dccp/feat.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/feat.c 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,586 @@ +/* + * net/dccp/feat.c + * + * An implementation of the DCCP protocol + * Andrea Bittau + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include + +#include "dccp.h" +#include "ccid.h" +#include "feat.h" + +#define DCCP_FEAT_SP_NOAGREE (-123) + +int dccp_feat_change(struct dccp_minisock *dmsk, u8 type, u8 feature, + u8 *val, u8 len, gfp_t gfp) +{ + struct dccp_opt_pend *opt; + + dccp_pr_debug("feat change type=%d feat=%d\n", type, feature); + + /* XXX sanity check feat change request */ + + /* check if that feature is already being negotiated */ + list_for_each_entry(opt, &dmsk->dccpms_pending, dccpop_node) { + /* ok we found a negotiation for this option already */ + if (opt->dccpop_feat == feature && opt->dccpop_type == type) { + dccp_pr_debug("Replacing old\n"); + /* replace */ + BUG_ON(opt->dccpop_val == NULL); + kfree(opt->dccpop_val); + opt->dccpop_val = val; + opt->dccpop_len = len; + opt->dccpop_conf = 0; + return 0; + } + } + + /* negotiation for a new feature */ + opt = kmalloc(sizeof(*opt), gfp); + if (opt == NULL) + return -ENOMEM; + + opt->dccpop_type = type; + opt->dccpop_feat = feature; + opt->dccpop_len = len; + opt->dccpop_val = val; + opt->dccpop_conf = 0; + opt->dccpop_sc = NULL; + + BUG_ON(opt->dccpop_val == NULL); + + list_add_tail(&opt->dccpop_node, &dmsk->dccpms_pending); + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_feat_change); + +static int dccp_feat_update_ccid(struct sock *sk, u8 type, u8 new_ccid_nr) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_minisock *dmsk = dccp_msk(sk); + /* figure out if we are changing our CCID or the peer's */ + const int rx = type == DCCPO_CHANGE_R; + const u8 ccid_nr = rx ? dmsk->dccpms_rx_ccid : dmsk->dccpms_tx_ccid; + struct ccid *new_ccid; + + /* Check if nothing is being changed. */ + if (ccid_nr == new_ccid_nr) + return 0; + + new_ccid = ccid_new(new_ccid_nr, sk, rx, GFP_ATOMIC); + if (new_ccid == NULL) + return -ENOMEM; + + if (rx) { + ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); + dp->dccps_hc_rx_ccid = new_ccid; + dmsk->dccpms_rx_ccid = new_ccid_nr; + } else { + ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); + dp->dccps_hc_tx_ccid = new_ccid; + dmsk->dccpms_tx_ccid = new_ccid_nr; + } + + return 0; +} + +/* XXX taking only u8 vals */ +static int dccp_feat_update(struct sock *sk, u8 type, u8 feat, u8 val) +{ + dccp_pr_debug("changing [%d] feat %d to %d\n", type, feat, val); + + switch (feat) { + case DCCPF_CCID: + return dccp_feat_update_ccid(sk, type, val); + default: + dccp_pr_debug("IMPLEMENT changing [%d] feat %d to %d\n", + type, feat, val); + break; + } + return 0; +} + +static int dccp_feat_reconcile(struct sock *sk, struct dccp_opt_pend *opt, + u8 *rpref, u8 rlen) +{ + struct dccp_sock *dp = dccp_sk(sk); + u8 *spref, slen, *res = NULL; + int i, j, rc, agree = 1; + + BUG_ON(rpref == NULL); + + /* check if we are the black sheep */ + if (dp->dccps_role == DCCP_ROLE_CLIENT) { + spref = rpref; + slen = rlen; + rpref = opt->dccpop_val; + rlen = opt->dccpop_len; + } else { + spref = opt->dccpop_val; + slen = opt->dccpop_len; + } + /* + * Now we have server preference list in spref and client preference in + * rpref + */ + BUG_ON(spref == NULL); + BUG_ON(rpref == NULL); + + /* FIXME sanity check vals */ + + /* Are values in any order? XXX Lame "algorithm" here */ + /* XXX assume values are 1 byte */ + for (i = 0; i < slen; i++) { + for (j = 0; j < rlen; j++) { + if (spref[i] == rpref[j]) { + res = &spref[i]; + break; + } + } + if (res) + break; + } + + /* we didn't agree on anything */ + if (res == NULL) { + /* confirm previous value */ + switch (opt->dccpop_feat) { + case DCCPF_CCID: + /* XXX did i get this right? =P */ + if (opt->dccpop_type == DCCPO_CHANGE_L) + res = &dccp_msk(sk)->dccpms_tx_ccid; + else + res = &dccp_msk(sk)->dccpms_rx_ccid; + break; + + default: + WARN_ON(1); /* XXX implement res */ + return -EFAULT; + } + + dccp_pr_debug("Don't agree... reconfirming %d\n", *res); + agree = 0; /* this is used for mandatory options... */ + } + + /* need to put result and our preference list */ + /* XXX assume 1 byte vals */ + rlen = 1 + opt->dccpop_len; + rpref = kmalloc(rlen, GFP_ATOMIC); + if (rpref == NULL) + return -ENOMEM; + + *rpref = *res; + memcpy(&rpref[1], opt->dccpop_val, opt->dccpop_len); + + /* put it in the "confirm queue" */ + if (opt->dccpop_sc == NULL) { + opt->dccpop_sc = kmalloc(sizeof(*opt->dccpop_sc), GFP_ATOMIC); + if (opt->dccpop_sc == NULL) { + kfree(rpref); + return -ENOMEM; + } + } else { + /* recycle the confirm slot */ + BUG_ON(opt->dccpop_sc->dccpoc_val == NULL); + kfree(opt->dccpop_sc->dccpoc_val); + dccp_pr_debug("recycling confirm slot\n"); + } + memset(opt->dccpop_sc, 0, sizeof(*opt->dccpop_sc)); + + opt->dccpop_sc->dccpoc_val = rpref; + opt->dccpop_sc->dccpoc_len = rlen; + + /* update the option on our side [we are about to send the confirm] */ + rc = dccp_feat_update(sk, opt->dccpop_type, opt->dccpop_feat, *res); + if (rc) { + kfree(opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc); + opt->dccpop_sc = 0; + return rc; + } + + dccp_pr_debug("Will confirm %d\n", *rpref); + + /* say we want to change to X but we just got a confirm X, suppress our + * change + */ + if (!opt->dccpop_conf) { + if (*opt->dccpop_val == *res) + opt->dccpop_conf = 1; + dccp_pr_debug("won't ask for change of same feature\n"); + } + + return agree ? 0 : DCCP_FEAT_SP_NOAGREE; /* used for mandatory opts */ +} + +static int dccp_feat_sp(struct sock *sk, u8 type, u8 feature, u8 *val, u8 len) +{ + struct dccp_minisock *dmsk = dccp_msk(sk); + struct dccp_opt_pend *opt; + int rc = 1; + u8 t; + + /* + * We received a CHANGE. We gotta match it against our own preference + * list. If we got a CHANGE_R it means it's a change for us, so we need + * to compare our CHANGE_L list. + */ + if (type == DCCPO_CHANGE_L) + t = DCCPO_CHANGE_R; + else + t = DCCPO_CHANGE_L; + + /* find our preference list for this feature */ + list_for_each_entry(opt, &dmsk->dccpms_pending, dccpop_node) { + if (opt->dccpop_type != t || opt->dccpop_feat != feature) + continue; + + /* find the winner from the two preference lists */ + rc = dccp_feat_reconcile(sk, opt, val, len); + break; + } + + /* We didn't deal with the change. This can happen if we have no + * preference list for the feature. In fact, it just shouldn't + * happen---if we understand a feature, we should have a preference list + * with at least the default value. + */ + BUG_ON(rc == 1); + + return rc; +} + +static int dccp_feat_nn(struct sock *sk, u8 type, u8 feature, u8 *val, u8 len) +{ + struct dccp_opt_pend *opt; + struct dccp_minisock *dmsk = dccp_msk(sk); + u8 *copy; + int rc; + + /* NN features must be change L */ + if (type == DCCPO_CHANGE_R) { + dccp_pr_debug("received CHANGE_R %d for NN feat %d\n", + type, feature); + return -EFAULT; + } + + /* XXX sanity check opt val */ + + /* copy option so we can confirm it */ + opt = kzalloc(sizeof(*opt), GFP_ATOMIC); + if (opt == NULL) + return -ENOMEM; + + copy = kmalloc(len, GFP_ATOMIC); + if (copy == NULL) { + kfree(opt); + return -ENOMEM; + } + memcpy(copy, val, len); + + opt->dccpop_type = DCCPO_CONFIRM_R; /* NN can only confirm R */ + opt->dccpop_feat = feature; + opt->dccpop_val = copy; + opt->dccpop_len = len; + + /* change feature */ + rc = dccp_feat_update(sk, type, feature, *val); + if (rc) { + kfree(opt->dccpop_val); + kfree(opt); + return rc; + } + + dccp_pr_debug("Confirming NN feature %d (val=%d)\n", feature, *copy); + list_add_tail(&opt->dccpop_node, &dmsk->dccpms_conf); + + return 0; +} + +static void dccp_feat_empty_confirm(struct dccp_minisock *dmsk, + u8 type, u8 feature) +{ + /* XXX check if other confirms for that are queued and recycle slot */ + struct dccp_opt_pend *opt = kzalloc(sizeof(*opt), GFP_ATOMIC); + + if (opt == NULL) { + /* XXX what do we do? Ignoring should be fine. It's a change + * after all =P + */ + return; + } + + opt->dccpop_type = type == DCCPO_CHANGE_L ? DCCPO_CONFIRM_R : + DCCPO_CONFIRM_L; + opt->dccpop_feat = feature; + opt->dccpop_val = 0; + opt->dccpop_len = 0; + + /* change feature */ + dccp_pr_debug("Empty confirm feature %d type %d\n", feature, type); + list_add_tail(&opt->dccpop_node, &dmsk->dccpms_conf); +} + +static void dccp_feat_flush_confirm(struct sock *sk) +{ + struct dccp_minisock *dmsk = dccp_msk(sk); + /* Check if there is anything to confirm in the first place */ + int yes = !list_empty(&dmsk->dccpms_conf); + + if (!yes) { + struct dccp_opt_pend *opt; + + list_for_each_entry(opt, &dmsk->dccpms_pending, dccpop_node) { + if (opt->dccpop_conf) { + yes = 1; + break; + } + } + } + + if (!yes) + return; + + /* OK there is something to confirm... */ + /* XXX check if packet is in flight? Send delayed ack?? */ + if (sk->sk_state == DCCP_OPEN) + dccp_send_ack(sk); +} + +int dccp_feat_change_recv(struct sock *sk, u8 type, u8 feature, u8 *val, u8 len) +{ + int rc; + + dccp_pr_debug("got feat change type=%d feat=%d\n", type, feature); + + /* figure out if it's SP or NN feature */ + switch (feature) { + /* deal with SP features */ + case DCCPF_CCID: + rc = dccp_feat_sp(sk, type, feature, val, len); + break; + + /* deal with NN features */ + case DCCPF_ACK_RATIO: + rc = dccp_feat_nn(sk, type, feature, val, len); + break; + + /* XXX implement other features */ + default: + rc = -EFAULT; + break; + } + + /* check if there were problems changing features */ + if (rc) { + /* If we don't agree on SP, we sent a confirm for old value. + * However we propagate rc to caller in case option was + * mandatory + */ + if (rc != DCCP_FEAT_SP_NOAGREE) + dccp_feat_empty_confirm(dccp_msk(sk), type, feature); + } + + /* generate the confirm [if required] */ + dccp_feat_flush_confirm(sk); + + return rc; +} + +EXPORT_SYMBOL_GPL(dccp_feat_change_recv); + +int dccp_feat_confirm_recv(struct sock *sk, u8 type, u8 feature, + u8 *val, u8 len) +{ + u8 t; + struct dccp_opt_pend *opt; + struct dccp_minisock *dmsk = dccp_msk(sk); + int rc = 1; + int all_confirmed = 1; + + dccp_pr_debug("got feat confirm type=%d feat=%d\n", type, feature); + + /* XXX sanity check type & feat */ + + /* locate our change request */ + t = type == DCCPO_CONFIRM_L ? DCCPO_CHANGE_R : DCCPO_CHANGE_L; + + list_for_each_entry(opt, &dmsk->dccpms_pending, dccpop_node) { + if (!opt->dccpop_conf && opt->dccpop_type == t && + opt->dccpop_feat == feature) { + /* we found it */ + /* XXX do sanity check */ + + opt->dccpop_conf = 1; + + /* We got a confirmation---change the option */ + dccp_feat_update(sk, opt->dccpop_type, + opt->dccpop_feat, *val); + + dccp_pr_debug("feat %d type %d confirmed %d\n", + feature, type, *val); + rc = 0; + break; + } + + if (!opt->dccpop_conf) + all_confirmed = 0; + } + + /* fix re-transmit timer */ + /* XXX gotta make sure that no option negotiation occurs during + * connection shutdown. Consider that the CLOSEREQ is sent and timer is + * on. if all options are confirmed it might kill timer which should + * remain alive until close is received. + */ + if (all_confirmed) { + dccp_pr_debug("clear feat negotiation timer %p\n", sk); + inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS); + } + + if (rc) + dccp_pr_debug("feat %d type %d never requested\n", + feature, type); + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_feat_confirm_recv); + +void dccp_feat_clean(struct dccp_minisock *dmsk) +{ + struct dccp_opt_pend *opt, *next; + + list_for_each_entry_safe(opt, next, &dmsk->dccpms_pending, + dccpop_node) { + BUG_ON(opt->dccpop_val == NULL); + kfree(opt->dccpop_val); + + if (opt->dccpop_sc != NULL) { + BUG_ON(opt->dccpop_sc->dccpoc_val == NULL); + kfree(opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc); + } + + kfree(opt); + } + INIT_LIST_HEAD(&dmsk->dccpms_pending); + + list_for_each_entry_safe(opt, next, &dmsk->dccpms_conf, dccpop_node) { + BUG_ON(opt == NULL); + if (opt->dccpop_val != NULL) + kfree(opt->dccpop_val); + kfree(opt); + } + INIT_LIST_HEAD(&dmsk->dccpms_conf); +} + +EXPORT_SYMBOL_GPL(dccp_feat_clean); + +/* this is to be called only when a listening sock creates its child. It is + * assumed by the function---the confirm is not duplicated, but rather it is + * "passed on". + */ +int dccp_feat_clone(struct sock *oldsk, struct sock *newsk) +{ + struct dccp_minisock *olddmsk = dccp_msk(oldsk); + struct dccp_minisock *newdmsk = dccp_msk(newsk); + struct dccp_opt_pend *opt; + int rc = 0; + + INIT_LIST_HEAD(&newdmsk->dccpms_pending); + INIT_LIST_HEAD(&newdmsk->dccpms_conf); + + list_for_each_entry(opt, &olddmsk->dccpms_pending, dccpop_node) { + struct dccp_opt_pend *newopt; + /* copy the value of the option */ + u8 *val = kmalloc(opt->dccpop_len, GFP_ATOMIC); + + if (val == NULL) + goto out_clean; + memcpy(val, opt->dccpop_val, opt->dccpop_len); + + newopt = kmalloc(sizeof(*newopt), GFP_ATOMIC); + if (newopt == NULL) { + kfree(val); + goto out_clean; + } + + /* insert the option */ + memcpy(newopt, opt, sizeof(*newopt)); + newopt->dccpop_val = val; + list_add_tail(&newopt->dccpop_node, &newdmsk->dccpms_pending); + + /* XXX what happens with backlogs and multiple connections at + * once... + */ + /* the master socket no longer needs to worry about confirms */ + opt->dccpop_sc = 0; /* it's not a memleak---new socket has it */ + + /* reset state for a new socket */ + opt->dccpop_conf = 0; + } + + /* XXX not doing anything about the conf queue */ + +out: + return rc; + +out_clean: + dccp_feat_clean(newdmsk); + rc = -ENOMEM; + goto out; +} + +EXPORT_SYMBOL_GPL(dccp_feat_clone); + +static int __dccp_feat_init(struct dccp_minisock *dmsk, u8 type, u8 feat, + u8 *val, u8 len) +{ + int rc = -ENOMEM; + u8 *copy = kmalloc(len, GFP_KERNEL); + + if (copy != NULL) { + memcpy(copy, val, len); + rc = dccp_feat_change(dmsk, type, feat, copy, len, GFP_KERNEL); + if (rc) + kfree(copy); + } + return rc; +} + +int dccp_feat_init(struct dccp_minisock *dmsk) +{ + int rc; + + INIT_LIST_HEAD(&dmsk->dccpms_pending); + INIT_LIST_HEAD(&dmsk->dccpms_conf); + + /* CCID L */ + rc = __dccp_feat_init(dmsk, DCCPO_CHANGE_L, DCCPF_CCID, + &dmsk->dccpms_tx_ccid, 1); + if (rc) + goto out; + + /* CCID R */ + rc = __dccp_feat_init(dmsk, DCCPO_CHANGE_R, DCCPF_CCID, + &dmsk->dccpms_rx_ccid, 1); + if (rc) + goto out; + + /* Ack ratio */ + rc = __dccp_feat_init(dmsk, DCCPO_CHANGE_L, DCCPF_ACK_RATIO, + &dmsk->dccpms_ack_ratio, 1); +out: + return rc; +} + +EXPORT_SYMBOL_GPL(dccp_feat_init); diff -puN /dev/null net/dccp/feat.h --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/feat.h 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,29 @@ +#ifndef _DCCP_FEAT_H +#define _DCCP_FEAT_H +/* + * net/dccp/feat.h + * + * An implementation of the DCCP protocol + * Copyright (c) 2005 Andrea Bittau + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include + +struct sock; +struct dccp_minisock; + +extern int dccp_feat_change(struct dccp_minisock *dmsk, u8 type, u8 feature, + u8 *val, u8 len, gfp_t gfp); +extern int dccp_feat_change_recv(struct sock *sk, u8 type, u8 feature, + u8 *val, u8 len); +extern int dccp_feat_confirm_recv(struct sock *sk, u8 type, u8 feature, + u8 *val, u8 len); +extern void dccp_feat_clean(struct dccp_minisock *dmsk); +extern int dccp_feat_clone(struct sock *oldsk, struct sock *newsk); +extern int dccp_feat_init(struct dccp_minisock *dmsk); + +#endif /* _DCCP_FEAT_H */ diff -puN net/dccp/input.c~git-net net/dccp/input.c --- devel/net/dccp/input.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/input.c 2006-03-17 23:03:48.000000000 -0800 @@ -32,7 +32,7 @@ static void dccp_fin(struct sock *sk, st static void dccp_rcv_close(struct sock *sk, struct sk_buff *skb) { - dccp_v4_send_reset(sk, DCCP_RESET_CODE_CLOSED); + dccp_send_reset(sk, DCCP_RESET_CODE_CLOSED); dccp_fin(sk, skb); dccp_set_state(sk, DCCP_CLOSED); sk_wake_async(sk, 1, POLL_HUP); @@ -56,11 +56,11 @@ static void dccp_rcv_closereq(struct soc dccp_send_close(sk, 0); } -static inline void dccp_event_ack_recv(struct sock *sk, struct sk_buff *skb) +static void dccp_event_ack_recv(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); - if (dp->dccps_options.dccpo_send_ack_vector) + if (dccp_msk(sk)->dccpms_send_ack_vector) dccp_ackvec_check_rcv_ackno(dp->dccps_hc_rx_ackvec, sk, DCCP_SKB_CB(skb)->dccpd_ack_seq); } @@ -151,9 +151,8 @@ static int dccp_check_seqno(struct sock return 0; } -static inline int __dccp_rcv_established(struct sock *sk, struct sk_buff *skb, - const struct dccp_hdr *dh, - const unsigned len) +static int __dccp_rcv_established(struct sock *sk, struct sk_buff *skb, + const struct dccp_hdr *dh, const unsigned len) { struct dccp_sock *dp = dccp_sk(sk); @@ -247,7 +246,7 @@ int dccp_rcv_established(struct sock *sk if (DCCP_SKB_CB(skb)->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ) dccp_event_ack_recv(sk, skb); - if (dp->dccps_options.dccpo_send_ack_vector && + if (dccp_msk(sk)->dccpms_send_ack_vector && dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk, DCCP_SKB_CB(skb)->dccpd_seq, DCCP_ACKVEC_STATE_RECEIVED)) @@ -300,7 +299,10 @@ static int dccp_rcv_request_sent_state_p goto out_invalid_packet; } - if (dp->dccps_options.dccpo_send_ack_vector && + if (dccp_parse_options(sk, skb)) + goto out_invalid_packet; + + if (dccp_msk(sk)->dccpms_send_ack_vector && dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk, DCCP_SKB_CB(skb)->dccpd_seq, DCCP_ACKVEC_STATE_RECEIVED)) @@ -321,14 +323,6 @@ static int dccp_rcv_request_sent_state_p dccp_set_seqno(&dp->dccps_swl, max48(dp->dccps_swl, dp->dccps_isr)); - if (ccid_hc_rx_init(dp->dccps_hc_rx_ccid, sk) != 0 || - ccid_hc_tx_init(dp->dccps_hc_tx_ccid, sk) != 0) { - ccid_hc_rx_exit(dp->dccps_hc_rx_ccid, sk); - ccid_hc_tx_exit(dp->dccps_hc_tx_ccid, sk); - /* FIXME: send appropriate RESET code */ - goto out_invalid_packet; - } - dccp_sync_mss(sk, icsk->icsk_pmtu_cookie); /* @@ -492,7 +486,7 @@ int dccp_rcv_state_process(struct sock * if (dcb->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ) dccp_event_ack_recv(sk, skb); - if (dp->dccps_options.dccpo_send_ack_vector && + if (dccp_msk(sk)->dccpms_send_ack_vector && dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk, DCCP_SKB_CB(skb)->dccpd_seq, DCCP_ACKVEC_STATE_RECEIVED)) diff -puN net/dccp/ipv4.c~git-net net/dccp/ipv4.c --- devel/net/dccp/ipv4.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ipv4.c 2006-03-17 23:03:48.000000000 -0800 @@ -18,8 +18,10 @@ #include #include +#include #include #include +#include #include #include #include @@ -28,14 +30,14 @@ #include "ackvec.h" #include "ccid.h" #include "dccp.h" +#include "feat.h" -struct inet_hashinfo __cacheline_aligned dccp_hashinfo = { - .lhash_lock = RW_LOCK_UNLOCKED, - .lhash_users = ATOMIC_INIT(0), - .lhash_wait = __WAIT_QUEUE_HEAD_INITIALIZER(dccp_hashinfo.lhash_wait), -}; - -EXPORT_SYMBOL_GPL(dccp_hashinfo); +/* + * This is the global socket data structure used for responding to + * the Out-of-the-blue (OOTB) packets. A control sock will be created + * for this socket at the initialization time. + */ +static struct socket *dccp_v4_ctl_socket; static int dccp_v4_get_port(struct sock *sk, const unsigned short snum) { @@ -43,18 +45,6 @@ static int dccp_v4_get_port(struct sock inet_csk_bind_conflict); } -static void dccp_v4_hash(struct sock *sk) -{ - inet_hash(&dccp_hashinfo, sk); -} - -void dccp_unhash(struct sock *sk) -{ - inet_unhash(&dccp_hashinfo, sk); -} - -EXPORT_SYMBOL_GPL(dccp_unhash); - int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) { struct inet_sock *inet = inet_sk(sk); @@ -207,11 +197,12 @@ static inline void dccp_do_pmtu_discover } /* else let the usual retransmit timer handle it */ } -static void dccp_v4_ctl_send_ack(struct sk_buff *rxskb) +static void dccp_v4_reqsk_send_ack(struct sk_buff *rxskb, + struct request_sock *req) { int err; struct dccp_hdr *rxdh = dccp_hdr(rxskb), *dh; - const int dccp_hdr_ack_len = sizeof(struct dccp_hdr) + + const u32 dccp_hdr_ack_len = sizeof(struct dccp_hdr) + sizeof(struct dccp_hdr_ext) + sizeof(struct dccp_hdr_ack_bits); struct sk_buff *skb; @@ -219,12 +210,12 @@ static void dccp_v4_ctl_send_ack(struct if (((struct rtable *)rxskb->dst)->rt_type != RTN_LOCAL) return; - skb = alloc_skb(MAX_DCCP_HEADER + 15, GFP_ATOMIC); + skb = alloc_skb(dccp_v4_ctl_socket->sk->sk_prot->max_header, GFP_ATOMIC); if (skb == NULL) return; /* Reserve space for headers. */ - skb_reserve(skb, MAX_DCCP_HEADER); + skb_reserve(skb, dccp_v4_ctl_socket->sk->sk_prot->max_header); skb->dst = dst_clone(rxskb->dst); @@ -243,11 +234,11 @@ static void dccp_v4_ctl_send_ack(struct dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), DCCP_SKB_CB(rxskb)->dccpd_seq); - bh_lock_sock(dccp_ctl_socket->sk); - err = ip_build_and_send_pkt(skb, dccp_ctl_socket->sk, + bh_lock_sock(dccp_v4_ctl_socket->sk); + err = ip_build_and_send_pkt(skb, dccp_v4_ctl_socket->sk, rxskb->nh.iph->daddr, rxskb->nh.iph->saddr, NULL); - bh_unlock_sock(dccp_ctl_socket->sk); + bh_unlock_sock(dccp_v4_ctl_socket->sk); if (err == NET_XMIT_CN || err == 0) { DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); @@ -255,12 +246,6 @@ static void dccp_v4_ctl_send_ack(struct } } -static void dccp_v4_reqsk_send_ack(struct sk_buff *skb, - struct request_sock *req) -{ - dccp_v4_ctl_send_ack(skb); -} - static int dccp_v4_send_response(struct sock *sk, struct request_sock *req, struct dst_entry *dst) { @@ -275,7 +260,10 @@ static int dccp_v4_send_response(struct skb = dccp_make_response(sk, dst, req); if (skb != NULL) { const struct inet_request_sock *ireq = inet_rsk(req); + struct dccp_hdr *dh = dccp_hdr(skb); + dh->dccph_checksum = dccp_v4_checksum(skb, ireq->loc_addr, + ireq->rmt_addr); memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); err = ip_build_and_send_pkt(skb, sk, ireq->loc_addr, ireq->rmt_addr, @@ -301,7 +289,7 @@ out: * check at all. A more general error queue to queue errors for later handling * is probably better. */ -void dccp_v4_err(struct sk_buff *skb, u32 info) +static void dccp_v4_err(struct sk_buff *skb, u32 info) { const struct iphdr *iph = (struct iphdr *)skb->data; const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + @@ -456,32 +444,6 @@ void dccp_v4_send_check(struct sock *sk, EXPORT_SYMBOL_GPL(dccp_v4_send_check); -int dccp_v4_send_reset(struct sock *sk, enum dccp_reset_codes code) -{ - struct sk_buff *skb; - /* - * FIXME: what if rebuild_header fails? - * Should we be doing a rebuild_header here? - */ - int err = inet_sk_rebuild_header(sk); - - if (err != 0) - return err; - - skb = dccp_make_reset(sk, sk->sk_dst_cache, code); - if (skb != NULL) { - const struct inet_sock *inet = inet_sk(sk); - - memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); - err = ip_build_and_send_pkt(skb, sk, - inet->saddr, inet->daddr, NULL); - if (err == NET_XMIT_CN) - err = 0; - } - - return err; -} - static inline u64 dccp_v4_init_sequence(const struct sock *sk, const struct sk_buff *skb) { @@ -497,9 +459,9 @@ int dccp_v4_conn_request(struct sock *sk struct dccp_sock dp; struct request_sock *req; struct dccp_request_sock *dreq; - const __u32 saddr = skb->nh.iph->saddr; - const __u32 daddr = skb->nh.iph->daddr; - const __u32 service = dccp_hdr_request(skb)->dccph_req_service; + const __be32 saddr = skb->nh.iph->saddr; + const __be32 daddr = skb->nh.iph->daddr; + const __be32 service = dccp_hdr_request(skb)->dccph_req_service; struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); __u8 reset_code = DCCP_RESET_CODE_TOO_BUSY; @@ -535,7 +497,8 @@ int dccp_v4_conn_request(struct sock *sk if (req == NULL) goto drop; - /* FIXME: process options */ + if (dccp_parse_options(sk, skb)) + goto drop; dccp_openreq_init(req, &dp, skb); @@ -660,8 +623,8 @@ static struct sock *dccp_v4_hnd_req(stru return sk; } -int dccp_v4_checksum(const struct sk_buff *skb, const u32 saddr, - const u32 daddr) +int dccp_v4_checksum(const struct sk_buff *skb, const __be32 saddr, + const __be32 daddr) { const struct dccp_hdr* dh = dccp_hdr(skb); int checksum_len; @@ -680,8 +643,10 @@ int dccp_v4_checksum(const struct sk_buf IPPROTO_DCCP, tmp); } +EXPORT_SYMBOL_GPL(dccp_v4_checksum); + static int dccp_v4_verify_checksum(struct sk_buff *skb, - const u32 saddr, const u32 daddr) + const __be32 saddr, const __be32 daddr) { struct dccp_hdr *dh = dccp_hdr(skb); int checksum_len; @@ -741,16 +706,17 @@ static void dccp_v4_ctl_send_reset(struc if (((struct rtable *)rxskb->dst)->rt_type != RTN_LOCAL) return; - dst = dccp_v4_route_skb(dccp_ctl_socket->sk, rxskb); + dst = dccp_v4_route_skb(dccp_v4_ctl_socket->sk, rxskb); if (dst == NULL) return; - skb = alloc_skb(MAX_DCCP_HEADER + 15, GFP_ATOMIC); + skb = alloc_skb(dccp_v4_ctl_socket->sk->sk_prot->max_header, + GFP_ATOMIC); if (skb == NULL) goto out; /* Reserve space for headers. */ - skb_reserve(skb, MAX_DCCP_HEADER); + skb_reserve(skb, dccp_v4_ctl_socket->sk->sk_prot->max_header); skb->dst = dst_clone(dst); skb->h.raw = skb_push(skb, dccp_hdr_reset_len); @@ -778,11 +744,11 @@ static void dccp_v4_ctl_send_reset(struc dh->dccph_checksum = dccp_v4_checksum(skb, rxskb->nh.iph->saddr, rxskb->nh.iph->daddr); - bh_lock_sock(dccp_ctl_socket->sk); - err = ip_build_and_send_pkt(skb, dccp_ctl_socket->sk, + bh_lock_sock(dccp_v4_ctl_socket->sk); + err = ip_build_and_send_pkt(skb, dccp_v4_ctl_socket->sk, rxskb->nh.iph->daddr, rxskb->nh.iph->saddr, NULL); - bh_unlock_sock(dccp_ctl_socket->sk); + bh_unlock_sock(dccp_v4_ctl_socket->sk); if (err == NET_XMIT_CN || err == 0) { DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); @@ -912,7 +878,7 @@ int dccp_invalid_packet(struct sk_buff * EXPORT_SYMBOL_GPL(dccp_invalid_packet); /* this is called when real data arrives */ -int dccp_v4_rcv(struct sk_buff *skb) +static int dccp_v4_rcv(struct sk_buff *skb) { const struct dccp_hdr *dh; struct sock *sk; @@ -1019,111 +985,37 @@ do_time_wait: goto no_dccp_socket; } -struct inet_connection_sock_af_ops dccp_ipv4_af_ops = { - .queue_xmit = ip_queue_xmit, - .send_check = dccp_v4_send_check, - .rebuild_header = inet_sk_rebuild_header, - .conn_request = dccp_v4_conn_request, - .syn_recv_sock = dccp_v4_request_recv_sock, - .net_header_len = sizeof(struct iphdr), - .setsockopt = ip_setsockopt, - .getsockopt = ip_getsockopt, - .addr2sockaddr = inet_csk_addr2sockaddr, - .sockaddr_len = sizeof(struct sockaddr_in), +static struct inet_connection_sock_af_ops dccp_ipv4_af_ops = { + .queue_xmit = ip_queue_xmit, + .send_check = dccp_v4_send_check, + .rebuild_header = inet_sk_rebuild_header, + .conn_request = dccp_v4_conn_request, + .syn_recv_sock = dccp_v4_request_recv_sock, + .net_header_len = sizeof(struct iphdr), + .setsockopt = ip_setsockopt, + .getsockopt = ip_getsockopt, + .addr2sockaddr = inet_csk_addr2sockaddr, + .sockaddr_len = sizeof(struct sockaddr_in), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif }; -int dccp_v4_init_sock(struct sock *sk) +static int dccp_v4_init_sock(struct sock *sk) { - struct dccp_sock *dp = dccp_sk(sk); - struct inet_connection_sock *icsk = inet_csk(sk); - static int dccp_ctl_socket_init = 1; - - dccp_options_init(&dp->dccps_options); - do_gettimeofday(&dp->dccps_epoch); + static __u8 dccp_v4_ctl_sock_initialized; + int err = dccp_init_sock(sk, dccp_v4_ctl_sock_initialized); - if (dp->dccps_options.dccpo_send_ack_vector) { - dp->dccps_hc_rx_ackvec = dccp_ackvec_alloc(DCCP_MAX_ACKVEC_LEN, - GFP_KERNEL); - if (dp->dccps_hc_rx_ackvec == NULL) - return -ENOMEM; + if (err == 0) { + if (unlikely(!dccp_v4_ctl_sock_initialized)) + dccp_v4_ctl_sock_initialized = 1; + inet_csk(sk)->icsk_af_ops = &dccp_ipv4_af_ops; } - /* - * FIXME: We're hardcoding the CCID, and doing this at this point makes - * the listening (master) sock get CCID control blocks, which is not - * necessary, but for now, to not mess with the test userspace apps, - * lets leave it here, later the real solution is to do this in a - * setsockopt(CCIDs-I-want/accept). -acme - */ - if (likely(!dccp_ctl_socket_init)) { - dp->dccps_hc_rx_ccid = ccid_init(dp->dccps_options.dccpo_rx_ccid, - sk); - dp->dccps_hc_tx_ccid = ccid_init(dp->dccps_options.dccpo_tx_ccid, - sk); - if (dp->dccps_hc_rx_ccid == NULL || - dp->dccps_hc_tx_ccid == NULL) { - ccid_exit(dp->dccps_hc_rx_ccid, sk); - ccid_exit(dp->dccps_hc_tx_ccid, sk); - if (dp->dccps_options.dccpo_send_ack_vector) { - dccp_ackvec_free(dp->dccps_hc_rx_ackvec); - dp->dccps_hc_rx_ackvec = NULL; - } - dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; - return -ENOMEM; - } - } else - dccp_ctl_socket_init = 0; - - dccp_init_xmit_timers(sk); - icsk->icsk_rto = DCCP_TIMEOUT_INIT; - sk->sk_state = DCCP_CLOSED; - sk->sk_write_space = dccp_write_space; - icsk->icsk_af_ops = &dccp_ipv4_af_ops; - icsk->icsk_sync_mss = dccp_sync_mss; - dp->dccps_mss_cache = 536; - dp->dccps_role = DCCP_ROLE_UNDEFINED; - dp->dccps_service = DCCP_SERVICE_INVALID_VALUE; - - return 0; -} - -EXPORT_SYMBOL_GPL(dccp_v4_init_sock); - -int dccp_v4_destroy_sock(struct sock *sk) -{ - struct dccp_sock *dp = dccp_sk(sk); - - /* - * DCCP doesn't use sk_write_queue, just sk_send_head - * for retransmissions - */ - if (sk->sk_send_head != NULL) { - kfree_skb(sk->sk_send_head); - sk->sk_send_head = NULL; - } - - /* Clean up a referenced DCCP bind bucket. */ - if (inet_csk(sk)->icsk_bind_hash != NULL) - inet_put_port(&dccp_hashinfo, sk); - - kfree(dp->dccps_service_list); - dp->dccps_service_list = NULL; - - ccid_hc_rx_exit(dp->dccps_hc_rx_ccid, sk); - ccid_hc_tx_exit(dp->dccps_hc_tx_ccid, sk); - if (dp->dccps_options.dccpo_send_ack_vector) { - dccp_ackvec_free(dp->dccps_hc_rx_ackvec); - dp->dccps_hc_rx_ackvec = NULL; - } - ccid_exit(dp->dccps_hc_rx_ccid, sk); - ccid_exit(dp->dccps_hc_tx_ccid, sk); - dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; - - return 0; + return err; } -EXPORT_SYMBOL_GPL(dccp_v4_destroy_sock); - static void dccp_v4_reqsk_destructor(struct request_sock *req) { kfree(inet_rsk(req)->opt); @@ -1142,7 +1034,7 @@ static struct timewait_sock_ops dccp_tim .twsk_obj_size = sizeof(struct inet_timewait_sock), }; -struct proto dccp_prot = { +static struct proto dccp_v4_prot = { .name = "DCCP", .owner = THIS_MODULE, .close = dccp_close, @@ -1155,17 +1047,110 @@ struct proto dccp_prot = { .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v4_do_rcv, - .hash = dccp_v4_hash, + .hash = dccp_hash, .unhash = dccp_unhash, .accept = inet_csk_accept, .get_port = dccp_v4_get_port, .shutdown = dccp_shutdown, - .destroy = dccp_v4_destroy_sock, + .destroy = dccp_destroy_sock, .orphan_count = &dccp_orphan_count, .max_header = MAX_DCCP_HEADER, .obj_size = sizeof(struct dccp_sock), .rsk_prot = &dccp_request_sock_ops, .twsk_prot = &dccp_timewait_sock_ops, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif +}; + +static struct net_protocol dccp_v4_protocol = { + .handler = dccp_v4_rcv, + .err_handler = dccp_v4_err, + .no_policy = 1, +}; + +static const struct proto_ops inet_dccp_ops = { + .family = PF_INET, + .owner = THIS_MODULE, + .release = inet_release, + .bind = inet_bind, + .connect = inet_stream_connect, + .socketpair = sock_no_socketpair, + .accept = inet_accept, + .getname = inet_getname, + /* FIXME: work on tcp_poll to rename it to inet_csk_poll */ + .poll = dccp_poll, + .ioctl = inet_ioctl, + /* FIXME: work on inet_listen to rename it to sock_common_listen */ + .listen = inet_dccp_listen, + .shutdown = inet_shutdown, + .setsockopt = sock_common_setsockopt, + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, + .sendpage = sock_no_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif +}; + +static struct inet_protosw dccp_v4_protosw = { + .type = SOCK_DCCP, + .protocol = IPPROTO_DCCP, + .prot = &dccp_v4_prot, + .ops = &inet_dccp_ops, + .capability = -1, + .no_check = 0, + .flags = INET_PROTOSW_ICSK, }; -EXPORT_SYMBOL_GPL(dccp_prot); +static int __init dccp_v4_init(void) +{ + int err = proto_register(&dccp_v4_prot, 1); + + if (err != 0) + goto out; + + err = inet_add_protocol(&dccp_v4_protocol, IPPROTO_DCCP); + if (err != 0) + goto out_proto_unregister; + + inet_register_protosw(&dccp_v4_protosw); + + err = inet_csk_ctl_sock_create(&dccp_v4_ctl_socket, PF_INET, + SOCK_DCCP, IPPROTO_DCCP); + if (err) + goto out_unregister_protosw; +out: + return err; +out_unregister_protosw: + inet_unregister_protosw(&dccp_v4_protosw); + inet_del_protocol(&dccp_v4_protocol, IPPROTO_DCCP); +out_proto_unregister: + proto_unregister(&dccp_v4_prot); + goto out; +} + +static void __exit dccp_v4_exit(void) +{ + inet_unregister_protosw(&dccp_v4_protosw); + inet_del_protocol(&dccp_v4_protocol, IPPROTO_DCCP); + proto_unregister(&dccp_v4_prot); +} + +module_init(dccp_v4_init); +module_exit(dccp_v4_exit); + +/* + * __stringify doesn't likes enums, so use SOCK_DCCP (6) and IPPROTO_DCCP (33) + * values directly, Also cover the case where the protocol is not specified, + * i.e. net-pf-PF_INET-proto-0-type-SOCK_DCCP + */ +MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-33-type-6"); +MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-0-type-6"); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Arnaldo Carvalho de Melo "); +MODULE_DESCRIPTION("DCCP - Datagram Congestion Controlled Protocol"); diff -puN net/dccp/ipv6.c~git-net net/dccp/ipv6.c --- devel/net/dccp/ipv6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/ipv6.c 2006-03-17 23:03:48.000000000 -0800 @@ -1,6 +1,6 @@ /* * DCCP over IPv6 - * Linux INET6 implementation + * Linux INET6 implementation * * Based on net/dccp6/ipv6.c * @@ -33,6 +33,9 @@ #include "dccp.h" #include "ipv6.h" +/* Socket used for sending RSTs and ACKs */ +static struct socket *dccp_v6_ctl_socket; + static void dccp_v6_ctl_send_reset(struct sk_buff *skb); static void dccp_v6_reqsk_send_ack(struct sk_buff *skb, struct request_sock *req); @@ -53,7 +56,7 @@ static void dccp_v6_hash(struct sock *sk { if (sk->sk_state != DCCP_CLOSED) { if (inet_csk(sk)->icsk_af_ops == &dccp_ipv6_mapped) { - dccp_prot.hash(sk); + dccp_hash(sk); return; } local_bh_disable(); @@ -63,8 +66,8 @@ static void dccp_v6_hash(struct sock *sk } static inline u16 dccp_v6_check(struct dccp_hdr *dh, int len, - struct in6_addr *saddr, - struct in6_addr *daddr, + struct in6_addr *saddr, + struct in6_addr *daddr, unsigned long base) { return csum_ipv6_magic(saddr, daddr, len, IPPROTO_DCCP, base); @@ -79,17 +82,17 @@ static __u32 dccp_v6_init_sequence(struc skb->nh.ipv6h->saddr.s6_addr32, dh->dccph_dport, dh->dccph_sport); - else - return secure_dccp_sequence_number(skb->nh.iph->daddr, - skb->nh.iph->saddr, - dh->dccph_dport, - dh->dccph_sport); + + return secure_dccp_sequence_number(skb->nh.iph->daddr, + skb->nh.iph->saddr, + dh->dccph_dport, + dh->dccph_sport); } -static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr, +static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) { - struct sockaddr_in6 *usin = (struct sockaddr_in6 *) uaddr; + struct sockaddr_in6 *usin = (struct sockaddr_in6 *)uaddr; struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); @@ -102,10 +105,10 @@ static int dccp_v6_connect(struct sock * dp->dccps_role = DCCP_ROLE_CLIENT; - if (addr_len < SIN6_LEN_RFC2133) + if (addr_len < SIN6_LEN_RFC2133) return -EINVAL; - if (usin->sin6_family != AF_INET6) + if (usin->sin6_family != AF_INET6) return -EAFNOSUPPORT; memset(&fl, 0, sizeof(fl)); @@ -122,17 +125,15 @@ static int dccp_v6_connect(struct sock * fl6_sock_release(flowlabel); } } - /* - * connect() to INADDR_ANY means loopback (BSD'ism). - */ - - if (ipv6_addr_any(&usin->sin6_addr)) - usin->sin6_addr.s6_addr[15] = 0x1; + * connect() to INADDR_ANY means loopback (BSD'ism). + */ + if (ipv6_addr_any(&usin->sin6_addr)) + usin->sin6_addr.s6_addr[15] = 1; addr_type = ipv6_addr_type(&usin->sin6_addr); - if(addr_type & IPV6_ADDR_MULTICAST) + if (addr_type & IPV6_ADDR_MULTICAST) return -ENETUNREACH; if (addr_type & IPV6_ADDR_LINKLOCAL) { @@ -157,9 +158,8 @@ static int dccp_v6_connect(struct sock * np->flow_label = fl.fl6_flowlabel; /* - * DCCP over IPv4 + * DCCP over IPv4 */ - if (addr_type == IPV6_ADDR_MAPPED) { u32 exthdrlen = icsk->icsk_ext_hdr_len; struct sockaddr_in sin; @@ -177,7 +177,6 @@ static int dccp_v6_connect(struct sock * sk->sk_backlog_rcv = dccp_v4_do_rcv; err = dccp_v4_connect(sk, (struct sockaddr *)&sin, sizeof(sin)); - if (err) { icsk->icsk_ext_hdr_len = exthdrlen; icsk->icsk_af_ops = &dccp_ipv6_af_ops; @@ -203,8 +202,9 @@ static int dccp_v6_connect(struct sock * fl.fl_ip_dport = usin->sin6_port; fl.fl_ip_sport = inet->sport; - if (np->opt && np->opt->srcrt) { - struct rt0_hdr *rt0 = (struct rt0_hdr *)np->opt->srcrt; + if (np->opt != NULL && np->opt->srcrt != NULL) { + const struct rt0_hdr *rt0 = (struct rt0_hdr *)np->opt->srcrt; + ipv6_addr_copy(&final, &fl.fl6_dst); ipv6_addr_copy(&fl.fl6_dst, rt0->addr); final_p = &final; @@ -213,10 +213,12 @@ static int dccp_v6_connect(struct sock * err = ip6_dst_lookup(sk, &dst, &fl); if (err) goto failure; + if (final_p) ipv6_addr_copy(&fl.fl6_dst, final_p); - if ((err = xfrm_lookup(&dst, &fl, sk, 0)) < 0) + err = xfrm_lookup(&dst, &fl, sk, 0); + if (err < 0) goto failure; if (saddr == NULL) { @@ -231,7 +233,7 @@ static int dccp_v6_connect(struct sock * ip6_dst_store(sk, dst, NULL); icsk->icsk_ext_hdr_len = 0; - if (np->opt) + if (np->opt != NULL) icsk->icsk_ext_hdr_len = (np->opt->opt_flen + np->opt->opt_nflen); @@ -264,7 +266,7 @@ failure: } static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, - int type, int code, int offset, __u32 info) + int type, int code, int offset, __be32 info) { struct ipv6hdr *hdr = (struct ipv6hdr *)skb->data; const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset); @@ -305,7 +307,6 @@ static void dccp_v6_err(struct sk_buff * /* icmp should have updated the destination cache entry */ dst = __sk_dst_check(sk, np->dst_cookie); - if (dst == NULL) { struct inet_sock *inet = inet_sk(sk); struct flowi fl; @@ -322,16 +323,17 @@ static void dccp_v6_err(struct sk_buff * fl.fl_ip_dport = inet->dport; fl.fl_ip_sport = inet->sport; - if ((err = ip6_dst_lookup(sk, &dst, &fl))) { + err = ip6_dst_lookup(sk, &dst, &fl); + if (err) { sk->sk_err_soft = -err; goto out; } - if ((err = xfrm_lookup(&dst, &fl, sk, 0)) < 0) { + err = xfrm_lookup(&dst, &fl, sk, 0); + if (err < 0) { sk->sk_err_soft = -err; goto out; } - } else dst_hold(dst); @@ -355,11 +357,12 @@ static void dccp_v6_err(struct sk_buff * req = inet6_csk_search_req(sk, &prev, dh->dccph_dport, &hdr->daddr, &hdr->saddr, inet6_iif(skb)); - if (!req) + if (req == NULL) goto out; - /* ICMPs are not backlogged, hence we cannot get - * an established socket here. + /* + * ICMPs are not backlogged, hence we cannot get an established + * socket here. */ BUG_TRAP(req->sk == NULL); @@ -373,7 +376,7 @@ static void dccp_v6_err(struct sk_buff * case DCCP_REQUESTING: case DCCP_RESPOND: /* Cannot happen. - It can, it SYNs are crossed. --ANK */ + It can, it SYNs are crossed. --ANK */ if (!sock_owned_by_user(sk)) { DCCP_INC_STATS_BH(DCCP_MIB_ATTEMPTFAILS); sk->sk_err = err; @@ -382,7 +385,6 @@ static void dccp_v6_err(struct sk_buff * * (see connect in sock.c) */ sk->sk_error_report(sk); - dccp_done(sk); } else sk->sk_err_soft = err; @@ -428,14 +430,16 @@ static int dccp_v6_send_response(struct ireq6->pktopts) { struct sk_buff *pktopts = ireq6->pktopts; struct inet6_skb_parm *rxopt = IP6CB(pktopts); + if (rxopt->srcrt) opt = ipv6_invert_rthdr(sk, (struct ipv6_rt_hdr *)(pktopts->nh.raw + rxopt->srcrt)); } - if (opt && opt->srcrt) { - struct rt0_hdr *rt0 = (struct rt0_hdr *)opt->srcrt; + if (opt != NULL && opt->srcrt != NULL) { + const struct rt0_hdr *rt0 = (struct rt0_hdr *)opt->srcrt; + ipv6_addr_copy(&final, &fl.fl6_dst); ipv6_addr_copy(&fl.fl6_dst, rt0->addr); final_p = &final; @@ -444,15 +448,19 @@ static int dccp_v6_send_response(struct err = ip6_dst_lookup(sk, &dst, &fl); if (err) goto done; + if (final_p) ipv6_addr_copy(&fl.fl6_dst, final_p); - if ((err = xfrm_lookup(&dst, &fl, sk, 0)) < 0) + + err = xfrm_lookup(&dst, &fl, sk, 0); + if (err < 0) goto done; } skb = dccp_make_response(sk, dst, req); if (skb != NULL) { struct dccp_hdr *dh = dccp_hdr(skb); + dh->dccph_checksum = dccp_v6_check(dh, skb->len, &ireq6->loc_addr, &ireq6->rmt_addr, @@ -466,7 +474,7 @@ static int dccp_v6_send_response(struct } done: - if (opt && opt != np->opt) + if (opt != NULL && opt != np->opt) sock_kfree_s(sk, opt, opt->tot_len); dst_release(dst); return err; @@ -497,7 +505,7 @@ static void dccp_v6_send_check(struct so struct dccp_hdr *dh = dccp_hdr(skb); dh->dccph_checksum = csum_ipv6_magic(&np->saddr, &np->daddr, - len, IPPROTO_DCCP, + len, IPPROTO_DCCP, csum_partial((char *)dh, dh->dccph_doff << 2, skb->csum)); @@ -505,8 +513,8 @@ static void dccp_v6_send_check(struct so static void dccp_v6_ctl_send_reset(struct sk_buff *rxskb) { - struct dccp_hdr *rxdh = dccp_hdr(rxskb), *dh; - const int dccp_hdr_reset_len = sizeof(struct dccp_hdr) + + struct dccp_hdr *rxdh = dccp_hdr(rxskb), *dh; + const u32 dccp_hdr_reset_len = sizeof(struct dccp_hdr) + sizeof(struct dccp_hdr_ext) + sizeof(struct dccp_hdr_reset); struct sk_buff *skb; @@ -517,20 +525,14 @@ static void dccp_v6_ctl_send_reset(struc return; if (!ipv6_unicast_destination(rxskb)) - return; - - /* - * We need to grab some memory, and put together an RST, - * and then put it into the queue to be sent. - */ + return; - skb = alloc_skb(MAX_HEADER + sizeof(struct ipv6hdr) + - dccp_hdr_reset_len, GFP_ATOMIC); - if (skb == NULL) + skb = alloc_skb(dccp_v6_ctl_socket->sk->sk_prot->max_header, + GFP_ATOMIC); + if (skb == NULL) return; - skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr) + - dccp_hdr_reset_len); + skb_reserve(skb, dccp_v6_ctl_socket->sk->sk_prot->max_header); skb->h.raw = skb_push(skb, dccp_hdr_reset_len); dh = dccp_hdr(skb); @@ -568,7 +570,7 @@ static void dccp_v6_ctl_send_reset(struc /* sk = NULL, but it is safe for now. RST socket required. */ if (!ip6_dst_lookup(NULL, &skb->dst, &fl)) { if (xfrm_lookup(&skb->dst, &fl, NULL, 0) >= 0) { - ip6_xmit(NULL, skb, &fl, NULL, 0); + ip6_xmit(dccp_v6_ctl_socket->sk, skb, &fl, NULL, 0); DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); DCCP_INC_STATS_BH(DCCP_MIB_OUTRSTS); return; @@ -578,22 +580,22 @@ static void dccp_v6_ctl_send_reset(struc kfree_skb(skb); } -static void dccp_v6_ctl_send_ack(struct sk_buff *rxskb) +static void dccp_v6_reqsk_send_ack(struct sk_buff *rxskb, + struct request_sock *req) { struct flowi fl; struct dccp_hdr *rxdh = dccp_hdr(rxskb), *dh; - const int dccp_hdr_ack_len = sizeof(struct dccp_hdr) + + const u32 dccp_hdr_ack_len = sizeof(struct dccp_hdr) + sizeof(struct dccp_hdr_ext) + sizeof(struct dccp_hdr_ack_bits); struct sk_buff *skb; - skb = alloc_skb(MAX_HEADER + sizeof(struct ipv6hdr) + - dccp_hdr_ack_len, GFP_ATOMIC); + skb = alloc_skb(dccp_v6_ctl_socket->sk->sk_prot->max_header, + GFP_ATOMIC); if (skb == NULL) return; - skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr) + - dccp_hdr_ack_len); + skb_reserve(skb, dccp_v6_ctl_socket->sk->sk_prot->max_header); skb->h.raw = skb_push(skb, dccp_hdr_ack_len); dh = dccp_hdr(skb); @@ -605,7 +607,7 @@ static void dccp_v6_ctl_send_ack(struct dh->dccph_dport = rxdh->dccph_sport; dh->dccph_doff = dccp_hdr_ack_len / 4; dh->dccph_x = 1; - + dccp_hdr_set_seq(dh, DCCP_SKB_CB(rxskb)->dccpd_ack_seq); dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), DCCP_SKB_CB(rxskb)->dccpd_seq); @@ -623,7 +625,7 @@ static void dccp_v6_ctl_send_ack(struct if (!ip6_dst_lookup(NULL, &skb->dst, &fl)) { if (xfrm_lookup(&skb->dst, &fl, NULL, 0) >= 0) { - ip6_xmit(NULL, skb, &fl, NULL, 0); + ip6_xmit(dccp_v6_ctl_socket->sk, skb, &fl, NULL, 0); DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); return; } @@ -632,12 +634,6 @@ static void dccp_v6_ctl_send_ack(struct kfree_skb(skb); } -static void dccp_v6_reqsk_send_ack(struct sk_buff *skb, - struct request_sock *req) -{ - dccp_v6_ctl_send_ack(skb); -} - static struct sock *dccp_v6_hnd_req(struct sock *sk,struct sk_buff *skb) { const struct dccp_hdr *dh = dccp_hdr(skb); @@ -657,7 +653,6 @@ static struct sock *dccp_v6_hnd_req(stru &iph->saddr, dh->dccph_sport, &iph->daddr, ntohs(dh->dccph_dport), inet6_iif(skb)); - if (nsk != NULL) { if (nsk->sk_state != DCCP_TIME_WAIT) { bh_lock_sock(nsk); @@ -678,7 +673,7 @@ static int dccp_v6_conn_request(struct s struct dccp_request_sock *dreq; struct inet6_request_sock *ireq6; struct ipv6_pinfo *np = inet6_sk(sk); - const __u32 service = dccp_hdr_request(skb)->dccph_req_service; + const __be32 service = dccp_hdr_request(skb)->dccph_req_service; struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); __u8 reset_code = DCCP_RESET_CODE_TOO_BUSY; @@ -686,17 +681,17 @@ static int dccp_v6_conn_request(struct s return dccp_v4_conn_request(sk, skb); if (!ipv6_unicast_destination(skb)) - goto drop; + goto drop; if (dccp_bad_service_code(sk, service)) { reset_code = DCCP_RESET_CODE_BAD_SERVICE_CODE; goto drop; } /* - * There are no SYN attacks on IPv6, yet... + * There are no SYN attacks on IPv6, yet... */ if (inet_csk_reqsk_queue_is_full(sk)) - goto drop; + goto drop; if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) goto drop; @@ -730,7 +725,7 @@ static int dccp_v6_conn_request(struct s ipv6_addr_type(&ireq6->rmt_addr) & IPV6_ADDR_LINKLOCAL) ireq6->iif = inet6_iif(skb); - /* + /* * Step 3: Process LISTEN state * * Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookie @@ -774,9 +769,8 @@ static struct sock *dccp_v6_request_recv /* * v6 mapped */ - newsk = dccp_v4_request_recv_sock(sk, skb, req, dst); - if (newsk == NULL) + if (newsk == NULL) return NULL; newdp6 = (struct dccp6_sock *)newsk; @@ -822,9 +816,9 @@ static struct sock *dccp_v6_request_recv if (sk_acceptq_is_full(sk)) goto out_overflow; - if (np->rxopt.bits.osrcrt == 2 && - opt == NULL && ireq6->pktopts) { - struct inet6_skb_parm *rxopt = IP6CB(ireq6->pktopts); + if (np->rxopt.bits.osrcrt == 2 && opt == NULL && ireq6->pktopts) { + const struct inet6_skb_parm *rxopt = IP6CB(ireq6->pktopts); + if (rxopt->srcrt) opt = ipv6_invert_rthdr(sk, (struct ipv6_rt_hdr *)(ireq6->pktopts->nh.raw + @@ -838,8 +832,9 @@ static struct sock *dccp_v6_request_recv memset(&fl, 0, sizeof(fl)); fl.proto = IPPROTO_DCCP; ipv6_addr_copy(&fl.fl6_dst, &ireq6->rmt_addr); - if (opt && opt->srcrt) { - struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt; + if (opt != NULL && opt->srcrt != NULL) { + const struct rt0_hdr *rt0 = (struct rt0_hdr *)opt->srcrt; + ipv6_addr_copy(&final, &fl.fl6_dst); ipv6_addr_copy(&fl.fl6_dst, rt0->addr); final_p = &final; @@ -857,7 +852,7 @@ static struct sock *dccp_v6_request_recv if ((xfrm_lookup(&dst, &fl, sk, 0)) < 0) goto out; - } + } newsk = dccp_create_openreq_child(sk, req, skb); if (newsk == NULL) @@ -870,9 +865,8 @@ static struct sock *dccp_v6_request_recv */ ip6_dst_store(newsk, dst, NULL); - newsk->sk_route_caps = dst->dev->features & - ~(NETIF_F_IP_CSUM | NETIF_F_TSO); - + newsk->sk_route_caps = dst->dev->features & ~(NETIF_F_IP_CSUM | + NETIF_F_TSO); newdp6 = (struct dccp6_sock *)newsk; newinet = inet_sk(newsk); newinet->pinet6 = &newdp6->inet6; @@ -886,7 +880,7 @@ static struct sock *dccp_v6_request_recv ipv6_addr_copy(&newnp->rcv_saddr, &ireq6->loc_addr); newsk->sk_bound_dev_if = ireq6->iif; - /* Now IPv6 options... + /* Now IPv6 options... First: no IPv4 options. */ @@ -908,20 +902,20 @@ static struct sock *dccp_v6_request_recv newnp->mcast_oif = inet6_iif(skb); newnp->mcast_hops = skb->nh.ipv6h->hop_limit; - /* Clone native IPv6 options from listening socket (if any) - - Yes, keeping reference count would be much more clever, - but we make one more one thing there: reattach optmem - to newsk. + /* + * Clone native IPv6 options from listening socket (if any) + * + * Yes, keeping reference count would be much more clever, but we make + * one more one thing there: reattach optmem to newsk. */ - if (opt) { + if (opt != NULL) { newnp->opt = ipv6_dup_options(newsk, opt); if (opt != np->opt) sock_kfree_s(sk, opt, opt->tot_len); } inet_csk(newsk)->icsk_ext_hdr_len = 0; - if (newnp->opt) + if (newnp->opt != NULL) inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen + newnp->opt->opt_flen); @@ -938,7 +932,7 @@ out_overflow: NET_INC_STATS_BH(LINUX_MIB_LISTENOVERFLOWS); out: NET_INC_STATS_BH(LINUX_MIB_LISTENDROPS); - if (opt && opt != np->opt) + if (opt != NULL && opt != np->opt) sock_kfree_s(sk, opt, opt->tot_len); dst_release(dst); return NULL; @@ -972,8 +966,8 @@ static int dccp_v6_do_rcv(struct sock *s goto discard; /* - * socket locking is here for SMP purposes as backlog rcv - * is currently called with bh processing disabled. + * socket locking is here for SMP purposes as backlog rcv is currently + * called with bh processing disabled. */ /* Do Stevens' IPV6_PKTOPTIONS. @@ -998,20 +992,20 @@ static int dccp_v6_do_rcv(struct sock *s return 0; } - if (sk->sk_state == DCCP_LISTEN) { + if (sk->sk_state == DCCP_LISTEN) { struct sock *nsk = dccp_v6_hnd_req(sk, skb); - if (!nsk) - goto discard; + if (nsk == NULL) + goto discard; /* * Queue it on the new socket if the new socket is active, * otherwise we just shortcircuit this and continue with * the new socket.. */ - if(nsk != sk) { + if (nsk != sk) { if (dccp_child_process(sk, nsk, skb)) goto reset; - if (opt_skb) + if (opt_skb != NULL) __kfree_skb(opt_skb); return 0; } @@ -1024,7 +1018,7 @@ static int dccp_v6_do_rcv(struct sock *s reset: dccp_v6_ctl_send_reset(skb); discard: - if (opt_skb) + if (opt_skb != NULL) __kfree_skb(opt_skb); kfree_skb(skb); return 0; @@ -1057,7 +1051,7 @@ static int dccp_v6_rcv(struct sk_buff ** dh->dccph_sport, &skb->nh.ipv6h->daddr, ntohs(dh->dccph_dport), inet6_iif(skb)); - /* + /* * Step 2: * If no socket ... * Generate Reset(No Connection) unless P.type == Reset @@ -1066,15 +1060,14 @@ static int dccp_v6_rcv(struct sk_buff ** if (sk == NULL) goto no_dccp_socket; - /* + /* * Step 2: * ... or S.state == TIMEWAIT, * Generate Reset(No Connection) unless P.type == Reset * Drop packet and return */ - if (sk->sk_state == DCCP_TIME_WAIT) - goto do_time_wait; + goto do_time_wait; if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; @@ -1113,32 +1106,40 @@ do_time_wait: } static struct inet_connection_sock_af_ops dccp_ipv6_af_ops = { - .queue_xmit = inet6_csk_xmit, - .send_check = dccp_v6_send_check, - .rebuild_header = inet6_sk_rebuild_header, - .conn_request = dccp_v6_conn_request, - .syn_recv_sock = dccp_v6_request_recv_sock, - .net_header_len = sizeof(struct ipv6hdr), - .setsockopt = ipv6_setsockopt, - .getsockopt = ipv6_getsockopt, - .addr2sockaddr = inet6_csk_addr2sockaddr, - .sockaddr_len = sizeof(struct sockaddr_in6) + .queue_xmit = inet6_csk_xmit, + .send_check = dccp_v6_send_check, + .rebuild_header = inet6_sk_rebuild_header, + .conn_request = dccp_v6_conn_request, + .syn_recv_sock = dccp_v6_request_recv_sock, + .net_header_len = sizeof(struct ipv6hdr), + .setsockopt = ipv6_setsockopt, + .getsockopt = ipv6_getsockopt, + .addr2sockaddr = inet6_csk_addr2sockaddr, + .sockaddr_len = sizeof(struct sockaddr_in6), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif }; /* * DCCP over IPv4 via INET6 API */ static struct inet_connection_sock_af_ops dccp_ipv6_mapped = { - .queue_xmit = ip_queue_xmit, - .send_check = dccp_v4_send_check, - .rebuild_header = inet_sk_rebuild_header, - .conn_request = dccp_v6_conn_request, - .syn_recv_sock = dccp_v6_request_recv_sock, - .net_header_len = sizeof(struct iphdr), - .setsockopt = ipv6_setsockopt, - .getsockopt = ipv6_getsockopt, - .addr2sockaddr = inet6_csk_addr2sockaddr, - .sockaddr_len = sizeof(struct sockaddr_in6) + .queue_xmit = ip_queue_xmit, + .send_check = dccp_v4_send_check, + .rebuild_header = inet_sk_rebuild_header, + .conn_request = dccp_v6_conn_request, + .syn_recv_sock = dccp_v6_request_recv_sock, + .net_header_len = sizeof(struct iphdr), + .setsockopt = ipv6_setsockopt, + .getsockopt = ipv6_getsockopt, + .addr2sockaddr = inet6_csk_addr2sockaddr, + .sockaddr_len = sizeof(struct sockaddr_in6), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif }; /* NOTE: A lot of things set to zero explicitly by call to @@ -1146,71 +1147,83 @@ static struct inet_connection_sock_af_op */ static int dccp_v6_init_sock(struct sock *sk) { - int err = dccp_v4_init_sock(sk); + static __u8 dccp_v6_ctl_sock_initialized; + int err = dccp_init_sock(sk, dccp_v6_ctl_sock_initialized); - if (err == 0) + if (err == 0) { + if (unlikely(!dccp_v6_ctl_sock_initialized)) + dccp_v6_ctl_sock_initialized = 1; inet_csk(sk)->icsk_af_ops = &dccp_ipv6_af_ops; + } return err; } static int dccp_v6_destroy_sock(struct sock *sk) { - dccp_v4_destroy_sock(sk); + dccp_destroy_sock(sk); return inet6_destroy_sock(sk); } static struct proto dccp_v6_prot = { - .name = "DCCPv6", - .owner = THIS_MODULE, - .close = dccp_close, - .connect = dccp_v6_connect, - .disconnect = dccp_disconnect, - .ioctl = dccp_ioctl, - .init = dccp_v6_init_sock, - .setsockopt = dccp_setsockopt, - .getsockopt = dccp_getsockopt, - .sendmsg = dccp_sendmsg, - .recvmsg = dccp_recvmsg, - .backlog_rcv = dccp_v6_do_rcv, - .hash = dccp_v6_hash, - .unhash = dccp_unhash, - .accept = inet_csk_accept, - .get_port = dccp_v6_get_port, - .shutdown = dccp_shutdown, - .destroy = dccp_v6_destroy_sock, - .orphan_count = &dccp_orphan_count, - .max_header = MAX_DCCP_HEADER, - .obj_size = sizeof(struct dccp6_sock), - .rsk_prot = &dccp6_request_sock_ops, - .twsk_prot = &dccp6_timewait_sock_ops, + .name = "DCCPv6", + .owner = THIS_MODULE, + .close = dccp_close, + .connect = dccp_v6_connect, + .disconnect = dccp_disconnect, + .ioctl = dccp_ioctl, + .init = dccp_v6_init_sock, + .setsockopt = dccp_setsockopt, + .getsockopt = dccp_getsockopt, + .sendmsg = dccp_sendmsg, + .recvmsg = dccp_recvmsg, + .backlog_rcv = dccp_v6_do_rcv, + .hash = dccp_v6_hash, + .unhash = dccp_unhash, + .accept = inet_csk_accept, + .get_port = dccp_v6_get_port, + .shutdown = dccp_shutdown, + .destroy = dccp_v6_destroy_sock, + .orphan_count = &dccp_orphan_count, + .max_header = MAX_DCCP_HEADER, + .obj_size = sizeof(struct dccp6_sock), + .rsk_prot = &dccp6_request_sock_ops, + .twsk_prot = &dccp6_timewait_sock_ops, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif }; static struct inet6_protocol dccp_v6_protocol = { - .handler = dccp_v6_rcv, - .err_handler = dccp_v6_err, - .flags = INET6_PROTO_NOPOLICY | INET6_PROTO_FINAL, + .handler = dccp_v6_rcv, + .err_handler = dccp_v6_err, + .flags = INET6_PROTO_NOPOLICY | INET6_PROTO_FINAL, }; static struct proto_ops inet6_dccp_ops = { - .family = PF_INET6, - .owner = THIS_MODULE, - .release = inet6_release, - .bind = inet6_bind, - .connect = inet_stream_connect, - .socketpair = sock_no_socketpair, - .accept = inet_accept, - .getname = inet6_getname, - .poll = dccp_poll, - .ioctl = inet6_ioctl, - .listen = inet_dccp_listen, - .shutdown = inet_shutdown, - .setsockopt = sock_common_setsockopt, - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, + .family = PF_INET6, + .owner = THIS_MODULE, + .release = inet6_release, + .bind = inet6_bind, + .connect = inet_stream_connect, + .socketpair = sock_no_socketpair, + .accept = inet_accept, + .getname = inet6_getname, + .poll = dccp_poll, + .ioctl = inet6_ioctl, + .listen = inet_dccp_listen, + .shutdown = inet_shutdown, + .setsockopt = sock_common_setsockopt, + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, + .sendpage = sock_no_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; static struct inet_protosw dccp_v6_protosw = { @@ -1234,8 +1247,16 @@ static int __init dccp_v6_init(void) goto out_unregister_proto; inet6_register_protosw(&dccp_v6_protosw); + + err = inet_csk_ctl_sock_create(&dccp_v6_ctl_socket, PF_INET6, + SOCK_DCCP, IPPROTO_DCCP); + if (err != 0) + goto out_unregister_protosw; out: return err; +out_unregister_protosw: + inet6_del_protocol(&dccp_v6_protocol, IPPROTO_DCCP); + inet6_unregister_protosw(&dccp_v6_protosw); out_unregister_proto: proto_unregister(&dccp_v6_prot); goto out; diff -puN net/dccp/Kconfig~git-net net/dccp/Kconfig --- devel/net/dccp/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -24,6 +24,10 @@ config INET_DCCP_DIAG def_tristate y if (IP_DCCP = y && INET_DIAG = y) def_tristate m +config IP_DCCP_ACKVEC + depends on IP_DCCP + def_bool N + source "net/dccp/ccids/Kconfig" menu "DCCP Kernel Hacking" @@ -36,15 +40,6 @@ config IP_DCCP_DEBUG Just say N. -config IP_DCCP_UNLOAD_HACK - depends on IP_DCCP=m && IP_DCCP_CCID3=m - bool "DCCP control sock unload hack" - ---help--- - Enable this to be able to unload the dccp module when the it - has only one refcount held, the control sock one. Just execute - "rmmod dccp_ccid3 dccp" - - Just say N. endmenu endmenu diff -puN net/dccp/Makefile~git-net net/dccp/Makefile --- devel/net/dccp/Makefile~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/Makefile 2006-03-17 23:03:48.000000000 -0800 @@ -2,15 +2,18 @@ obj-$(CONFIG_IPV6) += dccp_ipv6.o dccp_ipv6-y := ipv6.o -obj-$(CONFIG_IP_DCCP) += dccp.o +obj-$(CONFIG_IP_DCCP) += dccp.o dccp_ipv4.o -dccp-y := ccid.o input.o ipv4.o minisocks.o options.o output.o proto.o \ - timer.o +dccp-y := ccid.o feat.o input.o minisocks.o options.o output.o proto.o timer.o + +dccp_ipv4-y := ipv4.o dccp-$(CONFIG_IP_DCCP_ACKVEC) += ackvec.o obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o +dccp-$(CONFIG_SYSCTL) += sysctl.o + dccp_diag-y := diag.o obj-y += ccids/ diff -puN net/dccp/minisocks.c~git-net net/dccp/minisocks.c --- devel/net/dccp/minisocks.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/minisocks.c 2006-03-17 23:03:48.000000000 -0800 @@ -22,6 +22,7 @@ #include "ackvec.h" #include "ccid.h" #include "dccp.h" +#include "feat.h" struct inet_timewait_death_row dccp_death_row = { .sysctl_max_tw_buckets = NR_FILE * 2, @@ -106,6 +107,7 @@ struct sock *dccp_create_openreq_child(s const struct dccp_request_sock *dreq = dccp_rsk(req); struct inet_connection_sock *newicsk = inet_csk(sk); struct dccp_sock *newdp = dccp_sk(newsk); + struct dccp_minisock *newdmsk = dccp_msk(newsk); newdp->dccps_role = DCCP_ROLE_SERVER; newdp->dccps_hc_rx_ackvec = NULL; @@ -114,27 +116,27 @@ struct sock *dccp_create_openreq_child(s newicsk->icsk_rto = DCCP_TIMEOUT_INIT; do_gettimeofday(&newdp->dccps_epoch); - if (newdp->dccps_options.dccpo_send_ack_vector) { + if (dccp_feat_clone(sk, newsk)) + goto out_free; + + if (newdmsk->dccpms_send_ack_vector) { newdp->dccps_hc_rx_ackvec = - dccp_ackvec_alloc(DCCP_MAX_ACKVEC_LEN, - GFP_ATOMIC); - /* - * XXX: We're using the same CCIDs set on the parent, - * i.e. sk_clone copied the master sock and left the - * CCID pointers for this child, that is why we do the - * __ccid_get calls. - */ + dccp_ackvec_alloc(GFP_ATOMIC); if (unlikely(newdp->dccps_hc_rx_ackvec == NULL)) goto out_free; } - if (unlikely(ccid_hc_rx_init(newdp->dccps_hc_rx_ccid, - newsk) != 0 || - ccid_hc_tx_init(newdp->dccps_hc_tx_ccid, - newsk) != 0)) { + newdp->dccps_hc_rx_ccid = + ccid_hc_rx_new(newdmsk->dccpms_rx_ccid, + newsk, GFP_ATOMIC); + newdp->dccps_hc_tx_ccid = + ccid_hc_tx_new(newdmsk->dccpms_tx_ccid, + newsk, GFP_ATOMIC); + if (unlikely(newdp->dccps_hc_rx_ccid == NULL || + newdp->dccps_hc_tx_ccid == NULL)) { dccp_ackvec_free(newdp->dccps_hc_rx_ackvec); - ccid_hc_rx_exit(newdp->dccps_hc_rx_ccid, newsk); - ccid_hc_tx_exit(newdp->dccps_hc_tx_ccid, newsk); + ccid_hc_rx_delete(newdp->dccps_hc_rx_ccid, newsk); + ccid_hc_tx_delete(newdp->dccps_hc_tx_ccid, newsk); out_free: /* It is still raw copy of parent, so invalidate * destructor and make plain sk_free() */ @@ -143,9 +145,6 @@ out_free: return NULL; } - __ccid_get(newdp->dccps_hc_rx_ccid); - __ccid_get(newdp->dccps_hc_tx_ccid); - /* * Step 3: Process LISTEN state * @@ -155,7 +154,7 @@ out_free: */ /* See dccp_v4_conn_request */ - newdp->dccps_options.dccpo_sequence_window = req->rcv_wnd; + newdmsk->dccpms_sequence_window = req->rcv_wnd; newdp->dccps_gar = newdp->dccps_isr = dreq->dreq_isr; dccp_update_gsr(newsk, dreq->dreq_isr); diff -puN net/dccp/options.c~git-net net/dccp/options.c --- devel/net/dccp/options.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/options.c 2006-03-17 23:03:48.000000000 -0800 @@ -21,19 +21,23 @@ #include "ackvec.h" #include "ccid.h" #include "dccp.h" +#include "feat.h" -/* stores the default values for new connection. may be changed with sysctl */ -static const struct dccp_options dccpo_default_values = { - .dccpo_sequence_window = DCCPF_INITIAL_SEQUENCE_WINDOW, - .dccpo_rx_ccid = DCCPF_INITIAL_CCID, - .dccpo_tx_ccid = DCCPF_INITIAL_CCID, - .dccpo_send_ack_vector = DCCPF_INITIAL_SEND_ACK_VECTOR, - .dccpo_send_ndp_count = DCCPF_INITIAL_SEND_NDP_COUNT, -}; - -void dccp_options_init(struct dccp_options *dccpo) -{ - memcpy(dccpo, &dccpo_default_values, sizeof(*dccpo)); +int dccp_feat_default_sequence_window = DCCPF_INITIAL_SEQUENCE_WINDOW; +int dccp_feat_default_rx_ccid = DCCPF_INITIAL_CCID; +int dccp_feat_default_tx_ccid = DCCPF_INITIAL_CCID; +int dccp_feat_default_ack_ratio = DCCPF_INITIAL_ACK_RATIO; +int dccp_feat_default_send_ack_vector = DCCPF_INITIAL_SEND_ACK_VECTOR; +int dccp_feat_default_send_ndp_count = DCCPF_INITIAL_SEND_NDP_COUNT; + +void dccp_minisock_init(struct dccp_minisock *dmsk) +{ + dmsk->dccpms_sequence_window = dccp_feat_default_sequence_window; + dmsk->dccpms_rx_ccid = dccp_feat_default_rx_ccid; + dmsk->dccpms_tx_ccid = dccp_feat_default_tx_ccid; + dmsk->dccpms_ack_ratio = dccp_feat_default_ack_ratio; + dmsk->dccpms_send_ack_vector = dccp_feat_default_send_ack_vector; + dmsk->dccpms_send_ndp_count = dccp_feat_default_send_ndp_count; } static u32 dccp_decode_value_var(const unsigned char *bf, const u8 len) @@ -69,9 +73,12 @@ int dccp_parse_options(struct sock *sk, unsigned char opt, len; unsigned char *value; u32 elapsed_time; + int rc; + int mandatory = 0; memset(opt_recv, 0, sizeof(*opt_recv)); + opt = len = 0; while (opt_ptr != opt_end) { opt = *opt_ptr++; len = 0; @@ -100,6 +107,12 @@ int dccp_parse_options(struct sock *sk, switch (opt) { case DCCPO_PADDING: break; + case DCCPO_MANDATORY: + if (mandatory) + goto out_invalid_option; + if (pkt_type != DCCP_PKT_DATA) + mandatory = 1; + break; case DCCPO_NDP_COUNT: if (len > 3) goto out_invalid_option; @@ -108,12 +121,37 @@ int dccp_parse_options(struct sock *sk, dccp_pr_debug("%sNDP count=%d\n", debug_prefix, opt_recv->dccpor_ndp); break; + case DCCPO_CHANGE_L: + /* fall through */ + case DCCPO_CHANGE_R: + if (len < 2) + goto out_invalid_option; + rc = dccp_feat_change_recv(sk, opt, *value, value + 1, + len - 1); + /* + * When there is a change error, change_recv is + * responsible for dealing with it. i.e. reply with an + * empty confirm. + * If the change was mandatory, then we need to die. + */ + if (rc && mandatory) + goto out_invalid_option; + break; + case DCCPO_CONFIRM_L: + /* fall through */ + case DCCPO_CONFIRM_R: + if (len < 2) + goto out_invalid_option; + if (dccp_feat_confirm_recv(sk, opt, *value, + value + 1, len - 1)) + goto out_invalid_option; + break; case DCCPO_ACK_VECTOR_0: case DCCPO_ACK_VECTOR_1: if (pkt_type == DCCP_PKT_DATA) - continue; + break; - if (dp->dccps_options.dccpo_send_ack_vector && + if (dccp_msk(sk)->dccpms_send_ack_vector && dccp_ackvec_parse(sk, skb, opt, value, len)) goto out_invalid_option; break; @@ -121,7 +159,7 @@ int dccp_parse_options(struct sock *sk, if (len != 4) goto out_invalid_option; - opt_recv->dccpor_timestamp = ntohl(*(u32 *)value); + opt_recv->dccpor_timestamp = ntohl(*(__be32 *)value); dp->dccps_timestamp_echo = opt_recv->dccpor_timestamp; dccp_timestamp(sk, &dp->dccps_timestamp_time); @@ -135,7 +173,7 @@ int dccp_parse_options(struct sock *sk, if (len != 4 && len != 6 && len != 8) goto out_invalid_option; - opt_recv->dccpor_timestamp_echo = ntohl(*(u32 *)value); + opt_recv->dccpor_timestamp_echo = ntohl(*(__be32 *)value); dccp_pr_debug("%sTIMESTAMP_ECHO=%u, len=%d, ackno=%llu, ", debug_prefix, @@ -149,9 +187,9 @@ int dccp_parse_options(struct sock *sk, break; if (len == 6) - elapsed_time = ntohs(*(u16 *)(value + 4)); + elapsed_time = ntohs(*(__be16 *)(value + 4)); else - elapsed_time = ntohl(*(u32 *)(value + 4)); + elapsed_time = ntohl(*(__be32 *)(value + 4)); /* Give precedence to the biggest ELAPSED_TIME */ if (elapsed_time > opt_recv->dccpor_elapsed_time) @@ -165,9 +203,9 @@ int dccp_parse_options(struct sock *sk, continue; if (len == 2) - elapsed_time = ntohs(*(u16 *)value); + elapsed_time = ntohs(*(__be16 *)value); else - elapsed_time = ntohl(*(u32 *)value); + elapsed_time = ntohl(*(__be32 *)value); if (elapsed_time > opt_recv->dccpor_elapsed_time) opt_recv->dccpor_elapsed_time = elapsed_time; @@ -208,8 +246,15 @@ int dccp_parse_options(struct sock *sk, sk, opt, len); break; } + + if (opt != DCCPO_MANDATORY) + mandatory = 0; } + /* mandatory was the last byte in option list -> reset connection */ + if (mandatory) + goto out_invalid_option; + return 0; out_invalid_option: @@ -219,6 +264,8 @@ out_invalid_option: return -1; } +EXPORT_SYMBOL_GPL(dccp_parse_options); + static void dccp_encode_value_var(const u32 value, unsigned char *to, const unsigned int len) { @@ -237,17 +284,14 @@ static inline int dccp_ndp_len(const int return likely(ndp <= 0xFF) ? 1 : ndp <= 0xFFFF ? 2 : 3; } -void dccp_insert_option(struct sock *sk, struct sk_buff *skb, +int dccp_insert_option(struct sock *sk, struct sk_buff *skb, const unsigned char option, const void *value, const unsigned char len) { unsigned char *to; - if (DCCP_SKB_CB(skb)->dccpd_opt_len + len + 2 > DCCP_MAX_OPT_LEN) { - LIMIT_NETDEBUG(KERN_INFO "DCCP: packet too small to insert " - "%d option!\n", option); - return; - } + if (DCCP_SKB_CB(skb)->dccpd_opt_len + len + 2 > DCCP_MAX_OPT_LEN) + return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len + 2; @@ -256,11 +300,12 @@ void dccp_insert_option(struct sock *sk, *to++ = len + 2; memcpy(to, value, len); + return 0; } EXPORT_SYMBOL_GPL(dccp_insert_option); -static void dccp_insert_option_ndp(struct sock *sk, struct sk_buff *skb) +static int dccp_insert_option_ndp(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); int ndp = dp->dccps_ndp_count; @@ -276,7 +321,7 @@ static void dccp_insert_option_ndp(struc const int len = ndp_len + 2; if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) - return; + return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len; @@ -285,6 +330,8 @@ static void dccp_insert_option_ndp(struc *ptr++ = len; dccp_encode_value_var(ndp, ptr, ndp_len); } + + return 0; } static inline int dccp_elapsed_time_len(const u32 elapsed_time) @@ -292,27 +339,18 @@ static inline int dccp_elapsed_time_len( return elapsed_time == 0 ? 0 : elapsed_time <= 0xFFFF ? 2 : 4; } -void dccp_insert_option_elapsed_time(struct sock *sk, - struct sk_buff *skb, - u32 elapsed_time) +int dccp_insert_option_elapsed_time(struct sock *sk, struct sk_buff *skb, + u32 elapsed_time) { -#ifdef CONFIG_IP_DCCP_DEBUG - struct dccp_sock *dp = dccp_sk(sk); - const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? - "CLIENT TX opt: " : "server TX opt: "; -#endif const int elapsed_time_len = dccp_elapsed_time_len(elapsed_time); const int len = 2 + elapsed_time_len; unsigned char *to; if (elapsed_time_len == 0) - return; + return 0; - if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) { - LIMIT_NETDEBUG(KERN_INFO "DCCP: packet too small to " - "insert elapsed time!\n"); - return; - } + if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) + return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len; @@ -321,17 +359,14 @@ void dccp_insert_option_elapsed_time(str *to++ = len; if (elapsed_time_len == 2) { - const u16 var16 = htons((u16)elapsed_time); + const __be16 var16 = htons((u16)elapsed_time); memcpy(to, &var16, 2); } else { - const u32 var32 = htonl(elapsed_time); + const __be32 var32 = htonl(elapsed_time); memcpy(to, &var32, 4); } - dccp_pr_debug("%sELAPSED_TIME=%u, len=%d, seqno=%llu\n", - debug_prefix, elapsed_time, - len, - (unsigned long long) DCCP_SKB_CB(skb)->dccpd_seq); + return 0; } EXPORT_SYMBOL_GPL(dccp_insert_option_elapsed_time); @@ -352,32 +387,27 @@ void dccp_timestamp(const struct sock *s EXPORT_SYMBOL_GPL(dccp_timestamp); -void dccp_insert_option_timestamp(struct sock *sk, struct sk_buff *skb) +int dccp_insert_option_timestamp(struct sock *sk, struct sk_buff *skb) { struct timeval tv; - u32 now; - + __be32 now; + dccp_timestamp(sk, &tv); - now = timeval_usecs(&tv) / 10; + now = htonl(timeval_usecs(&tv) / 10); /* yes this will overflow but that is the point as we want a * 10 usec 32 bit timer which mean it wraps every 11.9 hours */ - now = htonl(now); - dccp_insert_option(sk, skb, DCCPO_TIMESTAMP, &now, sizeof(now)); + return dccp_insert_option(sk, skb, DCCPO_TIMESTAMP, &now, sizeof(now)); } EXPORT_SYMBOL_GPL(dccp_insert_option_timestamp); -static void dccp_insert_option_timestamp_echo(struct sock *sk, - struct sk_buff *skb) +static int dccp_insert_option_timestamp_echo(struct sock *sk, + struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); -#ifdef CONFIG_IP_DCCP_DEBUG - const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? - "CLIENT TX opt: " : "server TX opt: "; -#endif struct timeval now; - u32 tstamp_echo; + __be32 tstamp_echo; u32 elapsed_time; int len, elapsed_time_len; unsigned char *to; @@ -387,11 +417,8 @@ static void dccp_insert_option_timestamp elapsed_time_len = dccp_elapsed_time_len(elapsed_time); len = 6 + elapsed_time_len; - if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) { - LIMIT_NETDEBUG(KERN_INFO "DCCP: packet too small to insert " - "timestamp echo!\n"); - return; - } + if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) + return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len; @@ -402,51 +429,149 @@ static void dccp_insert_option_timestamp tstamp_echo = htonl(dp->dccps_timestamp_echo); memcpy(to, &tstamp_echo, 4); to += 4; - + if (elapsed_time_len == 2) { - const u16 var16 = htons((u16)elapsed_time); + const __be16 var16 = htons((u16)elapsed_time); memcpy(to, &var16, 2); } else if (elapsed_time_len == 4) { - const u32 var32 = htonl(elapsed_time); + const __be32 var32 = htonl(elapsed_time); memcpy(to, &var32, 4); } - dccp_pr_debug("%sTIMESTAMP_ECHO=%u, len=%d, seqno=%llu\n", - debug_prefix, dp->dccps_timestamp_echo, - len, - (unsigned long long) DCCP_SKB_CB(skb)->dccpd_seq); - dp->dccps_timestamp_echo = 0; dp->dccps_timestamp_time.tv_sec = 0; dp->dccps_timestamp_time.tv_usec = 0; + return 0; +} + +static int dccp_insert_feat_opt(struct sk_buff *skb, u8 type, u8 feat, + u8 *val, u8 len) +{ + u8 *to; + + if (DCCP_SKB_CB(skb)->dccpd_opt_len + len + 3 > DCCP_MAX_OPT_LEN) { + LIMIT_NETDEBUG(KERN_INFO "DCCP: packet too small" + " to insert feature %d option!\n", feat); + return -1; + } + + DCCP_SKB_CB(skb)->dccpd_opt_len += len + 3; + + to = skb_push(skb, len + 3); + *to++ = type; + *to++ = len + 3; + *to++ = feat; + + if (len) + memcpy(to, val, len); + dccp_pr_debug("option %d feat %d len %d\n", type, feat, len); + + return 0; } -void dccp_insert_options(struct sock *sk, struct sk_buff *skb) +static int dccp_insert_options_feat(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); + struct dccp_minisock *dmsk = dccp_msk(sk); + struct dccp_opt_pend *opt, *next; + int change = 0; + + /* confirm any options [NN opts] */ + list_for_each_entry_safe(opt, next, &dmsk->dccpms_conf, dccpop_node) { + dccp_insert_feat_opt(skb, opt->dccpop_type, + opt->dccpop_feat, opt->dccpop_val, + opt->dccpop_len); + /* fear empty confirms */ + if (opt->dccpop_val) + kfree(opt->dccpop_val); + kfree(opt); + } + INIT_LIST_HEAD(&dmsk->dccpms_conf); + + /* see which features we need to send */ + list_for_each_entry(opt, &dmsk->dccpms_pending, dccpop_node) { + /* see if we need to send any confirm */ + if (opt->dccpop_sc) { + dccp_insert_feat_opt(skb, opt->dccpop_type + 1, + opt->dccpop_feat, + opt->dccpop_sc->dccpoc_val, + opt->dccpop_sc->dccpoc_len); + + BUG_ON(!opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc); + opt->dccpop_sc = NULL; + } + + /* any option not confirmed, re-send it */ + if (!opt->dccpop_conf) { + dccp_insert_feat_opt(skb, opt->dccpop_type, + opt->dccpop_feat, opt->dccpop_val, + opt->dccpop_len); + change++; + } + } + + /* Retransmit timer. + * If this is the master listening sock, we don't set a timer on it. It + * should be fine because if the dude doesn't receive our RESPONSE + * [which will contain the CHANGE] he will send another REQUEST which + * will "retrnasmit" the change. + */ + if (change && dp->dccps_role != DCCP_ROLE_LISTEN) { + dccp_pr_debug("reset feat negotiation timer %p\n", sk); + + /* XXX don't reset the timer on re-transmissions. I.e. reset it + * only when sending new stuff i guess. Currently the timer + * never backs off because on re-transmission it just resets it! + */ + inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, + inet_csk(sk)->icsk_rto, DCCP_RTO_MAX); + } + + return 0; +} + +int dccp_insert_options(struct sock *sk, struct sk_buff *skb) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_minisock *dmsk = dccp_msk(sk); DCCP_SKB_CB(skb)->dccpd_opt_len = 0; - if (dp->dccps_options.dccpo_send_ndp_count) - dccp_insert_option_ndp(sk, skb); + if (dmsk->dccpms_send_ndp_count && + dccp_insert_option_ndp(sk, skb)) + return -1; if (!dccp_packet_without_ack(skb)) { - if (dp->dccps_options.dccpo_send_ack_vector && - dccp_ackvec_pending(dp->dccps_hc_rx_ackvec)) - dccp_insert_option_ackvec(sk, skb); - if (dp->dccps_timestamp_echo != 0) - dccp_insert_option_timestamp_echo(sk, skb); + if (dmsk->dccpms_send_ack_vector && + dccp_ackvec_pending(dp->dccps_hc_rx_ackvec) && + dccp_insert_option_ackvec(sk, skb)) + return -1; + + if (dp->dccps_timestamp_echo != 0 && + dccp_insert_option_timestamp_echo(sk, skb)) + return -1; } if (dp->dccps_hc_rx_insert_options) { - ccid_hc_rx_insert_options(dp->dccps_hc_rx_ccid, sk, skb); + if (ccid_hc_rx_insert_options(dp->dccps_hc_rx_ccid, sk, skb)) + return -1; dp->dccps_hc_rx_insert_options = 0; } if (dp->dccps_hc_tx_insert_options) { - ccid_hc_tx_insert_options(dp->dccps_hc_tx_ccid, sk, skb); + if (ccid_hc_tx_insert_options(dp->dccps_hc_tx_ccid, sk, skb)) + return -1; dp->dccps_hc_tx_insert_options = 0; } + /* Feature negotiation */ + /* Data packets can't do feat negotiation */ + if (DCCP_SKB_CB(skb)->dccpd_type != DCCP_PKT_DATA && + DCCP_SKB_CB(skb)->dccpd_type != DCCP_PKT_DATAACK && + dccp_insert_options_feat(sk, skb)) + return -1; + /* XXX: insert other options when appropriate */ if (DCCP_SKB_CB(skb)->dccpd_opt_len != 0) { @@ -459,4 +584,6 @@ void dccp_insert_options(struct sock *sk DCCP_SKB_CB(skb)->dccpd_opt_len += padding; } } + + return 0; } diff -puN net/dccp/output.c~git-net net/dccp/output.c --- devel/net/dccp/output.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/output.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,7 +27,7 @@ static inline void dccp_event_ack_sent(s inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK); } -static inline void dccp_skb_entail(struct sock *sk, struct sk_buff *skb) +static void dccp_skb_entail(struct sock *sk, struct sk_buff *skb) { skb_set_owner_w(skb, sk); WARN_ON(sk->sk_send_head); @@ -49,7 +49,7 @@ static int dccp_transmit_skb(struct sock struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); struct dccp_hdr *dh; /* XXX For now we're using only 48 bits sequence numbers */ - const int dccp_header_size = sizeof(*dh) + + const u32 dccp_header_size = sizeof(*dh) + sizeof(struct dccp_hdr_ext) + dccp_packet_hdr_len(dcb->dccpd_type); int err, set_ack = 1; @@ -64,6 +64,10 @@ static int dccp_transmit_skb(struct sock case DCCP_PKT_DATAACK: break; + case DCCP_PKT_REQUEST: + set_ack = 0; + /* fall through */ + case DCCP_PKT_SYNC: case DCCP_PKT_SYNCACK: ackno = dcb->dccpd_seq; @@ -79,7 +83,11 @@ static int dccp_transmit_skb(struct sock } dcb->dccpd_seq = dp->dccps_gss; - dccp_insert_options(sk, skb); + + if (dccp_insert_options(sk, skb)) { + kfree_skb(skb); + return -EPROTO; + } skb->h.raw = skb_push(skb, dccp_header_size); dh = dccp_hdr(skb); @@ -275,17 +283,16 @@ struct sk_buff *dccp_make_response(struc { struct dccp_hdr *dh; struct dccp_request_sock *dreq; - const int dccp_header_size = sizeof(struct dccp_hdr) + + const u32 dccp_header_size = sizeof(struct dccp_hdr) + sizeof(struct dccp_hdr_ext) + sizeof(struct dccp_hdr_response); - struct sk_buff *skb = sock_wmalloc(sk, MAX_HEADER + DCCP_MAX_OPT_LEN + - dccp_header_size, 1, + struct sk_buff *skb = sock_wmalloc(sk, sk->sk_prot->max_header, 1, GFP_ATOMIC); if (skb == NULL) return NULL; /* Reserve space for headers. */ - skb_reserve(skb, MAX_HEADER + DCCP_MAX_OPT_LEN + dccp_header_size); + skb_reserve(skb, sk->sk_prot->max_header); skb->dst = dst_clone(dst); skb->csum = 0; @@ -293,7 +300,11 @@ struct sk_buff *dccp_make_response(struc dreq = dccp_rsk(req); DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_RESPONSE; DCCP_SKB_CB(skb)->dccpd_seq = dreq->dreq_iss; - dccp_insert_options(sk, skb); + + if (dccp_insert_options(sk, skb)) { + kfree_skb(skb); + return NULL; + } skb->h.raw = skb_push(skb, dccp_header_size); @@ -310,32 +321,28 @@ struct sk_buff *dccp_make_response(struc dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), dreq->dreq_isr); dccp_hdr_response(skb)->dccph_resp_service = dreq->dreq_service; - dh->dccph_checksum = dccp_v4_checksum(skb, inet_rsk(req)->loc_addr, - inet_rsk(req)->rmt_addr); - DCCP_INC_STATS(DCCP_MIB_OUTSEGS); return skb; } EXPORT_SYMBOL_GPL(dccp_make_response); -struct sk_buff *dccp_make_reset(struct sock *sk, struct dst_entry *dst, - const enum dccp_reset_codes code) +static struct sk_buff *dccp_make_reset(struct sock *sk, struct dst_entry *dst, + const enum dccp_reset_codes code) { struct dccp_hdr *dh; struct dccp_sock *dp = dccp_sk(sk); - const int dccp_header_size = sizeof(struct dccp_hdr) + + const u32 dccp_header_size = sizeof(struct dccp_hdr) + sizeof(struct dccp_hdr_ext) + sizeof(struct dccp_hdr_reset); - struct sk_buff *skb = sock_wmalloc(sk, MAX_HEADER + DCCP_MAX_OPT_LEN + - dccp_header_size, 1, + struct sk_buff *skb = sock_wmalloc(sk, sk->sk_prot->max_header, 1, GFP_ATOMIC); if (skb == NULL) return NULL; /* Reserve space for headers. */ - skb_reserve(skb, MAX_HEADER + DCCP_MAX_OPT_LEN + dccp_header_size); + skb_reserve(skb, sk->sk_prot->max_header); skb->dst = dst_clone(dst); skb->csum = 0; @@ -345,7 +352,11 @@ struct sk_buff *dccp_make_reset(struct s DCCP_SKB_CB(skb)->dccpd_reset_code = code; DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_RESET; DCCP_SKB_CB(skb)->dccpd_seq = dp->dccps_gss; - dccp_insert_options(sk, skb); + + if (dccp_insert_options(sk, skb)) { + kfree_skb(skb); + return NULL; + } skb->h.raw = skb_push(skb, dccp_header_size); @@ -362,14 +373,34 @@ struct sk_buff *dccp_make_reset(struct s dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), dp->dccps_gsr); dccp_hdr_reset(skb)->dccph_reset_code = code; - - dh->dccph_checksum = dccp_v4_checksum(skb, inet_sk(sk)->saddr, - inet_sk(sk)->daddr); + inet_csk(sk)->icsk_af_ops->send_check(sk, skb->len, skb); DCCP_INC_STATS(DCCP_MIB_OUTSEGS); return skb; } +int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code) +{ + /* + * FIXME: what if rebuild_header fails? + * Should we be doing a rebuild_header here? + */ + int err = inet_sk_rebuild_header(sk); + + if (err == 0) { + struct sk_buff *skb = dccp_make_reset(sk, sk->sk_dst_cache, + code); + if (skb != NULL) { + memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); + err = inet_csk(sk)->icsk_af_ops->queue_xmit(skb, 0); + if (err == NET_XMIT_CN) + err = 0; + } + } + + return err; +} + /* * Do all connect socket setups that can be done AF independent. */ @@ -405,12 +436,12 @@ int dccp_connect(struct sock *sk) dccp_connect_init(sk); - skb = alloc_skb(MAX_DCCP_HEADER + 15, sk->sk_allocation); + skb = alloc_skb(sk->sk_prot->max_header, sk->sk_allocation); if (unlikely(skb == NULL)) return -ENOBUFS; /* Reserve space for headers. */ - skb_reserve(skb, MAX_DCCP_HEADER); + skb_reserve(skb, sk->sk_prot->max_header); DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_REQUEST; skb->csum = 0; @@ -431,7 +462,8 @@ void dccp_send_ack(struct sock *sk) { /* If we have been reset, we may not send again. */ if (sk->sk_state != DCCP_CLOSED) { - struct sk_buff *skb = alloc_skb(MAX_DCCP_HEADER, GFP_ATOMIC); + struct sk_buff *skb = alloc_skb(sk->sk_prot->max_header, + GFP_ATOMIC); if (skb == NULL) { inet_csk_schedule_ack(sk); @@ -443,7 +475,7 @@ void dccp_send_ack(struct sock *sk) } /* Reserve space for headers */ - skb_reserve(skb, MAX_DCCP_HEADER); + skb_reserve(skb, sk->sk_prot->max_header); skb->csum = 0; DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_ACK; dccp_transmit_skb(sk, skb); @@ -490,14 +522,14 @@ void dccp_send_sync(struct sock *sk, con * dccp_transmit_skb() will set the ownership to this * sock. */ - struct sk_buff *skb = alloc_skb(MAX_DCCP_HEADER, GFP_ATOMIC); + struct sk_buff *skb = alloc_skb(sk->sk_prot->max_header, GFP_ATOMIC); if (skb == NULL) /* FIXME: how to make sure the sync is sent? */ return; /* Reserve space for headers and prepare control bits. */ - skb_reserve(skb, MAX_DCCP_HEADER); + skb_reserve(skb, sk->sk_prot->max_header); skb->csum = 0; DCCP_SKB_CB(skb)->dccpd_type = pkt_type; DCCP_SKB_CB(skb)->dccpd_seq = seq; @@ -505,6 +537,8 @@ void dccp_send_sync(struct sock *sk, con dccp_transmit_skb(sk, skb); } +EXPORT_SYMBOL_GPL(dccp_send_sync); + /* * Send a DCCP_PKT_CLOSE/CLOSEREQ. The caller locks the socket for us. This * cannot be allowed to fail queueing a DCCP_PKT_CLOSE/CLOSEREQ frame under diff -puN net/dccp/proto.c~git-net net/dccp/proto.c --- devel/net/dccp/proto.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/proto.c 2006-03-17 23:03:48.000000000 -0800 @@ -23,9 +23,7 @@ #include #include -#include #include -#include #include #include @@ -37,6 +35,7 @@ #include "ccid.h" #include "dccp.h" +#include "feat.h" DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly; @@ -46,12 +45,66 @@ atomic_t dccp_orphan_count = ATOMIC_INIT EXPORT_SYMBOL_GPL(dccp_orphan_count); -static struct net_protocol dccp_protocol = { - .handler = dccp_v4_rcv, - .err_handler = dccp_v4_err, - .no_policy = 1, +struct inet_hashinfo __cacheline_aligned dccp_hashinfo = { + .lhash_lock = RW_LOCK_UNLOCKED, + .lhash_users = ATOMIC_INIT(0), + .lhash_wait = __WAIT_QUEUE_HEAD_INITIALIZER(dccp_hashinfo.lhash_wait), }; +EXPORT_SYMBOL_GPL(dccp_hashinfo); + +void dccp_set_state(struct sock *sk, const int state) +{ + const int oldstate = sk->sk_state; + + dccp_pr_debug("%s(%p) %-10.10s -> %s\n", + dccp_role(sk), sk, + dccp_state_name(oldstate), dccp_state_name(state)); + WARN_ON(state == oldstate); + + switch (state) { + case DCCP_OPEN: + if (oldstate != DCCP_OPEN) + DCCP_INC_STATS(DCCP_MIB_CURRESTAB); + break; + + case DCCP_CLOSED: + if (oldstate == DCCP_CLOSING || oldstate == DCCP_OPEN) + DCCP_INC_STATS(DCCP_MIB_ESTABRESETS); + + sk->sk_prot->unhash(sk); + if (inet_csk(sk)->icsk_bind_hash != NULL && + !(sk->sk_userlocks & SOCK_BINDPORT_LOCK)) + inet_put_port(&dccp_hashinfo, sk); + /* fall through */ + default: + if (oldstate == DCCP_OPEN) + DCCP_DEC_STATS(DCCP_MIB_CURRESTAB); + } + + /* Change state AFTER socket is unhashed to avoid closed + * socket sitting in hash tables. + */ + sk->sk_state = state; +} + +EXPORT_SYMBOL_GPL(dccp_set_state); + +void dccp_done(struct sock *sk) +{ + dccp_set_state(sk, DCCP_CLOSED); + dccp_clear_xmit_timers(sk); + + sk->sk_shutdown = SHUTDOWN_MASK; + + if (!sock_flag(sk, SOCK_DEAD)) + sk->sk_state_change(sk); + else + inet_csk_destroy_sock(sk); +} + +EXPORT_SYMBOL_GPL(dccp_done); + const char *dccp_packet_name(const int type) { static const char *dccp_packet_names[] = { @@ -96,6 +149,120 @@ const char *dccp_state_name(const int st EXPORT_SYMBOL_GPL(dccp_state_name); +void dccp_hash(struct sock *sk) +{ + inet_hash(&dccp_hashinfo, sk); +} + +EXPORT_SYMBOL_GPL(dccp_hash); + +void dccp_unhash(struct sock *sk) +{ + inet_unhash(&dccp_hashinfo, sk); +} + +EXPORT_SYMBOL_GPL(dccp_unhash); + +int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_minisock *dmsk = dccp_msk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + + dccp_minisock_init(&dp->dccps_minisock); + do_gettimeofday(&dp->dccps_epoch); + + /* + * FIXME: We're hardcoding the CCID, and doing this at this point makes + * the listening (master) sock get CCID control blocks, which is not + * necessary, but for now, to not mess with the test userspace apps, + * lets leave it here, later the real solution is to do this in a + * setsockopt(CCIDs-I-want/accept). -acme + */ + if (likely(ctl_sock_initialized)) { + int rc = dccp_feat_init(dmsk); + + if (rc) + return rc; + + if (dmsk->dccpms_send_ack_vector) { + dp->dccps_hc_rx_ackvec = dccp_ackvec_alloc(GFP_KERNEL); + if (dp->dccps_hc_rx_ackvec == NULL) + return -ENOMEM; + } + dp->dccps_hc_rx_ccid = ccid_hc_rx_new(dmsk->dccpms_rx_ccid, + sk, GFP_KERNEL); + dp->dccps_hc_tx_ccid = ccid_hc_tx_new(dmsk->dccpms_tx_ccid, + sk, GFP_KERNEL); + if (unlikely(dp->dccps_hc_rx_ccid == NULL || + dp->dccps_hc_tx_ccid == NULL)) { + ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); + ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); + if (dmsk->dccpms_send_ack_vector) { + dccp_ackvec_free(dp->dccps_hc_rx_ackvec); + dp->dccps_hc_rx_ackvec = NULL; + } + dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; + return -ENOMEM; + } + } else { + /* control socket doesn't need feat nego */ + INIT_LIST_HEAD(&dmsk->dccpms_pending); + INIT_LIST_HEAD(&dmsk->dccpms_conf); + } + + dccp_init_xmit_timers(sk); + icsk->icsk_rto = DCCP_TIMEOUT_INIT; + sk->sk_state = DCCP_CLOSED; + sk->sk_write_space = dccp_write_space; + icsk->icsk_sync_mss = dccp_sync_mss; + dp->dccps_mss_cache = 536; + dp->dccps_role = DCCP_ROLE_UNDEFINED; + dp->dccps_service = DCCP_SERVICE_INVALID_VALUE; + dp->dccps_l_ack_ratio = dp->dccps_r_ack_ratio = 1; + + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_init_sock); + +int dccp_destroy_sock(struct sock *sk) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_minisock *dmsk = dccp_msk(sk); + + /* + * DCCP doesn't use sk_write_queue, just sk_send_head + * for retransmissions + */ + if (sk->sk_send_head != NULL) { + kfree_skb(sk->sk_send_head); + sk->sk_send_head = NULL; + } + + /* Clean up a referenced DCCP bind bucket. */ + if (inet_csk(sk)->icsk_bind_hash != NULL) + inet_put_port(&dccp_hashinfo, sk); + + kfree(dp->dccps_service_list); + dp->dccps_service_list = NULL; + + if (dmsk->dccpms_send_ack_vector) { + dccp_ackvec_free(dp->dccps_hc_rx_ackvec); + dp->dccps_hc_rx_ackvec = NULL; + } + ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); + ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); + dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; + + /* clean up feature negotiation state */ + dccp_feat_clean(dmsk); + + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_destroy_sock); + static inline int dccp_listen_start(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); @@ -220,7 +387,7 @@ int dccp_ioctl(struct sock *sk, int cmd, EXPORT_SYMBOL_GPL(dccp_ioctl); -static int dccp_setsockopt_service(struct sock *sk, const u32 service, +static int dccp_setsockopt_service(struct sock *sk, const __be32 service, char __user *optval, int optlen) { struct dccp_sock *dp = dccp_sk(sk); @@ -255,18 +422,46 @@ static int dccp_setsockopt_service(struc return 0; } -int dccp_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, int optlen) +/* byte 1 is feature. the rest is the preference list */ +static int dccp_setsockopt_change(struct sock *sk, int type, + struct dccp_so_feat __user *optval) +{ + struct dccp_so_feat opt; + u8 *val; + int rc; + + if (copy_from_user(&opt, optval, sizeof(opt))) + return -EFAULT; + + val = kmalloc(opt.dccpsf_len, GFP_KERNEL); + if (!val) + return -ENOMEM; + + if (copy_from_user(val, opt.dccpsf_val, opt.dccpsf_len)) { + rc = -EFAULT; + goto out_free_val; + } + + rc = dccp_feat_change(dccp_msk(sk), type, opt.dccpsf_feat, + val, opt.dccpsf_len, GFP_KERNEL); + if (rc) + goto out_free_val; + +out: + return rc; + +out_free_val: + kfree(val); + goto out; +} + +static int do_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { struct dccp_sock *dp; int err; int val; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, - optname, optval, - optlen); - if (optlen < sizeof(int)) return -EINVAL; @@ -284,6 +479,25 @@ int dccp_setsockopt(struct sock *sk, int case DCCP_SOCKOPT_PACKET_SIZE: dp->dccps_packet_size = val; break; + + case DCCP_SOCKOPT_CHANGE_L: + if (optlen != sizeof(struct dccp_so_feat)) + err = -EINVAL; + else + err = dccp_setsockopt_change(sk, DCCPO_CHANGE_L, + (struct dccp_so_feat *) + optval); + break; + + case DCCP_SOCKOPT_CHANGE_R: + if (optlen != sizeof(struct dccp_so_feat)) + err = -EINVAL; + else + err = dccp_setsockopt_change(sk, DCCPO_CHANGE_R, + (struct dccp_so_feat *) + optval); + break; + default: err = -ENOPROTOOPT; break; @@ -293,10 +507,33 @@ int dccp_setsockopt(struct sock *sk, int return err; } +int dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} + EXPORT_SYMBOL_GPL(dccp_setsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) + return inet_csk_compat_setsockopt(sk, level, optname, + optval, optlen); + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} + +EXPORT_SYMBOL_GPL(compat_dccp_setsockopt); +#endif + static int dccp_getsockopt_service(struct sock *sk, int len, - u32 __user *optval, + __be32 __user *optval, int __user *optlen) { const struct dccp_sock *dp = dccp_sk(sk); @@ -326,16 +563,12 @@ out: return err; } -int dccp_getsockopt(struct sock *sk, int level, int optname, +static int do_dccp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { struct dccp_sock *dp; int val, len; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, - optname, optval, - optlen); if (get_user(len, optlen)) return -EFAULT; @@ -351,7 +584,7 @@ int dccp_getsockopt(struct sock *sk, int break; case DCCP_SOCKOPT_SERVICE: return dccp_getsockopt_service(sk, len, - (u32 __user *)optval, optlen); + (__be32 __user *)optval, optlen); case 128 ... 191: return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname, len, (u32 __user *)optval, optlen); @@ -368,8 +601,31 @@ int dccp_getsockopt(struct sock *sk, int return 0; } +int dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} + EXPORT_SYMBOL_GPL(dccp_getsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) + return inet_csk_compat_getsockopt(sk, level, optname, + optval, optlen); + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} + +EXPORT_SYMBOL_GPL(compat_dccp_getsockopt); +#endif + int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len) { @@ -679,84 +935,7 @@ void dccp_shutdown(struct sock *sk, int EXPORT_SYMBOL_GPL(dccp_shutdown); -static const struct proto_ops inet_dccp_ops = { - .family = PF_INET, - .owner = THIS_MODULE, - .release = inet_release, - .bind = inet_bind, - .connect = inet_stream_connect, - .socketpair = sock_no_socketpair, - .accept = inet_accept, - .getname = inet_getname, - /* FIXME: work on tcp_poll to rename it to inet_csk_poll */ - .poll = dccp_poll, - .ioctl = inet_ioctl, - /* FIXME: work on inet_listen to rename it to sock_common_listen */ - .listen = inet_dccp_listen, - .shutdown = inet_shutdown, - .setsockopt = sock_common_setsockopt, - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, -}; - -extern struct net_proto_family inet_family_ops; - -static struct inet_protosw dccp_v4_protosw = { - .type = SOCK_DCCP, - .protocol = IPPROTO_DCCP, - .prot = &dccp_prot, - .ops = &inet_dccp_ops, - .capability = -1, - .no_check = 0, - .flags = INET_PROTOSW_ICSK, -}; - -/* - * This is the global socket data structure used for responding to - * the Out-of-the-blue (OOTB) packets. A control sock will be created - * for this socket at the initialization time. - */ -struct socket *dccp_ctl_socket; - -static char dccp_ctl_socket_err_msg[] __initdata = - KERN_ERR "DCCP: Failed to create the control socket.\n"; - -static int __init dccp_ctl_sock_init(void) -{ - int rc = sock_create_kern(PF_INET, SOCK_DCCP, IPPROTO_DCCP, - &dccp_ctl_socket); - if (rc < 0) - printk(dccp_ctl_socket_err_msg); - else { - dccp_ctl_socket->sk->sk_allocation = GFP_ATOMIC; - inet_sk(dccp_ctl_socket->sk)->uc_ttl = -1; - - /* Unhash it so that IP input processing does not even - * see it, we do not wish this socket to see incoming - * packets. - */ - dccp_ctl_socket->sk->sk_prot->unhash(dccp_ctl_socket->sk); - } - - return rc; -} - -#ifdef CONFIG_IP_DCCP_UNLOAD_HACK -void dccp_ctl_sock_exit(void) -{ - if (dccp_ctl_socket != NULL) { - sock_release(dccp_ctl_socket); - dccp_ctl_socket = NULL; - } -} - -EXPORT_SYMBOL_GPL(dccp_ctl_sock_exit); -#endif - -static int __init init_dccp_v4_mibs(void) +static int __init dccp_mib_init(void) { int rc = -ENOMEM; @@ -778,6 +957,13 @@ out_free_one: } +static void dccp_mib_exit(void) +{ + free_percpu(dccp_statistics[0]); + free_percpu(dccp_statistics[1]); + dccp_statistics[0] = dccp_statistics[1] = NULL; +} + static int thash_entries; module_param(thash_entries, int, 0444); MODULE_PARM_DESC(thash_entries, "Number of ehash buckets"); @@ -794,17 +980,14 @@ static int __init dccp_init(void) { unsigned long goal; int ehash_order, bhash_order, i; - int rc = proto_register(&dccp_prot, 1); - - if (rc) - goto out; + int rc = -ENOBUFS; dccp_hashinfo.bind_bucket_cachep = kmem_cache_create("dccp_bind_bucket", sizeof(struct inet_bind_bucket), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (!dccp_hashinfo.bind_bucket_cachep) - goto out_proto_unregister; + goto out; /* * Size and allocate the main established and bind bucket @@ -866,27 +1049,23 @@ static int __init dccp_init(void) INIT_HLIST_HEAD(&dccp_hashinfo.bhash[i].chain); } - if (init_dccp_v4_mibs()) + rc = dccp_mib_init(); + if (rc) goto out_free_dccp_bhash; - rc = -EAGAIN; - if (inet_add_protocol(&dccp_protocol, IPPROTO_DCCP)) - goto out_free_dccp_v4_mibs; - - inet_register_protosw(&dccp_v4_protosw); + rc = dccp_ackvec_init(); + if (rc) + goto out_free_dccp_mib; - rc = dccp_ctl_sock_init(); + rc = dccp_sysctl_init(); if (rc) - goto out_unregister_protosw; + goto out_ackvec_exit; out: return rc; -out_unregister_protosw: - inet_unregister_protosw(&dccp_v4_protosw); - inet_del_protocol(&dccp_protocol, IPPROTO_DCCP); -out_free_dccp_v4_mibs: - free_percpu(dccp_statistics[0]); - free_percpu(dccp_statistics[1]); - dccp_statistics[0] = dccp_statistics[1] = NULL; +out_ackvec_exit: + dccp_ackvec_exit(); +out_free_dccp_mib: + dccp_mib_exit(); out_free_dccp_bhash: free_pages((unsigned long)dccp_hashinfo.bhash, bhash_order); dccp_hashinfo.bhash = NULL; @@ -896,23 +1075,12 @@ out_free_dccp_ehash: out_free_bind_bucket_cachep: kmem_cache_destroy(dccp_hashinfo.bind_bucket_cachep); dccp_hashinfo.bind_bucket_cachep = NULL; -out_proto_unregister: - proto_unregister(&dccp_prot); goto out; } -static const char dccp_del_proto_err_msg[] __exitdata = - KERN_ERR "can't remove dccp net_protocol\n"; - static void __exit dccp_fini(void) { - inet_unregister_protosw(&dccp_v4_protosw); - - if (inet_del_protocol(&dccp_protocol, IPPROTO_DCCP) < 0) - printk(dccp_del_proto_err_msg); - - free_percpu(dccp_statistics[0]); - free_percpu(dccp_statistics[1]); + dccp_mib_exit(); free_pages((unsigned long)dccp_hashinfo.bhash, get_order(dccp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))); @@ -920,19 +1088,13 @@ static void __exit dccp_fini(void) get_order(dccp_hashinfo.ehash_size * sizeof(struct inet_ehash_bucket))); kmem_cache_destroy(dccp_hashinfo.bind_bucket_cachep); - proto_unregister(&dccp_prot); + dccp_ackvec_exit(); + dccp_sysctl_exit(); } module_init(dccp_init); module_exit(dccp_fini); -/* - * __stringify doesn't likes enums, so use SOCK_DCCP (6) and IPPROTO_DCCP (33) - * values directly, Also cover the case where the protocol is not specified, - * i.e. net-pf-PF_INET-proto-0-type-SOCK_DCCP - */ -MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-33-type-6"); -MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-0-type-6"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Arnaldo Carvalho de Melo "); MODULE_DESCRIPTION("DCCP - Datagram Congestion Controlled Protocol"); diff -puN /dev/null net/dccp/sysctl.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/sysctl.c 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,124 @@ +/* + * net/dccp/sysctl.c + * + * An implementation of the DCCP protocol + * Arnaldo Carvalho de Melo + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License v2 + * as published by the Free Software Foundation. + */ + +#include +#include +#include + +#ifndef CONFIG_SYSCTL +#error This file should not be compiled without CONFIG_SYSCTL defined +#endif + +extern int dccp_feat_default_sequence_window; +extern int dccp_feat_default_rx_ccid; +extern int dccp_feat_default_tx_ccid; +extern int dccp_feat_default_ack_ratio; +extern int dccp_feat_default_send_ack_vector; +extern int dccp_feat_default_send_ndp_count; + +static struct ctl_table dccp_default_table[] = { + { + .ctl_name = NET_DCCP_DEFAULT_SEQ_WINDOW, + .procname = "seq_window", + .data = &dccp_feat_default_sequence_window, + .maxlen = sizeof(dccp_feat_default_sequence_window), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_RX_CCID, + .procname = "rx_ccid", + .data = &dccp_feat_default_rx_ccid, + .maxlen = sizeof(dccp_feat_default_rx_ccid), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_TX_CCID, + .procname = "tx_ccid", + .data = &dccp_feat_default_tx_ccid, + .maxlen = sizeof(dccp_feat_default_tx_ccid), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_ACK_RATIO, + .procname = "ack_ratio", + .data = &dccp_feat_default_ack_ratio, + .maxlen = sizeof(dccp_feat_default_ack_ratio), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_SEND_ACKVEC, + .procname = "send_ackvec", + .data = &dccp_feat_default_send_ack_vector, + .maxlen = sizeof(dccp_feat_default_send_ack_vector), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_SEND_NDP, + .procname = "send_ndp", + .data = &dccp_feat_default_send_ndp_count, + .maxlen = sizeof(dccp_feat_default_send_ndp_count), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { .ctl_name = 0, } +}; + +static struct ctl_table dccp_table[] = { + { + .ctl_name = NET_DCCP_DEFAULT, + .procname = "default", + .mode = 0555, + .child = dccp_default_table, + }, + { .ctl_name = 0, }, +}; + +static struct ctl_table dccp_dir_table[] = { + { + .ctl_name = NET_DCCP, + .procname = "dccp", + .mode = 0555, + .child = dccp_table, + }, + { .ctl_name = 0, }, +}; + +static struct ctl_table dccp_root_table[] = { + { + .ctl_name = CTL_NET, + .procname = "net", + .mode = 0555, + .child = dccp_dir_table, + }, + { .ctl_name = 0, }, +}; + +static struct ctl_table_header *dccp_table_header; + +int __init dccp_sysctl_init(void) +{ + dccp_table_header = register_sysctl_table(dccp_root_table, 1); + + return dccp_table_header != NULL ? 0 : -ENOMEM; +} + +void dccp_sysctl_exit(void) +{ + if (dccp_table_header != NULL) { + unregister_sysctl_table(dccp_table_header); + dccp_table_header = NULL; + } +} diff -puN net/dccp/timer.c~git-net net/dccp/timer.c --- devel/net/dccp/timer.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/dccp/timer.c 2006-03-17 23:03:48.000000000 -0800 @@ -31,7 +31,7 @@ static void dccp_write_err(struct sock * sk->sk_err = sk->sk_err_soft ? : ETIMEDOUT; sk->sk_error_report(sk); - dccp_v4_send_reset(sk, DCCP_RESET_CODE_ABORTED); + dccp_send_reset(sk, DCCP_RESET_CODE_ABORTED); dccp_done(sk); DCCP_INC_STATS_BH(DCCP_MIB_ABORTONTIMEOUT); } @@ -141,6 +141,17 @@ static void dccp_retransmit_timer(struct { struct inet_connection_sock *icsk = inet_csk(sk); + /* retransmit timer is used for feature negotiation throughout + * connection. In this case, no packet is re-transmitted, but rather an + * ack is generated and pending changes are splaced into its options. + */ + if (sk->sk_send_head == NULL) { + dccp_pr_debug("feat negotiation retransmit timeout %p\n", sk); + if (sk->sk_state == DCCP_OPEN) + dccp_send_ack(sk); + goto backoff; + } + /* * sk->sk_send_head has to have one skb with * DCCP_SKB_CB(skb)->dccpd_type set to one of the retransmittable DCCP @@ -177,6 +188,7 @@ static void dccp_retransmit_timer(struct goto out; } +backoff: icsk->icsk_backoff++; icsk->icsk_retransmits++; diff -puN net/decnet/af_decnet.c~git-net net/decnet/af_decnet.c --- devel/net/decnet/af_decnet.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/af_decnet.c 2006-03-17 23:03:48.000000000 -0800 @@ -172,7 +172,7 @@ static struct hlist_head *dn_find_list(s /* * Valid ports are those greater than zero and not already in use. */ -static int check_port(unsigned short port) +static int check_port(__le16 port) { struct sock *sk; struct hlist_node *node; @@ -661,7 +661,7 @@ disc_reject: } } -char *dn_addr2asc(dn_address addr, char *buf) +char *dn_addr2asc(__u16 addr, char *buf) { unsigned short node, area; @@ -801,7 +801,7 @@ static int dn_auto_bind(struct socket *s /* End of compatibility stuff */ scp->addr.sdn_add.a_len = dn_htons(2); - rv = dn_dev_bind_default((dn_address *)scp->addr.sdn_add.a_addr); + rv = dn_dev_bind_default((__le16 *)scp->addr.sdn_add.a_addr); if (rv == 0) { rv = dn_hash_sock(sk); if (rv) @@ -1021,7 +1021,7 @@ static void dn_user_copy(struct sk_buff opt->opt_optl = *ptr++; opt->opt_status = 0; memcpy(opt->opt_data, ptr, opt->opt_optl); - skb_pull(skb, opt->opt_optl + 1); + skb_pull(skb, dn_ntohs(opt->opt_optl) + 1); } @@ -1121,8 +1121,8 @@ static int dn_accept(struct socket *sock skb_pull(skb, dn_username2sockaddr(skb->data, skb->len, &(DN_SK(newsk)->addr), &type)); skb_pull(skb, dn_username2sockaddr(skb->data, skb->len, &(DN_SK(newsk)->peer), &type)); - *(dn_address *)(DN_SK(newsk)->peer.sdn_add.a_addr) = cb->src; - *(dn_address *)(DN_SK(newsk)->addr.sdn_add.a_addr) = cb->dst; + *(__le16 *)(DN_SK(newsk)->peer.sdn_add.a_addr) = cb->src; + *(__le16 *)(DN_SK(newsk)->addr.sdn_add.a_addr) = cb->dst; menuver = *skb->data; skb_pull(skb, 1); @@ -1365,7 +1365,7 @@ static int __dn_setsockopt(struct socket if (optlen != sizeof(struct optdata_dn)) return -EINVAL; - if (u.opt.opt_optl > 16) + if (dn_ntohs(u.opt.opt_optl) > 16) return -EINVAL; memcpy(&scp->conndata_out, &u.opt, optlen); @@ -1378,7 +1378,7 @@ static int __dn_setsockopt(struct socket if (optlen != sizeof(struct optdata_dn)) return -EINVAL; - if (u.opt.opt_optl > 16) + if (dn_ntohs(u.opt.opt_optl) > 16) return -EINVAL; memcpy(&scp->discdata_out, &u.opt, optlen); @@ -1693,7 +1693,7 @@ static int dn_recvmsg(struct kiocb *iocb if (rv) goto out; - if (flags & ~(MSG_PEEK|MSG_OOB|MSG_WAITALL|MSG_DONTWAIT|MSG_NOSIGNAL)) { + if (flags & ~(MSG_CMSG_COMPAT|MSG_PEEK|MSG_OOB|MSG_WAITALL|MSG_DONTWAIT|MSG_NOSIGNAL)) { rv = -EOPNOTSUPP; goto out; } diff -puN net/decnet/dn_dev.c~git-net net/decnet/dn_dev.c --- devel/net/decnet/dn_dev.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_dev.c 2006-03-17 23:03:48.000000000 -0800 @@ -64,7 +64,7 @@ extern struct neigh_table dn_neigh_table /* * decnet_address is kept in network order. */ -dn_address decnet_address = 0; +__le16 decnet_address = 0; static DEFINE_RWLOCK(dndev_lock); static struct net_device *decnet_default_device; @@ -439,7 +439,7 @@ static void dn_dev_del_ifa(struct dn_dev *ifap = ifa1->ifa_next; if (dn_db->dev->type == ARPHRD_ETHER) { - if (ifa1->ifa_local != dn_htons(dn_eth2dn(dev->dev_addr))) { + if (ifa1->ifa_local != dn_eth2dn(dev->dev_addr)) { dn_dn2eth(mac_addr, ifa1->ifa_local); dev_mc_delete(dev, mac_addr, ETH_ALEN, 0); } @@ -470,7 +470,7 @@ static int dn_dev_insert_ifa(struct dn_d } if (dev->type == ARPHRD_ETHER) { - if (ifa->ifa_local != dn_htons(dn_eth2dn(dev->dev_addr))) { + if (ifa->ifa_local != dn_eth2dn(dev->dev_addr)) { dn_dn2eth(mac_addr, ifa->ifa_local); dev_mc_add(dev, mac_addr, ETH_ALEN, 0); dev_mc_upload(dev); @@ -561,7 +561,7 @@ int dn_dev_ioctl(unsigned int cmd, void switch(cmd) { case SIOCGIFADDR: - *((dn_address *)sdn->sdn_nodeaddr) = ifa->ifa_local; + *((__le16 *)sdn->sdn_nodeaddr) = ifa->ifa_local; goto rarok; case SIOCSIFADDR: @@ -804,7 +804,7 @@ done: return skb->len; } -static int dn_dev_get_first(struct net_device *dev, dn_address *addr) +static int dn_dev_get_first(struct net_device *dev, __le16 *addr) { struct dn_dev *dn_db = (struct dn_dev *)dev->dn_ptr; struct dn_ifaddr *ifa; @@ -830,7 +830,7 @@ out: * a sensible default. Eventually the routing code will take care of all the * nasties for us I hope. */ -int dn_dev_bind_default(dn_address *addr) +int dn_dev_bind_default(__le16 *addr) { struct net_device *dev; int rv; @@ -853,7 +853,7 @@ static void dn_send_endnode_hello(struct { struct endnode_hello_message *msg; struct sk_buff *skb = NULL; - unsigned short int *pktlen; + __le16 *pktlen; struct dn_dev *dn_db = (struct dn_dev *)dev->dn_ptr; if ((skb = dn_alloc_skb(NULL, sizeof(*msg), GFP_ATOMIC)) == NULL) @@ -882,7 +882,7 @@ static void dn_send_endnode_hello(struct msg->datalen = 0x02; memset(msg->data, 0xAA, 2); - pktlen = (unsigned short *)skb_push(skb,2); + pktlen = (__le16 *)skb_push(skb,2); *pktlen = dn_htons(skb->len - 2); skb->nh.raw = skb->data; @@ -926,7 +926,7 @@ static void dn_send_router_hello(struct size_t size; unsigned char *ptr; unsigned char *i1, *i2; - unsigned short *pktlen; + __le16 *pktlen; char *src; if (mtu2blksize(dev) < (26 + 7)) @@ -955,11 +955,11 @@ static void dn_send_router_hello(struct ptr += ETH_ALEN; *ptr++ = dn_db->parms.forwarding == 1 ? DN_RT_INFO_L1RT : DN_RT_INFO_L2RT; - *((unsigned short *)ptr) = dn_htons(mtu2blksize(dev)); + *((__le16 *)ptr) = dn_htons(mtu2blksize(dev)); ptr += 2; *ptr++ = dn_db->parms.priority; /* Priority */ *ptr++ = 0; /* Area: Reserved */ - *((unsigned short *)ptr) = dn_htons((unsigned short)dn_db->parms.t3); + *((__le16 *)ptr) = dn_htons((unsigned short)dn_db->parms.t3); ptr += 2; *ptr++ = 0; /* MPD: Reserved */ i1 = ptr++; @@ -974,7 +974,7 @@ static void dn_send_router_hello(struct skb_trim(skb, (27 + *i2)); - pktlen = (unsigned short *)skb_push(skb, 2); + pktlen = (__le16 *)skb_push(skb, 2); *pktlen = dn_htons(skb->len - 2); skb->nh.raw = skb->data; @@ -1016,7 +1016,7 @@ static void dn_send_ptp_hello(struct net ptr = skb_put(skb, 2 + 4 + tdlen); *ptr++ = DN_RT_PKT_HELO; - *((dn_address *)ptr) = ifa->ifa_local; + *((__le16 *)ptr) = ifa->ifa_local; ptr += 2; *ptr++ = tdlen; @@ -1150,7 +1150,7 @@ struct dn_dev *dn_dev_create(struct net_ void dn_dev_up(struct net_device *dev) { struct dn_ifaddr *ifa; - dn_address addr = decnet_address; + __le16 addr = decnet_address; int maybe_default = 0; struct dn_dev *dn_db = (struct dn_dev *)dev->dn_ptr; @@ -1173,7 +1173,7 @@ void dn_dev_up(struct net_device *dev) if (dev->type == ARPHRD_ETHER) { if (memcmp(dev->dev_addr, dn_hiord, 4) != 0) return; - addr = dn_htons(dn_eth2dn(dev->dev_addr)); + addr = dn_eth2dn(dev->dev_addr); maybe_default = 1; } @@ -1385,8 +1385,8 @@ static int dn_dev_seq_show(struct seq_fi mtu2blksize(dev), dn_db->parms.priority, dn_db->parms.state, dn_db->parms.name, - dn_db->router ? dn_addr2asc(dn_ntohs(*(dn_address *)dn_db->router->primary_key), router_buf) : "", - dn_db->peer ? dn_addr2asc(dn_ntohs(*(dn_address *)dn_db->peer->primary_key), peer_buf) : ""); + dn_db->router ? dn_addr2asc(dn_ntohs(*(__le16 *)dn_db->router->primary_key), router_buf) : "", + dn_db->peer ? dn_addr2asc(dn_ntohs(*(__le16 *)dn_db->peer->primary_key), peer_buf) : ""); } return 0; } diff -puN net/decnet/dn_fib.c~git-net net/decnet/dn_fib.c --- devel/net/decnet/dn_fib.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_fib.c 2006-03-17 23:03:48.000000000 -0800 @@ -143,11 +143,11 @@ static inline struct dn_fib_info *dn_fib return NULL; } -u16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type) +__le16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type) { while(RTA_OK(attr,attrlen)) { if (attr->rta_type == type) - return *(u16*)RTA_DATA(attr); + return *(__le16*)RTA_DATA(attr); attr = RTA_NEXT(attr, attrlen); } @@ -565,7 +565,7 @@ int dn_fib_dump(struct sk_buff *skb, str return skb->len; } -static void fib_magic(int cmd, int type, __u16 dst, int dst_len, struct dn_ifaddr *ifa) +static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifaddr *ifa) { struct dn_fib_table *tb; struct { @@ -684,7 +684,7 @@ static int dn_fib_dnaddr_event(struct no return NOTIFY_DONE; } -int dn_fib_sync_down(dn_address local, struct net_device *dev, int force) +int dn_fib_sync_down(__le16 local, struct net_device *dev, int force) { int ret = 0; int scope = RT_SCOPE_NOWHERE; diff -puN net/decnet/dn_neigh.c~git-net net/decnet/dn_neigh.c --- devel/net/decnet/dn_neigh.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_neigh.c 2006-03-17 23:03:48.000000000 -0800 @@ -95,7 +95,7 @@ static struct neigh_ops dn_phase3_ops = struct neigh_table dn_neigh_table = { .family = PF_DECnet, .entry_size = sizeof(struct dn_neigh), - .key_len = sizeof(dn_address), + .key_len = sizeof(__le16), .hash = dn_neigh_hash, .constructor = dn_neigh_construct, .id = "dn_neigh_cache", @@ -123,7 +123,7 @@ struct neigh_table dn_neigh_table = { static u32 dn_neigh_hash(const void *pkey, const struct net_device *dev) { - return jhash_2words(*(dn_address *)pkey, 0, dn_neigh_table.hash_rnd); + return jhash_2words(*(__u16 *)pkey, 0, dn_neigh_table.hash_rnd); } static int dn_neigh_construct(struct neighbour *neigh) @@ -249,14 +249,14 @@ static int dn_long_output(struct sk_buff data = skb_push(skb, sizeof(struct dn_long_packet) + 3); lp = (struct dn_long_packet *)(data+3); - *((unsigned short *)data) = dn_htons(skb->len - 2); + *((__le16 *)data) = dn_htons(skb->len - 2); *(data + 2) = 1 | DN_RT_F_PF; /* Padding */ lp->msgflg = DN_RT_PKT_LONG|(cb->rt_flags&(DN_RT_F_IE|DN_RT_F_RQR|DN_RT_F_RTS)); lp->d_area = lp->d_subarea = 0; - dn_dn2eth(lp->d_id, dn_ntohs(cb->dst)); + dn_dn2eth(lp->d_id, cb->dst); lp->s_area = lp->s_subarea = 0; - dn_dn2eth(lp->s_id, dn_ntohs(cb->src)); + dn_dn2eth(lp->s_id, cb->src); lp->nl2 = 0; lp->visit_ct = cb->hops & 0x3f; lp->s_class = 0; @@ -293,7 +293,7 @@ static int dn_short_output(struct sk_buf } data = skb_push(skb, sizeof(struct dn_short_packet) + 2); - *((unsigned short *)data) = dn_htons(skb->len - 2); + *((__le16 *)data) = dn_htons(skb->len - 2); sp = (struct dn_short_packet *)(data+2); sp->msgflg = DN_RT_PKT_SHORT|(cb->rt_flags&(DN_RT_F_RQR|DN_RT_F_RTS)); @@ -335,7 +335,7 @@ static int dn_phase3_output(struct sk_bu } data = skb_push(skb, sizeof(struct dn_short_packet) + 2); - *((unsigned short *)data) = dn_htons(skb->len - 2); + *((__le16 *)data) = dn_htons(skb->len - 2); sp = (struct dn_short_packet *)(data + 2); sp->msgflg = DN_RT_PKT_SHORT|(cb->rt_flags&(DN_RT_F_RQR|DN_RT_F_RTS)); @@ -373,9 +373,9 @@ int dn_neigh_router_hello(struct sk_buff struct neighbour *neigh; struct dn_neigh *dn; struct dn_dev *dn_db; - dn_address src; + __le16 src; - src = dn_htons(dn_eth2dn(msg->id)); + src = dn_eth2dn(msg->id); neigh = __neigh_lookup(&dn_neigh_table, &src, skb->dev, 1); @@ -409,7 +409,7 @@ int dn_neigh_router_hello(struct sk_buff } /* Only use routers in our area */ - if ((dn_ntohs(src)>>10) == dn_ntohs((decnet_address)>>10)) { + if ((dn_ntohs(src)>>10) == (dn_ntohs((decnet_address))>>10)) { if (!dn_db->router) { dn_db->router = neigh_clone(neigh); } else { @@ -433,9 +433,9 @@ int dn_neigh_endnode_hello(struct sk_buf struct endnode_hello_message *msg = (struct endnode_hello_message *)skb->data; struct neighbour *neigh; struct dn_neigh *dn; - dn_address src; + __le16 src; - src = dn_htons(dn_eth2dn(msg->id)); + src = dn_eth2dn(msg->id); neigh = __neigh_lookup(&dn_neigh_table, &src, skb->dev, 1); diff -puN net/decnet/dn_nsp_in.c~git-net net/decnet/dn_nsp_in.c --- devel/net/decnet/dn_nsp_in.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_nsp_in.c 2006-03-17 23:03:48.000000000 -0800 @@ -85,7 +85,7 @@ static void dn_log_martian(struct sk_buf if (decnet_log_martians && net_ratelimit()) { char *devname = skb->dev ? skb->dev->name : "???"; struct dn_skb_cb *cb = DN_SKB_CB(skb); - printk(KERN_INFO "DECnet: Martian packet (%s) dev=%s src=0x%04hx dst=0x%04hx srcport=0x%04hx dstport=0x%04hx\n", msg, devname, cb->src, cb->dst, cb->src_port, cb->dst_port); + printk(KERN_INFO "DECnet: Martian packet (%s) dev=%s src=0x%04hx dst=0x%04hx srcport=0x%04hx dstport=0x%04hx\n", msg, devname, dn_ntohs(cb->src), dn_ntohs(cb->dst), dn_ntohs(cb->src_port), dn_ntohs(cb->dst_port)); } } @@ -128,7 +128,7 @@ static void dn_ack(struct sock *sk, stru */ static int dn_process_ack(struct sock *sk, struct sk_buff *skb, int oth) { - unsigned short *ptr = (unsigned short *)skb->data; + __le16 *ptr = (__le16 *)skb->data; int len = 0; unsigned short ack; @@ -346,7 +346,7 @@ static void dn_nsp_conn_conf(struct sock ptr = skb->data; cb->services = *ptr++; cb->info = *ptr++; - cb->segsize = dn_ntohs(*(__u16 *)ptr); + cb->segsize = dn_ntohs(*(__le16 *)ptr); if ((scp->state == DN_CI) || (scp->state == DN_CD)) { scp->persist = 0; @@ -363,7 +363,7 @@ static void dn_nsp_conn_conf(struct sock if (skb->len > 0) { unsigned char dlen = *skb->data; if ((dlen <= 16) && (dlen <= skb->len)) { - scp->conndata_in.opt_optl = dlen; + scp->conndata_in.opt_optl = dn_htons((__u16)dlen); memcpy(scp->conndata_in.opt_data, skb->data + 1, dlen); } } @@ -397,17 +397,17 @@ static void dn_nsp_disc_init(struct sock if (skb->len < 2) goto out; - reason = dn_ntohs(*(__u16 *)skb->data); + reason = dn_ntohs(*(__le16 *)skb->data); skb_pull(skb, 2); - scp->discdata_in.opt_status = reason; + scp->discdata_in.opt_status = dn_htons(reason); scp->discdata_in.opt_optl = 0; memset(scp->discdata_in.opt_data, 0, 16); if (skb->len > 0) { unsigned char dlen = *skb->data; if ((dlen <= 16) && (dlen <= skb->len)) { - scp->discdata_in.opt_optl = dlen; + scp->discdata_in.opt_optl = dn_htons((__u16)dlen); memcpy(scp->discdata_in.opt_data, skb->data + 1, dlen); } } @@ -464,7 +464,7 @@ static void dn_nsp_disc_conf(struct sock if (skb->len != 2) goto out; - reason = dn_ntohs(*(__u16 *)skb->data); + reason = dn_ntohs(*(__le16 *)skb->data); sk->sk_state = TCP_CLOSE; @@ -513,7 +513,7 @@ static void dn_nsp_linkservice(struct so if (skb->len != 4) goto out; - segnum = dn_ntohs(*(__u16 *)ptr); + segnum = dn_ntohs(*(__le16 *)ptr); ptr += 2; lsflags = *(unsigned char *)ptr++; fcval = *ptr; @@ -621,7 +621,7 @@ static void dn_nsp_otherdata(struct sock if (skb->len < 2) goto out; - cb->segnum = segnum = dn_ntohs(*(__u16 *)skb->data); + cb->segnum = segnum = dn_ntohs(*(__le16 *)skb->data); skb_pull(skb, 2); if (seq_next(scp->numoth_rcv, segnum)) { @@ -649,7 +649,7 @@ static void dn_nsp_data(struct sock *sk, if (skb->len < 2) goto out; - cb->segnum = segnum = dn_ntohs(*(__u16 *)skb->data); + cb->segnum = segnum = dn_ntohs(*(__le16 *)skb->data); skb_pull(skb, 2); if (seq_next(scp->numdat_rcv, segnum)) { @@ -760,7 +760,7 @@ static int dn_nsp_rx_packet(struct sk_bu /* * Grab the destination address. */ - cb->dst_port = *(unsigned short *)ptr; + cb->dst_port = *(__le16 *)ptr; cb->src_port = 0; ptr += 2; @@ -768,7 +768,7 @@ static int dn_nsp_rx_packet(struct sk_bu * If not a connack, grab the source address too. */ if (pskb_may_pull(skb, 5)) { - cb->src_port = *(unsigned short *)ptr; + cb->src_port = *(__le16 *)ptr; ptr += 2; skb_pull(skb, 5); } @@ -778,7 +778,7 @@ static int dn_nsp_rx_packet(struct sk_bu * Swap src & dst and look up in the normal way. */ if (unlikely(cb->rt_flags & DN_RT_F_RTS)) { - unsigned short tmp = cb->dst_port; + __le16 tmp = cb->dst_port; cb->dst_port = cb->src_port; cb->src_port = tmp; tmp = cb->dst; diff -puN net/decnet/dn_nsp_out.c~git-net net/decnet/dn_nsp_out.c --- devel/net/decnet/dn_nsp_out.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_nsp_out.c 2006-03-17 23:03:48.000000000 -0800 @@ -287,26 +287,26 @@ int dn_nsp_xmit_timeout(struct sock *sk) return 0; } -static inline unsigned char *dn_mk_common_header(struct dn_scp *scp, struct sk_buff *skb, unsigned char msgflag, int len) +static inline __le16 *dn_mk_common_header(struct dn_scp *scp, struct sk_buff *skb, unsigned char msgflag, int len) { unsigned char *ptr = skb_push(skb, len); BUG_ON(len < 5); *ptr++ = msgflag; - *((unsigned short *)ptr) = scp->addrrem; + *((__le16 *)ptr) = scp->addrrem; ptr += 2; - *((unsigned short *)ptr) = scp->addrloc; + *((__le16 *)ptr) = scp->addrloc; ptr += 2; - return ptr; + return (__le16 __force *)ptr; } -static unsigned short *dn_mk_ack_header(struct sock *sk, struct sk_buff *skb, unsigned char msgflag, int hlen, int other) +static __le16 *dn_mk_ack_header(struct sock *sk, struct sk_buff *skb, unsigned char msgflag, int hlen, int other) { struct dn_scp *scp = DN_SK(sk); unsigned short acknum = scp->numdat_rcv & 0x0FFF; unsigned short ackcrs = scp->numoth_rcv & 0x0FFF; - unsigned short *ptr; + __le16 *ptr; BUG_ON(hlen < 9); @@ -325,7 +325,7 @@ static unsigned short *dn_mk_ack_header( /* Set "cross subchannel" bit in ackcrs */ ackcrs |= 0x2000; - ptr = (unsigned short *)dn_mk_common_header(scp, skb, msgflag, hlen); + ptr = (__le16 *)dn_mk_common_header(scp, skb, msgflag, hlen); *ptr++ = dn_htons(acknum); *ptr++ = dn_htons(ackcrs); @@ -333,11 +333,11 @@ static unsigned short *dn_mk_ack_header( return ptr; } -static unsigned short *dn_nsp_mk_data_header(struct sock *sk, struct sk_buff *skb, int oth) +static __le16 *dn_nsp_mk_data_header(struct sock *sk, struct sk_buff *skb, int oth) { struct dn_scp *scp = DN_SK(sk); struct dn_skb_cb *cb = DN_SKB_CB(skb); - unsigned short *ptr = dn_mk_ack_header(sk, skb, cb->nsp_flags, 11, oth); + __le16 *ptr = dn_mk_ack_header(sk, skb, cb->nsp_flags, 11, oth); if (unlikely(oth)) { cb->segnum = scp->numoth; @@ -524,9 +524,9 @@ void dn_send_conn_conf(struct sock *sk, struct dn_scp *scp = DN_SK(sk); struct sk_buff *skb = NULL; struct nsp_conn_init_msg *msg; - unsigned char len = scp->conndata_out.opt_optl; + __u8 len = (__u8)dn_ntohs(scp->conndata_out.opt_optl); - if ((skb = dn_alloc_skb(sk, 50 + scp->conndata_out.opt_optl, gfp)) == NULL) + if ((skb = dn_alloc_skb(sk, 50 + dn_ntohs(scp->conndata_out.opt_optl), gfp)) == NULL) return; msg = (struct nsp_conn_init_msg *)skb_put(skb, sizeof(*msg)); @@ -553,7 +553,7 @@ void dn_send_conn_conf(struct sock *sk, static __inline__ void dn_nsp_do_disc(struct sock *sk, unsigned char msgflg, unsigned short reason, gfp_t gfp, struct dst_entry *dst, - int ddl, unsigned char *dd, __u16 rem, __u16 loc) + int ddl, unsigned char *dd, __le16 rem, __le16 loc) { struct sk_buff *skb = NULL; int size = 7 + ddl + ((msgflg == NSP_DISCINIT) ? 1 : 0); @@ -561,7 +561,7 @@ static __inline__ void dn_nsp_do_disc(st if ((dst == NULL) || (rem == 0)) { if (net_ratelimit()) - printk(KERN_DEBUG "DECnet: dn_nsp_do_disc: BUG! Please report this to SteveW@ACM.org rem=%u dst=%p\n", (unsigned)rem, dst); + printk(KERN_DEBUG "DECnet: dn_nsp_do_disc: BUG! Please report this to SteveW@ACM.org rem=%u dst=%p\n", dn_ntohs(rem), dst); return; } @@ -570,11 +570,11 @@ static __inline__ void dn_nsp_do_disc(st msg = skb_put(skb, size); *msg++ = msgflg; - *(__u16 *)msg = rem; + *(__le16 *)msg = rem; msg += 2; - *(__u16 *)msg = loc; + *(__le16 *)msg = loc; msg += 2; - *(__u16 *)msg = dn_htons(reason); + *(__le16 *)msg = dn_htons(reason); msg += 2; if (msgflg == NSP_DISCINIT) *msg++ = ddl; @@ -600,10 +600,10 @@ void dn_nsp_send_disc(struct sock *sk, u int ddl = 0; if (msgflg == NSP_DISCINIT) - ddl = scp->discdata_out.opt_optl; + ddl = dn_ntohs(scp->discdata_out.opt_optl); if (reason == 0) - reason = scp->discdata_out.opt_status; + reason = dn_ntohs(scp->discdata_out.opt_status); dn_nsp_do_disc(sk, msgflg, reason, gfp, sk->sk_dst_cache, ddl, scp->discdata_out.opt_data, scp->addrrem, scp->addrloc); @@ -708,7 +708,7 @@ void dn_nsp_send_conninit(struct sock *s if (aux > 0) memcpy(skb_put(skb, aux), scp->accessdata.acc_acc, aux); - aux = scp->conndata_out.opt_optl; + aux = (__u8)dn_ntohs(scp->conndata_out.opt_optl); *skb_put(skb, 1) = aux; if (aux > 0) memcpy(skb_put(skb,aux), scp->conndata_out.opt_data, aux); diff -puN net/decnet/dn_route.c~git-net net/decnet/dn_route.c --- devel/net/decnet/dn_route.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_route.c 2006-03-17 23:03:48.000000000 -0800 @@ -133,9 +133,9 @@ static struct dst_ops dn_dst_ops = { .entries = ATOMIC_INIT(0), }; -static __inline__ unsigned dn_hash(unsigned short src, unsigned short dst) +static __inline__ unsigned dn_hash(__le16 src, __le16 dst) { - unsigned short tmp = src ^ dst; + __u16 tmp = (__u16 __force)(src ^ dst); tmp ^= (tmp >> 3); tmp ^= (tmp >> 5); tmp ^= (tmp >> 10); @@ -149,8 +149,7 @@ static inline void dnrt_free(struct dn_r static inline void dnrt_drop(struct dn_route *rt) { - if (rt) - dst_release(&rt->u.dst); + dst_release(&rt->u.dst); call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free); } @@ -379,9 +378,9 @@ static int dn_return_short(struct sk_buf { struct dn_skb_cb *cb; unsigned char *ptr; - dn_address *src; - dn_address *dst; - dn_address tmp; + __le16 *src; + __le16 *dst; + __le16 tmp; /* Add back headers */ skb_push(skb, skb->data - skb->nh.raw); @@ -394,9 +393,9 @@ static int dn_return_short(struct sk_buf ptr = skb->data + 2; *ptr++ = (cb->rt_flags & ~DN_RT_F_RQR) | DN_RT_F_RTS; - dst = (dn_address *)ptr; + dst = (__le16 *)ptr; ptr += 2; - src = (dn_address *)ptr; + src = (__le16 *)ptr; ptr += 2; *ptr = 0; /* Zero hop count */ @@ -475,7 +474,8 @@ static int dn_route_rx_packet(struct sk_ struct dn_skb_cb *cb = DN_SKB_CB(skb); printk(KERN_DEBUG "DECnet: dn_route_rx_packet: rt_flags=0x%02x dev=%s len=%d src=0x%04hx dst=0x%04hx err=%d type=%d\n", - (int)cb->rt_flags, devname, skb->len, cb->src, cb->dst, + (int)cb->rt_flags, devname, skb->len, + dn_ntohs(cb->src), dn_ntohs(cb->dst), err, skb->pkt_type); } @@ -505,7 +505,7 @@ static int dn_route_rx_long(struct sk_bu /* Destination info */ ptr += 2; - cb->dst = dn_htons(dn_eth2dn(ptr)); + cb->dst = dn_eth2dn(ptr); if (memcmp(ptr, dn_hiord_addr, 4) != 0) goto drop_it; ptr += 6; @@ -513,7 +513,7 @@ static int dn_route_rx_long(struct sk_bu /* Source info */ ptr += 2; - cb->src = dn_htons(dn_eth2dn(ptr)); + cb->src = dn_eth2dn(ptr); if (memcmp(ptr, dn_hiord_addr, 4) != 0) goto drop_it; ptr += 6; @@ -541,9 +541,9 @@ static int dn_route_rx_short(struct sk_b skb_pull(skb, 5); skb->h.raw = skb->data; - cb->dst = *(dn_address *)ptr; + cb->dst = *(__le16 *)ptr; ptr += 2; - cb->src = *(dn_address *)ptr; + cb->src = *(__le16 *)ptr; ptr += 2; cb->hops = *ptr & 0x3f; @@ -575,7 +575,7 @@ int dn_route_rcv(struct sk_buff *skb, st { struct dn_skb_cb *cb; unsigned char flags = 0; - __u16 len = dn_ntohs(*(__u16 *)skb->data); + __u16 len = dn_ntohs(*(__le16 *)skb->data); struct dn_dev *dn = (struct dn_dev *)dev->dn_ptr; unsigned char padlen = 0; @@ -782,7 +782,7 @@ static int dn_rt_bug(struct sk_buff *skb struct dn_skb_cb *cb = DN_SKB_CB(skb); printk(KERN_DEBUG "dn_rt_bug: skb from:%04x to:%04x\n", - cb->src, cb->dst); + dn_ntohs(cb->src), dn_ntohs(cb->dst)); } kfree_skb(skb); @@ -823,7 +823,7 @@ static int dn_rt_set_next_hop(struct dn_ return 0; } -static inline int dn_match_addr(__u16 addr1, __u16 addr2) +static inline int dn_match_addr(__le16 addr1, __le16 addr2) { __u16 tmp = dn_ntohs(addr1) ^ dn_ntohs(addr2); int match = 16; @@ -834,9 +834,9 @@ static inline int dn_match_addr(__u16 ad return match; } -static __u16 dnet_select_source(const struct net_device *dev, __u16 daddr, int scope) +static __le16 dnet_select_source(const struct net_device *dev, __le16 daddr, int scope) { - __u16 saddr = 0; + __le16 saddr = 0; struct dn_dev *dn_db = dev->dn_ptr; struct dn_ifaddr *ifa; int best_match = 0; @@ -861,14 +861,14 @@ static __u16 dnet_select_source(const st return saddr; } -static inline __u16 __dn_fib_res_prefsrc(struct dn_fib_res *res) +static inline __le16 __dn_fib_res_prefsrc(struct dn_fib_res *res) { return dnet_select_source(DN_FIB_RES_DEV(*res), DN_FIB_RES_GW(*res), res->scope); } -static inline __u16 dn_fib_rules_map_destination(__u16 daddr, struct dn_fib_res *res) +static inline __le16 dn_fib_rules_map_destination(__le16 daddr, struct dn_fib_res *res) { - __u16 mask = dnet_make_mask(res->prefixlen); + __le16 mask = dnet_make_mask(res->prefixlen); return (daddr&~mask)|res->fi->fib_nh->nh_gw; } @@ -892,12 +892,13 @@ static int dn_route_output_slow(struct d struct dn_fib_res res = { .fi = NULL, .type = RTN_UNICAST }; int err; int free_res = 0; - __u16 gateway = 0; + __le16 gateway = 0; if (decnet_debug_level & 16) printk(KERN_DEBUG "dn_route_output_slow: dst=%04x src=%04x mark=%d" - " iif=%d oif=%d\n", oldflp->fld_dst, oldflp->fld_src, + " iif=%d oif=%d\n", dn_ntohs(oldflp->fld_dst), + dn_ntohs(oldflp->fld_src), oldflp->fld_fwmark, loopback_dev.ifindex, oldflp->oif); /* If we have an output interface, verify its a DECnet device */ @@ -961,8 +962,9 @@ source_ok: if (decnet_debug_level & 16) printk(KERN_DEBUG "dn_route_output_slow: initial checks complete." - " dst=%o4x src=%04x oif=%d try_hard=%d\n", fl.fld_dst, - fl.fld_src, fl.oif, try_hard); + " dst=%o4x src=%04x oif=%d try_hard=%d\n", + dn_ntohs(fl.fld_dst), dn_ntohs(fl.fld_src), + fl.oif, try_hard); /* * N.B. If the kernel is compiled without router support then @@ -1218,8 +1220,8 @@ static int dn_route_input_slow(struct sk struct neighbour *neigh = NULL; unsigned hash; int flags = 0; - __u16 gateway = 0; - __u16 local_src = 0; + __le16 gateway = 0; + __le16 local_src = 0; struct flowi fl = { .nl_u = { .dn_u = { .daddr = cb->dst, .saddr = cb->src, @@ -1266,7 +1268,7 @@ static int dn_route_input_slow(struct sk res.type = RTN_LOCAL; flags |= RTCF_DIRECTSRC; } else { - __u16 src_map = fl.fld_src; + __le16 src_map = fl.fld_src; free_res = 1; out_dev = DN_FIB_RES_DEV(res); diff -puN net/decnet/dn_rules.c~git-net net/decnet/dn_rules.c --- devel/net/decnet/dn_rules.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_rules.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,6 +27,8 @@ #include #include #include +#include +#include #include #include #include @@ -39,18 +41,18 @@ struct dn_fib_rule { - struct dn_fib_rule *r_next; + struct hlist_node r_hlist; atomic_t r_clntref; u32 r_preference; unsigned char r_table; unsigned char r_action; unsigned char r_dst_len; unsigned char r_src_len; - dn_address r_src; - dn_address r_srcmask; - dn_address r_dst; - dn_address r_dstmask; - dn_address r_srcmap; + __le16 r_src; + __le16 r_srcmask; + __le16 r_dst; + __le16 r_dstmask; + __le16 r_srcmap; u8 r_flags; #ifdef CONFIG_DECNET_ROUTE_FWMARK u32 r_fwmark; @@ -58,6 +60,7 @@ struct dn_fib_rule int r_ifindex; char r_ifname[IFNAMSIZ]; int r_dead; + struct rcu_head rcu; }; static struct dn_fib_rule default_rule = { @@ -67,18 +70,17 @@ static struct dn_fib_rule default_rule = .r_action = RTN_UNICAST }; -static struct dn_fib_rule *dn_fib_rules = &default_rule; -static DEFINE_RWLOCK(dn_fib_rules_lock); - +static struct hlist_head dn_fib_rules; int dn_fib_rtm_delrule(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) { struct rtattr **rta = arg; struct rtmsg *rtm = NLMSG_DATA(nlh); - struct dn_fib_rule *r, **rp; + struct dn_fib_rule *r; + struct hlist_node *node; int err = -ESRCH; - for(rp=&dn_fib_rules; (r=*rp) != NULL; rp = &r->r_next) { + hlist_for_each_entry(r, node, &dn_fib_rules, r_hlist) { if ((!rta[RTA_SRC-1] || memcmp(RTA_DATA(rta[RTA_SRC-1]), &r->r_src, 2) == 0) && rtm->rtm_src_len == r->r_src_len && rtm->rtm_dst_len == r->r_dst_len && @@ -95,10 +97,8 @@ int dn_fib_rtm_delrule(struct sk_buff *s if (r == &default_rule) break; - write_lock_bh(&dn_fib_rules_lock); - *rp = r->r_next; + hlist_del_rcu(&r->r_hlist); r->r_dead = 1; - write_unlock_bh(&dn_fib_rules_lock); dn_fib_rule_put(r); err = 0; break; @@ -108,11 +108,17 @@ int dn_fib_rtm_delrule(struct sk_buff *s return err; } +static inline void dn_fib_rule_put_rcu(struct rcu_head *head) +{ + struct dn_fib_rule *r = container_of(head, struct dn_fib_rule, rcu); + kfree(r); +} + void dn_fib_rule_put(struct dn_fib_rule *r) { if (atomic_dec_and_test(&r->r_clntref)) { if (r->r_dead) - kfree(r); + call_rcu(&r->rcu, dn_fib_rule_put_rcu); else printk(KERN_DEBUG "Attempt to free alive dn_fib_rule\n"); } @@ -123,7 +129,8 @@ int dn_fib_rtm_newrule(struct sk_buff *s { struct rtattr **rta = arg; struct rtmsg *rtm = NLMSG_DATA(nlh); - struct dn_fib_rule *r, *new_r, **rp; + struct dn_fib_rule *r, *new_r, *last = NULL; + struct hlist_node *node = NULL; unsigned char table_id; if (rtm->rtm_src_len > 16 || rtm->rtm_dst_len > 16) @@ -149,6 +156,7 @@ int dn_fib_rtm_newrule(struct sk_buff *s if (!new_r) return -ENOMEM; memset(new_r, 0, sizeof(*new_r)); + if (rta[RTA_SRC-1]) memcpy(&new_r->r_src, RTA_DATA(rta[RTA_SRC-1]), 2); if (rta[RTA_DST-1]) @@ -179,27 +187,26 @@ int dn_fib_rtm_newrule(struct sk_buff *s } } - rp = &dn_fib_rules; + r = container_of(dn_fib_rules.first, struct dn_fib_rule, r_hlist); if (!new_r->r_preference) { - r = dn_fib_rules; - if (r && (r = r->r_next) != NULL) { - rp = &dn_fib_rules->r_next; + if (r && r->r_hlist.next != NULL) { + r = container_of(r->r_hlist.next, struct dn_fib_rule, r_hlist); if (r->r_preference) new_r->r_preference = r->r_preference - 1; } } - while((r=*rp) != NULL) { + hlist_for_each_entry(r, node, &dn_fib_rules, r_hlist) { if (r->r_preference > new_r->r_preference) break; - rp = &r->r_next; + last = r; } - - new_r->r_next = r; atomic_inc(&new_r->r_clntref); - write_lock_bh(&dn_fib_rules_lock); - *rp = new_r; - write_unlock_bh(&dn_fib_rules_lock); + + if (last) + hlist_add_after_rcu(&last->r_hlist, &new_r->r_hlist); + else + hlist_add_before_rcu(&new_r->r_hlist, &r->r_hlist); return 0; } @@ -208,12 +215,14 @@ int dn_fib_lookup(const struct flowi *fl { struct dn_fib_rule *r, *policy; struct dn_fib_table *tb; - dn_address saddr = flp->fld_src; - dn_address daddr = flp->fld_dst; + __le16 saddr = flp->fld_src; + __le16 daddr = flp->fld_dst; + struct hlist_node *node; int err; - read_lock(&dn_fib_rules_lock); - for(r = dn_fib_rules; r; r = r->r_next) { + rcu_read_lock(); + + hlist_for_each_entry_rcu(r, node, &dn_fib_rules, r_hlist) { if (((saddr^r->r_src) & r->r_srcmask) || ((daddr^r->r_dst) & r->r_dstmask) || #ifdef CONFIG_DECNET_ROUTE_FWMARK @@ -228,14 +237,14 @@ int dn_fib_lookup(const struct flowi *fl policy = r; break; case RTN_UNREACHABLE: - read_unlock(&dn_fib_rules_lock); + rcu_read_unlock(); return -ENETUNREACH; default: case RTN_BLACKHOLE: - read_unlock(&dn_fib_rules_lock); + rcu_read_unlock(); return -EINVAL; case RTN_PROHIBIT: - read_unlock(&dn_fib_rules_lock); + rcu_read_unlock(); return -EACCES; } @@ -246,20 +255,20 @@ int dn_fib_lookup(const struct flowi *fl res->r = policy; if (policy) atomic_inc(&policy->r_clntref); - read_unlock(&dn_fib_rules_lock); + rcu_read_unlock(); return 0; } if (err < 0 && err != -EAGAIN) { - read_unlock(&dn_fib_rules_lock); + rcu_read_unlock(); return err; } } - read_unlock(&dn_fib_rules_lock); + rcu_read_unlock(); return -ESRCH; } -unsigned dnet_addr_type(__u16 addr) +unsigned dnet_addr_type(__le16 addr) { struct flowi fl = { .nl_u = { .dn_u = { .daddr = addr } } }; struct dn_fib_res res; @@ -277,7 +286,7 @@ unsigned dnet_addr_type(__u16 addr) return ret; } -__u16 dn_fib_rules_policy(__u16 saddr, struct dn_fib_res *res, unsigned *flags) +__le16 dn_fib_rules_policy(__le16 saddr, struct dn_fib_res *res, unsigned *flags) { struct dn_fib_rule *r = res->r; @@ -297,27 +306,23 @@ __u16 dn_fib_rules_policy(__u16 saddr, s static void dn_fib_rules_detach(struct net_device *dev) { + struct hlist_node *node; struct dn_fib_rule *r; - for(r = dn_fib_rules; r; r = r->r_next) { - if (r->r_ifindex == dev->ifindex) { - write_lock_bh(&dn_fib_rules_lock); + hlist_for_each_entry(r, node, &dn_fib_rules, r_hlist) { + if (r->r_ifindex == dev->ifindex) r->r_ifindex = -1; - write_unlock_bh(&dn_fib_rules_lock); - } } } static void dn_fib_rules_attach(struct net_device *dev) { + struct hlist_node *node; struct dn_fib_rule *r; - for(r = dn_fib_rules; r; r = r->r_next) { - if (r->r_ifindex == -1 && strcmp(dev->name, r->r_ifname) == 0) { - write_lock_bh(&dn_fib_rules_lock); + hlist_for_each_entry(r, node, &dn_fib_rules, r_hlist) { + if (r->r_ifindex == -1 && strcmp(dev->name, r->r_ifname) == 0) r->r_ifindex = dev->ifindex; - write_unlock_bh(&dn_fib_rules_lock); - } } } @@ -387,18 +392,20 @@ rtattr_failure: int dn_fib_dump_rules(struct sk_buff *skb, struct netlink_callback *cb) { - int idx; + int idx = 0; int s_idx = cb->args[0]; struct dn_fib_rule *r; + struct hlist_node *node; - read_lock(&dn_fib_rules_lock); - for(r = dn_fib_rules, idx = 0; r; r = r->r_next, idx++) { + rcu_read_lock(); + hlist_for_each_entry(r, node, &dn_fib_rules, r_hlist) { if (idx < s_idx) continue; if (dn_fib_fill_rule(skb, r, cb, NLM_F_MULTI) < 0) break; + idx++; } - read_unlock(&dn_fib_rules_lock); + rcu_read_unlock(); cb->args[0] = idx; return skb->len; @@ -406,6 +413,8 @@ int dn_fib_dump_rules(struct sk_buff *sk void __init dn_fib_rules_init(void) { + INIT_HLIST_HEAD(&dn_fib_rules); + hlist_add_head(&default_rule.r_hlist, &dn_fib_rules); register_netdevice_notifier(&dn_fib_rules_notifier); } diff -puN net/decnet/dn_table.c~git-net net/decnet/dn_table.c --- devel/net/decnet/dn_table.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/dn_table.c 2006-03-17 23:03:48.000000000 -0800 @@ -46,7 +46,7 @@ struct dn_zone u32 dz_hashmask; #define DZ_HASHMASK(dz) ((dz)->dz_hashmask) int dz_order; - u16 dz_mask; + __le16 dz_mask; #define DZ_MASK(dz) ((dz)->dz_mask) }; @@ -84,14 +84,14 @@ static int dn_fib_hash_zombies; static inline dn_fib_idx_t dn_hash(dn_fib_key_t key, struct dn_zone *dz) { - u16 h = ntohs(key.datum)>>(16 - dz->dz_order); + u16 h = dn_ntohs(key.datum)>>(16 - dz->dz_order); h ^= (h >> 10); h ^= (h >> 6); h &= DZ_HASHMASK(dz); return *(dn_fib_idx_t *)&h; } -static inline dn_fib_key_t dz_key(u16 dst, struct dn_zone *dz) +static inline dn_fib_key_t dz_key(__le16 dst, struct dn_zone *dz) { dn_fib_key_t k; k.datum = dst & DZ_MASK(dz); @@ -250,7 +250,7 @@ static int dn_fib_nh_match(struct rtmsg for_nexthops(fi) { int attrlen = nhlen - sizeof(struct rtnexthop); - dn_address gw; + __le16 gw; if (attrlen < 0 || (nhlen -= nhp->rtnh_len) < 0) return -EINVAL; @@ -457,7 +457,7 @@ static int dn_fib_table_insert(struct dn dz_key_0(key); if (rta->rta_dst) { - dn_address dst; + __le16 dst; memcpy(&dst, rta->rta_dst, 2); if (dst & ~DZ_MASK(dz)) return -EINVAL; @@ -593,7 +593,7 @@ static int dn_fib_table_delete(struct dn dz_key_0(key); if (rta->rta_dst) { - dn_address dst; + __le16 dst; memcpy(&dst, rta->rta_dst, 2); if (dst & ~DZ_MASK(dz)) return -EINVAL; diff -puN net/decnet/sysctl_net_decnet.c~git-net net/decnet/sysctl_net_decnet.c --- devel/net/decnet/sysctl_net_decnet.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/decnet/sysctl_net_decnet.c 2006-03-17 23:03:48.000000000 -0800 @@ -86,9 +86,9 @@ static void strip_it(char *str) * Simple routine to parse an ascii DECnet address * into a network order address. */ -static int parse_addr(dn_address *addr, char *str) +static int parse_addr(__le16 *addr, char *str) { - dn_address area, node; + __u16 area, node; while(*str && !ISNUM(*str)) str++; @@ -139,7 +139,7 @@ static int dn_node_address_strategy(ctl_ void **context) { size_t len; - dn_address addr; + __le16 addr; if (oldval && oldlenp) { if (get_user(len, oldlenp)) @@ -147,14 +147,14 @@ static int dn_node_address_strategy(ctl_ if (len) { if (len != sizeof(unsigned short)) return -EINVAL; - if (put_user(decnet_address, (unsigned short __user *)oldval)) + if (put_user(decnet_address, (__le16 __user *)oldval)) return -EFAULT; } } if (newval && newlen) { if (newlen != sizeof(unsigned short)) return -EINVAL; - if (get_user(addr, (unsigned short __user *)newval)) + if (get_user(addr, (__le16 __user *)newval)) return -EFAULT; dn_dev_devices_off(); @@ -173,7 +173,7 @@ static int dn_node_address_handler(ctl_t { char addr[DN_ASCBUF_LEN]; size_t len; - dn_address dnaddr; + __le16 dnaddr; if (!*lenp || (*ppos && !write)) { *lenp = 0; diff -puN net/ipv4/af_inet.c~git-net net/ipv4/af_inet.c --- devel/net/ipv4/af_inet.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/af_inet.c 2006-03-17 23:03:48.000000000 -0800 @@ -788,45 +788,53 @@ int inet_ioctl(struct socket *sock, unsi } const struct proto_ops inet_stream_ops = { - .family = PF_INET, - .owner = THIS_MODULE, - .release = inet_release, - .bind = inet_bind, - .connect = inet_stream_connect, - .socketpair = sock_no_socketpair, - .accept = inet_accept, - .getname = inet_getname, - .poll = tcp_poll, - .ioctl = inet_ioctl, - .listen = inet_listen, - .shutdown = inet_shutdown, - .setsockopt = sock_common_setsockopt, - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, - .sendpage = tcp_sendpage + .family = PF_INET, + .owner = THIS_MODULE, + .release = inet_release, + .bind = inet_bind, + .connect = inet_stream_connect, + .socketpair = sock_no_socketpair, + .accept = inet_accept, + .getname = inet_getname, + .poll = tcp_poll, + .ioctl = inet_ioctl, + .listen = inet_listen, + .shutdown = inet_shutdown, + .setsockopt = sock_common_setsockopt, + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, + .sendpage = tcp_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; const struct proto_ops inet_dgram_ops = { - .family = PF_INET, - .owner = THIS_MODULE, - .release = inet_release, - .bind = inet_bind, - .connect = inet_dgram_connect, - .socketpair = sock_no_socketpair, - .accept = sock_no_accept, - .getname = inet_getname, - .poll = udp_poll, - .ioctl = inet_ioctl, - .listen = sock_no_listen, - .shutdown = inet_shutdown, - .setsockopt = sock_common_setsockopt, - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, - .sendpage = inet_sendpage, + .family = PF_INET, + .owner = THIS_MODULE, + .release = inet_release, + .bind = inet_bind, + .connect = inet_dgram_connect, + .socketpair = sock_no_socketpair, + .accept = sock_no_accept, + .getname = inet_getname, + .poll = udp_poll, + .ioctl = inet_ioctl, + .listen = sock_no_listen, + .shutdown = inet_shutdown, + .setsockopt = sock_common_setsockopt, + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, + .sendpage = inet_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; /* @@ -834,24 +842,28 @@ const struct proto_ops inet_dgram_ops = * udp_poll */ static const struct proto_ops inet_sockraw_ops = { - .family = PF_INET, - .owner = THIS_MODULE, - .release = inet_release, - .bind = inet_bind, - .connect = inet_dgram_connect, - .socketpair = sock_no_socketpair, - .accept = sock_no_accept, - .getname = inet_getname, - .poll = datagram_poll, - .ioctl = inet_ioctl, - .listen = sock_no_listen, - .shutdown = inet_shutdown, - .setsockopt = sock_common_setsockopt, - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, - .sendpage = inet_sendpage, + .family = PF_INET, + .owner = THIS_MODULE, + .release = inet_release, + .bind = inet_bind, + .connect = inet_dgram_connect, + .socketpair = sock_no_socketpair, + .accept = sock_no_accept, + .getname = inet_getname, + .poll = datagram_poll, + .ioctl = inet_ioctl, + .listen = sock_no_listen, + .shutdown = inet_shutdown, + .setsockopt = sock_common_setsockopt, + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, + .sendpage = inet_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; static struct net_proto_family inet_family_ops = { diff -puN net/ipv4/ah4.c~git-net net/ipv4/ah4.c --- devel/net/ipv4/ah4.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/ah4.c 2006-03-17 23:03:48.000000000 -0800 @@ -97,6 +97,7 @@ static int ah_output(struct xfrm_state * ah->reserved = 0; ah->spi = x->id.spi; ah->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); ahp->icv(ahp, skb, ah->auth_data); top_iph->tos = iph->tos; diff -puN net/ipv4/arp.c~git-net net/ipv4/arp.c --- devel/net/ipv4/arp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/arp.c 2006-03-17 23:03:48.000000000 -0800 @@ -879,16 +879,16 @@ static int arp_process(struct sk_buff *s n = __neigh_lookup(&arp_tbl, &sip, dev, 0); -#ifdef CONFIG_IP_ACCEPT_UNSOLICITED_ARP - /* Unsolicited ARP is not accepted by default. - It is possible, that this option should be enabled for some - devices (strip is candidate) - */ - if (n == NULL && - arp->ar_op == htons(ARPOP_REPLY) && - inet_addr_type(sip) == RTN_UNICAST) - n = __neigh_lookup(&arp_tbl, &sip, dev, -1); -#endif + if (ipv4_devconf.arp_accept) { + /* Unsolicited ARP is not accepted by default. + It is possible, that this option should be enabled for some + devices (strip is candidate) + */ + if (n == NULL && + arp->ar_op == htons(ARPOP_REPLY) && + inet_addr_type(sip) == RTN_UNICAST) + n = __neigh_lookup(&arp_tbl, &sip, dev, -1); + } if (n) { int state = NUD_REACHABLE; diff -puN net/ipv4/devinet.c~git-net net/ipv4/devinet.c --- devel/net/ipv4/devinet.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/devinet.c 2006-03-17 23:03:48.000000000 -0800 @@ -1394,6 +1394,14 @@ static struct devinet_sysctl_table { .proc_handler = &proc_dointvec, }, { + .ctl_name = NET_IPV4_CONF_ARP_ACCEPT, + .procname = "arp_accept", + .data = &ipv4_devconf.arp_accept, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { .ctl_name = NET_IPV4_CONF_NOXFRM, .procname = "disable_xfrm", .data = &ipv4_devconf.no_xfrm, diff -puN net/ipv4/esp4.c~git-net net/ipv4/esp4.c --- devel/net/ipv4/esp4.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/esp4.c 2006-03-17 23:03:48.000000000 -0800 @@ -90,6 +90,7 @@ static int esp_output(struct xfrm_state esph->spi = x->id.spi; esph->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); if (esp->conf.ivlen) crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); diff -puN net/ipv4/fib_rules.c~git-net net/ipv4/fib_rules.c --- devel/net/ipv4/fib_rules.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/fib_rules.c 2006-03-17 23:03:48.000000000 -0800 @@ -40,6 +40,8 @@ #include #include #include +#include +#include #include #include @@ -52,7 +54,7 @@ struct fib_rule { - struct fib_rule *r_next; + struct hlist_node hlist; atomic_t r_clntref; u32 r_preference; unsigned char r_table; @@ -75,6 +77,7 @@ struct fib_rule #endif char r_ifname[IFNAMSIZ]; int r_dead; + struct rcu_head rcu; }; static struct fib_rule default_rule = { @@ -85,7 +88,6 @@ static struct fib_rule default_rule = { }; static struct fib_rule main_rule = { - .r_next = &default_rule, .r_clntref = ATOMIC_INIT(2), .r_preference = 0x7FFE, .r_table = RT_TABLE_MAIN, @@ -93,23 +95,24 @@ static struct fib_rule main_rule = { }; static struct fib_rule local_rule = { - .r_next = &main_rule, .r_clntref = ATOMIC_INIT(2), .r_table = RT_TABLE_LOCAL, .r_action = RTN_UNICAST, }; -static struct fib_rule *fib_rules = &local_rule; -static DEFINE_RWLOCK(fib_rules_lock); +static struct hlist_head fib_rules; + +/* writer func called from netlink -- rtnl_sem hold*/ int inet_rtm_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg) { struct rtattr **rta = arg; struct rtmsg *rtm = NLMSG_DATA(nlh); - struct fib_rule *r, **rp; + struct fib_rule *r; + struct hlist_node *node; int err = -ESRCH; - for (rp=&fib_rules; (r=*rp) != NULL; rp=&r->r_next) { + hlist_for_each_entry(r, node, &fib_rules, hlist) { if ((!rta[RTA_SRC-1] || memcmp(RTA_DATA(rta[RTA_SRC-1]), &r->r_src, 4) == 0) && rtm->rtm_src_len == r->r_src_len && rtm->rtm_dst_len == r->r_dst_len && @@ -126,10 +129,8 @@ int inet_rtm_delrule(struct sk_buff *skb if (r == &local_rule) break; - write_lock_bh(&fib_rules_lock); - *rp = r->r_next; + hlist_del_rcu(&r->hlist); r->r_dead = 1; - write_unlock_bh(&fib_rules_lock); fib_rule_put(r); err = 0; break; @@ -150,21 +151,30 @@ static struct fib_table *fib_empty_table return NULL; } +static inline void fib_rule_put_rcu(struct rcu_head *head) +{ + struct fib_rule *r = container_of(head, struct fib_rule, rcu); + kfree(r); +} + void fib_rule_put(struct fib_rule *r) { if (atomic_dec_and_test(&r->r_clntref)) { if (r->r_dead) - kfree(r); + call_rcu(&r->rcu, fib_rule_put_rcu); else printk("Freeing alive rule %p\n", r); } } +/* writer func called from netlink -- rtnl_sem hold*/ + int inet_rtm_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg) { struct rtattr **rta = arg; struct rtmsg *rtm = NLMSG_DATA(nlh); - struct fib_rule *r, *new_r, **rp; + struct fib_rule *r, *new_r, *last = NULL; + struct hlist_node *node = NULL; unsigned char table_id; if (rtm->rtm_src_len > 32 || rtm->rtm_dst_len > 32 || @@ -188,6 +198,7 @@ int inet_rtm_newrule(struct sk_buff *skb if (!new_r) return -ENOMEM; memset(new_r, 0, sizeof(*new_r)); + if (rta[RTA_SRC-1]) memcpy(&new_r->r_src, RTA_DATA(rta[RTA_SRC-1]), 4); if (rta[RTA_DST-1]) @@ -220,28 +231,28 @@ int inet_rtm_newrule(struct sk_buff *skb if (rta[RTA_FLOW-1]) memcpy(&new_r->r_tclassid, RTA_DATA(rta[RTA_FLOW-1]), 4); #endif + r = container_of(fib_rules.first, struct fib_rule, hlist); - rp = &fib_rules; if (!new_r->r_preference) { - r = fib_rules; - if (r && (r = r->r_next) != NULL) { - rp = &fib_rules->r_next; + if (r && r->hlist.next != NULL) { + r = container_of(r->hlist.next, struct fib_rule, hlist); if (r->r_preference) new_r->r_preference = r->r_preference - 1; } } - - while ( (r = *rp) != NULL ) { + + hlist_for_each_entry(r, node, &fib_rules, hlist) { if (r->r_preference > new_r->r_preference) break; - rp = &r->r_next; + last = r; } - - new_r->r_next = r; atomic_inc(&new_r->r_clntref); - write_lock_bh(&fib_rules_lock); - *rp = new_r; - write_unlock_bh(&fib_rules_lock); + + if (last) + hlist_add_after_rcu(&last->hlist, &new_r->hlist); + else + hlist_add_before_rcu(&new_r->hlist, &r->hlist); + return 0; } @@ -254,30 +265,30 @@ u32 fib_rules_tclass(struct fib_result * } #endif +/* callers should hold rtnl semaphore */ static void fib_rules_detach(struct net_device *dev) { + struct hlist_node *node; struct fib_rule *r; - for (r=fib_rules; r; r=r->r_next) { - if (r->r_ifindex == dev->ifindex) { - write_lock_bh(&fib_rules_lock); + hlist_for_each_entry(r, node, &fib_rules, hlist) { + if (r->r_ifindex == dev->ifindex) r->r_ifindex = -1; - write_unlock_bh(&fib_rules_lock); - } + } } +/* callers should hold rtnl semaphore */ + static void fib_rules_attach(struct net_device *dev) { + struct hlist_node *node; struct fib_rule *r; - for (r=fib_rules; r; r=r->r_next) { - if (r->r_ifindex == -1 && strcmp(dev->name, r->r_ifname) == 0) { - write_lock_bh(&fib_rules_lock); + hlist_for_each_entry(r, node, &fib_rules, hlist) { + if (r->r_ifindex == -1 && strcmp(dev->name, r->r_ifname) == 0) r->r_ifindex = dev->ifindex; - write_unlock_bh(&fib_rules_lock); - } } } @@ -286,14 +297,17 @@ int fib_lookup(const struct flowi *flp, int err; struct fib_rule *r, *policy; struct fib_table *tb; + struct hlist_node *node; u32 daddr = flp->fl4_dst; u32 saddr = flp->fl4_src; FRprintk("Lookup: %u.%u.%u.%u <- %u.%u.%u.%u ", NIPQUAD(flp->fl4_dst), NIPQUAD(flp->fl4_src)); - read_lock(&fib_rules_lock); - for (r = fib_rules; r; r=r->r_next) { + + rcu_read_lock(); + + hlist_for_each_entry_rcu(r, node, &fib_rules, hlist) { if (((saddr^r->r_src) & r->r_srcmask) || ((daddr^r->r_dst) & r->r_dstmask) || (r->r_tos && r->r_tos != flp->fl4_tos) || @@ -309,14 +323,14 @@ FRprintk("tb %d r %d ", r->r_table, r->r policy = r; break; case RTN_UNREACHABLE: - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -ENETUNREACH; default: case RTN_BLACKHOLE: - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -EINVAL; case RTN_PROHIBIT: - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -EACCES; } @@ -327,16 +341,16 @@ FRprintk("tb %d r %d ", r->r_table, r->r res->r = policy; if (policy) atomic_inc(&policy->r_clntref); - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return 0; } if (err < 0 && err != -EAGAIN) { - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return err; } } FRprintk("FAILURE\n"); - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -ENETUNREACH; } @@ -414,20 +428,25 @@ rtattr_failure: return -1; } +/* callers should hold rtnl semaphore */ + int inet_dump_rules(struct sk_buff *skb, struct netlink_callback *cb) { - int idx; + int idx = 0; int s_idx = cb->args[0]; struct fib_rule *r; + struct hlist_node *node; + + rcu_read_lock(); + hlist_for_each_entry(r, node, &fib_rules, hlist) { - read_lock(&fib_rules_lock); - for (r=fib_rules, idx=0; r; r = r->r_next, idx++) { if (idx < s_idx) continue; if (inet_fill_rule(skb, r, cb, NLM_F_MULTI) < 0) break; + idx++; } - read_unlock(&fib_rules_lock); + rcu_read_unlock(); cb->args[0] = idx; return skb->len; @@ -435,5 +454,9 @@ int inet_dump_rules(struct sk_buff *skb, void __init fib_rules_init(void) { + INIT_HLIST_HEAD(&fib_rules); + hlist_add_head(&local_rule.hlist, &fib_rules); + hlist_add_after(&local_rule.hlist, &main_rule.hlist); + hlist_add_after(&main_rule.hlist, &default_rule.hlist); register_netdevice_notifier(&fib_rules_notifier); } diff -puN net/ipv4/fib_trie.c~git-net net/ipv4/fib_trie.c --- devel/net/ipv4/fib_trie.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/fib_trie.c 2006-03-17 23:03:48.000000000 -0800 @@ -50,7 +50,7 @@ * Patrick McHardy */ -#define VERSION "0.404" +#define VERSION "0.406" #include #include @@ -84,7 +84,7 @@ #include "fib_lookup.h" #undef CONFIG_IP_FIB_TRIE_STATS -#define MAX_CHILDS 16384 +#define MAX_STAT_DEPTH 32 #define KEYLENGTH (8*sizeof(t_key)) #define MASK_PFX(k, l) (((l)==0)?0:(k >> (KEYLENGTH-l)) << (KEYLENGTH-l)) @@ -154,7 +154,7 @@ struct trie_stat { unsigned int tnodes; unsigned int leaves; unsigned int nullpointers; - unsigned int nodesizes[MAX_CHILDS]; + unsigned int nodesizes[MAX_STAT_DEPTH]; }; struct trie { @@ -2040,7 +2040,15 @@ rescan: static struct node *fib_trie_get_first(struct fib_trie_iter *iter, struct trie *t) { - struct node *n = rcu_dereference(t->trie); + struct node *n ; + + if(!t) + return NULL; + + n = rcu_dereference(t->trie); + + if(!iter) + return NULL; if (n && IS_TNODE(n)) { iter->tnode = (struct tnode *) n; @@ -2072,7 +2080,9 @@ static void trie_collect_stats(struct tr int i; s->tnodes++; - s->nodesizes[tn->bits]++; + if(tn->bits < MAX_STAT_DEPTH) + s->nodesizes[tn->bits]++; + for (i = 0; i < (1<bits); i++) if (!tn->child[i]) s->nullpointers++; @@ -2102,8 +2112,8 @@ static void trie_show_stats(struct seq_f seq_printf(seq, "\tInternal nodes: %d\n\t", stat->tnodes); bytes += sizeof(struct tnode) * stat->tnodes; - max = MAX_CHILDS-1; - while (max >= 0 && stat->nodesizes[max] == 0) + max = MAX_STAT_DEPTH; + while (max > 0 && stat->nodesizes[max-1] == 0) max--; pointers = 0; diff -puN net/ipv4/igmp.c~git-net net/ipv4/igmp.c --- devel/net/ipv4/igmp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/igmp.c 2006-03-17 23:03:48.000000000 -0800 @@ -1382,7 +1382,7 @@ static struct in_device * ip_mc_find_dev dev = ip_dev_find(imr->imr_address.s_addr); if (!dev) return NULL; - __dev_put(dev); + dev_put(dev); } if (!dev && !ip_route_output_key(&rt, &fl)) { @@ -1730,7 +1730,7 @@ int ip_mc_join_group(struct sock *sk , s if (!MULTICAST(addr)) return -EINVAL; - rtnl_shlock(); + rtnl_lock(); in_dev = ip_mc_find_dev(imr); @@ -1763,7 +1763,7 @@ int ip_mc_join_group(struct sock *sk , s ip_mc_inc_group(in_dev, addr); err = 0; done: - rtnl_shunlock(); + rtnl_unlock(); return err; } @@ -1837,7 +1837,7 @@ int ip_mc_source(int add, int omode, str if (!MULTICAST(addr)) return -EINVAL; - rtnl_shlock(); + rtnl_lock(); imr.imr_multiaddr.s_addr = mreqs->imr_multiaddr; imr.imr_address.s_addr = mreqs->imr_interface; @@ -1947,7 +1947,7 @@ int ip_mc_source(int add, int omode, str ip_mc_add_src(in_dev, &mreqs->imr_multiaddr, omode, 1, &mreqs->imr_sourceaddr, 1); done: - rtnl_shunlock(); + rtnl_unlock(); if (leavegroup) return ip_mc_leave_group(sk, &imr); return err; @@ -1970,7 +1970,7 @@ int ip_mc_msfilter(struct sock *sk, stru msf->imsf_fmode != MCAST_EXCLUDE) return -EINVAL; - rtnl_shlock(); + rtnl_lock(); imr.imr_multiaddr.s_addr = msf->imsf_multiaddr; imr.imr_address.s_addr = msf->imsf_interface; @@ -2030,7 +2030,7 @@ int ip_mc_msfilter(struct sock *sk, stru pmc->sfmode = msf->imsf_fmode; err = 0; done: - rtnl_shunlock(); + rtnl_unlock(); if (leavegroup) err = ip_mc_leave_group(sk, &imr); return err; @@ -2050,7 +2050,7 @@ int ip_mc_msfget(struct sock *sk, struct if (!MULTICAST(addr)) return -EINVAL; - rtnl_shlock(); + rtnl_lock(); imr.imr_multiaddr.s_addr = msf->imsf_multiaddr; imr.imr_address.s_addr = msf->imsf_interface; @@ -2072,7 +2072,7 @@ int ip_mc_msfget(struct sock *sk, struct goto done; msf->imsf_fmode = pmc->sfmode; psl = pmc->sflist; - rtnl_shunlock(); + rtnl_unlock(); if (!psl) { len = 0; count = 0; @@ -2091,7 +2091,7 @@ int ip_mc_msfget(struct sock *sk, struct return -EFAULT; return 0; done: - rtnl_shunlock(); + rtnl_unlock(); return err; } @@ -2112,7 +2112,7 @@ int ip_mc_gsfget(struct sock *sk, struct if (!MULTICAST(addr)) return -EINVAL; - rtnl_shlock(); + rtnl_lock(); err = -EADDRNOTAVAIL; @@ -2125,7 +2125,7 @@ int ip_mc_gsfget(struct sock *sk, struct goto done; gsf->gf_fmode = pmc->sfmode; psl = pmc->sflist; - rtnl_shunlock(); + rtnl_unlock(); count = psl ? psl->sl_count : 0; copycount = count < gsf->gf_numsrc ? count : gsf->gf_numsrc; gsf->gf_numsrc = count; @@ -2146,7 +2146,7 @@ int ip_mc_gsfget(struct sock *sk, struct } return 0; done: - rtnl_shunlock(); + rtnl_unlock(); return err; } diff -puN net/ipv4/inet_connection_sock.c~git-net net/ipv4/inet_connection_sock.c --- devel/net/ipv4/inet_connection_sock.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/inet_connection_sock.c 2006-03-17 23:03:48.000000000 -0800 @@ -648,3 +648,52 @@ void inet_csk_addr2sockaddr(struct sock } EXPORT_SYMBOL_GPL(inet_csk_addr2sockaddr); + +int inet_csk_ctl_sock_create(struct socket **sock, unsigned short family, + unsigned short type, unsigned char protocol) +{ + int rc = sock_create_kern(family, type, protocol, sock); + + if (rc == 0) { + (*sock)->sk->sk_allocation = GFP_ATOMIC; + inet_sk((*sock)->sk)->uc_ttl = -1; + /* + * Unhash it so that IP input processing does not even see it, + * we do not wish this socket to see incoming packets. + */ + (*sock)->sk->sk_prot->unhash((*sock)->sk); + } + return rc; +} + +EXPORT_SYMBOL_GPL(inet_csk_ctl_sock_create); + +#ifdef CONFIG_COMPAT +int inet_csk_compat_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + const struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_af_ops->compat_getsockopt != NULL) + return icsk->icsk_af_ops->compat_getsockopt(sk, level, optname, + optval, optlen); + return icsk->icsk_af_ops->getsockopt(sk, level, optname, + optval, optlen); +} + +EXPORT_SYMBOL_GPL(inet_csk_compat_getsockopt); + +int inet_csk_compat_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + const struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_af_ops->compat_setsockopt != NULL) + return icsk->icsk_af_ops->compat_setsockopt(sk, level, optname, + optval, optlen); + return icsk->icsk_af_ops->setsockopt(sk, level, optname, + optval, optlen); +} + +EXPORT_SYMBOL_GPL(inet_csk_compat_setsockopt); +#endif diff -puN net/ipv4/ipcomp.c~git-net net/ipv4/ipcomp.c --- devel/net/ipv4/ipcomp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/ipcomp.c 2006-03-17 23:03:48.000000000 -0800 @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -36,7 +37,7 @@ struct ipcomp_tfms { int users; }; -static DECLARE_MUTEX(ipcomp_resource_sem); +static DEFINE_MUTEX(ipcomp_resource_mutex); static void **ipcomp_scratches; static int ipcomp_scratch_users; static LIST_HEAD(ipcomp_tfms_list); @@ -253,7 +254,7 @@ error: } /* - * Must be protected by xfrm_cfg_sem. State and tunnel user references are + * Must be protected by xfrm_cfg_mutex. State and tunnel user references are * always incremented on success. */ static int ipcomp_tunnel_attach(struct xfrm_state *x) @@ -411,9 +412,9 @@ static void ipcomp_destroy(struct xfrm_s if (!ipcd) return; xfrm_state_delete_tunnel(x); - down(&ipcomp_resource_sem); + mutex_lock(&ipcomp_resource_mutex); ipcomp_free_data(ipcd); - up(&ipcomp_resource_sem); + mutex_unlock(&ipcomp_resource_mutex); kfree(ipcd); } @@ -440,14 +441,14 @@ static int ipcomp_init_state(struct xfrm if (x->props.mode) x->props.header_len += sizeof(struct iphdr); - down(&ipcomp_resource_sem); + mutex_lock(&ipcomp_resource_mutex); if (!ipcomp_alloc_scratches()) goto error; ipcd->tfms = ipcomp_alloc_tfms(x->calg->alg_name); if (!ipcd->tfms) goto error; - up(&ipcomp_resource_sem); + mutex_unlock(&ipcomp_resource_mutex); if (x->props.mode) { err = ipcomp_tunnel_attach(x); @@ -464,10 +465,10 @@ out: return err; error_tunnel: - down(&ipcomp_resource_sem); + mutex_lock(&ipcomp_resource_mutex); error: ipcomp_free_data(ipcd); - up(&ipcomp_resource_sem); + mutex_unlock(&ipcomp_resource_mutex); kfree(ipcd); goto out; } diff -puN net/ipv4/ipconfig.c~git-net net/ipv4/ipconfig.c --- devel/net/ipv4/ipconfig.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/ipconfig.c 2006-03-17 23:03:48.000000000 -0800 @@ -186,7 +186,7 @@ static int __init ic_open_devs(void) unsigned short oflags; last = &ic_first_dev; - rtnl_shlock(); + rtnl_lock(); /* bring loopback device up first */ if (dev_change_flags(&loopback_dev, loopback_dev.flags | IFF_UP) < 0) @@ -215,7 +215,7 @@ static int __init ic_open_devs(void) continue; } if (!(d = kmalloc(sizeof(struct ic_device), GFP_KERNEL))) { - rtnl_shunlock(); + rtnl_unlock(); return -1; } d->dev = dev; @@ -232,7 +232,7 @@ static int __init ic_open_devs(void) dev->name, able, d->xid)); } } - rtnl_shunlock(); + rtnl_unlock(); *last = NULL; @@ -251,7 +251,7 @@ static void __init ic_close_devs(void) struct ic_device *d, *next; struct net_device *dev; - rtnl_shlock(); + rtnl_lock(); next = ic_first_dev; while ((d = next)) { next = d->next; @@ -262,7 +262,7 @@ static void __init ic_close_devs(void) } kfree(d); } - rtnl_shunlock(); + rtnl_unlock(); } /* diff -puN net/ipv4/ipmr.c~git-net net/ipv4/ipmr.c --- devel/net/ipv4/ipmr.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/ipmr.c 2006-03-17 23:03:48.000000000 -0800 @@ -415,10 +415,10 @@ static int vif_add(struct vifctl *vifc, return -ENOBUFS; break; case 0: - dev=ip_dev_find(vifc->vifc_lcl_addr.s_addr); + dev = ip_dev_find(vifc->vifc_lcl_addr.s_addr); if (!dev) return -EADDRNOTAVAIL; - __dev_put(dev); + dev_put(dev); break; default: return -EINVAL; diff -puN net/ipv4/ip_sockglue.c~git-net net/ipv4/ip_sockglue.c --- devel/net/ipv4/ip_sockglue.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/ip_sockglue.c 2006-03-17 23:03:48.000000000 -0800 @@ -50,6 +50,7 @@ #define IP_CMSG_TOS 4 #define IP_CMSG_RECVOPTS 8 #define IP_CMSG_RETOPTS 16 +#define IP_CMSG_PASSSEC 32 /* * SOL_IP control messages. @@ -109,6 +110,19 @@ static void ip_cmsg_recv_retopts(struct put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data); } +static void ip_cmsg_recv_security(struct msghdr *msg, struct sk_buff *skb) +{ + char *secdata; + u32 seclen; + int err; + + err = security_socket_getpeersec_dgram(skb, &secdata, &seclen); + if (err) + return; + + put_cmsg(msg, SOL_IP, SCM_SECURITY, seclen, secdata); +} + void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb) { @@ -138,6 +152,11 @@ void ip_cmsg_recv(struct msghdr *msg, st if (flags & 1) ip_cmsg_recv_retopts(msg, skb); + if ((flags>>=1) == 0) + return; + + if (flags & 1) + ip_cmsg_recv_security(msg, skb); } int ip_cmsg_send(struct msghdr *msg, struct ipcm_cookie *ipc) @@ -380,20 +399,19 @@ out: * an IP socket. */ -int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) +static int do_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct inet_sock *inet = inet_sk(sk); int val=0,err; - if (level != SOL_IP) - return -ENOPROTOOPT; - if (((1<= sizeof(int)) { @@ -478,6 +496,12 @@ int ip_setsockopt(struct sock *sk, int l else inet->cmsg_flags &= ~IP_CMSG_RETOPTS; break; + case IP_PASSSEC: + if (val) + inet->cmsg_flags |= IP_CMSG_PASSSEC; + else + inet->cmsg_flags &= ~IP_CMSG_PASSSEC; + break; case IP_TOS: /* This sets both TOS and Precedence */ if (sk->sk_type == SOCK_STREAM) { val &= ~3; @@ -849,12 +873,7 @@ mc_msf_out: break; default: -#ifdef CONFIG_NETFILTER - err = nf_setsockopt(sk, PF_INET, optname, optval, - optlen); -#else err = -ENOPROTOOPT; -#endif break; } release_sock(sk); @@ -865,12 +884,68 @@ e_inval: return -EINVAL; } +int ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) +{ + int err; + + if (level != SOL_IP) + return -ENOPROTOOPT; + + err = do_ip_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_HDRINCL && + optname != IP_IPSEC_POLICY && optname != IP_XFRM_POLICY +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = nf_setsockopt(sk, PF_INET, optname, optval, optlen); + release_sock(sk); + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + int err; + + if (level != SOL_IP) + return -ENOPROTOOPT; + + err = do_ip_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_HDRINCL && + optname != IP_IPSEC_POLICY && optname != IP_XFRM_POLICY +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = compat_nf_setsockopt(sk, PF_INET, optname, + optval, optlen); + release_sock(sk); + } +#endif + return err; +} + +EXPORT_SYMBOL(compat_ip_setsockopt); +#endif + /* * Get the options. Note for future reference. The GET of IP options gets the * _received_ ones. The set sets the _sent_ ones. */ -int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) +static int do_ip_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) { struct inet_sock *inet = inet_sk(sk); int val; @@ -932,6 +1007,9 @@ int ip_getsockopt(struct sock *sk, int l case IP_RETOPTS: val = (inet->cmsg_flags & IP_CMSG_RETOPTS) != 0; break; + case IP_PASSSEC: + val = (inet->cmsg_flags & IP_CMSG_PASSSEC) != 0; + break; case IP_TOS: val = inet->tos; break; @@ -1051,17 +1129,8 @@ int ip_getsockopt(struct sock *sk, int l val = inet->freebind; break; default: -#ifdef CONFIG_NETFILTER - val = nf_getsockopt(sk, PF_INET, optname, optval, - &len); - release_sock(sk); - if (val >= 0) - val = put_user(len, optlen); - return val; -#else release_sock(sk); return -ENOPROTOOPT; -#endif } release_sock(sk); @@ -1082,6 +1151,67 @@ int ip_getsockopt(struct sock *sk, int l return 0; } +int ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + int err; + + err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if(get_user(len,optlen)) + return -EFAULT; + + lock_sock(sk); + err = nf_getsockopt(sk, PF_INET, optname, optval, + &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = compat_nf_getsockopt(sk, PF_INET, optname, optval, &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} + +EXPORT_SYMBOL(compat_ip_getsockopt); +#endif + EXPORT_SYMBOL(ip_cmsg_recv); EXPORT_SYMBOL(ip_getsockopt); diff -puN net/ipv4/ipvs/ip_vs_app.c~git-net net/ipv4/ipvs/ip_vs_app.c --- devel/net/ipv4/ipvs/ip_vs_app.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/ipvs/ip_vs_app.c 2006-03-17 23:03:48.000000000 -0800 @@ -31,6 +31,7 @@ #include #include #include +#include #include @@ -40,7 +41,7 @@ EXPORT_SYMBOL(register_ip_vs_app_inc); /* ipvs application list head */ static LIST_HEAD(ip_vs_app_list); -static DECLARE_MUTEX(__ip_vs_app_mutex); +static DEFINE_MUTEX(__ip_vs_app_mutex); /* @@ -173,11 +174,11 @@ register_ip_vs_app_inc(struct ip_vs_app { int result; - down(&__ip_vs_app_mutex); + mutex_lock(&__ip_vs_app_mutex); result = ip_vs_app_inc_new(app, proto, port); - up(&__ip_vs_app_mutex); + mutex_unlock(&__ip_vs_app_mutex); return result; } @@ -191,11 +192,11 @@ int register_ip_vs_app(struct ip_vs_app /* increase the module use count */ ip_vs_use_count_inc(); - down(&__ip_vs_app_mutex); + mutex_lock(&__ip_vs_app_mutex); list_add(&app->a_list, &ip_vs_app_list); - up(&__ip_vs_app_mutex); + mutex_unlock(&__ip_vs_app_mutex); return 0; } @@ -209,7 +210,7 @@ void unregister_ip_vs_app(struct ip_vs_a { struct ip_vs_app *inc, *nxt; - down(&__ip_vs_app_mutex); + mutex_lock(&__ip_vs_app_mutex); list_for_each_entry_safe(inc, nxt, &app->incs_list, a_list) { ip_vs_app_inc_release(inc); @@ -217,7 +218,7 @@ void unregister_ip_vs_app(struct ip_vs_a list_del(&app->a_list); - up(&__ip_vs_app_mutex); + mutex_unlock(&__ip_vs_app_mutex); /* decrease the module use count */ ip_vs_use_count_dec(); @@ -498,7 +499,7 @@ static struct ip_vs_app *ip_vs_app_idx(l static void *ip_vs_app_seq_start(struct seq_file *seq, loff_t *pos) { - down(&__ip_vs_app_mutex); + mutex_lock(&__ip_vs_app_mutex); return *pos ? ip_vs_app_idx(*pos - 1) : SEQ_START_TOKEN; } @@ -530,7 +531,7 @@ static void *ip_vs_app_seq_next(struct s static void ip_vs_app_seq_stop(struct seq_file *seq, void *v) { - up(&__ip_vs_app_mutex); + mutex_unlock(&__ip_vs_app_mutex); } static int ip_vs_app_seq_show(struct seq_file *seq, void *v) diff -puN net/ipv4/netfilter/arp_tables.c~git-net net/ipv4/netfilter/arp_tables.c --- devel/net/ipv4/netfilter/arp_tables.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/arp_tables.c 2006-03-17 23:03:48.000000000 -0800 @@ -22,7 +22,7 @@ #include #include -#include +#include #include #include @@ -208,6 +208,7 @@ static unsigned int arpt_error(struct sk const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -300,6 +301,7 @@ unsigned int arpt_do_table(struct sk_buf verdict = t->u.kernel.target->target(pskb, in, out, hook, + t->u.kernel.target, t->data, userdata); @@ -480,26 +482,31 @@ static inline int check_entry(struct arp } t->u.kernel.target = target; + ret = xt_check_target(target, NF_ARP, t->u.target_size - sizeof(*t), + name, e->comefrom, 0, 0); + if (ret) + goto err; + if (t->u.kernel.target == &arpt_standard_target) { if (!standard_check(t, size)) { ret = -EINVAL; goto out; } } else if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(name, e, t->data, + && !t->u.kernel.target->checkentry(name, e, target, t->data, t->u.target_size - sizeof(*t), e->comefrom)) { - module_put(t->u.kernel.target->me); duprintf("arp_tables: check failed for `%s'.\n", t->u.kernel.target->name); ret = -EINVAL; - goto out; + goto err; } (*i)++; return 0; - +err: + module_put(t->u.kernel.target->me); out: return ret; } @@ -555,7 +562,7 @@ static inline int cleanup_entry(struct a t = arpt_get_target(e); if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); return 0; @@ -1138,11 +1145,13 @@ void arpt_unregister_table(struct arpt_t /* The built-in targets: standard (NULL) and error. */ static struct arpt_target arpt_standard_target = { .name = ARPT_STANDARD_TARGET, + .targetsize = sizeof(int), }; static struct arpt_target arpt_error_target = { .name = ARPT_ERROR_TARGET, .target = arpt_error, + .targetsize = ARPT_FUNCTION_MAXNAMELEN, }; static struct nf_sockopt_ops arpt_sockopts = { diff -puN net/ipv4/netfilter/arpt_mangle.c~git-net net/ipv4/netfilter/arpt_mangle.c --- devel/net/ipv4/netfilter/arpt_mangle.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/arpt_mangle.c 2006-03-17 23:03:48.000000000 -0800 @@ -8,9 +8,10 @@ MODULE_AUTHOR("Bart De Schuymer "); MODULE_DESCRIPTION("Netfilter NAT helper module for PPTP"); @@ -198,7 +200,7 @@ pptp_outbound_pkt(struct sk_buff **pskb, /* only OUT_CALL_REQUEST, IN_CALL_REPLY, CALL_CLEAR_REQUEST pass * down to here */ DEBUGP("altering call id from 0x%04x to 0x%04x\n", - ntohs(*(u_int16_t *)pptpReq + cid_off), ntohs(new_callid)); + ntohs(REQ_CID(pptpReq, cid_off)), ntohs(new_callid)); /* mangle packet */ if (ip_nat_mangle_tcp_packet(pskb, ct, ctinfo, @@ -342,7 +344,7 @@ pptp_inbound_pkt(struct sk_buff **pskb, /* mangle packet */ DEBUGP("altering peer call id from 0x%04x to 0x%04x\n", - ntohs(*(u_int16_t *)pptpReq + pcid_off), ntohs(new_pcid)); + ntohs(REQ_CID(pptpReq, pcid_off)), ntohs(new_pcid)); if (ip_nat_mangle_tcp_packet(pskb, ct, ctinfo, pcid_off + sizeof(struct pptp_pkt_hdr) + @@ -353,7 +355,7 @@ pptp_inbound_pkt(struct sk_buff **pskb, if (new_cid) { DEBUGP("altering call id from 0x%04x to 0x%04x\n", - ntohs(*(u_int16_t *)pptpReq + cid_off), ntohs(new_cid)); + ntohs(REQ_CID(pptpReq, cid_off)), ntohs(new_cid)); if (ip_nat_mangle_tcp_packet(pskb, ct, ctinfo, cid_off + sizeof(struct pptp_pkt_hdr) + sizeof(struct PptpControlHeader), diff -puN net/ipv4/netfilter/ip_nat_rule.c~git-net net/ipv4/netfilter/ip_nat_rule.c --- devel/net/ipv4/netfilter/ip_nat_rule.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ip_nat_rule.c 2006-03-17 23:03:48.000000000 -0800 @@ -103,6 +103,7 @@ static unsigned int ipt_snat_target(stru const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct ipt_target *target, const void *targinfo, void *userinfo) { @@ -145,6 +146,7 @@ static unsigned int ipt_dnat_target(stru const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct ipt_target *target, const void *targinfo, void *userinfo) { @@ -170,6 +172,7 @@ static unsigned int ipt_dnat_target(stru static int ipt_snat_checkentry(const char *tablename, const void *entry, + const struct ipt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -181,28 +184,12 @@ static int ipt_snat_checkentry(const cha printk("SNAT: multiple ranges no longer supported\n"); return 0; } - - if (targinfosize != IPT_ALIGN(sizeof(struct ip_nat_multi_range_compat))) { - DEBUGP("SNAT: Target size %u wrong for %u ranges\n", - targinfosize, mr->rangesize); - return 0; - } - - /* Only allow these for NAT. */ - if (strcmp(tablename, "nat") != 0) { - DEBUGP("SNAT: wrong table %s\n", tablename); - return 0; - } - - if (hook_mask & ~(1 << NF_IP_POST_ROUTING)) { - DEBUGP("SNAT: hook mask 0x%x bad\n", hook_mask); - return 0; - } return 1; } static int ipt_dnat_checkentry(const char *tablename, const void *entry, + const struct ipt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -214,24 +201,6 @@ static int ipt_dnat_checkentry(const cha printk("DNAT: multiple ranges no longer supported\n"); return 0; } - - if (targinfosize != IPT_ALIGN(sizeof(struct ip_nat_multi_range_compat))) { - DEBUGP("DNAT: Target size %u wrong for %u ranges\n", - targinfosize, mr->rangesize); - return 0; - } - - /* Only allow these for NAT. */ - if (strcmp(tablename, "nat") != 0) { - DEBUGP("DNAT: wrong table %s\n", tablename); - return 0; - } - - if (hook_mask & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT))) { - DEBUGP("DNAT: hook mask 0x%x bad\n", hook_mask); - return 0; - } - return 1; } @@ -299,12 +268,18 @@ int ip_nat_rule_find(struct sk_buff **ps static struct ipt_target ipt_snat_reg = { .name = "SNAT", .target = ipt_snat_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = 1 << NF_IP_POST_ROUTING, .checkentry = ipt_snat_checkentry, }; static struct ipt_target ipt_dnat_reg = { .name = "DNAT", .target = ipt_dnat_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = 1 << NF_IP_PRE_ROUTING, .checkentry = ipt_dnat_checkentry, }; diff -puN net/ipv4/netfilter/ip_nat_snmp_basic.c~git-net net/ipv4/netfilter/ip_nat_snmp_basic.c --- devel/net/ipv4/netfilter/ip_nat_snmp_basic.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ip_nat_snmp_basic.c 2006-03-17 23:03:48.000000000 -0800 @@ -250,6 +250,7 @@ static unsigned char asn1_header_decode( if (!asn1_id_decode(ctx, cls, con, tag)) return 0; + def = len = 0; if (!asn1_length_decode(ctx, &def, &len)) return 0; @@ -669,7 +670,7 @@ static unsigned char snmp_object_decode( unsigned char *eoc, *end, *p; unsigned long *lp, *id; unsigned long ul; - long l; + long l; *obj = NULL; id = NULL; @@ -699,11 +700,13 @@ static unsigned char snmp_object_decode( return 0; } + type = 0; if (!snmp_tag_cls2syntax(tag, cls, &type)) { kfree(id); return 0; } + l = 0; switch (type) { case SNMP_INTEGER: len = sizeof(long); diff -puN net/ipv4/netfilter/ip_queue.c~git-net net/ipv4/netfilter/ip_queue.c --- devel/net/ipv4/netfilter/ip_queue.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ip_queue.c 2006-03-17 23:03:48.000000000 -0800 @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -61,7 +62,7 @@ static unsigned int queue_dropped = 0; static unsigned int queue_user_dropped = 0; static struct sock *ipqnl; static LIST_HEAD(queue_list); -static DECLARE_MUTEX(ipqnl_sem); +static DEFINE_MUTEX(ipqnl_mutex); static void ipq_issue_verdict(struct ipq_queue_entry *entry, int verdict) @@ -539,7 +540,7 @@ ipq_rcv_sk(struct sock *sk, int len) struct sk_buff *skb; unsigned int qlen; - down(&ipqnl_sem); + mutex_lock(&ipqnl_mutex); for (qlen = skb_queue_len(&sk->sk_receive_queue); qlen; qlen--) { skb = skb_dequeue(&sk->sk_receive_queue); @@ -547,7 +548,7 @@ ipq_rcv_sk(struct sock *sk, int len) kfree_skb(skb); } - up(&ipqnl_sem); + mutex_unlock(&ipqnl_mutex); } static int @@ -708,8 +709,8 @@ cleanup_sysctl: cleanup_ipqnl: sock_release(ipqnl->sk_socket); - down(&ipqnl_sem); - up(&ipqnl_sem); + mutex_lock(&ipqnl_mutex); + mutex_unlock(&ipqnl_mutex); cleanup_netlink_notifier: netlink_unregister_notifier(&ipq_nl_notifier); diff -puN net/ipv4/netfilter/ip_tables.c~git-net net/ipv4/netfilter/ip_tables.c --- devel/net/ipv4/netfilter/ip_tables.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ip_tables.c 2006-03-17 23:03:48.000000000 -0800 @@ -25,7 +25,7 @@ #include #include #include -#include +#include #include #include #include @@ -179,6 +179,7 @@ ipt_error(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -197,8 +198,8 @@ int do_match(struct ipt_entry_match *m, int *hotdrop) { /* Stop iteration if it doesn't match */ - if (!m->u.kernel.match->match(skb, in, out, m->data, offset, - skb->nh.iph->ihl*4, hotdrop)) + if (!m->u.kernel.match->match(skb, in, out, m->u.kernel.match, m->data, + offset, skb->nh.iph->ihl*4, hotdrop)) return 1; else return 0; @@ -305,6 +306,7 @@ ipt_do_table(struct sk_buff **pskb, verdict = t->u.kernel.target->target(pskb, in, out, hook, + t->u.kernel.target, t->data, userdata); @@ -464,7 +466,7 @@ cleanup_match(struct ipt_entry_match *m, return 1; if (m->u.kernel.match->destroy) - m->u.kernel.match->destroy(m->data, + m->u.kernel.match->destroy(m->u.kernel.match, m->data, m->u.match_size - sizeof(*m)); module_put(m->u.kernel.match->me); return 0; @@ -477,21 +479,12 @@ standard_check(const struct ipt_entry_ta struct ipt_standard_target *targ = (void *)t; /* Check standard info. */ - if (t->u.target_size - != IPT_ALIGN(sizeof(struct ipt_standard_target))) { - duprintf("standard_check: target size %u != %u\n", - t->u.target_size, - IPT_ALIGN(sizeof(struct ipt_standard_target))); - return 0; - } - if (targ->verdict >= 0 && targ->verdict > max_offset - sizeof(struct ipt_entry)) { duprintf("ipt_standard_check: bad verdict (%i)\n", targ->verdict); return 0; } - if (targ->verdict < -NF_MAX_VERDICT - 1) { duprintf("ipt_standard_check: bad negative verdict (%i)\n", targ->verdict); @@ -508,6 +501,7 @@ check_match(struct ipt_entry_match *m, unsigned int *i) { struct ipt_match *match; + int ret; match = try_then_request_module(xt_find_match(AF_INET, m->u.user.name, m->u.user.revision), @@ -518,18 +512,27 @@ check_match(struct ipt_entry_match *m, } m->u.kernel.match = match; + ret = xt_check_match(match, AF_INET, m->u.match_size - sizeof(*m), + name, hookmask, ip->proto, + ip->invflags & IPT_INV_PROTO); + if (ret) + goto err; + if (m->u.kernel.match->checkentry - && !m->u.kernel.match->checkentry(name, ip, m->data, + && !m->u.kernel.match->checkentry(name, ip, match, m->data, m->u.match_size - sizeof(*m), hookmask)) { - module_put(m->u.kernel.match->me); duprintf("ip_tables: check failed for `%s'.\n", m->u.kernel.match->name); - return -EINVAL; + ret = -EINVAL; + goto err; } (*i)++; return 0; +err: + module_put(m->u.kernel.match->me); + return ret; } static struct ipt_target ipt_standard_target; @@ -565,26 +568,32 @@ check_entry(struct ipt_entry *e, const c } t->u.kernel.target = target; + ret = xt_check_target(target, AF_INET, t->u.target_size - sizeof(*t), + name, e->comefrom, e->ip.proto, + e->ip.invflags & IPT_INV_PROTO); + if (ret) + goto err; + if (t->u.kernel.target == &ipt_standard_target) { if (!standard_check(t, size)) { ret = -EINVAL; goto cleanup_matches; } } else if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(name, e, t->data, + && !t->u.kernel.target->checkentry(name, e, target, t->data, t->u.target_size - sizeof(*t), e->comefrom)) { - module_put(t->u.kernel.target->me); duprintf("ip_tables: check failed for `%s'.\n", t->u.kernel.target->name); ret = -EINVAL; - goto cleanup_matches; + goto err; } (*i)++; return 0; - + err: + module_put(t->u.kernel.target->me); cleanup_matches: IPT_MATCH_ITERATE(e, cleanup_match, &j); return ret; @@ -645,7 +654,7 @@ cleanup_entry(struct ipt_entry *e, unsig IPT_MATCH_ITERATE(e, cleanup_match, NULL); t = ipt_get_target(e); if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); return 0; @@ -1277,6 +1286,7 @@ static int icmp_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -1310,28 +1320,27 @@ icmp_match(const struct sk_buff *skb, static int icmp_checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ipt_ip *ip = info; const struct ipt_icmp *icmpinfo = matchinfo; - /* Must specify proto == ICMP, and no unknown invflags */ - return ip->proto == IPPROTO_ICMP - && !(ip->invflags & IPT_INV_PROTO) - && matchsize == IPT_ALIGN(sizeof(struct ipt_icmp)) - && !(icmpinfo->invflags & ~IPT_ICMP_INV); + /* Must specify no unknown invflags */ + return !(icmpinfo->invflags & ~IPT_ICMP_INV); } /* The built-in targets: standard (NULL) and error. */ static struct ipt_target ipt_standard_target = { .name = IPT_STANDARD_TARGET, + .targetsize = sizeof(int), }; static struct ipt_target ipt_error_target = { .name = IPT_ERROR_TARGET, .target = ipt_error, + .targetsize = IPT_FUNCTION_MAXNAMELEN, }; static struct nf_sockopt_ops ipt_sockopts = { @@ -1346,8 +1355,10 @@ static struct nf_sockopt_ops ipt_sockopt static struct ipt_match icmp_matchstruct = { .name = "icmp", - .match = &icmp_match, - .checkentry = &icmp_checkentry, + .match = icmp_match, + .matchsize = sizeof(struct ipt_icmp), + .proto = IPPROTO_ICMP, + .checkentry = icmp_checkentry, }; static int __init init(void) diff -puN net/ipv4/netfilter/ipt_addrtype.c~git-net net/ipv4/netfilter/ipt_addrtype.c --- devel/net/ipv4/netfilter/ipt_addrtype.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_addrtype.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,8 +27,9 @@ static inline int match_type(u_int32_t a return !!(mask & (1 << inet_addr_type(addr))); } -static int match(const struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, const void *matchinfo, +static int match(const struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, int *hotdrop) { const struct ipt_addrtype_info *info = matchinfo; @@ -43,23 +44,10 @@ static int match(const struct sk_buff *s return ret; } -static int checkentry(const char *tablename, const void *ip, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_addrtype_info))) { - printk(KERN_ERR "ipt_addrtype: invalid size (%u != %Zu)\n", - matchsize, IPT_ALIGN(sizeof(struct ipt_addrtype_info))); - return 0; - } - - return 1; -} - static struct ipt_match addrtype_match = { .name = "addrtype", .match = match, - .checkentry = checkentry, + .matchsize = sizeof(struct ipt_addrtype_info), .me = THIS_MODULE }; diff -puN net/ipv4/netfilter/ipt_ah.c~git-net net/ipv4/netfilter/ipt_ah.c --- devel/net/ipv4/netfilter/ipt_ah.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ah.c 2006-03-17 23:03:48.000000000 -0800 @@ -39,6 +39,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -71,37 +72,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ipt_ah *ahinfo = matchinfo; - const struct ipt_ip *ip = ip_void; - /* Must specify proto == AH, and no unknown invflags */ - if (ip->proto != IPPROTO_AH || (ip->invflags & IPT_INV_PROTO)) { - duprintf("ipt_ah: Protocol %u != %u\n", ip->proto, - IPPROTO_AH); - return 0; - } - if (matchinfosize != IPT_ALIGN(sizeof(struct ipt_ah))) { - duprintf("ipt_ah: matchsize %u != %u\n", - matchinfosize, IPT_ALIGN(sizeof(struct ipt_ah))); - return 0; - } + /* Must specify no unknown invflags */ if (ahinfo->invflags & ~IPT_AH_INV_MASK) { - duprintf("ipt_ah: unknown flags %X\n", - ahinfo->invflags); + duprintf("ipt_ah: unknown flags %X\n", ahinfo->invflags); return 0; } - return 1; } static struct ipt_match ah_match = { .name = "ah", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_ah), + .proto = IPPROTO_AH, + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_CLUSTERIP.c~git-net net/ipv4/netfilter/ipt_CLUSTERIP.c --- devel/net/ipv4/netfilter/ipt_CLUSTERIP.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_CLUSTERIP.c 2006-03-17 23:03:48.000000000 -0800 @@ -311,6 +311,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -380,6 +381,7 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -389,13 +391,6 @@ checkentry(const char *tablename, struct clusterip_config *config; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_clusterip_tgt_info))) { - printk(KERN_WARNING "CLUSTERIP: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_clusterip_tgt_info))); - return 0; - } - if (cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP && cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT && cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT_DPT) { @@ -465,9 +460,10 @@ checkentry(const char *tablename, } /* drop reference count of cluster config when rule is deleted */ -static void destroy(void *matchinfo, unsigned int matchinfosize) +static void destroy(const struct xt_target *target, void *targinfo, + unsigned int targinfosize) { - struct ipt_clusterip_tgt_info *cipinfo = matchinfo; + struct ipt_clusterip_tgt_info *cipinfo = targinfo; /* if no more entries are referencing the config, remove it * from the list and destroy the proc entry */ @@ -476,12 +472,13 @@ static void destroy(void *matchinfo, uns clusterip_config_put(cipinfo->config); } -static struct ipt_target clusterip_tgt = { - .name = "CLUSTERIP", - .target = &target, - .checkentry = &checkentry, - .destroy = &destroy, - .me = THIS_MODULE +static struct ipt_target clusterip_tgt = { + .name = "CLUSTERIP", + .target = target, + .targetsize = sizeof(struct ipt_clusterip_tgt_info), + .checkentry = checkentry, + .destroy = destroy, + .me = THIS_MODULE }; diff -puN net/ipv4/netfilter/ipt_dscp.c~git-net net/ipv4/netfilter/ipt_dscp.c --- devel/net/ipv4/netfilter/ipt_dscp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_dscp.c 2006-03-17 23:03:48.000000000 -0800 @@ -19,8 +19,9 @@ MODULE_AUTHOR("Harald Welte tos&IPT_DSCP_MASK) == sh_dscp) ^ info->invert; } -static int checkentry(const char *tablename, const void *ip, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_dscp_info))) - return 0; - - return 1; -} - static struct ipt_match dscp_match = { .name = "dscp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_dscp_info), .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_DSCP.c~git-net net/ipv4/netfilter/ipt_DSCP.c --- devel/net/ipv4/netfilter/ipt_DSCP.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_DSCP.c 2006-03-17 23:03:48.000000000 -0800 @@ -29,6 +29,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -58,35 +59,25 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const u_int8_t dscp = ((struct ipt_DSCP_info *)targinfo)->dscp; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_DSCP_info))) { - printk(KERN_WARNING "DSCP: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_DSCP_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "DSCP: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if ((dscp > IPT_DSCP_MAX)) { printk(KERN_WARNING "DSCP: dscp %x out of range\n", dscp); return 0; } - return 1; } static struct ipt_target ipt_dscp_reg = { .name = "DSCP", .target = target, + .targetsize = sizeof(struct ipt_DSCP_info), + .table = "mangle", .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ecn.c~git-net net/ipv4/netfilter/ipt_ecn.c --- devel/net/ipv4/netfilter/ipt_ecn.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ecn.c 2006-03-17 23:03:48.000000000 -0800 @@ -65,8 +65,9 @@ static inline int match_tcp(const struct return 1; } -static int match(const struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, const void *matchinfo, +static int match(const struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, int *hotdrop) { const struct ipt_ecn_info *info = matchinfo; @@ -86,15 +87,13 @@ static int match(const struct sk_buff *s } static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ipt_ecn_info *info = matchinfo; const struct ipt_ip *ip = ip_void; - if (matchsize != IPT_ALIGN(sizeof(struct ipt_ecn_info))) - return 0; - if (info->operation & IPT_ECN_OP_MATCH_MASK) return 0; @@ -113,8 +112,9 @@ static int checkentry(const char *tablen static struct ipt_match ecn_match = { .name = "ecn", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_ecn_info), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ECN.c~git-net net/ipv4/netfilter/ipt_ECN.c --- devel/net/ipv4/netfilter/ipt_ECN.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ECN.c 2006-03-17 23:03:48.000000000 -0800 @@ -94,6 +94,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -114,6 +115,7 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -121,18 +123,6 @@ checkentry(const char *tablename, const struct ipt_ECN_info *einfo = (struct ipt_ECN_info *)targinfo; const struct ipt_entry *e = e_void; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_ECN_info))) { - printk(KERN_WARNING "ECN: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_ECN_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "ECN: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (einfo->operation & IPT_ECN_OP_MASK) { printk(KERN_WARNING "ECN: unsupported ECN operation %x\n", einfo->operation); @@ -143,20 +133,20 @@ checkentry(const char *tablename, einfo->ip_ect); return 0; } - if ((einfo->operation & (IPT_ECN_OP_SET_ECE|IPT_ECN_OP_SET_CWR)) && (e->ip.proto != IPPROTO_TCP || (e->ip.invflags & IPT_INV_PROTO))) { printk(KERN_WARNING "ECN: cannot use TCP operations on a " "non-tcp rule\n"); return 0; } - return 1; } static struct ipt_target ipt_ecn_reg = { .name = "ECN", .target = target, + .targetsize = sizeof(struct ipt_ECN_info), + .table = "mangle", .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_esp.c~git-net net/ipv4/netfilter/ipt_esp.c --- devel/net/ipv4/netfilter/ipt_esp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_esp.c 2006-03-17 23:03:48.000000000 -0800 @@ -40,6 +40,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -72,37 +73,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ipt_esp *espinfo = matchinfo; - const struct ipt_ip *ip = ip_void; - /* Must specify proto == ESP, and no unknown invflags */ - if (ip->proto != IPPROTO_ESP || (ip->invflags & IPT_INV_PROTO)) { - duprintf("ipt_esp: Protocol %u != %u\n", ip->proto, - IPPROTO_ESP); - return 0; - } - if (matchinfosize != IPT_ALIGN(sizeof(struct ipt_esp))) { - duprintf("ipt_esp: matchsize %u != %u\n", - matchinfosize, IPT_ALIGN(sizeof(struct ipt_esp))); - return 0; - } + /* Must specify no unknown invflags */ if (espinfo->invflags & ~IPT_ESP_INV_MASK) { - duprintf("ipt_esp: unknown flags %X\n", - espinfo->invflags); + duprintf("ipt_esp: unknown flags %X\n", espinfo->invflags); return 0; } - return 1; } static struct ipt_match esp_match = { .name = "esp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_esp), + .proto = IPPROTO_ESP, + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_hashlimit.c~git-net net/ipv4/netfilter/ipt_hashlimit.c --- devel/net/ipv4/netfilter/ipt_hashlimit.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_hashlimit.c 2006-03-17 23:03:48.000000000 -0800 @@ -427,6 +427,7 @@ static int hashlimit_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -506,15 +507,13 @@ hashlimit_match(const struct sk_buff *sk static int hashlimit_checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { struct ipt_hashlimit_info *r = matchinfo; - if (matchsize != IPT_ALIGN(sizeof(struct ipt_hashlimit_info))) - return 0; - /* Check for overflow. */ if (r->cfg.burst == 0 || user2credits(r->cfg.avg * r->cfg.burst) < @@ -558,19 +557,21 @@ hashlimit_checkentry(const char *tablena } static void -hashlimit_destroy(void *matchinfo, unsigned int matchsize) +hashlimit_destroy(const struct xt_match *match, void *matchinfo, + unsigned int matchsize) { struct ipt_hashlimit_info *r = (struct ipt_hashlimit_info *) matchinfo; htable_put(r->hinfo); } -static struct ipt_match ipt_hashlimit = { - .name = "hashlimit", - .match = hashlimit_match, - .checkentry = hashlimit_checkentry, - .destroy = hashlimit_destroy, - .me = THIS_MODULE +static struct ipt_match ipt_hashlimit = { + .name = "hashlimit", + .match = hashlimit_match, + .matchsize = sizeof(struct ipt_hashlimit_info), + .checkentry = hashlimit_checkentry, + .destroy = hashlimit_destroy, + .me = THIS_MODULE }; /* PROC stuff */ diff -puN net/ipv4/netfilter/ipt_iprange.c~git-net net/ipv4/netfilter/ipt_iprange.c --- devel/net/ipv4/netfilter/ipt_iprange.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_iprange.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,6 +27,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, int *hotdrop) { @@ -62,27 +63,12 @@ match(const struct sk_buff *skb, return 1; } -static int check(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - /* verify size */ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_iprange_info))) - return 0; - - return 1; -} - -static struct ipt_match iprange_match = -{ - .list = { NULL, NULL }, - .name = "iprange", - .match = &match, - .checkentry = &check, - .destroy = NULL, - .me = THIS_MODULE +static struct ipt_match iprange_match = { + .name = "iprange", + .match = match, + .matchsize = sizeof(struct ipt_iprange_info), + .destroy = NULL, + .me = THIS_MODULE }; static int __init init(void) diff -puN net/ipv4/netfilter/ipt_LOG.c~git-net net/ipv4/netfilter/ipt_LOG.c --- devel/net/ipv4/netfilter/ipt_LOG.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_LOG.c 2006-03-17 23:03:48.000000000 -0800 @@ -415,6 +415,7 @@ ipt_log_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -437,35 +438,29 @@ ipt_log_target(struct sk_buff **pskb, static int ipt_log_checkentry(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ipt_log_info *loginfo = targinfo; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_log_info))) { - DEBUGP("LOG: targinfosize %u != %u\n", - targinfosize, IPT_ALIGN(sizeof(struct ipt_log_info))); - return 0; - } - if (loginfo->level >= 8) { DEBUGP("LOG: level %u >= 8\n", loginfo->level); return 0; } - if (loginfo->prefix[sizeof(loginfo->prefix)-1] != '\0') { DEBUGP("LOG: prefix term %i\n", loginfo->prefix[sizeof(loginfo->prefix)-1]); return 0; } - return 1; } static struct ipt_target ipt_log_reg = { .name = "LOG", .target = ipt_log_target, + .targetsize = sizeof(struct ipt_log_info), .checkentry = ipt_log_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_MASQUERADE.c~git-net net/ipv4/netfilter/ipt_MASQUERADE.c --- devel/net/ipv4/netfilter/ipt_MASQUERADE.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_MASQUERADE.c 2006-03-17 23:03:48.000000000 -0800 @@ -41,25 +41,13 @@ static DEFINE_RWLOCK(masq_lock); static int masquerade_check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip_nat_multi_range_compat *mr = targinfo; - if (strcmp(tablename, "nat") != 0) { - DEBUGP("masquerade_check: bad table `%s'.\n", tablename); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP("masquerade_check: size %u != %u.\n", - targinfosize, sizeof(*mr)); - return 0; - } - if (hook_mask & ~(1 << NF_IP_POST_ROUTING)) { - DEBUGP("masquerade_check: bad hooks %x.\n", hook_mask); - return 0; - } if (mr->range[0].flags & IP_NAT_RANGE_MAP_IPS) { DEBUGP("masquerade_check: bad MAP_IPS.\n"); return 0; @@ -76,6 +64,7 @@ masquerade_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -179,6 +168,9 @@ static struct notifier_block masq_inet_n static struct ipt_target masquerade = { .name = "MASQUERADE", .target = masquerade_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = 1 << NF_IP_POST_ROUTING, .checkentry = masquerade_check, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_multiport.c~git-net net/ipv4/netfilter/ipt_multiport.c --- devel/net/ipv4/netfilter/ipt_multiport.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_multiport.c 2006-03-17 23:03:48.000000000 -0800 @@ -95,6 +95,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -127,6 +128,7 @@ static int match_v1(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -153,40 +155,19 @@ match_v1(const struct sk_buff *skb, return ports_match_v1(multiinfo, ntohs(pptr[0]), ntohs(pptr[1])); } -/* Called when user tries to insert an entry of this type. */ -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - return (matchsize == IPT_ALIGN(sizeof(struct ipt_multiport))); -} - -static int -checkentry_v1(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - return (matchsize == IPT_ALIGN(sizeof(struct ipt_multiport_v1))); -} - static struct ipt_match multiport_match = { .name = "multiport", .revision = 0, - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_multiport), .me = THIS_MODULE, }; static struct ipt_match multiport_match_v1 = { .name = "multiport", .revision = 1, - .match = &match_v1, - .checkentry = &checkentry_v1, + .match = match_v1, + .matchsize = sizeof(struct ipt_multiport_v1), .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_NETMAP.c~git-net net/ipv4/netfilter/ipt_NETMAP.c --- devel/net/ipv4/netfilter/ipt_NETMAP.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_NETMAP.c 2006-03-17 23:03:48.000000000 -0800 @@ -32,25 +32,13 @@ MODULE_DESCRIPTION("iptables 1:1 NAT map static int check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip_nat_multi_range_compat *mr = targinfo; - if (strcmp(tablename, "nat") != 0) { - DEBUGP(MODULENAME":check: bad table `%s'.\n", tablename); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP(MODULENAME":check: size %u.\n", targinfosize); - return 0; - } - if (hook_mask & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_POST_ROUTING) | - (1 << NF_IP_LOCAL_OUT))) { - DEBUGP(MODULENAME":check: bad hooks %x.\n", hook_mask); - return 0; - } if (!(mr->range[0].flags & IP_NAT_RANGE_MAP_IPS)) { DEBUGP(MODULENAME":check: bad MAP_IPS.\n"); return 0; @@ -67,6 +55,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -101,6 +90,10 @@ target(struct sk_buff **pskb, static struct ipt_target target_module = { .name = MODULENAME, .target = target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_POST_ROUTING) | + (1 << NF_IP_LOCAL_OUT), .checkentry = check, .me = THIS_MODULE }; diff -puN net/ipv4/netfilter/ipt_owner.c~git-net net/ipv4/netfilter/ipt_owner.c --- devel/net/ipv4/netfilter/ipt_owner.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_owner.c 2006-03-17 23:03:48.000000000 -0800 @@ -25,6 +25,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -53,37 +54,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ipt_owner_info *info = matchinfo; - if (hook_mask - & ~((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_POST_ROUTING))) { - printk("ipt_owner: only valid for LOCAL_OUT or POST_ROUTING.\n"); - return 0; - } - - if (matchsize != IPT_ALIGN(sizeof(struct ipt_owner_info))) { - printk("Matchsize %u != %Zu\n", matchsize, - IPT_ALIGN(sizeof(struct ipt_owner_info))); - return 0; - } - if (info->match & (IPT_OWNER_PID|IPT_OWNER_SID|IPT_OWNER_COMM)) { printk("ipt_owner: pid, sid and command matching " "not supported anymore\n"); return 0; } - return 1; } static struct ipt_match owner_match = { .name = "owner", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_owner_info), + .hooks = (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_POST_ROUTING), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -L net/ipv4/netfilter/ipt_policy.c -puN net/ipv4/netfilter/ipt_policy.c~git-net /dev/null --- devel/net/ipv4/netfilter/ipt_policy.c +++ /dev/null 2003-09-15 06:40:47.000000000 -0700 @@ -1,176 +0,0 @@ -/* IP tables module for matching IPsec policy - * - * Copyright (c) 2004,2005 Patrick McHardy, - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -MODULE_AUTHOR("Patrick McHardy "); -MODULE_DESCRIPTION("IPtables IPsec policy matching module"); -MODULE_LICENSE("GPL"); - - -static inline int -match_xfrm_state(struct xfrm_state *x, const struct ipt_policy_elem *e) -{ -#define MATCH_ADDR(x,y,z) (!e->match.x || \ - ((e->x.a4.s_addr == (e->y.a4.s_addr & (z))) \ - ^ e->invert.x)) -#define MATCH(x,y) (!e->match.x || ((e->x == (y)) ^ e->invert.x)) - - return MATCH_ADDR(saddr, smask, x->props.saddr.a4) && - MATCH_ADDR(daddr, dmask, x->id.daddr.a4) && - MATCH(proto, x->id.proto) && - MATCH(mode, x->props.mode) && - MATCH(spi, x->id.spi) && - MATCH(reqid, x->props.reqid); -} - -static int -match_policy_in(const struct sk_buff *skb, const struct ipt_policy_info *info) -{ - const struct ipt_policy_elem *e; - struct sec_path *sp = skb->sp; - int strict = info->flags & IPT_POLICY_MATCH_STRICT; - int i, pos; - - if (sp == NULL) - return -1; - if (strict && info->len != sp->len) - return 0; - - for (i = sp->len - 1; i >= 0; i--) { - pos = strict ? i - sp->len + 1 : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(sp->x[i].xvec, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? 1 : 0; -} - -static int -match_policy_out(const struct sk_buff *skb, const struct ipt_policy_info *info) -{ - const struct ipt_policy_elem *e; - struct dst_entry *dst = skb->dst; - int strict = info->flags & IPT_POLICY_MATCH_STRICT; - int i, pos; - - if (dst->xfrm == NULL) - return -1; - - for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { - pos = strict ? i : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(dst->xfrm, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? i == info->len : 0; -} - -static int match(const struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - const void *matchinfo, - int offset, - unsigned int protoff, - int *hotdrop) -{ - const struct ipt_policy_info *info = matchinfo; - int ret; - - if (info->flags & IPT_POLICY_MATCH_IN) - ret = match_policy_in(skb, info); - else - ret = match_policy_out(skb, info); - - if (ret < 0) - ret = info->flags & IPT_POLICY_MATCH_NONE ? 1 : 0; - else if (info->flags & IPT_POLICY_MATCH_NONE) - ret = 0; - - return ret; -} - -static int checkentry(const char *tablename, const void *ip_void, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - struct ipt_policy_info *info = matchinfo; - - if (matchsize != IPT_ALIGN(sizeof(*info))) { - printk(KERN_ERR "ipt_policy: matchsize %u != %zu\n", - matchsize, IPT_ALIGN(sizeof(*info))); - return 0; - } - if (!(info->flags & (IPT_POLICY_MATCH_IN|IPT_POLICY_MATCH_OUT))) { - printk(KERN_ERR "ipt_policy: neither incoming nor " - "outgoing policy selected\n"); - return 0; - } - if (hook_mask & (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_LOCAL_IN) - && info->flags & IPT_POLICY_MATCH_OUT) { - printk(KERN_ERR "ipt_policy: output policy not valid in " - "PRE_ROUTING and INPUT\n"); - return 0; - } - if (hook_mask & (1 << NF_IP_POST_ROUTING | 1 << NF_IP_LOCAL_OUT) - && info->flags & IPT_POLICY_MATCH_IN) { - printk(KERN_ERR "ipt_policy: input policy not valid in " - "POST_ROUTING and OUTPUT\n"); - return 0; - } - if (info->len > IPT_POLICY_MAX_ELEM) { - printk(KERN_ERR "ipt_policy: too many policy elements\n"); - return 0; - } - - return 1; -} - -static struct ipt_match policy_match = { - .name = "policy", - .match = match, - .checkentry = checkentry, - .me = THIS_MODULE, -}; - -static int __init init(void) -{ - return ipt_register_match(&policy_match); -} - -static void __exit fini(void) -{ - ipt_unregister_match(&policy_match); -} - -module_init(init); -module_exit(fini); diff -puN net/ipv4/netfilter/ipt_recent.c~git-net net/ipv4/netfilter/ipt_recent.c --- devel/net/ipv4/netfilter/ipt_recent.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_recent.c 2006-03-17 23:03:48.000000000 -0800 @@ -102,6 +102,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -318,7 +319,7 @@ static int ip_recent_ctrl(struct file *f skb->nh.iph->daddr = 0; /* Clear ttl since we have no way of knowing it */ skb->nh.iph->ttl = 0; - match(skb,NULL,NULL,info,0,0,NULL); + match(skb,NULL,NULL,NULL,info,0,0,NULL); kfree(skb->nh.iph); out_free_skb: @@ -356,6 +357,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -657,6 +659,7 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -670,8 +673,6 @@ checkentry(const char *tablename, if(debug) printk(KERN_INFO RECENT_NAME ": checkentry() entered.\n"); #endif - if (matchsize != IPT_ALIGN(sizeof(struct ipt_recent_info))) return 0; - /* seconds and hit_count only valid for CHECK/UPDATE */ if(info->check_set & IPT_RECENT_SET) { flag++; if(info->seconds || info->hit_count) return 0; } if(info->check_set & IPT_RECENT_REMOVE) { flag++; if(info->seconds || info->hit_count) return 0; } @@ -871,7 +872,7 @@ checkentry(const char *tablename, * up its memory. */ static void -destroy(void *matchinfo, unsigned int matchsize) +destroy(const struct xt_match *match, void *matchinfo, unsigned int matchsize) { const struct ipt_recent_info *info = matchinfo; struct recent_ip_tables *curr_table, *last_table; @@ -951,12 +952,13 @@ destroy(void *matchinfo, unsigned int ma /* This is the structure we pass to ipt_register to register our * module with iptables. */ -static struct ipt_match recent_match = { - .name = "recent", - .match = &match, - .checkentry = &checkentry, - .destroy = &destroy, - .me = THIS_MODULE +static struct ipt_match recent_match = { + .name = "recent", + .match = match, + .matchsize = sizeof(struct ipt_recent_info), + .checkentry = checkentry, + .destroy = destroy, + .me = THIS_MODULE }; /* Kernel module initialization. */ diff -puN net/ipv4/netfilter/ipt_REDIRECT.c~git-net net/ipv4/netfilter/ipt_REDIRECT.c --- devel/net/ipv4/netfilter/ipt_REDIRECT.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_REDIRECT.c 2006-03-17 23:03:48.000000000 -0800 @@ -34,24 +34,13 @@ MODULE_DESCRIPTION("iptables REDIRECT ta static int redirect_check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip_nat_multi_range_compat *mr = targinfo; - if (strcmp(tablename, "nat") != 0) { - DEBUGP("redirect_check: bad table `%s'.\n", table); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP("redirect_check: size %u.\n", targinfosize); - return 0; - } - if (hook_mask & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT))) { - DEBUGP("redirect_check: bad hooks %x.\n", hook_mask); - return 0; - } if (mr->range[0].flags & IP_NAT_RANGE_MAP_IPS) { DEBUGP("redirect_check: bad MAP_IPS.\n"); return 0; @@ -68,6 +57,7 @@ redirect_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -115,6 +105,9 @@ redirect_target(struct sk_buff **pskb, static struct ipt_target redirect_reg = { .name = "REDIRECT", .target = redirect_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT), .checkentry = redirect_check, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_REJECT.c~git-net net/ipv4/netfilter/ipt_REJECT.c --- devel/net/ipv4/netfilter/ipt_REJECT.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_REJECT.c 2006-03-17 23:03:48.000000000 -0800 @@ -154,10 +154,6 @@ static void send_reset(struct sk_buff *o /* This packet will not be the same as the other: clear nf fields */ nf_reset(nskb); nskb->nfmark = 0; -#ifdef CONFIG_BRIDGE_NETFILTER - nf_bridge_put(nskb->nf_bridge); - nskb->nf_bridge = NULL; -#endif tcph = (struct tcphdr *)((u_int32_t*)nskb->nh.iph + nskb->nh.iph->ihl); @@ -236,6 +232,7 @@ static unsigned int reject(struct sk_buf const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -283,6 +280,7 @@ static unsigned int reject(struct sk_buf static int check(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -290,23 +288,6 @@ static int check(const char *tablename, const struct ipt_reject_info *rejinfo = targinfo; const struct ipt_entry *e = e_void; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_reject_info))) { - DEBUGP("REJECT: targinfosize %u != 0\n", targinfosize); - return 0; - } - - /* Only allow these for packet filtering. */ - if (strcmp(tablename, "filter") != 0) { - DEBUGP("REJECT: bad table `%s'.\n", tablename); - return 0; - } - if ((hook_mask & ~((1 << NF_IP_LOCAL_IN) - | (1 << NF_IP_FORWARD) - | (1 << NF_IP_LOCAL_OUT))) != 0) { - DEBUGP("REJECT: bad hook mask %X\n", hook_mask); - return 0; - } - if (rejinfo->with == IPT_ICMP_ECHOREPLY) { printk("REJECT: ECHOREPLY no longer supported.\n"); return 0; @@ -318,13 +299,16 @@ static int check(const char *tablename, return 0; } } - return 1; } static struct ipt_target ipt_reject_reg = { .name = "REJECT", .target = reject, + .targetsize = sizeof(struct ipt_reject_info), + .table = "filter", + .hooks = (1 << NF_IP_LOCAL_IN) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_LOCAL_OUT), .checkentry = check, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_SAME.c~git-net net/ipv4/netfilter/ipt_SAME.c --- devel/net/ipv4/netfilter/ipt_SAME.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_SAME.c 2006-03-17 23:03:48.000000000 -0800 @@ -50,6 +50,7 @@ MODULE_DESCRIPTION("iptables special SNA static int same_check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -59,18 +60,6 @@ same_check(const char *tablename, mr->ipnum = 0; - if (strcmp(tablename, "nat") != 0) { - DEBUGP("same_check: bad table `%s'.\n", tablename); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP("same_check: size %u.\n", targinfosize); - return 0; - } - if (hook_mask & ~(1 << NF_IP_PRE_ROUTING | 1 << NF_IP_POST_ROUTING)) { - DEBUGP("same_check: bad hooks %x.\n", hook_mask); - return 0; - } if (mr->rangesize < 1) { DEBUGP("same_check: need at least one dest range.\n"); return 0; @@ -127,7 +116,7 @@ same_check(const char *tablename, } static void -same_destroy(void *targinfo, +same_destroy(const struct xt_target *target, void *targinfo, unsigned int targinfosize) { struct ipt_same_info *mr = targinfo; @@ -143,6 +132,7 @@ same_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -191,6 +181,9 @@ same_target(struct sk_buff **pskb, static struct ipt_target same_reg = { .name = "SAME", .target = same_target, + .targetsize = sizeof(struct ipt_same_info), + .table = "nat", + .hooks = (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_POST_ROUTING), .checkentry = same_check, .destroy = same_destroy, .me = THIS_MODULE, diff -puN net/ipv4/netfilter/ipt_TCPMSS.c~git-net net/ipv4/netfilter/ipt_TCPMSS.c --- devel/net/ipv4/netfilter/ipt_TCPMSS.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_TCPMSS.c 2006-03-17 23:03:48.000000000 -0800 @@ -48,6 +48,7 @@ ipt_tcpmss_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -211,6 +212,7 @@ static inline int find_syn_match(const s static int ipt_tcpmss_checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -218,13 +220,6 @@ ipt_tcpmss_checkentry(const char *tablen const struct ipt_tcpmss_info *tcpmssinfo = targinfo; const struct ipt_entry *e = e_void; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_tcpmss_info))) { - DEBUGP("ipt_tcpmss_checkentry: targinfosize %u != %u\n", - targinfosize, IPT_ALIGN(sizeof(struct ipt_tcpmss_info))); - return 0; - } - - if((tcpmssinfo->mss == IPT_TCPMSS_CLAMP_PMTU) && ((hook_mask & ~((1 << NF_IP_FORWARD) | (1 << NF_IP_LOCAL_OUT) @@ -233,11 +228,8 @@ ipt_tcpmss_checkentry(const char *tablen return 0; } - if (e->ip.proto == IPPROTO_TCP - && !(e->ip.invflags & IPT_INV_PROTO) - && IPT_MATCH_ITERATE(e, find_syn_match)) + if (IPT_MATCH_ITERATE(e, find_syn_match)) return 1; - printk("TCPMSS: Only works on TCP SYN packets\n"); return 0; } @@ -245,6 +237,8 @@ ipt_tcpmss_checkentry(const char *tablen static struct ipt_target ipt_tcpmss_reg = { .name = "TCPMSS", .target = ipt_tcpmss_target, + .targetsize = sizeof(struct ipt_tcpmss_info), + .proto = IPPROTO_TCP, .checkentry = ipt_tcpmss_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_tos.c~git-net net/ipv4/netfilter/ipt_tos.c --- devel/net/ipv4/netfilter/ipt_tos.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_tos.c 2006-03-17 23:03:48.000000000 -0800 @@ -21,6 +21,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -31,23 +32,10 @@ match(const struct sk_buff *skb, return (skb->nh.iph->tos == info->tos) ^ info->invert; } -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_tos_info))) - return 0; - - return 1; -} - static struct ipt_match tos_match = { .name = "tos", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_tos_info), .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_TOS.c~git-net net/ipv4/netfilter/ipt_TOS.c --- devel/net/ipv4/netfilter/ipt_TOS.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_TOS.c 2006-03-17 23:03:48.000000000 -0800 @@ -25,6 +25,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -53,24 +54,13 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const u_int8_t tos = ((struct ipt_tos_target_info *)targinfo)->tos; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_tos_target_info))) { - printk(KERN_WARNING "TOS: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_tos_target_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "TOS: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (tos != IPTOS_LOWDELAY && tos != IPTOS_THROUGHPUT && tos != IPTOS_RELIABILITY @@ -79,13 +69,14 @@ checkentry(const char *tablename, printk(KERN_WARNING "TOS: bad tos value %#x\n", tos); return 0; } - return 1; } static struct ipt_target ipt_tos_reg = { .name = "TOS", .target = target, + .targetsize = sizeof(struct ipt_tos_target_info), + .table = "mangle", .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ttl.c~git-net net/ipv4/netfilter/ipt_ttl.c --- devel/net/ipv4/netfilter/ipt_ttl.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ttl.c 2006-03-17 23:03:48.000000000 -0800 @@ -19,8 +19,9 @@ MODULE_AUTHOR("Harald Welte mode > IPT_TTL_MAXMODE) { printk(KERN_WARNING "ipt_TTL: invalid or unknown Mode %u\n", info->mode); return 0; } - if ((info->mode != IPT_TTL_SET) && (info->ttl == 0)) return 0; - return 1; } static struct ipt_target ipt_TTL = { .name = "TTL", .target = ipt_ttl_target, + .targetsize = sizeof(struct ipt_TTL_info), + .table = "mangle", .checkentry = ipt_ttl_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ULOG.c~git-net net/ipv4/netfilter/ipt_ULOG.c --- devel/net/ipv4/netfilter/ipt_ULOG.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ULOG.c 2006-03-17 23:03:48.000000000 -0800 @@ -303,6 +303,7 @@ static unsigned int ipt_ulog_target(stru const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { struct ipt_ulog_info *loginfo = (struct ipt_ulog_info *) targinfo; @@ -339,42 +340,37 @@ static void ipt_logfn(unsigned int pf, static int ipt_ulog_checkentry(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hookmask) { struct ipt_ulog_info *loginfo = (struct ipt_ulog_info *) targinfo; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_ulog_info))) { - DEBUGP("ipt_ULOG: targinfosize %u != 0\n", targinfosize); - return 0; - } - if (loginfo->prefix[sizeof(loginfo->prefix) - 1] != '\0') { DEBUGP("ipt_ULOG: prefix term %i\n", loginfo->prefix[sizeof(loginfo->prefix) - 1]); return 0; } - if (loginfo->qthreshold > ULOG_MAX_QLEN) { DEBUGP("ipt_ULOG: queue threshold %i > MAX_QLEN\n", loginfo->qthreshold); return 0; } - return 1; } static struct ipt_target ipt_ulog_reg = { .name = "ULOG", .target = ipt_ulog_target, + .targetsize = sizeof(struct ipt_ulog_info), .checkentry = ipt_ulog_checkentry, .me = THIS_MODULE, }; static struct nf_logger ipt_ulog_logger = { .name = "ipt_ULOG", - .logfn = &ipt_logfn, + .logfn = ipt_logfn, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/Kconfig~git-net net/ipv4/netfilter/Kconfig --- devel/net/ipv4/netfilter/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -303,16 +303,6 @@ config IP_NF_MATCH_HASHLIMIT destination IP' or `500pps from any given source IP' with a single IPtables rule. -config IP_NF_MATCH_POLICY - tristate "IPsec policy match support" - depends on IP_NF_IPTABLES && XFRM - help - Policy matching allows you to match packets based on the - IPsec policy that was used during decapsulation/will - be used during encapsulation. - - To compile it as a module, choose M here. If unsure, say N. - # `filter', generic and specific targets config IP_NF_FILTER tristate "Packet filtering" diff -puN net/ipv4/netfilter/Makefile~git-net net/ipv4/netfilter/Makefile --- devel/net/ipv4/netfilter/Makefile~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/Makefile 2006-03-17 23:03:48.000000000 -0800 @@ -57,7 +57,6 @@ obj-$(CONFIG_IP_NF_MATCH_DSCP) += ipt_ds obj-$(CONFIG_IP_NF_MATCH_AH_ESP) += ipt_ah.o ipt_esp.o obj-$(CONFIG_IP_NF_MATCH_TTL) += ipt_ttl.o obj-$(CONFIG_IP_NF_MATCH_ADDRTYPE) += ipt_addrtype.o -obj-$(CONFIG_IP_NF_MATCH_POLICY) += ipt_policy.o # targets obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o diff -puN net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c~git-net net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c --- devel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c 2006-03-17 23:03:48.000000000 -0800 @@ -141,19 +141,21 @@ static unsigned int ipv4_conntrack_help( { struct nf_conn *ct; enum ip_conntrack_info ctinfo; + struct nf_conn_help *help; /* This is where we call the helper: as the packet goes out. */ ct = nf_ct_get(*pskb, &ctinfo); - if (ct && ct->helper) { - unsigned int ret; - ret = ct->helper->help(pskb, - (*pskb)->nh.raw - (*pskb)->data - + (*pskb)->nh.iph->ihl*4, - ct, ctinfo); - if (ret != NF_ACCEPT) - return ret; - } - return NF_ACCEPT; + if (!ct) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help || !help->helper) + return NF_ACCEPT; + + return help->helper->help(pskb, + (*pskb)->nh.raw - (*pskb)->data + + (*pskb)->nh.iph->ihl*4, + ct, ctinfo); } static unsigned int ipv4_conntrack_defrag(unsigned int hooknum, diff -puN net/ipv4/raw.c~git-net net/ipv4/raw.c --- devel/net/ipv4/raw.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/raw.c 2006-03-17 23:03:48.000000000 -0800 @@ -660,12 +660,9 @@ static int raw_geticmpfilter(struct sock out: return ret; } -static int raw_setsockopt(struct sock *sk, int level, int optname, +static int do_raw_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { - if (level != SOL_RAW) - return ip_setsockopt(sk, level, optname, optval, optlen); - if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -675,12 +672,27 @@ static int raw_setsockopt(struct sock *s return -ENOPROTOOPT; } -static int raw_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen) +static int raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { if (level != SOL_RAW) - return ip_getsockopt(sk, level, optname, optval, optlen); + return ip_setsockopt(sk, level, optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#ifdef CONFIG_COMPAT +static int compat_raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_RAW) + return compat_ip_setsockopt(sk, level, optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#endif + +static int do_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -690,6 +702,24 @@ static int raw_getsockopt(struct sock *s return -ENOPROTOOPT; } +static int raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return ip_getsockopt(sk, level, optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return compat_ip_getsockopt(sk, level, optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} +#endif + static int raw_ioctl(struct sock *sk, int cmd, unsigned long arg) { switch (cmd) { @@ -719,22 +749,26 @@ static int raw_ioctl(struct sock *sk, in } struct proto raw_prot = { - .name = "RAW", - .owner = THIS_MODULE, - .close = raw_close, - .connect = ip4_datagram_connect, - .disconnect = udp_disconnect, - .ioctl = raw_ioctl, - .init = raw_init, - .setsockopt = raw_setsockopt, - .getsockopt = raw_getsockopt, - .sendmsg = raw_sendmsg, - .recvmsg = raw_recvmsg, - .bind = raw_bind, - .backlog_rcv = raw_rcv_skb, - .hash = raw_v4_hash, - .unhash = raw_v4_unhash, - .obj_size = sizeof(struct raw_sock), + .name = "RAW", + .owner = THIS_MODULE, + .close = raw_close, + .connect = ip4_datagram_connect, + .disconnect = udp_disconnect, + .ioctl = raw_ioctl, + .init = raw_init, + .setsockopt = raw_setsockopt, + .getsockopt = raw_getsockopt, + .sendmsg = raw_sendmsg, + .recvmsg = raw_recvmsg, + .bind = raw_bind, + .backlog_rcv = raw_rcv_skb, + .hash = raw_v4_hash, + .unhash = raw_v4_unhash, + .obj_size = sizeof(struct raw_sock), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_raw_setsockopt, + .compat_getsockopt = compat_raw_getsockopt, +#endif }; #ifdef CONFIG_PROC_FS diff -puN net/ipv4/sysctl_net_ipv4.c~git-net net/ipv4/sysctl_net_ipv4.c --- devel/net/ipv4/sysctl_net_ipv4.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/sysctl_net_ipv4.c 2006-03-17 23:03:48.000000000 -0800 @@ -664,7 +664,30 @@ ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, - + { + .ctl_name = NET_TCP_MTU_PROBING, + .procname = "tcp_mtu_probing", + .data = &sysctl_tcp_mtu_probing, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_TCP_BASE_MSS, + .procname = "tcp_base_mss", + .data = &sysctl_tcp_base_mss, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS, + .procname = "tcp_workaround_signed_windows", + .data = &sysctl_tcp_workaround_signed_windows, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec + }, { .ctl_name = 0 } }; diff -puN net/ipv4/tcp.c~git-net net/ipv4/tcp.c --- devel/net/ipv4/tcp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/tcp.c 2006-03-17 23:03:48.000000000 -0800 @@ -1687,18 +1687,14 @@ int tcp_disconnect(struct sock *sk, int /* * Socket option code for TCP. */ -int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, - int optlen) +static int do_tcp_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); int val; int err = 0; - if (level != SOL_TCP) - return icsk->icsk_af_ops->setsockopt(sk, level, optname, - optval, optlen); - /* This is a string value all the others are int's */ if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -1871,6 +1867,30 @@ int tcp_setsockopt(struct sock *sk, int return err; } +int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, + int optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->setsockopt(sk, level, optname, + optval, optlen); + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_TCP) + return inet_csk_compat_setsockopt(sk, level, optname, + optval, optlen); + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} + +EXPORT_SYMBOL(compat_tcp_setsockopt); +#endif + /* Return information about state of tcp endpoint in API format. */ void tcp_get_info(struct sock *sk, struct tcp_info *info) { @@ -1931,17 +1951,13 @@ void tcp_get_info(struct sock *sk, struc EXPORT_SYMBOL_GPL(tcp_get_info); -int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, - int __user *optlen) +static int do_tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); int val, len; - if (level != SOL_TCP) - return icsk->icsk_af_ops->getsockopt(sk, level, optname, - optval, optlen); - if (get_user(len, optlen)) return -EFAULT; @@ -2025,6 +2041,29 @@ int tcp_getsockopt(struct sock *sk, int return 0; } +int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, + int __user *optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->getsockopt(sk, level, optname, + optval, optlen); + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_TCP) + return inet_csk_compat_getsockopt(sk, level, optname, + optval, optlen); + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} + +EXPORT_SYMBOL(compat_tcp_getsockopt); +#endif extern void __skb_cb_too_small_for_tcp(int, int); extern struct tcp_congestion_ops tcp_reno; diff -puN net/ipv4/tcp_htcp.c~git-net net/ipv4/tcp_htcp.c --- devel/net/ipv4/tcp_htcp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_htcp.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,12 +27,12 @@ struct htcp { u16 alpha; /* Fixed point arith, << 7 */ u8 beta; /* Fixed point arith, << 7 */ u8 modeswitch; /* Delay modeswitch until we had at least one congestion event */ - u8 ccount; /* Number of RTTs since last congestion event */ - u8 undo_ccount; - u16 packetcount; + u32 last_cong; /* Time since last congestion event end */ + u32 undo_last_cong; + u16 pkts_acked; + u32 packetcount; u32 minRTT; u32 maxRTT; - u32 snd_cwnd_cnt2; u32 undo_maxRTT; u32 undo_old_maxB; @@ -45,21 +45,30 @@ struct htcp { u32 lasttime; }; +static inline u32 htcp_cong_time(struct htcp *ca) +{ + return jiffies - ca->last_cong; +} + +static inline u32 htcp_ccount(struct htcp *ca) +{ + return htcp_cong_time(ca)/ca->minRTT; +} + static inline void htcp_reset(struct htcp *ca) { - ca->undo_ccount = ca->ccount; + ca->undo_last_cong = ca->last_cong; ca->undo_maxRTT = ca->maxRTT; ca->undo_old_maxB = ca->old_maxB; - ca->ccount = 0; - ca->snd_cwnd_cnt2 = 0; + ca->last_cong = jiffies; } static u32 htcp_cwnd_undo(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); struct htcp *ca = inet_csk_ca(sk); - ca->ccount = ca->undo_ccount; + ca->last_cong = ca->undo_last_cong; ca->maxRTT = ca->undo_maxRTT; ca->old_maxB = ca->undo_old_maxB; return max(tp->snd_cwnd, (tp->snd_ssthresh<<7)/ca->beta); @@ -77,10 +86,10 @@ static inline void measure_rtt(struct so ca->minRTT = srtt; /* max RTT */ - if (icsk->icsk_ca_state == TCP_CA_Open && tp->snd_ssthresh < 0xFFFF && ca->ccount > 3) { + if (icsk->icsk_ca_state == TCP_CA_Open && tp->snd_ssthresh < 0xFFFF && htcp_ccount(ca) > 3) { if (ca->maxRTT < ca->minRTT) ca->maxRTT = ca->minRTT; - if (ca->maxRTT < srtt && srtt <= ca->maxRTT+HZ/50) + if (ca->maxRTT < srtt && srtt <= ca->maxRTT+msecs_to_jiffies(20)) ca->maxRTT = srtt; } } @@ -92,6 +101,12 @@ static void measure_achieved_throughput( struct htcp *ca = inet_csk_ca(sk); u32 now = tcp_time_stamp; + if (icsk->icsk_ca_state == TCP_CA_Open) + ca->pkts_acked = pkts_acked; + + if (!use_bandwidth_switch) + return; + /* achieved throughput calculations */ if (icsk->icsk_ca_state != TCP_CA_Open && icsk->icsk_ca_state != TCP_CA_Disorder) { @@ -106,7 +121,7 @@ static void measure_achieved_throughput( && now - ca->lasttime >= ca->minRTT && ca->minRTT > 0) { __u32 cur_Bi = ca->packetcount*HZ/(now - ca->lasttime); - if (ca->ccount <= 3) { + if (htcp_ccount(ca) <= 3) { /* just after backoff */ ca->minB = ca->maxB = ca->Bi = cur_Bi; } else { @@ -135,7 +150,7 @@ static inline void htcp_beta_update(stru } } - if (ca->modeswitch && minRTT > max(HZ/100, 1) && maxRTT) { + if (ca->modeswitch && minRTT > msecs_to_jiffies(10) && maxRTT) { ca->beta = (minRTT<<7)/maxRTT; if (ca->beta < BETA_MIN) ca->beta = BETA_MIN; @@ -151,7 +166,7 @@ static inline void htcp_alpha_update(str { u32 minRTT = ca->minRTT; u32 factor = 1; - u32 diff = ca->ccount * minRTT; /* time since last backoff */ + u32 diff = htcp_cong_time(ca); if (diff > HZ) { diff -= HZ; @@ -216,21 +231,18 @@ static void htcp_cong_avoid(struct sock measure_rtt(sk); - /* keep track of number of round-trip times since last backoff event */ - if (ca->snd_cwnd_cnt2++ > tp->snd_cwnd) { - ca->ccount++; - ca->snd_cwnd_cnt2 = 0; - htcp_alpha_update(ca); - } - /* In dangerous area, increase slowly. * In theory this is tp->snd_cwnd += alpha / tp->snd_cwnd */ - if ((tp->snd_cwnd_cnt++ * ca->alpha)>>7 >= tp->snd_cwnd) { + if ((tp->snd_cwnd_cnt * ca->alpha)>>7 >= tp->snd_cwnd) { if (tp->snd_cwnd < tp->snd_cwnd_clamp) tp->snd_cwnd++; tp->snd_cwnd_cnt = 0; - } + htcp_alpha_update(ca); + } else + tp->snd_cwnd_cnt += ca->pkts_acked; + + ca->pkts_acked = 1; } } @@ -249,11 +261,19 @@ static void htcp_init(struct sock *sk) memset(ca, 0, sizeof(struct htcp)); ca->alpha = ALPHA_BASE; ca->beta = BETA_MIN; + ca->pkts_acked = 1; + ca->last_cong = jiffies; } static void htcp_state(struct sock *sk, u8 new_state) { switch (new_state) { + case TCP_CA_Open: + { + struct htcp *ca = inet_csk_ca(sk); + ca->last_cong = jiffies; + } + break; case TCP_CA_CWR: case TCP_CA_Recovery: case TCP_CA_Loss: @@ -278,8 +298,6 @@ static int __init htcp_register(void) { BUG_ON(sizeof(struct htcp) > ICSK_CA_PRIV_SIZE); BUILD_BUG_ON(BETA_MIN >= BETA_MAX); - if (!use_bandwidth_switch) - htcp.pkts_acked = NULL; return tcp_register_congestion_control(&htcp); } diff -puN net/ipv4/tcp_input.c~git-net net/ipv4/tcp_input.c --- devel/net/ipv4/tcp_input.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_input.c 2006-03-17 23:03:48.000000000 -0800 @@ -1891,6 +1891,34 @@ static void tcp_try_to_open(struct sock } } +static void tcp_mtup_probe_failed(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + icsk->icsk_mtup.search_high = icsk->icsk_mtup.probe_size - 1; + icsk->icsk_mtup.probe_size = 0; +} + +static void tcp_mtup_probe_success(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + + /* FIXME: breaks with very large cwnd */ + tp->prior_ssthresh = tcp_current_ssthresh(sk); + tp->snd_cwnd = tp->snd_cwnd * + tcp_mss_to_mtu(sk, tp->mss_cache) / + icsk->icsk_mtup.probe_size; + tp->snd_cwnd_cnt = 0; + tp->snd_cwnd_stamp = tcp_time_stamp; + tp->rcv_ssthresh = tcp_current_ssthresh(sk); + + icsk->icsk_mtup.search_low = icsk->icsk_mtup.probe_size; + icsk->icsk_mtup.probe_size = 0; + tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); +} + + /* Process an event, which can update packets-in-flight not trivially. * Main goal of this function is to calculate new estimate for left_out, * taking into account both packets sitting in receiver's buffer and @@ -2023,6 +2051,17 @@ tcp_fastretrans_alert(struct sock *sk, u return; } + /* MTU probe failure: don't reduce cwnd */ + if (icsk->icsk_ca_state < TCP_CA_CWR && + icsk->icsk_mtup.probe_size && + tp->snd_una == tp->mtu_probe.probe_seq_start) { + tcp_mtup_probe_failed(sk); + /* Restores the reduction we did in tcp_mtup_probe() */ + tp->snd_cwnd++; + tcp_simple_retransmit(sk); + return; + } + /* Otherwise enter Recovery state */ if (IsReno(tp)) @@ -2242,6 +2281,13 @@ static int tcp_clean_rtx_queue(struct so acked |= FLAG_SYN_ACKED; tp->retrans_stamp = 0; } + + /* MTU probing checks */ + if (icsk->icsk_mtup.probe_size) { + if (!after(tp->mtu_probe.probe_seq_end, TCP_SKB_CB(skb)->end_seq)) { + tcp_mtup_probe_success(sk, skb); + } + } if (sacked) { if (sacked & TCPCB_RETRANS) { @@ -4101,6 +4147,7 @@ static int tcp_rcv_synsent_state_process if (tp->rx_opt.sack_ok && sysctl_tcp_fack) tp->rx_opt.sack_ok |= 2; + tcp_mtup_init(sk); tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); tcp_initialize_rcv_mss(sk); @@ -4211,6 +4258,7 @@ discard: if (tp->ecn_flags&TCP_ECN_OK) sock_set_flag(sk, SOCK_NO_LARGESEND); + tcp_mtup_init(sk); tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); tcp_initialize_rcv_mss(sk); @@ -4399,6 +4447,7 @@ int tcp_rcv_state_process(struct sock *s */ tp->lsndtime = tcp_time_stamp; + tcp_mtup_init(sk); tcp_initialize_rcv_mss(sk); tcp_init_buffer_space(sk); tcp_fast_path_on(tp); diff -puN net/ipv4/tcp_ipv4.c~git-net net/ipv4/tcp_ipv4.c --- devel/net/ipv4/tcp_ipv4.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_ipv4.c 2006-03-17 23:03:48.000000000 -0800 @@ -900,6 +900,7 @@ struct sock *tcp_v4_syn_recv_sock(struct inet_csk(newsk)->icsk_ext_hdr_len = newinet->opt->optlen; newinet->id = newtp->write_seq ^ jiffies; + tcp_mtup_init(newsk); tcp_sync_mss(newsk, dst_mtu(dst)); newtp->advmss = dst_metric(dst, RTAX_ADVMSS); tcp_initialize_rcv_mss(newsk); @@ -1216,17 +1217,21 @@ int tcp_v4_tw_remember_stamp(struct inet } struct inet_connection_sock_af_ops ipv4_specific = { - .queue_xmit = ip_queue_xmit, - .send_check = tcp_v4_send_check, - .rebuild_header = inet_sk_rebuild_header, - .conn_request = tcp_v4_conn_request, - .syn_recv_sock = tcp_v4_syn_recv_sock, - .remember_stamp = tcp_v4_remember_stamp, - .net_header_len = sizeof(struct iphdr), - .setsockopt = ip_setsockopt, - .getsockopt = ip_getsockopt, - .addr2sockaddr = inet_csk_addr2sockaddr, - .sockaddr_len = sizeof(struct sockaddr_in), + .queue_xmit = ip_queue_xmit, + .send_check = tcp_v4_send_check, + .rebuild_header = inet_sk_rebuild_header, + .conn_request = tcp_v4_conn_request, + .syn_recv_sock = tcp_v4_syn_recv_sock, + .remember_stamp = tcp_v4_remember_stamp, + .net_header_len = sizeof(struct iphdr), + .setsockopt = ip_setsockopt, + .getsockopt = ip_getsockopt, + .addr2sockaddr = inet_csk_addr2sockaddr, + .sockaddr_len = sizeof(struct sockaddr_in), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif }; /* NOTE: A lot of things set to zero explicitly by call to @@ -1825,23 +1830,16 @@ struct proto tcp_prot = { .obj_size = sizeof(struct tcp_sock), .twsk_prot = &tcp_timewait_sock_ops, .rsk_prot = &tcp_request_sock_ops, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_tcp_setsockopt, + .compat_getsockopt = compat_tcp_getsockopt, +#endif }; - - void __init tcp_v4_init(struct net_proto_family *ops) { - int err = sock_create_kern(PF_INET, SOCK_RAW, IPPROTO_TCP, &tcp_socket); - if (err < 0) + if (inet_csk_ctl_sock_create(&tcp_socket, PF_INET, SOCK_RAW, IPPROTO_TCP) < 0) panic("Failed to create the TCP control socket.\n"); - tcp_socket->sk->sk_allocation = GFP_ATOMIC; - inet_sk(tcp_socket->sk)->uc_ttl = -1; - - /* Unhash it so that IP input processing does not even - * see it, we do not wish this socket to see incoming - * packets. - */ - tcp_socket->sk->sk_prot->unhash(tcp_socket->sk); } EXPORT_SYMBOL(ipv4_specific); diff -puN net/ipv4/tcp_output.c~git-net net/ipv4/tcp_output.c --- devel/net/ipv4/tcp_output.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_output.c 2006-03-17 23:03:48.000000000 -0800 @@ -45,12 +45,23 @@ /* People can turn this off for buggy TCP's found in printers etc. */ int sysctl_tcp_retrans_collapse = 1; +/* People can turn this on to work with those rare, broken TCPs that + * interpret the window field as a signed quantity. + */ +int sysctl_tcp_workaround_signed_windows = 0; + /* This limits the percentage of the congestion window which we * will allow a single TSO frame to consume. Building TSO frames * which are too large can cause TCP streams to be bursty. */ int sysctl_tcp_tso_win_divisor = 3; +int sysctl_tcp_mtu_probing = 0; +int sysctl_tcp_base_mss = 512; + +EXPORT_SYMBOL(sysctl_tcp_mtu_probing); +EXPORT_SYMBOL(sysctl_tcp_base_mss); + static void update_send_head(struct sock *sk, struct tcp_sock *tp, struct sk_buff *skb) { @@ -171,12 +182,18 @@ void tcp_select_initial_window(int __spa space = (space / mss) * mss; /* NOTE: offering an initial window larger than 32767 - * will break some buggy TCP stacks. We try to be nice. - * If we are not window scaling, then this truncates - * our initial window offering to 32k. There should also - * be a sysctl option to stop being nice. + * will break some buggy TCP stacks. If the admin tells us + * it is likely we could be speaking with such a buggy stack + * we will truncate our initial window offering to 32K-1 + * unless the remote has sent us a window scaling option, + * which we interpret as a sign the remote TCP is not + * misinterpreting the window field as a signed quantity. */ - (*rcv_wnd) = min(space, MAX_TCP_WINDOW); + if (sysctl_tcp_workaround_signed_windows) + (*rcv_wnd) = min(space, MAX_TCP_WINDOW); + else + (*rcv_wnd) = space; + (*rcv_wscale) = 0; if (wscale_ok) { /* Set window scaling on max possible window @@ -235,7 +252,7 @@ static u16 tcp_select_window(struct sock /* Make sure we do not exceed the maximum possible * scaled window. */ - if (!tp->rx_opt.rcv_wscale) + if (!tp->rx_opt.rcv_wscale && sysctl_tcp_workaround_signed_windows) new_win = min(new_win, MAX_TCP_WINDOW); else new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale)); @@ -681,6 +698,62 @@ int tcp_trim_head(struct sock *sk, struc return 0; } +/* Not accounting for SACKs here. */ +int tcp_mtu_to_mss(struct sock *sk, int pmtu) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + int mss_now; + + /* Calculate base mss without TCP options: + It is MMS_S - sizeof(tcphdr) of rfc1122 + */ + mss_now = pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcphdr); + + /* Clamp it (mss_clamp does not include tcp options) */ + if (mss_now > tp->rx_opt.mss_clamp) + mss_now = tp->rx_opt.mss_clamp; + + /* Now subtract optional transport overhead */ + mss_now -= icsk->icsk_ext_hdr_len; + + /* Then reserve room for full set of TCP options and 8 bytes of data */ + if (mss_now < 48) + mss_now = 48; + + /* Now subtract TCP options size, not including SACKs */ + mss_now -= tp->tcp_header_len - sizeof(struct tcphdr); + + return mss_now; +} + +/* Inverse of above */ +int tcp_mss_to_mtu(struct sock *sk, int mss) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + int mtu; + + mtu = mss + + tp->tcp_header_len + + icsk->icsk_ext_hdr_len + + icsk->icsk_af_ops->net_header_len; + + return mtu; +} + +void tcp_mtup_init(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + + icsk->icsk_mtup.enabled = sysctl_tcp_mtu_probing > 1; + icsk->icsk_mtup.search_high = tp->rx_opt.mss_clamp + sizeof(struct tcphdr) + + icsk->icsk_af_ops->net_header_len; + icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, sysctl_tcp_base_mss); + icsk->icsk_mtup.probe_size = 0; +} + /* This function synchronize snd mss to current pmtu/exthdr set. tp->rx_opt.user_mss is mss set by user by TCP_MAXSEG. It does NOT counts @@ -708,25 +781,12 @@ unsigned int tcp_sync_mss(struct sock *s { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); - /* Calculate base mss without TCP options: - It is MMS_S - sizeof(tcphdr) of rfc1122 - */ - int mss_now = (pmtu - icsk->icsk_af_ops->net_header_len - - sizeof(struct tcphdr)); - - /* Clamp it (mss_clamp does not include tcp options) */ - if (mss_now > tp->rx_opt.mss_clamp) - mss_now = tp->rx_opt.mss_clamp; + int mss_now; - /* Now subtract optional transport overhead */ - mss_now -= icsk->icsk_ext_hdr_len; + if (icsk->icsk_mtup.search_high > pmtu) + icsk->icsk_mtup.search_high = pmtu; - /* Then reserve room for full set of TCP options and 8 bytes of data */ - if (mss_now < 48) - mss_now = 48; - - /* Now subtract TCP options size, not including SACKs */ - mss_now -= tp->tcp_header_len - sizeof(struct tcphdr); + mss_now = tcp_mtu_to_mss(sk, pmtu); /* Bound mss with half of window */ if (tp->max_window && mss_now > (tp->max_window>>1)) @@ -734,6 +794,8 @@ unsigned int tcp_sync_mss(struct sock *s /* And store cached results */ icsk->icsk_pmtu_cookie = pmtu; + if (icsk->icsk_mtup.enabled) + mss_now = min(mss_now, tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low)); tp->mss_cache = mss_now; return mss_now; @@ -1063,6 +1125,140 @@ static int tcp_tso_should_defer(struct s return 1; } +/* Create a new MTU probe if we are ready. + * Returns 0 if we should wait to probe (no cwnd available), + * 1 if a probe was sent, + * -1 otherwise */ +static int tcp_mtu_probe(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + struct sk_buff *skb, *nskb, *next; + int len; + int probe_size; + unsigned int pif; + int copy; + int mss_now; + + /* Not currently probing/verifying, + * not in recovery, + * have enough cwnd, and + * not SACKing (the variable headers throw things off) */ + if (!icsk->icsk_mtup.enabled || + icsk->icsk_mtup.probe_size || + inet_csk(sk)->icsk_ca_state != TCP_CA_Open || + tp->snd_cwnd < 11 || + tp->rx_opt.eff_sacks) + return -1; + + /* Very simple search strategy: just double the MSS. */ + mss_now = tcp_current_mss(sk, 0); + probe_size = 2*tp->mss_cache; + if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high)) { + /* TODO: set timer for probe_converge_event */ + return -1; + } + + /* Have enough data in the send queue to probe? */ + len = 0; + if ((skb = sk->sk_send_head) == NULL) + return -1; + while ((len += skb->len) < probe_size && !tcp_skb_is_last(sk, skb)) + skb = skb->next; + if (len < probe_size) + return -1; + + /* Receive window check. */ + if (after(TCP_SKB_CB(skb)->seq + probe_size, tp->snd_una + tp->snd_wnd)) { + if (tp->snd_wnd < probe_size) + return -1; + else + return 0; + } + + /* Do we need to wait to drain cwnd? */ + pif = tcp_packets_in_flight(tp); + if (pif + 2 > tp->snd_cwnd) { + /* With no packets in flight, don't stall. */ + if (pif == 0) + return -1; + else + return 0; + } + + /* We're allowed to probe. Build it now. */ + if ((nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC)) == NULL) + return -1; + sk_charge_skb(sk, nskb); + + skb = sk->sk_send_head; + __skb_insert(nskb, skb->prev, skb, &sk->sk_write_queue); + sk->sk_send_head = nskb; + + TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq; + TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size; + TCP_SKB_CB(nskb)->flags = TCPCB_FLAG_ACK; + TCP_SKB_CB(nskb)->sacked = 0; + nskb->csum = 0; + if (skb->ip_summed == CHECKSUM_HW) + nskb->ip_summed = CHECKSUM_HW; + + len = 0; + while (len < probe_size) { + next = skb->next; + + copy = min_t(int, skb->len, probe_size - len); + if (nskb->ip_summed) + skb_copy_bits(skb, 0, skb_put(nskb, copy), copy); + else + nskb->csum = skb_copy_and_csum_bits(skb, 0, + skb_put(nskb, copy), copy, nskb->csum); + + if (skb->len <= copy) { + /* We've eaten all the data from this skb. + * Throw it away. */ + TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags; + __skb_unlink(skb, &sk->sk_write_queue); + sk_stream_free_skb(sk, skb); + } else { + TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags & + ~(TCPCB_FLAG_FIN|TCPCB_FLAG_PSH); + if (!skb_shinfo(skb)->nr_frags) { + skb_pull(skb, copy); + if (skb->ip_summed != CHECKSUM_HW) + skb->csum = csum_partial(skb->data, skb->len, 0); + } else { + __pskb_trim_head(skb, copy); + tcp_set_skb_tso_segs(sk, skb, mss_now); + } + TCP_SKB_CB(skb)->seq += copy; + } + + len += copy; + skb = next; + } + tcp_init_tso_segs(sk, nskb, nskb->len); + + /* We're ready to send. If this fails, the probe will + * be resegmented into mss-sized pieces by tcp_write_xmit(). */ + TCP_SKB_CB(nskb)->when = tcp_time_stamp; + if (!tcp_transmit_skb(sk, nskb, 1, GFP_ATOMIC)) { + /* Decrement cwnd here because we are sending + * effectively two packets. */ + tp->snd_cwnd--; + update_send_head(sk, tp, nskb); + + icsk->icsk_mtup.probe_size = tcp_mss_to_mtu(sk, nskb->len); + tp->mtu_probe.probe_seq_start = TCP_SKB_CB(nskb)->seq; + tp->mtu_probe.probe_seq_end = TCP_SKB_CB(nskb)->end_seq; + + return 1; + } + + return -1; +} + + /* This routine writes packets to the network. It advances the * send_head. This happens as incoming acks open up the remote * window for us. @@ -1076,6 +1272,7 @@ static int tcp_write_xmit(struct sock *s struct sk_buff *skb; unsigned int tso_segs, sent_pkts; int cwnd_quota; + int result; /* If we are closed, the bytes will have to remain here. * In time closedown will finish, we empty the write queue and all @@ -1085,12 +1282,20 @@ static int tcp_write_xmit(struct sock *s return 0; sent_pkts = 0; + + /* Do MTU probing. */ + if ((result = tcp_mtu_probe(sk)) == 0) { + return 0; + } else if (result > 0) { + sent_pkts = 1; + } + while ((skb = sk->sk_send_head)) { unsigned int limit; tso_segs = tcp_init_tso_segs(sk, skb, mss_now); BUG_ON(!tso_segs); - + cwnd_quota = tcp_cwnd_test(tp, skb); if (!cwnd_quota) break; @@ -1455,9 +1660,15 @@ void tcp_simple_retransmit(struct sock * int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); unsigned int cur_mss = tcp_current_mss(sk, 0); int err; - + + /* Inconslusive MTU probe */ + if (icsk->icsk_mtup.probe_size) { + icsk->icsk_mtup.probe_size = 0; + } + /* Do not sent more than we queued. 1/4 is reserved for possible * copying overhead: fragmentation, tunneling, mangling etc. */ @@ -1883,6 +2094,7 @@ static void tcp_connect_init(struct sock if (tp->rx_opt.user_mss) tp->rx_opt.mss_clamp = tp->rx_opt.user_mss; tp->max_window = 0; + tcp_mtup_init(sk); tcp_sync_mss(sk, dst_mtu(dst)); if (!tp->window_clamp) @@ -2180,3 +2392,4 @@ EXPORT_SYMBOL(tcp_make_synack); EXPORT_SYMBOL(tcp_simple_retransmit); EXPORT_SYMBOL(tcp_sync_mss); EXPORT_SYMBOL(sysctl_tcp_tso_win_divisor); +EXPORT_SYMBOL(tcp_mtup_init); diff -puN net/ipv4/tcp_timer.c~git-net net/ipv4/tcp_timer.c --- devel/net/ipv4/tcp_timer.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_timer.c 2006-03-17 23:03:48.000000000 -0800 @@ -119,8 +119,10 @@ static int tcp_orphan_retries(struct soc /* A write timeout has occurred. Process the after effects. */ static int tcp_write_timeout(struct sock *sk) { - const struct inet_connection_sock *icsk = inet_csk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + struct tcp_sock *tp = tcp_sk(sk); int retry_until; + int mss; if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { if (icsk->icsk_retransmits) @@ -128,25 +130,19 @@ static int tcp_write_timeout(struct sock retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries; } else { if (icsk->icsk_retransmits >= sysctl_tcp_retries1) { - /* NOTE. draft-ietf-tcpimpl-pmtud-01.txt requires pmtu black - hole detection. :-( - - It is place to make it. It is not made. I do not want - to make it. It is disgusting. It does not work in any - case. Let me to cite the same draft, which requires for - us to implement this: - - "The one security concern raised by this memo is that ICMP black holes - are often caused by over-zealous security administrators who block - all ICMP messages. It is vitally important that those who design and - deploy security systems understand the impact of strict filtering on - upper-layer protocols. The safest web site in the world is worthless - if most TCP implementations cannot transfer data from it. It would - be far nicer to have all of the black holes fixed rather than fixing - all of the TCP implementations." - - Golden words :-). - */ + /* Black hole detection */ + if (sysctl_tcp_mtu_probing) { + if (!icsk->icsk_mtup.enabled) { + icsk->icsk_mtup.enabled = 1; + tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); + } else { + mss = min(sysctl_tcp_base_mss, + tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low)/2); + mss = max(mss, 68 - tp->tcp_header_len); + icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); + tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); + } + } dst_negative_advice(&sk->sk_dst_cache); } diff -puN net/ipv4/udp.c~git-net net/ipv4/udp.c --- devel/net/ipv4/udp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv4/udp.c 2006-03-17 23:03:48.000000000 -0800 @@ -1207,16 +1207,13 @@ static int udp_destroy_sock(struct sock /* * Socket option code for UDP */ -static int udp_setsockopt(struct sock *sk, int level, int optname, +static int do_udp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct udp_sock *up = udp_sk(sk); int val; int err = 0; - if (level != SOL_UDP) - return ip_setsockopt(sk, level, optname, optval, optlen); - if(optlen #include +#include #include #include #include @@ -26,19 +27,19 @@ static int ipip_xfrm_rcv(struct xfrm_sta } static struct xfrm_tunnel *ipip_handler; -static DECLARE_MUTEX(xfrm4_tunnel_sem); +static DEFINE_MUTEX(xfrm4_tunnel_mutex); int xfrm4_tunnel_register(struct xfrm_tunnel *handler) { int ret; - down(&xfrm4_tunnel_sem); + mutex_lock(&xfrm4_tunnel_mutex); ret = 0; if (ipip_handler != NULL) ret = -EINVAL; if (!ret) ipip_handler = handler; - up(&xfrm4_tunnel_sem); + mutex_unlock(&xfrm4_tunnel_mutex); return ret; } @@ -49,13 +50,13 @@ int xfrm4_tunnel_deregister(struct xfrm_ { int ret; - down(&xfrm4_tunnel_sem); + mutex_lock(&xfrm4_tunnel_mutex); ret = 0; if (ipip_handler != handler) ret = -EINVAL; if (!ret) ipip_handler = NULL; - up(&xfrm4_tunnel_sem); + mutex_unlock(&xfrm4_tunnel_mutex); synchronize_net(); diff -puN net/ipv6/addrconf.c~git-net net/ipv6/addrconf.c --- devel/net/ipv6/addrconf.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/addrconf.c 2006-03-17 23:03:48.000000000 -0800 @@ -78,8 +78,6 @@ #ifdef CONFIG_IPV6_PRIVACY #include -#include -#include #endif #include @@ -110,8 +108,6 @@ static int __ipv6_try_regen_rndid(struct static void ipv6_regen_rndid(unsigned long data); static int desync_factor = MAX_DESYNC_FACTOR * HZ; -static struct crypto_tfm *md5_tfm; -static DEFINE_SPINLOCK(md5_tfm_lock); #endif static int ipv6_count_addresses(struct inet6_dev *idev); @@ -169,6 +165,15 @@ struct ipv6_devconf ipv6_devconf = { .max_desync_factor = MAX_DESYNC_FACTOR, #endif .max_addresses = IPV6_MAX_ADDRESSES, + .accept_ra_defrtr = 1, + .accept_ra_pinfo = 1, +#ifdef CONFIG_IPV6_ROUTER_PREF + .accept_ra_rtr_pref = 1, + .rtr_probe_interval = 60 * HZ, +#ifdef CONFIG_IPV6_ROUTE_INFO + .accept_ra_rt_info_max_plen = 0, +#endif +#endif }; static struct ipv6_devconf ipv6_devconf_dflt = { @@ -190,6 +195,15 @@ static struct ipv6_devconf ipv6_devconf_ .max_desync_factor = MAX_DESYNC_FACTOR, #endif .max_addresses = IPV6_MAX_ADDRESSES, + .accept_ra_defrtr = 1, + .accept_ra_pinfo = 1, +#ifdef CONFIG_IPV6_ROUTER_PREF + .accept_ra_rtr_pref = 1, + .rtr_probe_interval = 60 * HZ, +#ifdef CONFIG_IPV6_ROUTE_INFO + .accept_ra_rt_info_max_plen = 0, +#endif +#endif }; /* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ @@ -327,86 +341,83 @@ static struct inet6_dev * ipv6_add_dev(s if (dev->mtu < IPV6_MIN_MTU) return NULL; - ndev = kmalloc(sizeof(struct inet6_dev), GFP_KERNEL); + ndev = kzalloc(sizeof(struct inet6_dev), GFP_KERNEL); - if (ndev) { - memset(ndev, 0, sizeof(struct inet6_dev)); + if (ndev == NULL) + return NULL; - rwlock_init(&ndev->lock); - ndev->dev = dev; - memcpy(&ndev->cnf, &ipv6_devconf_dflt, sizeof(ndev->cnf)); - ndev->cnf.mtu6 = dev->mtu; - ndev->cnf.sysctl = NULL; - ndev->nd_parms = neigh_parms_alloc(dev, &nd_tbl); - if (ndev->nd_parms == NULL) { - kfree(ndev); - return NULL; - } - /* We refer to the device */ - dev_hold(dev); + rwlock_init(&ndev->lock); + ndev->dev = dev; + memcpy(&ndev->cnf, &ipv6_devconf_dflt, sizeof(ndev->cnf)); + ndev->cnf.mtu6 = dev->mtu; + ndev->cnf.sysctl = NULL; + ndev->nd_parms = neigh_parms_alloc(dev, &nd_tbl); + if (ndev->nd_parms == NULL) { + kfree(ndev); + return NULL; + } + /* We refer to the device */ + dev_hold(dev); - if (snmp6_alloc_dev(ndev) < 0) { - ADBG((KERN_WARNING - "%s(): cannot allocate memory for statistics; dev=%s.\n", - __FUNCTION__, dev->name)); - neigh_parms_release(&nd_tbl, ndev->nd_parms); - ndev->dead = 1; - in6_dev_finish_destroy(ndev); - return NULL; - } + if (snmp6_alloc_dev(ndev) < 0) { + ADBG((KERN_WARNING + "%s(): cannot allocate memory for statistics; dev=%s.\n", + __FUNCTION__, dev->name)); + neigh_parms_release(&nd_tbl, ndev->nd_parms); + ndev->dead = 1; + in6_dev_finish_destroy(ndev); + return NULL; + } - if (snmp6_register_dev(ndev) < 0) { - ADBG((KERN_WARNING - "%s(): cannot create /proc/net/dev_snmp6/%s\n", - __FUNCTION__, dev->name)); - neigh_parms_release(&nd_tbl, ndev->nd_parms); - ndev->dead = 1; - in6_dev_finish_destroy(ndev); - return NULL; - } + if (snmp6_register_dev(ndev) < 0) { + ADBG((KERN_WARNING + "%s(): cannot create /proc/net/dev_snmp6/%s\n", + __FUNCTION__, dev->name)); + neigh_parms_release(&nd_tbl, ndev->nd_parms); + ndev->dead = 1; + in6_dev_finish_destroy(ndev); + return NULL; + } - /* One reference from device. We must do this before - * we invoke __ipv6_regen_rndid(). - */ - in6_dev_hold(ndev); + /* One reference from device. We must do this before + * we invoke __ipv6_regen_rndid(). + */ + in6_dev_hold(ndev); #ifdef CONFIG_IPV6_PRIVACY - get_random_bytes(ndev->rndid, sizeof(ndev->rndid)); - get_random_bytes(ndev->entropy, sizeof(ndev->entropy)); - init_timer(&ndev->regen_timer); - ndev->regen_timer.function = ipv6_regen_rndid; - ndev->regen_timer.data = (unsigned long) ndev; - if ((dev->flags&IFF_LOOPBACK) || - dev->type == ARPHRD_TUNNEL || - dev->type == ARPHRD_NONE || - dev->type == ARPHRD_SIT) { - printk(KERN_INFO - "%s: Disabled Privacy Extensions\n", - dev->name); - ndev->cnf.use_tempaddr = -1; - } else { - in6_dev_hold(ndev); - ipv6_regen_rndid((unsigned long) ndev); - } + init_timer(&ndev->regen_timer); + ndev->regen_timer.function = ipv6_regen_rndid; + ndev->regen_timer.data = (unsigned long) ndev; + if ((dev->flags&IFF_LOOPBACK) || + dev->type == ARPHRD_TUNNEL || + dev->type == ARPHRD_NONE || + dev->type == ARPHRD_SIT) { + printk(KERN_INFO + "%s: Disabled Privacy Extensions\n", + dev->name); + ndev->cnf.use_tempaddr = -1; + } else { + in6_dev_hold(ndev); + ipv6_regen_rndid((unsigned long) ndev); + } #endif - if (netif_carrier_ok(dev)) - ndev->if_flags |= IF_READY; + if (netif_carrier_ok(dev)) + ndev->if_flags |= IF_READY; - write_lock_bh(&addrconf_lock); - dev->ip6_ptr = ndev; - write_unlock_bh(&addrconf_lock); + write_lock_bh(&addrconf_lock); + dev->ip6_ptr = ndev; + write_unlock_bh(&addrconf_lock); - ipv6_mc_init_dev(ndev); - ndev->tstamp = jiffies; + ipv6_mc_init_dev(ndev); + ndev->tstamp = jiffies; #ifdef CONFIG_SYSCTL - neigh_sysctl_register(dev, ndev->nd_parms, NET_IPV6, - NET_IPV6_NEIGH, "ipv6", - &ndisc_ifinfo_sysctl_change, - NULL); - addrconf_sysctl_register(ndev, &ndev->cnf); + neigh_sysctl_register(dev, ndev->nd_parms, NET_IPV6, + NET_IPV6_NEIGH, "ipv6", + &ndisc_ifinfo_sysctl_change, + NULL); + addrconf_sysctl_register(ndev, &ndev->cnf); #endif - } return ndev; } @@ -524,7 +535,7 @@ ipv6_add_addr(struct inet6_dev *idev, co goto out; } - ifa = kmalloc(sizeof(struct inet6_ifaddr), GFP_ATOMIC); + ifa = kzalloc(sizeof(struct inet6_ifaddr), GFP_ATOMIC); if (ifa == NULL) { ADBG(("ipv6_add_addr: malloc failed\n")); @@ -538,7 +549,6 @@ ipv6_add_addr(struct inet6_dev *idev, co goto out; } - memset(ifa, 0, sizeof(struct inet6_ifaddr)); ipv6_addr_copy(&ifa->addr, addr); spin_lock_init(&ifa->lock); @@ -1305,52 +1315,67 @@ static void addrconf_leave_anycast(struc __ipv6_dev_ac_dec(ifp->idev, &addr); } +static int addrconf_ifid_eui48(u8 *eui, struct net_device *dev) +{ + if (dev->addr_len != ETH_ALEN) + return -1; + memcpy(eui, dev->dev_addr, 3); + memcpy(eui + 5, dev->dev_addr + 3, 3); + + /* + * The zSeries OSA network cards can be shared among various + * OS instances, but the OSA cards have only one MAC address. + * This leads to duplicate address conflicts in conjunction + * with IPv6 if more than one instance uses the same card. + * + * The driver for these cards can deliver a unique 16-bit + * identifier for each instance sharing the same card. It is + * placed instead of 0xFFFE in the interface identifier. The + * "u" bit of the interface identifier is not inverted in this + * case. Hence the resulting interface identifier has local + * scope according to RFC2373. + */ + if (dev->dev_id) { + eui[3] = (dev->dev_id >> 8) & 0xFF; + eui[4] = dev->dev_id & 0xFF; + } else { + eui[3] = 0xFF; + eui[4] = 0xFE; + eui[0] ^= 2; + } + return 0; +} + +static int addrconf_ifid_arcnet(u8 *eui, struct net_device *dev) +{ + /* XXX: inherit EUI-64 from other interface -- yoshfuji */ + if (dev->addr_len != ARCNET_ALEN) + return -1; + memset(eui, 0, 7); + eui[7] = *(u8*)dev->dev_addr; + return 0; +} + +static int addrconf_ifid_infiniband(u8 *eui, struct net_device *dev) +{ + if (dev->addr_len != INFINIBAND_ALEN) + return -1; + memcpy(eui, dev->dev_addr + 12, 8); + eui[0] |= 2; + return 0; +} + static int ipv6_generate_eui64(u8 *eui, struct net_device *dev) { switch (dev->type) { case ARPHRD_ETHER: case ARPHRD_FDDI: case ARPHRD_IEEE802_TR: - if (dev->addr_len != ETH_ALEN) - return -1; - memcpy(eui, dev->dev_addr, 3); - memcpy(eui + 5, dev->dev_addr + 3, 3); - - /* - * The zSeries OSA network cards can be shared among various - * OS instances, but the OSA cards have only one MAC address. - * This leads to duplicate address conflicts in conjunction - * with IPv6 if more than one instance uses the same card. - * - * The driver for these cards can deliver a unique 16-bit - * identifier for each instance sharing the same card. It is - * placed instead of 0xFFFE in the interface identifier. The - * "u" bit of the interface identifier is not inverted in this - * case. Hence the resulting interface identifier has local - * scope according to RFC2373. - */ - if (dev->dev_id) { - eui[3] = (dev->dev_id >> 8) & 0xFF; - eui[4] = dev->dev_id & 0xFF; - } else { - eui[3] = 0xFF; - eui[4] = 0xFE; - eui[0] ^= 2; - } - return 0; + return addrconf_ifid_eui48(eui, dev); case ARPHRD_ARCNET: - /* XXX: inherit EUI-64 from other interface -- yoshfuji */ - if (dev->addr_len != ARCNET_ALEN) - return -1; - memset(eui, 0, 7); - eui[7] = *(u8*)dev->dev_addr; - return 0; + return addrconf_ifid_arcnet(eui, dev); case ARPHRD_INFINIBAND: - if (dev->addr_len != INFINIBAND_ALEN) - return -1; - memcpy(eui, dev->dev_addr + 12, 8); - eui[0] |= 2; - return 0; + return addrconf_ifid_infiniband(eui, dev); } return -1; } @@ -1376,34 +1401,9 @@ static int ipv6_inherit_eui64(u8 *eui, s /* (re)generation of randomized interface identifier (RFC 3041 3.2, 3.5) */ static int __ipv6_regen_rndid(struct inet6_dev *idev) { - struct net_device *dev; - struct scatterlist sg[2]; - - sg_set_buf(&sg[0], idev->entropy, 8); - sg_set_buf(&sg[1], idev->work_eui64, 8); - - dev = idev->dev; - - if (ipv6_generate_eui64(idev->work_eui64, dev)) { - printk(KERN_INFO - "__ipv6_regen_rndid(idev=%p): cannot get EUI64 identifier; use random bytes.\n", - idev); - get_random_bytes(idev->work_eui64, sizeof(idev->work_eui64)); - } regen: - spin_lock(&md5_tfm_lock); - if (unlikely(md5_tfm == NULL)) { - spin_unlock(&md5_tfm_lock); - return -1; - } - crypto_digest_init(md5_tfm); - crypto_digest_update(md5_tfm, sg, 2); - crypto_digest_final(md5_tfm, idev->work_digest); - spin_unlock(&md5_tfm_lock); - - memcpy(idev->rndid, &idev->work_digest[0], 8); + get_random_bytes(idev->rndid, sizeof(idev->rndid)); idev->rndid[0] &= ~0x02; - memcpy(idev->entropy, &idev->work_digest[8], 8); /* * : @@ -2143,7 +2143,6 @@ static void addrconf_ip6_tnl_config(stru return; } ip6_tnl_add_linklocal(idev); - addrconf_add_mroute(dev); } static int addrconf_notify(struct notifier_block *this, unsigned long event, @@ -2668,11 +2667,10 @@ static int if6_seq_open(struct inode *in { struct seq_file *seq; int rc = -ENOMEM; - struct if6_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + struct if6_iter_state *s = kzalloc(sizeof(*s), GFP_KERNEL); if (!s) goto out; - memset(s, 0, sizeof(*s)); rc = seq_open(file, &if6_seq_ops); if (rc) @@ -3133,6 +3131,15 @@ static void inline ipv6_store_devconf(st array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor; #endif array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses; + array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr; + array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo; +#ifdef CONFIG_IPV6_ROUTER_PREF + array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref; + array[DEVCONF_RTR_PROBE_INTERVAL] = cnf->rtr_probe_interval; +#ifdef CONFIV_IPV6_ROUTE_INFO + array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] = cnf->accept_ra_rt_info_max_plen; +#endif +#endif } static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev, @@ -3586,6 +3593,51 @@ static struct addrconf_sysctl_table .proc_handler = &proc_dointvec, }, { + .ctl_name = NET_IPV6_ACCEPT_RA_DEFRTR, + .procname = "accept_ra_defrtr", + .data = &ipv6_devconf.accept_ra_defrtr, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_ACCEPT_RA_PINFO, + .procname = "accept_ra_pinfo", + .data = &ipv6_devconf.accept_ra_pinfo, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, +#ifdef CONFIG_IPV6_ROUTER_PREF + { + .ctl_name = NET_IPV6_ACCEPT_RA_RTR_PREF, + .procname = "accept_ra_rtr_pref", + .data = &ipv6_devconf.accept_ra_rtr_pref, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_RTR_PROBE_INTERVAL, + .procname = "router_probe_interval", + .data = &ipv6_devconf.rtr_probe_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, +#ifdef CONFIV_IPV6_ROUTE_INFO + { + .ctl_name = NET_IPV6_ACCEPT_RA_RT_INFO_MAX_PLEN, + .procname = "accept_ra_rt_info_max_plen", + .data = &ipv6_devconf.accept_ra_rt_info_max_plen, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, +#endif +#endif + { .ctl_name = 0, /* sentinel */ } }, @@ -3760,13 +3812,6 @@ int __init addrconf_init(void) register_netdevice_notifier(&ipv6_dev_notf); -#ifdef CONFIG_IPV6_PRIVACY - md5_tfm = crypto_alloc_tfm("md5", 0); - if (unlikely(md5_tfm == NULL)) - printk(KERN_WARNING - "failed to load transform for md5\n"); -#endif - addrconf_verify(0); rtnetlink_links[PF_INET6] = inet6_rtnetlink_table; #ifdef CONFIG_SYSCTL @@ -3829,11 +3874,6 @@ void __exit addrconf_cleanup(void) rtnl_unlock(); -#ifdef CONFIG_IPV6_PRIVACY - crypto_free_tfm(md5_tfm); - md5_tfm = NULL; -#endif - #ifdef CONFIG_PROC_FS proc_net_remove("if_inet6"); #endif diff -puN net/ipv6/af_inet6.c~git-net net/ipv6/af_inet6.c --- devel/net/ipv6/af_inet6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/af_inet6.c 2006-03-17 23:03:48.000000000 -0800 @@ -456,45 +456,53 @@ int inet6_ioctl(struct socket *sock, uns } const struct proto_ops inet6_stream_ops = { - .family = PF_INET6, - .owner = THIS_MODULE, - .release = inet6_release, - .bind = inet6_bind, - .connect = inet_stream_connect, /* ok */ - .socketpair = sock_no_socketpair, /* a do nothing */ - .accept = inet_accept, /* ok */ - .getname = inet6_getname, - .poll = tcp_poll, /* ok */ - .ioctl = inet6_ioctl, /* must change */ - .listen = inet_listen, /* ok */ - .shutdown = inet_shutdown, /* ok */ - .setsockopt = sock_common_setsockopt, /* ok */ - .getsockopt = sock_common_getsockopt, /* ok */ - .sendmsg = inet_sendmsg, /* ok */ - .recvmsg = sock_common_recvmsg, /* ok */ - .mmap = sock_no_mmap, - .sendpage = tcp_sendpage + .family = PF_INET6, + .owner = THIS_MODULE, + .release = inet6_release, + .bind = inet6_bind, + .connect = inet_stream_connect, /* ok */ + .socketpair = sock_no_socketpair, /* a do nothing */ + .accept = inet_accept, /* ok */ + .getname = inet6_getname, + .poll = tcp_poll, /* ok */ + .ioctl = inet6_ioctl, /* must change */ + .listen = inet_listen, /* ok */ + .shutdown = inet_shutdown, /* ok */ + .setsockopt = sock_common_setsockopt, /* ok */ + .getsockopt = sock_common_getsockopt, /* ok */ + .sendmsg = inet_sendmsg, /* ok */ + .recvmsg = sock_common_recvmsg, /* ok */ + .mmap = sock_no_mmap, + .sendpage = tcp_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; const struct proto_ops inet6_dgram_ops = { - .family = PF_INET6, - .owner = THIS_MODULE, - .release = inet6_release, - .bind = inet6_bind, - .connect = inet_dgram_connect, /* ok */ - .socketpair = sock_no_socketpair, /* a do nothing */ - .accept = sock_no_accept, /* a do nothing */ - .getname = inet6_getname, - .poll = udp_poll, /* ok */ - .ioctl = inet6_ioctl, /* must change */ - .listen = sock_no_listen, /* ok */ - .shutdown = inet_shutdown, /* ok */ - .setsockopt = sock_common_setsockopt, /* ok */ - .getsockopt = sock_common_getsockopt, /* ok */ - .sendmsg = inet_sendmsg, /* ok */ - .recvmsg = sock_common_recvmsg, /* ok */ - .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, + .family = PF_INET6, + .owner = THIS_MODULE, + .release = inet6_release, + .bind = inet6_bind, + .connect = inet_dgram_connect, /* ok */ + .socketpair = sock_no_socketpair, /* a do nothing */ + .accept = sock_no_accept, /* a do nothing */ + .getname = inet6_getname, + .poll = udp_poll, /* ok */ + .ioctl = inet6_ioctl, /* must change */ + .listen = sock_no_listen, /* ok */ + .shutdown = inet_shutdown, /* ok */ + .setsockopt = sock_common_setsockopt, /* ok */ + .getsockopt = sock_common_getsockopt, /* ok */ + .sendmsg = inet_sendmsg, /* ok */ + .recvmsg = sock_common_recvmsg, /* ok */ + .mmap = sock_no_mmap, + .sendpage = sock_no_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; static struct net_proto_family inet6_family_ops = { @@ -505,24 +513,28 @@ static struct net_proto_family inet6_fam /* Same as inet6_dgram_ops, sans udp_poll. */ static const struct proto_ops inet6_sockraw_ops = { - .family = PF_INET6, - .owner = THIS_MODULE, - .release = inet6_release, - .bind = inet6_bind, - .connect = inet_dgram_connect, /* ok */ - .socketpair = sock_no_socketpair, /* a do nothing */ - .accept = sock_no_accept, /* a do nothing */ - .getname = inet6_getname, - .poll = datagram_poll, /* ok */ - .ioctl = inet6_ioctl, /* must change */ - .listen = sock_no_listen, /* ok */ - .shutdown = inet_shutdown, /* ok */ - .setsockopt = sock_common_setsockopt, /* ok */ - .getsockopt = sock_common_getsockopt, /* ok */ - .sendmsg = inet_sendmsg, /* ok */ - .recvmsg = sock_common_recvmsg, /* ok */ - .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, + .family = PF_INET6, + .owner = THIS_MODULE, + .release = inet6_release, + .bind = inet6_bind, + .connect = inet_dgram_connect, /* ok */ + .socketpair = sock_no_socketpair, /* a do nothing */ + .accept = sock_no_accept, /* a do nothing */ + .getname = inet6_getname, + .poll = datagram_poll, /* ok */ + .ioctl = inet6_ioctl, /* must change */ + .listen = sock_no_listen, /* ok */ + .shutdown = inet_shutdown, /* ok */ + .setsockopt = sock_common_setsockopt, /* ok */ + .getsockopt = sock_common_getsockopt, /* ok */ + .sendmsg = inet_sendmsg, /* ok */ + .recvmsg = sock_common_recvmsg, /* ok */ + .mmap = sock_no_mmap, + .sendpage = sock_no_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; static struct inet_protosw rawv6_protosw = { diff -puN net/ipv6/ah6.c~git-net net/ipv6/ah6.c --- devel/net/ipv6/ah6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/ah6.c 2006-03-17 23:03:48.000000000 -0800 @@ -213,6 +213,7 @@ static int ah6_output(struct xfrm_state ah->reserved = 0; ah->spi = x->id.spi; ah->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); ahp->icv(ahp, skb, ah->auth_data); err = 0; @@ -353,12 +354,10 @@ static int ah6_init_state(struct xfrm_st if (x->encap) goto error; - ahp = kmalloc(sizeof(*ahp), GFP_KERNEL); + ahp = kzalloc(sizeof(*ahp), GFP_KERNEL); if (ahp == NULL) return -ENOMEM; - memset(ahp, 0, sizeof(*ahp)); - ahp->key = x->aalg->alg_key; ahp->key_len = (x->aalg->alg_key_len+7)/8; ahp->tfm = crypto_alloc_tfm(x->aalg->alg_name, 0); diff -puN net/ipv6/anycast.c~git-net net/ipv6/anycast.c --- devel/net/ipv6/anycast.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/anycast.c 2006-03-17 23:03:48.000000000 -0800 @@ -308,7 +308,7 @@ int ipv6_dev_ac_inc(struct net_device *d * not found: create a new one. */ - aca = kmalloc(sizeof(struct ifacaddr6), GFP_ATOMIC); + aca = kzalloc(sizeof(struct ifacaddr6), GFP_ATOMIC); if (aca == NULL) { err = -ENOMEM; @@ -322,8 +322,6 @@ int ipv6_dev_ac_inc(struct net_device *d goto out; } - memset(aca, 0, sizeof(struct ifacaddr6)); - ipv6_addr_copy(&aca->aca_addr, addr); aca->aca_idev = idev; aca->aca_rt = rt; @@ -550,7 +548,7 @@ static int ac6_seq_open(struct inode *in { struct seq_file *seq; int rc = -ENOMEM; - struct ac6_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + struct ac6_iter_state *s = kzalloc(sizeof(*s), GFP_KERNEL); if (!s) goto out; @@ -561,7 +559,6 @@ static int ac6_seq_open(struct inode *in seq = file->private_data; seq->private = s; - memset(s, 0, sizeof(*s)); out: return rc; out_kfree: diff -puN net/ipv6/esp6.c~git-net net/ipv6/esp6.c --- devel/net/ipv6/esp6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/esp6.c 2006-03-17 23:03:48.000000000 -0800 @@ -94,6 +94,7 @@ static int esp6_output(struct xfrm_state esph->spi = x->id.spi; esph->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); if (esp->conf.ivlen) crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); @@ -304,12 +305,10 @@ static int esp6_init_state(struct xfrm_s if (x->encap) goto error; - esp = kmalloc(sizeof(*esp), GFP_KERNEL); + esp = kzalloc(sizeof(*esp), GFP_KERNEL); if (esp == NULL) return -ENOMEM; - memset(esp, 0, sizeof(*esp)); - if (x->aalg) { struct xfrm_algo_desc *aalg_desc; diff -puN net/ipv6/ip6_fib.c~git-net net/ipv6/ip6_fib.c --- devel/net/ipv6/ip6_fib.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/ip6_fib.c 2006-03-17 23:03:48.000000000 -0800 @@ -1105,7 +1105,6 @@ static int fib6_age(struct rt6_info *rt, if (rt->rt6i_flags&RTF_EXPIRES && rt->rt6i_expires) { if (time_after(now, rt->rt6i_expires)) { RT6_TRACE("expiring %p\n", rt); - rt6_reset_dflt_pointer(rt); return -1; } gc_args.more++; diff -puN net/ipv6/ip6_flowlabel.c~git-net net/ipv6/ip6_flowlabel.c --- devel/net/ipv6/ip6_flowlabel.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/ip6_flowlabel.c 2006-03-17 23:03:48.000000000 -0800 @@ -287,10 +287,9 @@ fl_create(struct in6_flowlabel_req *freq int err; err = -ENOMEM; - fl = kmalloc(sizeof(*fl), GFP_KERNEL); + fl = kzalloc(sizeof(*fl), GFP_KERNEL); if (fl == NULL) goto done; - memset(fl, 0, sizeof(*fl)); olen = optlen - CMSG_ALIGN(sizeof(*freq)); if (olen > 0) { @@ -663,7 +662,7 @@ static int ip6fl_seq_open(struct inode * { struct seq_file *seq; int rc = -ENOMEM; - struct ip6fl_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + struct ip6fl_iter_state *s = kzalloc(sizeof(*s), GFP_KERNEL); if (!s) goto out; @@ -674,7 +673,6 @@ static int ip6fl_seq_open(struct inode * seq = file->private_data; seq->private = s; - memset(s, 0, sizeof(*s)); out: return rc; out_kfree: diff -puN net/ipv6/ip6_output.c~git-net net/ipv6/ip6_output.c --- devel/net/ipv6/ip6_output.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/ip6_output.c 2006-03-17 23:03:48.000000000 -0800 @@ -733,28 +733,29 @@ int ip6_dst_lookup(struct sock *sk, stru if (*dst) { struct rt6_info *rt = (struct rt6_info*)*dst; - /* Yes, checking route validity in not connected - case is not very simple. Take into account, - that we do not support routing by source, TOS, - and MSG_DONTROUTE --ANK (980726) - - 1. If route was host route, check that - cached destination is current. - If it is network route, we still may - check its validity using saved pointer - to the last used address: daddr_cache. - We do not want to save whole address now, - (because main consumer of this service - is tcp, which has not this problem), - so that the last trick works only on connected - sockets. - 2. oif also should be the same. - */ - + /* Yes, checking route validity in not connected + * case is not very simple. Take into account, + * that we do not support routing by source, TOS, + * and MSG_DONTROUTE --ANK (980726) + * + * 1. If route was host route, check that + * cached destination is current. + * If it is network route, we still may + * check its validity using saved pointer + * to the last used address: daddr_cache. + * We do not want to save whole address now, + * (because main consumer of this service + * is tcp, which has not this problem), + * so that the last trick works only on connected + * sockets. + * 2. oif also should be the same. + */ if (((rt->rt6i_dst.plen != 128 || - !ipv6_addr_equal(&fl->fl6_dst, &rt->rt6i_dst.addr)) + !ipv6_addr_equal(&fl->fl6_dst, + &rt->rt6i_dst.addr)) && (np->daddr_cache == NULL || - !ipv6_addr_equal(&fl->fl6_dst, np->daddr_cache))) + !ipv6_addr_equal(&fl->fl6_dst, + np->daddr_cache))) || (fl->oif && fl->oif != (*dst)->dev->ifindex)) { dst_release(*dst); *dst = NULL; @@ -889,7 +890,7 @@ int ip6_append_data(struct sock *sk, int np->cork.hop_limit = hlimit; np->cork.tclass = tclass; mtu = dst_mtu(rt->u.dst.path); - if (np && np->frag_size < mtu) { + if (np->frag_size < mtu) { if (np->frag_size) mtu = np->frag_size; } diff -puN net/ipv6/ipcomp6.c~git-net net/ipv6/ipcomp6.c --- devel/net/ipv6/ipcomp6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/ipcomp6.c 2006-03-17 23:03:48.000000000 -0800 @@ -50,6 +50,7 @@ #include #include #include +#include struct ipcomp6_tfms { struct list_head list; @@ -57,7 +58,7 @@ struct ipcomp6_tfms { int users; }; -static DECLARE_MUTEX(ipcomp6_resource_sem); +static DEFINE_MUTEX(ipcomp6_resource_mutex); static void **ipcomp6_scratches; static int ipcomp6_scratch_users; static LIST_HEAD(ipcomp6_tfms_list); @@ -286,8 +287,8 @@ static void ipcomp6_free_scratches(void) for_each_cpu(i) { void *scratch = *per_cpu_ptr(scratches, i); - if (scratch) - vfree(scratch); + + vfree(scratch); } free_percpu(scratches); @@ -405,9 +406,9 @@ static void ipcomp6_destroy(struct xfrm_ if (!ipcd) return; xfrm_state_delete_tunnel(x); - down(&ipcomp6_resource_sem); + mutex_lock(&ipcomp6_resource_mutex); ipcomp6_free_data(ipcd); - up(&ipcomp6_resource_sem); + mutex_unlock(&ipcomp6_resource_mutex); kfree(ipcd); xfrm6_tunnel_free_spi((xfrm_address_t *)&x->props.saddr); @@ -427,23 +428,22 @@ static int ipcomp6_init_state(struct xfr goto out; err = -ENOMEM; - ipcd = kmalloc(sizeof(*ipcd), GFP_KERNEL); + ipcd = kzalloc(sizeof(*ipcd), GFP_KERNEL); if (!ipcd) goto out; - memset(ipcd, 0, sizeof(*ipcd)); x->props.header_len = 0; if (x->props.mode) x->props.header_len += sizeof(struct ipv6hdr); - down(&ipcomp6_resource_sem); + mutex_lock(&ipcomp6_resource_mutex); if (!ipcomp6_alloc_scratches()) goto error; ipcd->tfms = ipcomp6_alloc_tfms(x->calg->alg_name); if (!ipcd->tfms) goto error; - up(&ipcomp6_resource_sem); + mutex_unlock(&ipcomp6_resource_mutex); if (x->props.mode) { err = ipcomp6_tunnel_attach(x); @@ -459,10 +459,10 @@ static int ipcomp6_init_state(struct xfr out: return err; error_tunnel: - down(&ipcomp6_resource_sem); + mutex_lock(&ipcomp6_resource_mutex); error: ipcomp6_free_data(ipcd); - up(&ipcomp6_resource_sem); + mutex_unlock(&ipcomp6_resource_mutex); kfree(ipcd); goto out; diff -puN net/ipv6/ipv6_sockglue.c~git-net net/ipv6/ipv6_sockglue.c --- devel/net/ipv6/ipv6_sockglue.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/ipv6_sockglue.c 2006-03-17 23:03:48.000000000 -0800 @@ -109,19 +109,13 @@ int ip6_ra_control(struct sock *sk, int return 0; } -int ipv6_setsockopt(struct sock *sk, int level, int optname, +static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct ipv6_pinfo *np = inet6_sk(sk); int val, valbool; int retv = -ENOPROTOOPT; - if (level == SOL_IP && sk->sk_type != SOCK_RAW) - return udp_prot.setsockopt(sk, level, optname, optval, optlen); - - if(level!=SOL_IPV6) - goto out; - if (optval == NULL) val=0; else if (get_user(val, (int __user *) optval)) @@ -613,17 +607,9 @@ done: retv = xfrm_user_policy(sk, optname, optval, optlen); break; -#ifdef CONFIG_NETFILTER - default: - retv = nf_setsockopt(sk, PF_INET6, optname, optval, - optlen); - break; -#endif - } release_sock(sk); -out: return retv; e_inval: @@ -631,6 +617,65 @@ e_inval: return -EINVAL; } +int ipv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) + return udp_prot.setsockopt(sk, level, optname, optval, optlen); + + if (level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_IPSEC_POLICY && + optname != IPV6_XFRM_POLICY) { + lock_sock(sk); + err = nf_setsockopt(sk, PF_INET6, optname, optval, + optlen); + release_sock(sk); + } +#endif + return err; +} + + +#ifdef CONFIG_COMPAT +int compat_ipv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) { + if (udp_prot.compat_setsockopt != NULL) + return udp_prot.compat_setsockopt(sk, level, optname, + optval, optlen); + return udp_prot.setsockopt(sk, level, optname, optval, optlen); + } + + if (level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_IPSEC_POLICY && + optname != IPV6_XFRM_POLICY) { + lock_sock(sk); + err = compat_nf_setsockopt(sk, PF_INET6, optname, + optval, optlen); + release_sock(sk); + } +#endif + return err; +} + +EXPORT_SYMBOL(compat_ipv6_setsockopt); +#endif + static int ipv6_getsockopt_sticky(struct sock *sk, struct ipv6_opt_hdr *hdr, char __user *optval, int len) { @@ -642,17 +687,13 @@ static int ipv6_getsockopt_sticky(struct return len; } -int ipv6_getsockopt(struct sock *sk, int level, int optname, +static int do_ipv6_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { struct ipv6_pinfo *np = inet6_sk(sk); int len; int val; - if (level == SOL_IP && sk->sk_type != SOCK_RAW) - return udp_prot.getsockopt(sk, level, optname, optval, optlen); - if(level!=SOL_IPV6) - return -ENOPROTOOPT; if (get_user(len, optlen)) return -EFAULT; switch (optname) { @@ -842,17 +883,7 @@ int ipv6_getsockopt(struct sock *sk, int break; default: -#ifdef CONFIG_NETFILTER - lock_sock(sk); - val = nf_getsockopt(sk, PF_INET6, optname, optval, - &len); - release_sock(sk); - if (val >= 0) - val = put_user(len, optlen); - return val; -#else return -EINVAL; -#endif } len = min_t(unsigned int, sizeof(int), len); if(put_user(len, optlen)) @@ -862,6 +893,78 @@ int ipv6_getsockopt(struct sock *sk, int return 0; } +int ipv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) + return udp_prot.getsockopt(sk, level, optname, optval, optlen); + + if(level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible EINVALs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_ADDRFORM && + optname != MCAST_MSFILTER) { + int len; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = nf_getsockopt(sk, PF_INET6, optname, optval, + &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ipv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) { + if (udp_prot.compat_getsockopt != NULL) + return udp_prot.compat_getsockopt(sk, level, optname, + optval, optlen); + return udp_prot.getsockopt(sk, level, optname, optval, optlen); + } + + if (level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible EINVALs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_ADDRFORM && + optname != MCAST_MSFILTER) { + int len; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = compat_nf_getsockopt(sk, PF_INET6, + optname, optval, &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + } +#endif + return err; +} + +EXPORT_SYMBOL(compat_ipv6_getsockopt); +#endif + void __init ipv6_packet_init(void) { dev_add_pack(&ipv6_packet_type); diff -puN net/ipv6/Kconfig~git-net net/ipv6/Kconfig --- devel/net/ipv6/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -6,8 +6,6 @@ config IPV6 tristate "The IPv6 protocol" default m - select CRYPTO if IPV6_PRIVACY - select CRYPTO_MD5 if IPV6_PRIVACY ---help--- This is complemental support for the IP version 6. You will still be able to do traditional IPv4 networking as well. @@ -22,7 +20,7 @@ config IPV6 module will be called ipv6. config IPV6_PRIVACY - bool "IPv6: Privacy Extensions (RFC 3041) support" + bool "IPv6: Privacy Extensions support" depends on IPV6 ---help--- Privacy Extensions for Stateless Address Autoconfiguration in IPv6 @@ -30,6 +28,9 @@ config IPV6_PRIVACY pseudo-random global-scope unicast address(es) will assigned to your interface(s). + We use our standard pseudo random algorithm to generate randomized + interface identifier, instead of one described in RFC 3041. + By default, kernel do not generate temporary addresses. To use temporary addresses, do @@ -37,6 +38,25 @@ config IPV6_PRIVACY See for details. +config IPV6_ROUTER_PREF + bool "IPv6: Router Preference (RFC 4191) support" + depends on IPV6 + ---help--- + Router Preference is an optional extension to the Router + Advertisement message to improve the ability of hosts + to pick more appropriate router, especially when the hosts + is placed in a multi-homed network. + + If unsure, say N. + +config IPV6_ROUTE_INFO + bool "IPv6: Route Information (RFC 4191) support (EXPERIMENTAL)" + depends on IPV6_ROUTER_PREF && EXPERIMENTAL + ---help--- + This is experimental support of Route Information. + + If unsure, say N. + config INET6_AH tristate "IPv6: AH transformation" depends on IPV6 diff -puN net/ipv6/mcast.c~git-net net/ipv6/mcast.c --- devel/net/ipv6/mcast.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/mcast.c 2006-03-17 23:03:48.000000000 -0800 @@ -767,10 +767,10 @@ static void mld_add_delrec(struct inet6_ * for deleted items allows change reports to use common code with * non-deleted or query-response MCA's. */ - pmc = kmalloc(sizeof(*pmc), GFP_ATOMIC); + pmc = kzalloc(sizeof(*pmc), GFP_ATOMIC); if (!pmc) return; - memset(pmc, 0, sizeof(*pmc)); + spin_lock_bh(&im->mca_lock); spin_lock_init(&pmc->mca_lock); pmc->idev = im->idev; @@ -893,7 +893,7 @@ int ipv6_dev_mc_inc(struct net_device *d * not found: create a new one. */ - mc = kmalloc(sizeof(struct ifmcaddr6), GFP_ATOMIC); + mc = kzalloc(sizeof(struct ifmcaddr6), GFP_ATOMIC); if (mc == NULL) { write_unlock_bh(&idev->lock); @@ -901,7 +901,6 @@ int ipv6_dev_mc_inc(struct net_device *d return -ENOMEM; } - memset(mc, 0, sizeof(struct ifmcaddr6)); init_timer(&mc->mca_timer); mc->mca_timer.function = igmp6_timer_handler; mc->mca_timer.data = (unsigned long) mc; @@ -1934,10 +1933,10 @@ static int ip6_mc_add1_src(struct ifmcad psf_prev = psf; } if (!psf) { - psf = kmalloc(sizeof(*psf), GFP_ATOMIC); + psf = kzalloc(sizeof(*psf), GFP_ATOMIC); if (!psf) return -ENOBUFS; - memset(psf, 0, sizeof(*psf)); + psf->sf_addr = *psfsrc; if (psf_prev) { psf_prev->sf_next = psf; @@ -2431,7 +2430,7 @@ static int igmp6_mc_seq_open(struct inod { struct seq_file *seq; int rc = -ENOMEM; - struct igmp6_mc_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + struct igmp6_mc_iter_state *s = kzalloc(sizeof(*s), GFP_KERNEL); if (!s) goto out; @@ -2442,7 +2441,6 @@ static int igmp6_mc_seq_open(struct inod seq = file->private_data; seq->private = s; - memset(s, 0, sizeof(*s)); out: return rc; out_kfree: @@ -2606,7 +2604,7 @@ static int igmp6_mcf_seq_open(struct ino { struct seq_file *seq; int rc = -ENOMEM; - struct igmp6_mcf_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + struct igmp6_mcf_iter_state *s = kzalloc(sizeof(*s), GFP_KERNEL); if (!s) goto out; @@ -2617,7 +2615,6 @@ static int igmp6_mcf_seq_open(struct ino seq = file->private_data; seq->private = s; - memset(s, 0, sizeof(*s)); out: return rc; out_kfree: diff -puN net/ipv6/ndisc.c~git-net net/ipv6/ndisc.c --- devel/net/ipv6/ndisc.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/ndisc.c 2006-03-17 23:03:48.000000000 -0800 @@ -156,7 +156,11 @@ struct neigh_table nd_tbl = { /* ND options */ struct ndisc_options { - struct nd_opt_hdr *nd_opt_array[__ND_OPT_MAX]; + struct nd_opt_hdr *nd_opt_array[__ND_OPT_ARRAY_MAX]; +#ifdef CONFIG_IPV6_ROUTE_INFO + struct nd_opt_hdr *nd_opts_ri; + struct nd_opt_hdr *nd_opts_ri_end; +#endif }; #define nd_opts_src_lladdr nd_opt_array[ND_OPT_SOURCE_LL_ADDR] @@ -255,6 +259,13 @@ static struct ndisc_options *ndisc_parse if (ndopts->nd_opt_array[nd_opt->nd_opt_type] == 0) ndopts->nd_opt_array[nd_opt->nd_opt_type] = nd_opt; break; +#ifdef CONFIG_IPV6_ROUTE_INFO + case ND_OPT_ROUTE_INFO: + ndopts->nd_opts_ri_end = nd_opt; + if (!ndopts->nd_opts_ri) + ndopts->nd_opts_ri = nd_opt; + break; +#endif default: /* * Unknown options must be silently ignored, @@ -1019,10 +1030,11 @@ static void ndisc_router_discovery(struc struct ra_msg *ra_msg = (struct ra_msg *) skb->h.raw; struct neighbour *neigh = NULL; struct inet6_dev *in6_dev; - struct rt6_info *rt; + struct rt6_info *rt = NULL; int lifetime; struct ndisc_options ndopts; int optlen; + unsigned int pref = 0; __u8 * opt = (__u8 *)(ra_msg + 1); @@ -1081,8 +1093,19 @@ static void ndisc_router_discovery(struc (ra_msg->icmph.icmp6_addrconf_other ? IF_RA_OTHERCONF : 0); + if (!in6_dev->cnf.accept_ra_defrtr) + goto skip_defrtr; + lifetime = ntohs(ra_msg->icmph.icmp6_rt_lifetime); +#ifdef CONFIG_IPV6_ROUTER_PREF + pref = ra_msg->icmph.icmp6_router_pref; + /* 10b is handled as if it were 00b (medium) */ + if (pref == ICMPV6_ROUTER_PREF_INVALID || + in6_dev->cnf.accept_ra_rtr_pref) + pref = ICMPV6_ROUTER_PREF_MEDIUM; +#endif + rt = rt6_get_dflt_router(&skb->nh.ipv6h->saddr, skb->dev); if (rt) @@ -1098,7 +1121,7 @@ static void ndisc_router_discovery(struc ND_PRINTK3(KERN_DEBUG "ICMPv6 RA: adding default router.\n"); - rt = rt6_add_dflt_router(&skb->nh.ipv6h->saddr, skb->dev); + rt = rt6_add_dflt_router(&skb->nh.ipv6h->saddr, skb->dev, pref); if (rt == NULL) { ND_PRINTK0(KERN_ERR "ICMPv6 RA: %s() failed to add default route.\n", @@ -1117,6 +1140,8 @@ static void ndisc_router_discovery(struc return; } neigh->flags |= NTF_ROUTER; + } else if (rt) { + rt->rt6i_flags |= (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref); } if (rt) @@ -1128,6 +1153,8 @@ static void ndisc_router_discovery(struc rt->u.dst.metrics[RTAX_HOPLIMIT-1] = ra_msg->icmph.icmp6_hop_limit; } +skip_defrtr: + /* * Update Reachable Time and Retrans Timer */ @@ -1186,7 +1213,21 @@ static void ndisc_router_discovery(struc NEIGH_UPDATE_F_ISROUTER); } - if (ndopts.nd_opts_pi) { +#ifdef CONFIG_IPV6_ROUTE_INFO + if (in6_dev->cnf.accept_ra_rtr_pref && ndopts.nd_opts_ri) { + struct nd_opt_hdr *p; + for (p = ndopts.nd_opts_ri; + p; + p = ndisc_next_option(p, ndopts.nd_opts_ri_end)) { + if (((struct route_info *)p)->prefix_len > in6_dev->cnf.accept_ra_rt_info_max_plen) + continue; + rt6_route_rcv(skb->dev, (u8*)p, (p->nd_opt_len) << 3, + &skb->nh.ipv6h->saddr); + } + } +#endif + + if (in6_dev->cnf.accept_ra_pinfo && ndopts.nd_opts_pi) { struct nd_opt_hdr *p; for (p = ndopts.nd_opts_pi; p; diff -puN net/ipv6/netfilter/ip6_queue.c~git-net net/ipv6/netfilter/ip6_queue.c --- devel/net/ipv6/netfilter/ip6_queue.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6_queue.c 2006-03-17 23:03:48.000000000 -0800 @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -65,7 +66,7 @@ static unsigned int queue_dropped = 0; static unsigned int queue_user_dropped = 0; static struct sock *ipqnl; static LIST_HEAD(queue_list); -static DECLARE_MUTEX(ipqnl_sem); +static DEFINE_MUTEX(ipqnl_mutex); static void ipq_issue_verdict(struct ipq_queue_entry *entry, int verdict) @@ -537,7 +538,7 @@ ipq_rcv_sk(struct sock *sk, int len) struct sk_buff *skb; unsigned int qlen; - down(&ipqnl_sem); + mutex_lock(&ipqnl_mutex); for (qlen = skb_queue_len(&sk->sk_receive_queue); qlen; qlen--) { skb = skb_dequeue(&sk->sk_receive_queue); @@ -545,7 +546,7 @@ ipq_rcv_sk(struct sock *sk, int len) kfree_skb(skb); } - up(&ipqnl_sem); + mutex_unlock(&ipqnl_mutex); } static int @@ -704,8 +705,8 @@ cleanup_sysctl: cleanup_ipqnl: sock_release(ipqnl->sk_socket); - down(&ipqnl_sem); - up(&ipqnl_sem); + mutex_lock(&ipqnl_mutex); + mutex_unlock(&ipqnl_mutex); cleanup_netlink_notifier: netlink_unregister_notifier(&ipq_nl_notifier); diff -puN net/ipv6/netfilter/ip6_tables.c~git-net net/ipv6/netfilter/ip6_tables.c --- devel/net/ipv6/netfilter/ip6_tables.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6_tables.c 2006-03-17 23:03:48.000000000 -0800 @@ -29,7 +29,7 @@ #include #include #include -#include +#include #include #include @@ -94,19 +94,6 @@ do { \ #define up(x) do { printk("UP:%u:" #x "\n", __LINE__); up(x); } while(0) #endif -int -ip6_masked_addrcmp(const struct in6_addr *addr1, const struct in6_addr *mask, - const struct in6_addr *addr2) -{ - int i; - for( i = 0; i < 16; i++){ - if((addr1->s6_addr[i] & mask->s6_addr[i]) != - (addr2->s6_addr[i] & mask->s6_addr[i])) - return 1; - } - return 0; -} - /* Check for an extension */ int ip6t_ext_hdr(u8 nexthdr) @@ -135,10 +122,10 @@ ip6_packet_match(const struct sk_buff *s #define FWINV(bool,invflg) ((bool) ^ !!(ip6info->invflags & invflg)) - if (FWINV(ip6_masked_addrcmp(&ipv6->saddr, &ip6info->smsk, - &ip6info->src), IP6T_INV_SRCIP) - || FWINV(ip6_masked_addrcmp(&ipv6->daddr, &ip6info->dmsk, - &ip6info->dst), IP6T_INV_DSTIP)) { + if (FWINV(ipv6_masked_addr_cmp(&ipv6->saddr, &ip6info->smsk, + &ip6info->src), IP6T_INV_SRCIP) + || FWINV(ipv6_masked_addr_cmp(&ipv6->daddr, &ip6info->dmsk, + &ip6info->dst), IP6T_INV_DSTIP)) { dprintf("Source or dest mismatch.\n"); /* dprintf("SRC: %u. Mask: %u. Target: %u.%s\n", ip->saddr, @@ -232,6 +219,7 @@ ip6t_error(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -251,7 +239,7 @@ int do_match(struct ip6t_entry_match *m, int *hotdrop) { /* Stop iteration if it doesn't match */ - if (!m->u.kernel.match->match(skb, in, out, m->data, + if (!m->u.kernel.match->match(skb, in, out, m->u.kernel.match, m->data, offset, protoff, hotdrop)) return 1; else @@ -373,6 +361,7 @@ ip6t_do_table(struct sk_buff **pskb, verdict = t->u.kernel.target->target(pskb, in, out, hook, + t->u.kernel.target, t->data, userdata); @@ -531,7 +520,7 @@ cleanup_match(struct ip6t_entry_match *m return 1; if (m->u.kernel.match->destroy) - m->u.kernel.match->destroy(m->data, + m->u.kernel.match->destroy(m->u.kernel.match, m->data, m->u.match_size - sizeof(*m)); module_put(m->u.kernel.match->me); return 0; @@ -544,21 +533,12 @@ standard_check(const struct ip6t_entry_t struct ip6t_standard_target *targ = (void *)t; /* Check standard info. */ - if (t->u.target_size - != IP6T_ALIGN(sizeof(struct ip6t_standard_target))) { - duprintf("standard_check: target size %u != %u\n", - t->u.target_size, - IP6T_ALIGN(sizeof(struct ip6t_standard_target))); - return 0; - } - if (targ->verdict >= 0 && targ->verdict > max_offset - sizeof(struct ip6t_entry)) { duprintf("ip6t_standard_check: bad verdict (%i)\n", targ->verdict); return 0; } - if (targ->verdict < -NF_MAX_VERDICT - 1) { duprintf("ip6t_standard_check: bad negative verdict (%i)\n", targ->verdict); @@ -575,6 +555,7 @@ check_match(struct ip6t_entry_match *m, unsigned int *i) { struct ip6t_match *match; + int ret; match = try_then_request_module(xt_find_match(AF_INET6, m->u.user.name, m->u.user.revision), @@ -585,18 +566,27 @@ check_match(struct ip6t_entry_match *m, } m->u.kernel.match = match; + ret = xt_check_match(match, AF_INET6, m->u.match_size - sizeof(*m), + name, hookmask, ipv6->proto, + ipv6->invflags & IP6T_INV_PROTO); + if (ret) + goto err; + if (m->u.kernel.match->checkentry - && !m->u.kernel.match->checkentry(name, ipv6, m->data, + && !m->u.kernel.match->checkentry(name, ipv6, match, m->data, m->u.match_size - sizeof(*m), hookmask)) { - module_put(m->u.kernel.match->me); duprintf("ip_tables: check failed for `%s'.\n", m->u.kernel.match->name); - return -EINVAL; + ret = -EINVAL; + goto err; } (*i)++; return 0; +err: + module_put(m->u.kernel.match->me); + return ret; } static struct ip6t_target ip6t_standard_target; @@ -632,26 +622,32 @@ check_entry(struct ip6t_entry *e, const } t->u.kernel.target = target; + ret = xt_check_target(target, AF_INET6, t->u.target_size - sizeof(*t), + name, e->comefrom, e->ipv6.proto, + e->ipv6.invflags & IP6T_INV_PROTO); + if (ret) + goto err; + if (t->u.kernel.target == &ip6t_standard_target) { if (!standard_check(t, size)) { ret = -EINVAL; goto cleanup_matches; } } else if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(name, e, t->data, + && !t->u.kernel.target->checkentry(name, e, target, t->data, t->u.target_size - sizeof(*t), e->comefrom)) { - module_put(t->u.kernel.target->me); duprintf("ip_tables: check failed for `%s'.\n", t->u.kernel.target->name); ret = -EINVAL; - goto cleanup_matches; + goto err; } (*i)++; return 0; - + err: + module_put(t->u.kernel.target->me); cleanup_matches: IP6T_MATCH_ITERATE(e, cleanup_match, &j); return ret; @@ -712,7 +708,7 @@ cleanup_entry(struct ip6t_entry *e, unsi IP6T_MATCH_ITERATE(e, cleanup_match, NULL); t = ip6t_get_target(e); if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); return 0; @@ -1333,6 +1329,7 @@ static int icmp6_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -1365,28 +1362,27 @@ icmp6_match(const struct sk_buff *skb, static int icmp6_checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ip6t_ip6 *ipv6 = entry; const struct ip6t_icmp *icmpinfo = matchinfo; - /* Must specify proto == ICMP, and no unknown invflags */ - return ipv6->proto == IPPROTO_ICMPV6 - && !(ipv6->invflags & IP6T_INV_PROTO) - && matchsize == IP6T_ALIGN(sizeof(struct ip6t_icmp)) - && !(icmpinfo->invflags & ~IP6T_ICMP_INV); + /* Must specify no unknown invflags */ + return !(icmpinfo->invflags & ~IP6T_ICMP_INV); } /* The built-in targets: standard (NULL) and error. */ static struct ip6t_target ip6t_standard_target = { .name = IP6T_STANDARD_TARGET, + .targetsize = sizeof(int), }; static struct ip6t_target ip6t_error_target = { .name = IP6T_ERROR_TARGET, .target = ip6t_error, + .targetsize = IP6T_FUNCTION_MAXNAMELEN, }; static struct nf_sockopt_ops ip6t_sockopts = { @@ -1402,7 +1398,9 @@ static struct nf_sockopt_ops ip6t_sockop static struct ip6t_match icmp6_matchstruct = { .name = "icmp6", .match = &icmp6_match, - .checkentry = &icmp6_checkentry, + .matchsize = sizeof(struct ip6t_icmp), + .checkentry = icmp6_checkentry, + .proto = IPPROTO_ICMPV6, }; static int __init init(void) @@ -1515,7 +1513,6 @@ EXPORT_SYMBOL(ip6t_unregister_table); EXPORT_SYMBOL(ip6t_do_table); EXPORT_SYMBOL(ip6t_ext_hdr); EXPORT_SYMBOL(ipv6_find_hdr); -EXPORT_SYMBOL(ip6_masked_addrcmp); module_init(init); module_exit(fini); diff -puN net/ipv6/netfilter/ip6t_ah.c~git-net net/ipv6/netfilter/ip6t_ah.c --- devel/net/ipv6/netfilter/ip6t_ah.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_ah.c 2006-03-17 23:03:48.000000000 -0800 @@ -44,6 +44,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -99,17 +100,13 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_ah *ahinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_ah))) { - DEBUGP("ip6t_ah: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_ah))); - return 0; - } if (ahinfo->invflags & ~IP6T_AH_INV_MASK) { DEBUGP("ip6t_ah: unknown flags %X\n", ahinfo->invflags); return 0; @@ -119,8 +116,9 @@ checkentry(const char *tablename, static struct ip6t_match ah_match = { .name = "ah", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_ah), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_dst.c~git-net net/ipv6/netfilter/ip6t_dst.c --- devel/net/ipv6/netfilter/ip6t_dst.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_dst.c 2006-03-17 23:03:48.000000000 -0800 @@ -55,6 +55,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -179,22 +180,17 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_opts *optsinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_opts))) { - DEBUGP("ip6t_opts: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_opts))); - return 0; - } if (optsinfo->invflags & ~IP6T_OPTS_INV_MASK) { DEBUGP("ip6t_opts: unknown flags %X\n", optsinfo->invflags); return 0; } - return 1; } @@ -204,8 +200,9 @@ static struct ip6t_match opts_match = { #else .name = "dst", #endif - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_opts), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_esp.c~git-net net/ipv6/netfilter/ip6t_esp.c --- devel/net/ipv6/netfilter/ip6t_esp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_esp.c 2006-03-17 23:03:48.000000000 -0800 @@ -44,6 +44,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -77,17 +78,13 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_esp *espinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_esp))) { - DEBUGP("ip6t_esp: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_esp))); - return 0; - } if (espinfo->invflags & ~IP6T_ESP_INV_MASK) { DEBUGP("ip6t_esp: unknown flags %X\n", espinfo->invflags); @@ -98,8 +95,9 @@ checkentry(const char *tablename, static struct ip6t_match esp_match = { .name = "esp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_esp), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_eui64.c~git-net net/ipv6/netfilter/ip6t_eui64.c --- devel/net/ipv6/netfilter/ip6t_eui64.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_eui64.c 2006-03-17 23:03:48.000000000 -0800 @@ -22,6 +22,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -60,30 +61,12 @@ match(const struct sk_buff *skb, return 0; } -static int -ip6t_eui64_checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (hook_mask - & ~((1 << NF_IP6_PRE_ROUTING) | (1 << NF_IP6_LOCAL_IN) | - (1 << NF_IP6_FORWARD))) { - printk("ip6t_eui64: only valid for PRE_ROUTING, LOCAL_IN or FORWARD.\n"); - return 0; - } - - if (matchsize != IP6T_ALIGN(sizeof(int))) - return 0; - - return 1; -} - static struct ip6t_match eui64_match = { .name = "eui64", - .match = &match, - .checkentry = &ip6t_eui64_checkentry, + .match = match, + .matchsize = sizeof(int), + .hooks = (1 << NF_IP6_PRE_ROUTING) | (1 << NF_IP6_LOCAL_IN) | + (1 << NF_IP6_FORWARD), .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_frag.c~git-net net/ipv6/netfilter/ip6t_frag.c --- devel/net/ipv6/netfilter/ip6t_frag.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_frag.c 2006-03-17 23:03:48.000000000 -0800 @@ -43,6 +43,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -116,29 +117,25 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_frag *fraginfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_frag))) { - DEBUGP("ip6t_frag: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_frag))); - return 0; - } if (fraginfo->invflags & ~IP6T_FRAG_INV_MASK) { DEBUGP("ip6t_frag: unknown flags %X\n", fraginfo->invflags); return 0; } - return 1; } static struct ip6t_match frag_match = { .name = "frag", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_frag), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_hbh.c~git-net net/ipv6/netfilter/ip6t_hbh.c --- devel/net/ipv6/netfilter/ip6t_hbh.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_hbh.c 2006-03-17 23:03:48.000000000 -0800 @@ -55,6 +55,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -179,22 +180,17 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_opts *optsinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_opts))) { - DEBUGP("ip6t_opts: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_opts))); - return 0; - } if (optsinfo->invflags & ~IP6T_OPTS_INV_MASK) { DEBUGP("ip6t_opts: unknown flags %X\n", optsinfo->invflags); return 0; } - return 1; } @@ -204,8 +200,9 @@ static struct ip6t_match opts_match = { #else .name = "dst", #endif - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_opts), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_hl.c~git-net net/ipv6/netfilter/ip6t_hl.c --- devel/net/ipv6/netfilter/ip6t_hl.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_hl.c 2006-03-17 23:03:48.000000000 -0800 @@ -18,10 +18,10 @@ MODULE_AUTHOR("Maciej Soltysiak nh.ipv6h; @@ -48,20 +48,10 @@ static int match(const struct sk_buff *s return 0; } -static int checkentry(const char *tablename, const void *entry, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_hl_info))) - return 0; - - return 1; -} - static struct ip6t_match hl_match = { .name = "hl", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_hl_info), .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_HL.c~git-net net/ipv6/netfilter/ip6t_HL.c --- devel/net/ipv6/netfilter/ip6t_HL.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_HL.c 2006-03-17 23:03:48.000000000 -0800 @@ -21,6 +21,7 @@ static unsigned int ip6t_hl_target(struc const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { struct ipv6hdr *ip6h; @@ -63,43 +64,31 @@ static unsigned int ip6t_hl_target(struc static int ip6t_hl_checkentry(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct ip6t_HL_info *info = targinfo; - if (targinfosize != IP6T_ALIGN(sizeof(struct ip6t_HL_info))) { - printk(KERN_WARNING "ip6t_HL: targinfosize %u != %Zu\n", - targinfosize, - IP6T_ALIGN(sizeof(struct ip6t_HL_info))); - return 0; - } - - if (strcmp(tablename, "mangle")) { - printk(KERN_WARNING "ip6t_HL: can only be called from " - "\"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (info->mode > IP6T_HL_MAXMODE) { printk(KERN_WARNING "ip6t_HL: invalid or unknown Mode %u\n", info->mode); return 0; } - if ((info->mode != IP6T_HL_SET) && (info->hop_limit == 0)) { printk(KERN_WARNING "ip6t_HL: increment/decrement doesn't " "make sense with value 0\n"); return 0; } - return 1; } static struct ip6t_target ip6t_HL = { .name = "HL", .target = ip6t_hl_target, + .targetsize = sizeof(struct ip6t_HL_info), + .table = "mangle", .checkentry = ip6t_hl_checkentry, .me = THIS_MODULE }; diff -puN net/ipv6/netfilter/ip6t_ipv6header.c~git-net net/ipv6/netfilter/ip6t_ipv6header.c --- devel/net/ipv6/netfilter/ip6t_ipv6header.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_ipv6header.c 2006-03-17 23:03:48.000000000 -0800 @@ -29,6 +29,7 @@ static int ipv6header_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -125,17 +126,13 @@ ipv6header_match(const struct sk_buff *s static int ipv6header_checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ip6t_ipv6header_info *info = matchinfo; - /* Check for obvious errors */ - /* This match is valid in all hooks! */ - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_ipv6header_info))) - return 0; - /* invflags is 0 or 0xff in hard mode */ if ((!info->modeflag) && info->invflags != 0x00 && info->invflags != 0xFF) @@ -147,6 +144,7 @@ ipv6header_checkentry(const char *tablen static struct ip6t_match ip6t_ipv6header_match = { .name = "ipv6header", .match = &ipv6header_match, + .matchsize = sizeof(struct ip6t_ipv6header_info), .checkentry = &ipv6header_checkentry, .destroy = NULL, .me = THIS_MODULE, diff -puN net/ipv6/netfilter/ip6t_LOG.c~git-net net/ipv6/netfilter/ip6t_LOG.c --- devel/net/ipv6/netfilter/ip6t_LOG.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_LOG.c 2006-03-17 23:03:48.000000000 -0800 @@ -426,6 +426,7 @@ ip6t_log_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -449,35 +450,29 @@ ip6t_log_target(struct sk_buff **pskb, static int ip6t_log_checkentry(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip6t_log_info *loginfo = targinfo; - if (targinfosize != IP6T_ALIGN(sizeof(struct ip6t_log_info))) { - DEBUGP("LOG: targinfosize %u != %u\n", - targinfosize, IP6T_ALIGN(sizeof(struct ip6t_log_info))); - return 0; - } - if (loginfo->level >= 8) { DEBUGP("LOG: level %u >= 8\n", loginfo->level); return 0; } - if (loginfo->prefix[sizeof(loginfo->prefix)-1] != '\0') { DEBUGP("LOG: prefix term %i\n", loginfo->prefix[sizeof(loginfo->prefix)-1]); return 0; } - return 1; } static struct ip6t_target ip6t_log_reg = { .name = "LOG", .target = ip6t_log_target, + .targetsize = sizeof(struct ip6t_log_info), .checkentry = ip6t_log_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_multiport.c~git-net net/ipv6/netfilter/ip6t_multiport.c --- devel/net/ipv6/netfilter/ip6t_multiport.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_multiport.c 2006-03-17 23:03:48.000000000 -0800 @@ -51,6 +51,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -85,6 +86,7 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -92,13 +94,9 @@ checkentry(const char *tablename, const struct ip6t_ip6 *ip = info; const struct ip6t_multiport *multiinfo = matchinfo; - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_multiport))) - return 0; - /* Must specify proto == TCP/UDP, no unknown flags or bad count */ return (ip->proto == IPPROTO_TCP || ip->proto == IPPROTO_UDP) && !(ip->invflags & IP6T_INV_PROTO) - && matchsize == IP6T_ALIGN(sizeof(struct ip6t_multiport)) && (multiinfo->flags == IP6T_MULTIPORT_SOURCE || multiinfo->flags == IP6T_MULTIPORT_DESTINATION || multiinfo->flags == IP6T_MULTIPORT_EITHER) @@ -107,8 +105,9 @@ checkentry(const char *tablename, static struct ip6t_match multiport_match = { .name = "multiport", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_multiport), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_owner.c~git-net net/ipv6/netfilter/ip6t_owner.c --- devel/net/ipv6/netfilter/ip6t_owner.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_owner.c 2006-03-17 23:03:48.000000000 -0800 @@ -26,6 +26,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -54,34 +55,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ip6t_owner_info *info = matchinfo; - if (hook_mask - & ~((1 << NF_IP6_LOCAL_OUT) | (1 << NF_IP6_POST_ROUTING))) { - printk("ip6t_owner: only valid for LOCAL_OUT or POST_ROUTING.\n"); - return 0; - } - - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_owner_info))) - return 0; - if (info->match & (IP6T_OWNER_PID | IP6T_OWNER_SID)) { printk("ipt_owner: pid and sid matching " "not supported anymore\n"); return 0; } - return 1; } static struct ip6t_match owner_match = { .name = "owner", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_owner_info), + .hooks = (1 << NF_IP6_LOCAL_OUT) | (1 << NF_IP6_POST_ROUTING), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -L net/ipv6/netfilter/ip6t_policy.c -puN net/ipv6/netfilter/ip6t_policy.c~git-net /dev/null --- devel/net/ipv6/netfilter/ip6t_policy.c +++ /dev/null 2003-09-15 06:40:47.000000000 -0700 @@ -1,176 +0,0 @@ -/* IP tables module for matching IPsec policy - * - * Copyright (c) 2004,2005 Patrick McHardy, - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -MODULE_AUTHOR("Patrick McHardy "); -MODULE_DESCRIPTION("IPtables IPsec policy matching module"); -MODULE_LICENSE("GPL"); - - -static inline int -match_xfrm_state(struct xfrm_state *x, const struct ip6t_policy_elem *e) -{ -#define MATCH_ADDR(x,y,z) (!e->match.x || \ - ((!ip6_masked_addrcmp(&e->x.a6, &e->y.a6, z)) \ - ^ e->invert.x)) -#define MATCH(x,y) (!e->match.x || ((e->x == (y)) ^ e->invert.x)) - - return MATCH_ADDR(saddr, smask, (struct in6_addr *)&x->props.saddr.a6) && - MATCH_ADDR(daddr, dmask, (struct in6_addr *)&x->id.daddr.a6) && - MATCH(proto, x->id.proto) && - MATCH(mode, x->props.mode) && - MATCH(spi, x->id.spi) && - MATCH(reqid, x->props.reqid); -} - -static int -match_policy_in(const struct sk_buff *skb, const struct ip6t_policy_info *info) -{ - const struct ip6t_policy_elem *e; - struct sec_path *sp = skb->sp; - int strict = info->flags & IP6T_POLICY_MATCH_STRICT; - int i, pos; - - if (sp == NULL) - return -1; - if (strict && info->len != sp->len) - return 0; - - for (i = sp->len - 1; i >= 0; i--) { - pos = strict ? i - sp->len + 1 : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(sp->x[i].xvec, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? 1 : 0; -} - -static int -match_policy_out(const struct sk_buff *skb, const struct ip6t_policy_info *info) -{ - const struct ip6t_policy_elem *e; - struct dst_entry *dst = skb->dst; - int strict = info->flags & IP6T_POLICY_MATCH_STRICT; - int i, pos; - - if (dst->xfrm == NULL) - return -1; - - for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { - pos = strict ? i : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(dst->xfrm, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? i == info->len : 0; -} - -static int match(const struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - const void *matchinfo, - int offset, - unsigned int protoff, - int *hotdrop) -{ - const struct ip6t_policy_info *info = matchinfo; - int ret; - - if (info->flags & IP6T_POLICY_MATCH_IN) - ret = match_policy_in(skb, info); - else - ret = match_policy_out(skb, info); - - if (ret < 0) - ret = info->flags & IP6T_POLICY_MATCH_NONE ? 1 : 0; - else if (info->flags & IP6T_POLICY_MATCH_NONE) - ret = 0; - - return ret; -} - -static int checkentry(const char *tablename, const void *ip_void, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - struct ip6t_policy_info *info = matchinfo; - - if (matchsize != IP6T_ALIGN(sizeof(*info))) { - printk(KERN_ERR "ip6t_policy: matchsize %u != %zu\n", - matchsize, IP6T_ALIGN(sizeof(*info))); - return 0; - } - if (!(info->flags & (IP6T_POLICY_MATCH_IN|IP6T_POLICY_MATCH_OUT))) { - printk(KERN_ERR "ip6t_policy: neither incoming nor " - "outgoing policy selected\n"); - return 0; - } - if (hook_mask & (1 << NF_IP6_PRE_ROUTING | 1 << NF_IP6_LOCAL_IN) - && info->flags & IP6T_POLICY_MATCH_OUT) { - printk(KERN_ERR "ip6t_policy: output policy not valid in " - "PRE_ROUTING and INPUT\n"); - return 0; - } - if (hook_mask & (1 << NF_IP6_POST_ROUTING | 1 << NF_IP6_LOCAL_OUT) - && info->flags & IP6T_POLICY_MATCH_IN) { - printk(KERN_ERR "ip6t_policy: input policy not valid in " - "POST_ROUTING and OUTPUT\n"); - return 0; - } - if (info->len > IP6T_POLICY_MAX_ELEM) { - printk(KERN_ERR "ip6t_policy: too many policy elements\n"); - return 0; - } - - return 1; -} - -static struct ip6t_match policy_match = { - .name = "policy", - .match = match, - .checkentry = checkentry, - .me = THIS_MODULE, -}; - -static int __init init(void) -{ - return ip6t_register_match(&policy_match); -} - -static void __exit fini(void) -{ - ip6t_unregister_match(&policy_match); -} - -module_init(init); -module_exit(fini); diff -puN net/ipv6/netfilter/ip6t_REJECT.c~git-net net/ipv6/netfilter/ip6t_REJECT.c --- devel/net/ipv6/netfilter/ip6t_REJECT.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_REJECT.c 2006-03-17 23:03:48.000000000 -0800 @@ -179,6 +179,7 @@ static unsigned int reject6_target(struc const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -221,6 +222,7 @@ static unsigned int reject6_target(struc static int check(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -228,24 +230,6 @@ static int check(const char *tablename, const struct ip6t_reject_info *rejinfo = targinfo; const struct ip6t_entry *e = entry; - if (targinfosize != IP6T_ALIGN(sizeof(struct ip6t_reject_info))) { - DEBUGP("ip6t_REJECT: targinfosize %u != 0\n", targinfosize); - return 0; - } - - /* Only allow these for packet filtering. */ - if (strcmp(tablename, "filter") != 0) { - DEBUGP("ip6t_REJECT: bad table `%s'.\n", tablename); - return 0; - } - - if ((hook_mask & ~((1 << NF_IP6_LOCAL_IN) - | (1 << NF_IP6_FORWARD) - | (1 << NF_IP6_LOCAL_OUT))) != 0) { - DEBUGP("ip6t_REJECT: bad hook mask %X\n", hook_mask); - return 0; - } - if (rejinfo->with == IP6T_ICMP6_ECHOREPLY) { printk("ip6t_REJECT: ECHOREPLY is not supported.\n"); return 0; @@ -257,13 +241,16 @@ static int check(const char *tablename, return 0; } } - return 1; } static struct ip6t_target ip6t_reject_reg = { .name = "REJECT", .target = reject6_target, + .targetsize = sizeof(struct ip6t_reject_info), + .table = "filter", + .hooks = (1 << NF_IP6_LOCAL_IN) | (1 << NF_IP6_FORWARD) | + (1 << NF_IP6_LOCAL_OUT), .checkentry = check, .me = THIS_MODULE }; diff -puN net/ipv6/netfilter/ip6t_rt.c~git-net net/ipv6/netfilter/ip6t_rt.c --- devel/net/ipv6/netfilter/ip6t_rt.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_rt.c 2006-03-17 23:03:48.000000000 -0800 @@ -45,6 +45,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -194,17 +195,13 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_rt *rtinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_rt))) { - DEBUGP("ip6t_rt: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_rt))); - return 0; - } if (rtinfo->invflags & ~IP6T_RT_INV_MASK) { DEBUGP("ip6t_rt: unknown flags %X\n", rtinfo->invflags); return 0; @@ -222,8 +219,9 @@ checkentry(const char *tablename, static struct ip6t_match rt_match = { .name = "rt", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_rt), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/Kconfig~git-net net/ipv6/netfilter/Kconfig --- devel/net/ipv6/netfilter/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -133,16 +133,6 @@ config IP6_NF_MATCH_EUI64 To compile it as a module, choose M here. If unsure, say N. -config IP6_NF_MATCH_POLICY - tristate "IPsec policy match support" - depends on IP6_NF_IPTABLES && XFRM - help - Policy matching allows you to match packets based on the - IPsec policy that was used during decapsulation/will - be used during encapsulation. - - To compile it as a module, choose M here. If unsure, say N. - # The targets config IP6_NF_FILTER tristate "Packet filtering" diff -puN net/ipv6/netfilter/Makefile~git-net net/ipv6/netfilter/Makefile --- devel/net/ipv6/netfilter/Makefile~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/Makefile 2006-03-17 23:03:48.000000000 -0800 @@ -9,7 +9,6 @@ obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_ obj-$(CONFIG_IP6_NF_MATCH_IPV6HEADER) += ip6t_ipv6header.o obj-$(CONFIG_IP6_NF_MATCH_FRAG) += ip6t_frag.o obj-$(CONFIG_IP6_NF_MATCH_AHESP) += ip6t_esp.o ip6t_ah.o -obj-$(CONFIG_IP6_NF_MATCH_POLICY) += ip6t_policy.o obj-$(CONFIG_IP6_NF_MATCH_EUI64) += ip6t_eui64.o obj-$(CONFIG_IP6_NF_MATCH_MULTIPORT) += ip6t_multiport.o obj-$(CONFIG_IP6_NF_MATCH_OWNER) += ip6t_owner.o diff -puN net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c~git-net net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c --- devel/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c 2006-03-17 23:03:48.000000000 -0800 @@ -179,31 +179,36 @@ static unsigned int ipv6_confirm(unsigne int (*okfn)(struct sk_buff *)) { struct nf_conn *ct; + struct nf_conn_help *help; enum ip_conntrack_info ctinfo; + unsigned int ret, protoff; + unsigned int extoff = (u8*)((*pskb)->nh.ipv6h + 1) + - (*pskb)->data; + unsigned char pnum = (*pskb)->nh.ipv6h->nexthdr; + /* This is where we call the helper: as the packet goes out. */ ct = nf_ct_get(*pskb, &ctinfo); - if (ct && ct->helper) { - unsigned int ret, protoff; - unsigned int extoff = (u8*)((*pskb)->nh.ipv6h + 1) - - (*pskb)->data; - unsigned char pnum = (*pskb)->nh.ipv6h->nexthdr; - - protoff = nf_ct_ipv6_skip_exthdr(*pskb, extoff, &pnum, - (*pskb)->len - extoff); - if (protoff < 0 || protoff > (*pskb)->len || - pnum == NEXTHDR_FRAGMENT) { - DEBUGP("proto header not found\n"); - return NF_ACCEPT; - } + if (!ct) + goto out; - ret = ct->helper->help(pskb, protoff, ct, ctinfo); - if (ret != NF_ACCEPT) - return ret; + help = nfct_help(ct); + if (!help || !help->helper) + goto out; + + protoff = nf_ct_ipv6_skip_exthdr(*pskb, extoff, &pnum, + (*pskb)->len - extoff); + if (protoff < 0 || protoff > (*pskb)->len || + pnum == NEXTHDR_FRAGMENT) { + DEBUGP("proto header not found\n"); + return NF_ACCEPT; } + ret = help->helper->help(pskb, protoff, ct, ctinfo); + if (ret != NF_ACCEPT) + return ret; +out: /* We've seen it coming out the other side: confirm it */ - return nf_conntrack_confirm(pskb); } diff -puN net/ipv6/netfilter/nf_conntrack_reasm.c~git-net net/ipv6/netfilter/nf_conntrack_reasm.c --- devel/net/ipv6/netfilter/nf_conntrack_reasm.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/nf_conntrack_reasm.c 2006-03-17 23:03:48.000000000 -0800 @@ -313,8 +313,8 @@ static struct nf_ct_frag6_queue *nf_ct_f #ifdef CONFIG_SMP hlist_for_each_entry(fq, n, &nf_ct_frag6_hash[hash], list) { if (fq->id == fq_in->id && - !ipv6_addr_cmp(&fq_in->saddr, &fq->saddr) && - !ipv6_addr_cmp(&fq_in->daddr, &fq->daddr)) { + ipv6_addr_equal(&fq_in->saddr, &fq->saddr) && + ipv6_addr_equal(&fq_in->daddr, &fq->daddr)) { atomic_inc(&fq->refcnt); write_unlock(&nf_ct_frag6_lock); fq_in->last_in |= COMPLETE; @@ -376,8 +376,8 @@ fq_find(u32 id, struct in6_addr *src, st read_lock(&nf_ct_frag6_lock); hlist_for_each_entry(fq, n, &nf_ct_frag6_hash[hash], list) { if (fq->id == id && - !ipv6_addr_cmp(src, &fq->saddr) && - !ipv6_addr_cmp(dst, &fq->daddr)) { + ipv6_addr_equal(src, &fq->saddr) && + ipv6_addr_equal(dst, &fq->daddr)) { atomic_inc(&fq->refcnt); read_unlock(&nf_ct_frag6_lock); return fq; diff -puN net/ipv6/raw.c~git-net net/ipv6/raw.c --- devel/net/ipv6/raw.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/raw.c 2006-03-17 23:03:48.000000000 -0800 @@ -859,29 +859,12 @@ static int rawv6_geticmpfilter(struct so } -static int rawv6_setsockopt(struct sock *sk, int level, int optname, +static int do_rawv6_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct raw6_sock *rp = raw6_sk(sk); int val; - switch(level) { - case SOL_RAW: - break; - - case SOL_ICMPV6: - if (inet_sk(sk)->num != IPPROTO_ICMPV6) - return -EOPNOTSUPP; - return rawv6_seticmpfilter(sk, level, optname, optval, - optlen); - case SOL_IPV6: - if (optname == IPV6_CHECKSUM) - break; - default: - return ipv6_setsockopt(sk, level, optname, optval, - optlen); - }; - if (get_user(val, (int __user *)optval)) return -EFAULT; @@ -906,12 +889,9 @@ static int rawv6_setsockopt(struct sock } } -static int rawv6_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen) +static int rawv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { - struct raw6_sock *rp = raw6_sk(sk); - int val, len; - switch(level) { case SOL_RAW: break; @@ -919,15 +899,45 @@ static int rawv6_getsockopt(struct sock case SOL_ICMPV6: if (inet_sk(sk)->num != IPPROTO_ICMPV6) return -EOPNOTSUPP; - return rawv6_geticmpfilter(sk, level, optname, optval, + return rawv6_seticmpfilter(sk, level, optname, optval, optlen); case SOL_IPV6: if (optname == IPV6_CHECKSUM) break; default: - return ipv6_getsockopt(sk, level, optname, optval, + return ipv6_setsockopt(sk, level, optname, optval, optlen); }; + return do_rawv6_setsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_rawv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + switch (level) { + case SOL_RAW: + break; + case SOL_ICMPV6: + if (inet_sk(sk)->num != IPPROTO_ICMPV6) + return -EOPNOTSUPP; + return rawv6_seticmpfilter(sk, level, optname, optval, optlen); + case SOL_IPV6: + if (optname == IPV6_CHECKSUM) + break; + default: + return compat_ipv6_setsockopt(sk, level, optname, + optval, optlen); + }; + return do_rawv6_setsockopt(sk, level, optname, optval, optlen); +} +#endif + +static int do_rawv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + struct raw6_sock *rp = raw6_sk(sk); + int val, len; if (get_user(len,optlen)) return -EFAULT; @@ -953,6 +963,50 @@ static int rawv6_getsockopt(struct sock return 0; } +static int rawv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + switch(level) { + case SOL_RAW: + break; + + case SOL_ICMPV6: + if (inet_sk(sk)->num != IPPROTO_ICMPV6) + return -EOPNOTSUPP; + return rawv6_geticmpfilter(sk, level, optname, optval, + optlen); + case SOL_IPV6: + if (optname == IPV6_CHECKSUM) + break; + default: + return ipv6_getsockopt(sk, level, optname, optval, + optlen); + }; + return do_rawv6_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_rawv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + switch (level) { + case SOL_RAW: + break; + case SOL_ICMPV6: + if (inet_sk(sk)->num != IPPROTO_ICMPV6) + return -EOPNOTSUPP; + return rawv6_geticmpfilter(sk, level, optname, optval, optlen); + case SOL_IPV6: + if (optname == IPV6_CHECKSUM) + break; + default: + return compat_ipv6_getsockopt(sk, level, optname, + optval, optlen); + }; + return do_rawv6_getsockopt(sk, level, optname, optval, optlen); +} +#endif + static int rawv6_ioctl(struct sock *sk, int cmd, unsigned long arg) { switch(cmd) { @@ -998,23 +1052,27 @@ static int rawv6_init_sk(struct sock *sk } struct proto rawv6_prot = { - .name = "RAWv6", - .owner = THIS_MODULE, - .close = rawv6_close, - .connect = ip6_datagram_connect, - .disconnect = udp_disconnect, - .ioctl = rawv6_ioctl, - .init = rawv6_init_sk, - .destroy = inet6_destroy_sock, - .setsockopt = rawv6_setsockopt, - .getsockopt = rawv6_getsockopt, - .sendmsg = rawv6_sendmsg, - .recvmsg = rawv6_recvmsg, - .bind = rawv6_bind, - .backlog_rcv = rawv6_rcv_skb, - .hash = raw_v6_hash, - .unhash = raw_v6_unhash, - .obj_size = sizeof(struct raw6_sock), + .name = "RAWv6", + .owner = THIS_MODULE, + .close = rawv6_close, + .connect = ip6_datagram_connect, + .disconnect = udp_disconnect, + .ioctl = rawv6_ioctl, + .init = rawv6_init_sk, + .destroy = inet6_destroy_sock, + .setsockopt = rawv6_setsockopt, + .getsockopt = rawv6_getsockopt, + .sendmsg = rawv6_sendmsg, + .recvmsg = rawv6_recvmsg, + .bind = rawv6_bind, + .backlog_rcv = rawv6_rcv_skb, + .hash = raw_v6_hash, + .unhash = raw_v6_unhash, + .obj_size = sizeof(struct raw6_sock), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_rawv6_setsockopt, + .compat_getsockopt = compat_rawv6_getsockopt, +#endif }; #ifdef CONFIG_PROC_FS @@ -1140,7 +1198,7 @@ static int raw6_seq_open(struct inode *i { struct seq_file *seq; int rc = -ENOMEM; - struct raw6_iter_state *s = kmalloc(sizeof(*s), GFP_KERNEL); + struct raw6_iter_state *s = kzalloc(sizeof(*s), GFP_KERNEL); if (!s) goto out; rc = seq_open(file, &raw6_seq_ops); @@ -1148,7 +1206,6 @@ static int raw6_seq_open(struct inode *i goto out_kfree; seq = file->private_data; seq->private = s; - memset(s, 0, sizeof(*s)); out: return rc; out_kfree: diff -puN net/ipv6/reassembly.c~git-net net/ipv6/reassembly.c --- devel/net/ipv6/reassembly.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/reassembly.c 2006-03-17 23:03:48.000000000 -0800 @@ -203,7 +203,7 @@ static inline void frag_free_queue(struc static inline struct frag_queue *frag_alloc_queue(void) { - struct frag_queue *fq = kmalloc(sizeof(struct frag_queue), GFP_ATOMIC); + struct frag_queue *fq = kzalloc(sizeof(struct frag_queue), GFP_ATOMIC); if(!fq) return NULL; @@ -288,6 +288,7 @@ static void ip6_evictor(void) static void ip6_frag_expire(unsigned long data) { struct frag_queue *fq = (struct frag_queue *) data; + struct net_device *dev; spin_lock(&fq->lock); @@ -299,22 +300,22 @@ static void ip6_frag_expire(unsigned lon IP6_INC_STATS_BH(IPSTATS_MIB_REASMTIMEOUT); IP6_INC_STATS_BH(IPSTATS_MIB_REASMFAILS); - /* Send error only if the first segment arrived. */ - if (fq->last_in&FIRST_IN && fq->fragments) { - struct net_device *dev = dev_get_by_index(fq->iif); - - /* - But use as source device on which LAST ARRIVED - segment was received. And do not use fq->dev - pointer directly, device might already disappeared. - */ - if (dev) { - fq->fragments->dev = dev; - icmpv6_send(fq->fragments, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0, - dev); - dev_put(dev); - } - } + /* Don't send error if the first segment did not arrive. */ + if (!(fq->last_in&FIRST_IN) || !fq->fragments) + goto out; + + dev = dev_get_by_index(fq->iif); + if (!dev) + goto out; + + /* + But use as source device on which LAST ARRIVED + segment was received. And do not use fq->dev + pointer directly, device might already disappeared. + */ + fq->fragments->dev = dev; + icmpv6_send(fq->fragments, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0, dev); + dev_put(dev); out: spin_unlock(&fq->lock); fq_put(fq, NULL); @@ -368,8 +369,6 @@ ip6_frag_create(unsigned int hash, u32 i if ((fq = frag_alloc_queue()) == NULL) goto oom; - memset(fq, 0, sizeof(struct frag_queue)); - fq->id = id; ipv6_addr_copy(&fq->saddr, src); ipv6_addr_copy(&fq->daddr, dst); diff -puN net/ipv6/route.c~git-net net/ipv6/route.c --- devel/net/ipv6/route.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/route.c 2006-03-17 23:03:48.000000000 -0800 @@ -72,6 +72,10 @@ #define RT6_TRACE(x...) do { ; } while (0) #endif +#define CLONE_OFFLINK_ROUTE 0 + +#define RT6_SELECT_F_IFACE 0x1 +#define RT6_SELECT_F_REACHABLE 0x2 static int ip6_rt_max_size = 4096; static int ip6_rt_gc_min_interval = HZ / 2; @@ -94,6 +98,14 @@ static int ip6_pkt_discard_out(struct s static void ip6_link_failure(struct sk_buff *skb); static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu); +#ifdef CONFIG_IPV6_ROUTE_INFO +static struct rt6_info *rt6_add_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex, + unsigned pref); +static struct rt6_info *rt6_get_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex); +#endif + static struct dst_ops ip6_dst_ops = { .family = AF_INET6, .protocol = __constant_htons(ETH_P_IPV6), @@ -214,150 +226,211 @@ static __inline__ struct rt6_info *rt6_d return rt; } +#ifdef CONFIG_IPV6_ROUTER_PREF +static void rt6_probe(struct rt6_info *rt) +{ + struct neighbour *neigh = rt ? rt->rt6i_nexthop : NULL; + /* + * Okay, this does not seem to be appropriate + * for now, however, we need to check if it + * is really so; aka Router Reachability Probing. + * + * Router Reachability Probe MUST be rate-limited + * to no more than one per minute. + */ + if (!neigh || (neigh->nud_state & NUD_VALID)) + return; + read_lock_bh(&neigh->lock); + if (!(neigh->nud_state & NUD_VALID) && + time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) { + struct in6_addr mcaddr; + struct in6_addr *target; + + neigh->updated = jiffies; + read_unlock_bh(&neigh->lock); + + target = (struct in6_addr *)&neigh->primary_key; + addrconf_addr_solict_mult(target, &mcaddr); + ndisc_send_ns(rt->rt6i_dev, NULL, target, &mcaddr, NULL); + } else + read_unlock_bh(&neigh->lock); +} +#else +static inline void rt6_probe(struct rt6_info *rt) +{ + return; +} +#endif + /* - * pointer to the last default router chosen. BH is disabled locally. + * Default Router Selection (RFC 2461 6.3.6) */ -static struct rt6_info *rt6_dflt_pointer; -static DEFINE_SPINLOCK(rt6_dflt_lock); +static int inline rt6_check_dev(struct rt6_info *rt, int oif) +{ + struct net_device *dev = rt->rt6i_dev; + if (!oif || dev->ifindex == oif) + return 2; + if ((dev->flags & IFF_LOOPBACK) && + rt->rt6i_idev && rt->rt6i_idev->dev->ifindex == oif) + return 1; + return 0; +} -void rt6_reset_dflt_pointer(struct rt6_info *rt) +static int inline rt6_check_neigh(struct rt6_info *rt) { - spin_lock_bh(&rt6_dflt_lock); - if (rt == NULL || rt == rt6_dflt_pointer) { - RT6_TRACE("reset default router: %p->NULL\n", rt6_dflt_pointer); - rt6_dflt_pointer = NULL; + struct neighbour *neigh = rt->rt6i_nexthop; + int m = 0; + if (neigh) { + read_lock_bh(&neigh->lock); + if (neigh->nud_state & NUD_VALID) + m = 1; + read_unlock_bh(&neigh->lock); } - spin_unlock_bh(&rt6_dflt_lock); + return m; } -/* Default Router Selection (RFC 2461 6.3.6) */ -static struct rt6_info *rt6_best_dflt(struct rt6_info *rt, int oif) +static int rt6_score_route(struct rt6_info *rt, int oif, + int strict) { - struct rt6_info *match = NULL; - struct rt6_info *sprt; - int mpri = 0; - - for (sprt = rt; sprt; sprt = sprt->u.next) { - struct neighbour *neigh; - int m = 0; - - if (!oif || - (sprt->rt6i_dev && - sprt->rt6i_dev->ifindex == oif)) - m += 8; + int m = rt6_check_dev(rt, oif); + if (!m && (strict & RT6_SELECT_F_IFACE)) + return -1; +#ifdef CONFIG_IPV6_ROUTER_PREF + m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->rt6i_flags)) << 2; +#endif + if (rt6_check_neigh(rt)) + m |= 16; + else if (strict & RT6_SELECT_F_REACHABLE) + return -1; + return m; +} - if (rt6_check_expired(sprt)) - continue; +static struct rt6_info *rt6_select(struct rt6_info **head, int oif, + int strict) +{ + struct rt6_info *match = NULL, *last = NULL; + struct rt6_info *rt, *rt0 = *head; + u32 metric; + int mpri = -1; - if (sprt == rt6_dflt_pointer) - m += 4; + RT6_TRACE("%s(head=%p(*head=%p), oif=%d)\n", + __FUNCTION__, head, head ? *head : NULL, oif); - if ((neigh = sprt->rt6i_nexthop) != NULL) { - read_lock_bh(&neigh->lock); - switch (neigh->nud_state) { - case NUD_REACHABLE: - m += 3; - break; + for (rt = rt0, metric = rt0->rt6i_metric; + rt && rt->rt6i_metric == metric; + rt = rt->u.next) { + int m; - case NUD_STALE: - case NUD_DELAY: - case NUD_PROBE: - m += 2; - break; + if (rt6_check_expired(rt)) + continue; - case NUD_NOARP: - case NUD_PERMANENT: - m += 1; - break; + last = rt; - case NUD_INCOMPLETE: - default: - read_unlock_bh(&neigh->lock); - continue; - } - read_unlock_bh(&neigh->lock); - } else { + m = rt6_score_route(rt, oif, strict); + if (m < 0) continue; - } - if (m > mpri || m >= 12) { - match = sprt; + if (m > mpri) { + rt6_probe(match); + match = rt; mpri = m; - if (m >= 12) { - /* we choose the last default router if it - * is in (probably) reachable state. - * If route changed, we should do pmtu - * discovery. --yoshfuji - */ - break; - } + } else { + rt6_probe(rt); } } - spin_lock(&rt6_dflt_lock); - if (!match) { - /* - * No default routers are known to be reachable. - * SHOULD round robin - */ - if (rt6_dflt_pointer) { - for (sprt = rt6_dflt_pointer->u.next; - sprt; sprt = sprt->u.next) { - if (sprt->u.dst.obsolete <= 0 && - sprt->u.dst.error == 0 && - !rt6_check_expired(sprt)) { - match = sprt; - break; - } - } - for (sprt = rt; - !match && sprt; - sprt = sprt->u.next) { - if (sprt->u.dst.obsolete <= 0 && - sprt->u.dst.error == 0 && - !rt6_check_expired(sprt)) { - match = sprt; - break; - } - if (sprt == rt6_dflt_pointer) - break; - } - } + if (!match && + (strict & RT6_SELECT_F_REACHABLE) && + last && last != rt0) { + /* no entries matched; do round-robin */ + *head = rt0->u.next; + rt0->u.next = last->u.next; + last->u.next = rt0; } - if (match) { - if (rt6_dflt_pointer != match) - RT6_TRACE("changed default router: %p->%p\n", - rt6_dflt_pointer, match); - rt6_dflt_pointer = match; + RT6_TRACE("%s() => %p, score=%d\n", + __FUNCTION__, match, mpri); + + return (match ? match : &ip6_null_entry); +} + +#ifdef CONFIG_IPV6_ROUTE_INFO +int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, + struct in6_addr *gwaddr) +{ + struct route_info *rinfo = (struct route_info *) opt; + struct in6_addr prefix_buf, *prefix; + unsigned int pref; + u32 lifetime; + struct rt6_info *rt; + + if (len < sizeof(struct route_info)) { + return -EINVAL; } - spin_unlock(&rt6_dflt_lock); - if (!match) { - /* - * Last Resort: if no default routers found, - * use addrconf default route. - * We don't record this route. - */ - for (sprt = ip6_routing_table.leaf; - sprt; sprt = sprt->u.next) { - if (!rt6_check_expired(sprt) && - (sprt->rt6i_flags & RTF_DEFAULT) && - (!oif || - (sprt->rt6i_dev && - sprt->rt6i_dev->ifindex == oif))) { - match = sprt; - break; - } + /* Sanity check for prefix_len and length */ + if (rinfo->length > 3) { + return -EINVAL; + } else if (rinfo->prefix_len > 128) { + return -EINVAL; + } else if (rinfo->prefix_len > 64) { + if (rinfo->length < 2) { + return -EINVAL; } - if (!match) { - /* no default route. give up. */ - match = &ip6_null_entry; + } else if (rinfo->prefix_len > 0) { + if (rinfo->length < 1) { + return -EINVAL; } } - return match; + pref = rinfo->route_pref; + if (pref == ICMPV6_ROUTER_PREF_INVALID) + pref = ICMPV6_ROUTER_PREF_MEDIUM; + + lifetime = htonl(rinfo->lifetime); + if (lifetime == 0xffffffff) { + /* infinity */ + } else if (lifetime > 0x7fffffff/HZ) { + /* Avoid arithmetic overflow */ + lifetime = 0x7fffffff/HZ - 1; + } + + if (rinfo->length == 3) + prefix = (struct in6_addr *)rinfo->prefix; + else { + /* this function is safe */ + ipv6_addr_prefix(&prefix_buf, + (struct in6_addr *)rinfo->prefix, + rinfo->prefix_len); + prefix = &prefix_buf; + } + + rt = rt6_get_route_info(prefix, rinfo->prefix_len, gwaddr, dev->ifindex); + + if (rt && !lifetime) { + ip6_del_rt(rt, NULL, NULL, NULL); + rt = NULL; + } + + if (!rt && lifetime) + rt = rt6_add_route_info(prefix, rinfo->prefix_len, gwaddr, dev->ifindex, + pref); + else if (rt) + rt->rt6i_flags = RTF_ROUTEINFO | + (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref); + + if (rt) { + if (lifetime == 0xffffffff) { + rt->rt6i_flags &= ~RTF_EXPIRES; + } else { + rt->rt6i_expires = jiffies + HZ * lifetime; + rt->rt6i_flags |= RTF_EXPIRES; + } + dst_release(&rt->u.dst); + } + return 0; } +#endif struct rt6_info *rt6_lookup(struct in6_addr *daddr, struct in6_addr *saddr, int oif, int strict) @@ -397,14 +470,9 @@ int ip6_ins_rt(struct rt6_info *rt, stru return err; } -/* No rt6_lock! If COW failed, the function returns dead route entry - with dst->error set to errno value. - */ - -static struct rt6_info *rt6_cow(struct rt6_info *ort, struct in6_addr *daddr, - struct in6_addr *saddr, struct netlink_skb_parms *req) +static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, struct in6_addr *daddr, + struct in6_addr *saddr) { - int err; struct rt6_info *rt; /* @@ -435,25 +503,30 @@ static struct rt6_info *rt6_cow(struct r rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway); - dst_hold(&rt->u.dst); - - err = ip6_ins_rt(rt, NULL, NULL, req); - if (err == 0) - return rt; + } - rt->u.dst.error = err; + return rt; +} - return rt; +static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort, struct in6_addr *daddr) +{ + struct rt6_info *rt = ip6_rt_copy(ort); + if (rt) { + ipv6_addr_copy(&rt->rt6i_dst.addr, daddr); + rt->rt6i_dst.plen = 128; + rt->rt6i_flags |= RTF_CACHE; + if (rt->rt6i_flags & RTF_REJECT) + rt->u.dst.error = ort->u.dst.error; + rt->u.dst.flags |= DST_HOST; + rt->rt6i_nexthop = neigh_clone(ort->rt6i_nexthop); } - dst_hold(&ip6_null_entry.u.dst); - return &ip6_null_entry; + return rt; } #define BACKTRACK() \ -if (rt == &ip6_null_entry && strict) { \ +if (rt == &ip6_null_entry) { \ while ((fn = fn->parent) != NULL) { \ if (fn->fn_flags & RTN_ROOT) { \ - dst_hold(&rt->u.dst); \ goto out; \ } \ if (fn->fn_flags & RTN_RTINFO) \ @@ -465,115 +538,138 @@ if (rt == &ip6_null_entry && strict) { \ void ip6_route_input(struct sk_buff *skb) { struct fib6_node *fn; - struct rt6_info *rt; + struct rt6_info *rt, *nrt; int strict; int attempts = 3; + int err; + int reachable = RT6_SELECT_F_REACHABLE; - strict = ipv6_addr_type(&skb->nh.ipv6h->daddr) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL); + strict = ipv6_addr_type(&skb->nh.ipv6h->daddr) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL) ? RT6_SELECT_F_IFACE : 0; relookup: read_lock_bh(&rt6_lock); +restart_2: fn = fib6_lookup(&ip6_routing_table, &skb->nh.ipv6h->daddr, &skb->nh.ipv6h->saddr); restart: - rt = fn->leaf; - - if ((rt->rt6i_flags & RTF_CACHE)) { - rt = rt6_device_match(rt, skb->dev->ifindex, strict); - BACKTRACK(); - dst_hold(&rt->u.dst); - goto out; - } - - rt = rt6_device_match(rt, skb->dev->ifindex, strict); + rt = rt6_select(&fn->leaf, skb->dev->ifindex, strict | reachable); BACKTRACK(); + if (rt == &ip6_null_entry || + rt->rt6i_flags & RTF_CACHE) + goto out; - if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) { - struct rt6_info *nrt; - dst_hold(&rt->u.dst); - read_unlock_bh(&rt6_lock); + dst_hold(&rt->u.dst); + read_unlock_bh(&rt6_lock); - nrt = rt6_cow(rt, &skb->nh.ipv6h->daddr, - &skb->nh.ipv6h->saddr, - &NETLINK_CB(skb)); + if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) + nrt = rt6_alloc_cow(rt, &skb->nh.ipv6h->daddr, &skb->nh.ipv6h->saddr); + else { +#if CLONE_OFFLINK_ROUTE + nrt = rt6_alloc_clone(rt, &skb->nh.ipv6h->daddr); +#else + goto out2; +#endif + } - dst_release(&rt->u.dst); - rt = nrt; + dst_release(&rt->u.dst); + rt = nrt ? : &ip6_null_entry; - if (rt->u.dst.error != -EEXIST || --attempts <= 0) + dst_hold(&rt->u.dst); + if (nrt) { + err = ip6_ins_rt(nrt, NULL, NULL, &NETLINK_CB(skb)); + if (!err) goto out2; - - /* Race condition! In the gap, when rt6_lock was - released someone could insert this route. Relookup. - */ - dst_release(&rt->u.dst); - goto relookup; } - dst_hold(&rt->u.dst); + + if (--attempts <= 0) + goto out2; + + /* + * Race condition! In the gap, when rt6_lock was + * released someone could insert this route. Relookup. + */ + dst_release(&rt->u.dst); + goto relookup; out: + if (reachable) { + reachable = 0; + goto restart_2; + } + dst_hold(&rt->u.dst); read_unlock_bh(&rt6_lock); out2: rt->u.dst.lastuse = jiffies; rt->u.dst.__use++; skb->dst = (struct dst_entry *) rt; + return; } struct dst_entry * ip6_route_output(struct sock *sk, struct flowi *fl) { struct fib6_node *fn; - struct rt6_info *rt; + struct rt6_info *rt, *nrt; int strict; int attempts = 3; + int err; + int reachable = RT6_SELECT_F_REACHABLE; - strict = ipv6_addr_type(&fl->fl6_dst) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL); + strict = ipv6_addr_type(&fl->fl6_dst) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL) ? RT6_SELECT_F_IFACE : 0; relookup: read_lock_bh(&rt6_lock); +restart_2: fn = fib6_lookup(&ip6_routing_table, &fl->fl6_dst, &fl->fl6_src); restart: - rt = fn->leaf; - - if ((rt->rt6i_flags & RTF_CACHE)) { - rt = rt6_device_match(rt, fl->oif, strict); - BACKTRACK(); - dst_hold(&rt->u.dst); + rt = rt6_select(&fn->leaf, fl->oif, strict | reachable); + BACKTRACK(); + if (rt == &ip6_null_entry || + rt->rt6i_flags & RTF_CACHE) goto out; - } - if (rt->rt6i_flags & RTF_DEFAULT) { - if (rt->rt6i_metric >= IP6_RT_PRIO_ADDRCONF) - rt = rt6_best_dflt(rt, fl->oif); - } else { - rt = rt6_device_match(rt, fl->oif, strict); - BACKTRACK(); - } - if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) { - struct rt6_info *nrt; - dst_hold(&rt->u.dst); - read_unlock_bh(&rt6_lock); + dst_hold(&rt->u.dst); + read_unlock_bh(&rt6_lock); - nrt = rt6_cow(rt, &fl->fl6_dst, &fl->fl6_src, NULL); + if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) + nrt = rt6_alloc_cow(rt, &fl->fl6_dst, &fl->fl6_src); + else { +#if CLONE_OFFLINK_ROUTE + nrt = rt6_alloc_clone(rt, &fl->fl6_dst); +#else + goto out2; +#endif + } - dst_release(&rt->u.dst); - rt = nrt; + dst_release(&rt->u.dst); + rt = nrt ? : &ip6_null_entry; - if (rt->u.dst.error != -EEXIST || --attempts <= 0) + dst_hold(&rt->u.dst); + if (nrt) { + err = ip6_ins_rt(nrt, NULL, NULL, NULL); + if (!err) goto out2; - - /* Race condition! In the gap, when rt6_lock was - released someone could insert this route. Relookup. - */ - dst_release(&rt->u.dst); - goto relookup; } - dst_hold(&rt->u.dst); + + if (--attempts <= 0) + goto out2; + + /* + * Race condition! In the gap, when rt6_lock was + * released someone could insert this route. Relookup. + */ + dst_release(&rt->u.dst); + goto relookup; out: + if (reachable) { + reachable = 0; + goto restart_2; + } + dst_hold(&rt->u.dst); read_unlock_bh(&rt6_lock); out2: rt->u.dst.lastuse = jiffies; @@ -999,8 +1095,6 @@ int ip6_del_rt(struct rt6_info *rt, stru write_lock_bh(&rt6_lock); - rt6_reset_dflt_pointer(NULL); - err = fib6_del(rt, nlh, _rtattr, req); dst_release(&rt->u.dst); @@ -1050,59 +1144,63 @@ static int ip6_route_del(struct in6_rtms void rt6_redirect(struct in6_addr *dest, struct in6_addr *saddr, struct neighbour *neigh, u8 *lladdr, int on_link) { - struct rt6_info *rt, *nrt; - - /* Locate old route to this destination. */ - rt = rt6_lookup(dest, NULL, neigh->dev->ifindex, 1); - - if (rt == NULL) - return; - - if (neigh->dev != rt->rt6i_dev) - goto out; + struct rt6_info *rt, *nrt = NULL; + int strict; + struct fib6_node *fn; /* - * Current route is on-link; redirect is always invalid. - * - * Seems, previous statement is not true. It could - * be node, which looks for us as on-link (f.e. proxy ndisc) - * But then router serving it might decide, that we should - * know truth 8)8) --ANK (980726). + * Get the "current" route for this destination and + * check if the redirect has come from approriate router. + * + * RFC 2461 specifies that redirects should only be + * accepted if they come from the nexthop to the target. + * Due to the way the routes are chosen, this notion + * is a bit fuzzy and one might need to check all possible + * routes. */ - if (!(rt->rt6i_flags&RTF_GATEWAY)) - goto out; + strict = ipv6_addr_type(dest) & (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL); - /* - * RFC 2461 specifies that redirects should only be - * accepted if they come from the nexthop to the target. - * Due to the way default routers are chosen, this notion - * is a bit fuzzy and one might need to check all default - * routers. - */ - if (!ipv6_addr_equal(saddr, &rt->rt6i_gateway)) { - if (rt->rt6i_flags & RTF_DEFAULT) { - struct rt6_info *rt1; - - read_lock(&rt6_lock); - for (rt1 = ip6_routing_table.leaf; rt1; rt1 = rt1->u.next) { - if (ipv6_addr_equal(saddr, &rt1->rt6i_gateway)) { - dst_hold(&rt1->u.dst); - dst_release(&rt->u.dst); - read_unlock(&rt6_lock); - rt = rt1; - goto source_ok; - } - } - read_unlock(&rt6_lock); + read_lock_bh(&rt6_lock); + fn = fib6_lookup(&ip6_routing_table, dest, NULL); +restart: + for (rt = fn->leaf; rt; rt = rt->u.next) { + /* + * Current route is on-link; redirect is always invalid. + * + * Seems, previous statement is not true. It could + * be node, which looks for us as on-link (f.e. proxy ndisc) + * But then router serving it might decide, that we should + * know truth 8)8) --ANK (980726). + */ + if (rt6_check_expired(rt)) + continue; + if (!(rt->rt6i_flags & RTF_GATEWAY)) + continue; + if (neigh->dev != rt->rt6i_dev) + continue; + if (!ipv6_addr_equal(saddr, &rt->rt6i_gateway)) + continue; + break; + } + if (rt) + dst_hold(&rt->u.dst); + else if (strict) { + while ((fn = fn->parent) != NULL) { + if (fn->fn_flags & RTN_ROOT) + break; + if (fn->fn_flags & RTN_RTINFO) + goto restart; } + } + read_unlock_bh(&rt6_lock); + + if (!rt) { if (net_ratelimit()) printk(KERN_DEBUG "rt6_redirect: source isn't a valid nexthop " "for redirect target\n"); - goto out; + return; } -source_ok: - /* * We have finally decided to accept it. */ @@ -1210,38 +1308,27 @@ void rt6_pmtu_discovery(struct in6_addr 1. It is connected route. Action: COW 2. It is gatewayed route or NONEXTHOP route. Action: clone it. */ - if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) { - nrt = rt6_cow(rt, daddr, saddr, NULL); - if (!nrt->u.dst.error) { - nrt->u.dst.metrics[RTAX_MTU-1] = pmtu; - if (allfrag) - nrt->u.dst.metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG; - /* According to RFC 1981, detecting PMTU increase shouldn't be - happened within 5 mins, the recommended timer is 10 mins. - Here this route expiration time is set to ip6_rt_mtu_expires - which is 10 mins. After 10 mins the decreased pmtu is expired - and detecting PMTU increase will be automatically happened. - */ - dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires); - nrt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES; - } - dst_release(&nrt->u.dst); - } else { - nrt = ip6_rt_copy(rt); - if (nrt == NULL) - goto out; - ipv6_addr_copy(&nrt->rt6i_dst.addr, daddr); - nrt->rt6i_dst.plen = 128; - nrt->u.dst.flags |= DST_HOST; - nrt->rt6i_nexthop = neigh_clone(rt->rt6i_nexthop); - dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires); - nrt->rt6i_flags |= RTF_DYNAMIC|RTF_CACHE|RTF_EXPIRES; + if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) + nrt = rt6_alloc_cow(rt, daddr, saddr); + else + nrt = rt6_alloc_clone(rt, daddr); + + if (nrt) { nrt->u.dst.metrics[RTAX_MTU-1] = pmtu; if (allfrag) nrt->u.dst.metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG; + + /* According to RFC 1981, detecting PMTU increase shouldn't be + * happened within 5 mins, the recommended timer is 10 mins. + * Here this route expiration time is set to ip6_rt_mtu_expires + * which is 10 mins. After 10 mins the decreased pmtu is expired + * and detecting PMTU increase will be automatically happened. + */ + dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires); + nrt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES; + ip6_ins_rt(nrt, NULL, NULL, NULL); } - out: dst_release(&rt->u.dst); } @@ -1280,6 +1367,57 @@ static struct rt6_info * ip6_rt_copy(str return rt; } +#ifdef CONFIG_IPV6_ROUTE_INFO +static struct rt6_info *rt6_get_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex) +{ + struct fib6_node *fn; + struct rt6_info *rt = NULL; + + write_lock_bh(&rt6_lock); + fn = fib6_locate(&ip6_routing_table, prefix ,prefixlen, NULL, 0); + if (!fn) + goto out; + + for (rt = fn->leaf; rt; rt = rt->u.next) { + if (rt->rt6i_dev->ifindex != ifindex) + continue; + if ((rt->rt6i_flags & (RTF_ROUTEINFO|RTF_GATEWAY)) != (RTF_ROUTEINFO|RTF_GATEWAY)) + continue; + if (!ipv6_addr_equal(&rt->rt6i_gateway, gwaddr)) + continue; + dst_hold(&rt->u.dst); + break; + } +out: + write_unlock_bh(&rt6_lock); + return rt; +} + +static struct rt6_info *rt6_add_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex, + unsigned pref) +{ + struct in6_rtmsg rtmsg; + + memset(&rtmsg, 0, sizeof(rtmsg)); + rtmsg.rtmsg_type = RTMSG_NEWROUTE; + ipv6_addr_copy(&rtmsg.rtmsg_dst, prefix); + rtmsg.rtmsg_dst_len = prefixlen; + ipv6_addr_copy(&rtmsg.rtmsg_gateway, gwaddr); + rtmsg.rtmsg_metric = 1024; + rtmsg.rtmsg_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_ROUTEINFO | RTF_UP | RTF_PREF(pref); + /* We should treat it as a default route if prefix length is 0. */ + if (!prefixlen) + rtmsg.rtmsg_flags |= RTF_DEFAULT; + rtmsg.rtmsg_ifindex = ifindex; + + ip6_route_add(&rtmsg, NULL, NULL, NULL); + + return rt6_get_route_info(prefix, prefixlen, gwaddr, ifindex); +} +#endif + struct rt6_info *rt6_get_dflt_router(struct in6_addr *addr, struct net_device *dev) { struct rt6_info *rt; @@ -1290,6 +1428,7 @@ struct rt6_info *rt6_get_dflt_router(str write_lock_bh(&rt6_lock); for (rt = fn->leaf; rt; rt=rt->u.next) { if (dev == rt->rt6i_dev && + ((rt->rt6i_flags & (RTF_ADDRCONF | RTF_DEFAULT)) == (RTF_ADDRCONF | RTF_DEFAULT)) && ipv6_addr_equal(&rt->rt6i_gateway, addr)) break; } @@ -1300,7 +1439,8 @@ struct rt6_info *rt6_get_dflt_router(str } struct rt6_info *rt6_add_dflt_router(struct in6_addr *gwaddr, - struct net_device *dev) + struct net_device *dev, + unsigned int pref) { struct in6_rtmsg rtmsg; @@ -1308,7 +1448,8 @@ struct rt6_info *rt6_add_dflt_router(str rtmsg.rtmsg_type = RTMSG_NEWROUTE; ipv6_addr_copy(&rtmsg.rtmsg_gateway, gwaddr); rtmsg.rtmsg_metric = 1024; - rtmsg.rtmsg_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT | RTF_UP | RTF_EXPIRES; + rtmsg.rtmsg_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT | RTF_UP | RTF_EXPIRES | + RTF_PREF(pref); rtmsg.rtmsg_ifindex = dev->ifindex; @@ -1326,8 +1467,6 @@ restart: if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF)) { dst_hold(&rt->u.dst); - rt6_reset_dflt_pointer(NULL); - read_unlock_bh(&rt6_lock); ip6_del_rt(rt, NULL, NULL, NULL); @@ -1738,11 +1877,10 @@ int inet6_dump_fib(struct sk_buff *skb, /* * 2. allocate and initialize walker. */ - w = kmalloc(sizeof(*w), GFP_ATOMIC); + w = kzalloc(sizeof(*w), GFP_ATOMIC); if (w == NULL) return -ENOMEM; RT6_TRACE("dump<%p", w); - memset(w, 0, sizeof(*w)); w->root = &ip6_routing_table; w->func = fib6_dump_node; w->args = &arg; diff -puN net/ipv6/tcp_ipv6.c~git-net net/ipv6/tcp_ipv6.c --- devel/net/ipv6/tcp_ipv6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/tcp_ipv6.c 2006-03-17 23:03:48.000000000 -0800 @@ -987,6 +987,7 @@ static struct sock * tcp_v6_syn_recv_soc inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen + newnp->opt->opt_flen); + tcp_mtup_init(newsk); tcp_sync_mss(newsk, dst_mtu(dst)); newtp->advmss = dst_metric(dst, RTAX_ADVMSS); tcp_initialize_rcv_mss(newsk); @@ -1297,18 +1298,21 @@ static int tcp_v6_remember_stamp(struct } static struct inet_connection_sock_af_ops ipv6_specific = { - .queue_xmit = inet6_csk_xmit, - .send_check = tcp_v6_send_check, - .rebuild_header = inet6_sk_rebuild_header, - .conn_request = tcp_v6_conn_request, - .syn_recv_sock = tcp_v6_syn_recv_sock, - .remember_stamp = tcp_v6_remember_stamp, - .net_header_len = sizeof(struct ipv6hdr), - - .setsockopt = ipv6_setsockopt, - .getsockopt = ipv6_getsockopt, - .addr2sockaddr = inet6_csk_addr2sockaddr, - .sockaddr_len = sizeof(struct sockaddr_in6) + .queue_xmit = inet6_csk_xmit, + .send_check = tcp_v6_send_check, + .rebuild_header = inet6_sk_rebuild_header, + .conn_request = tcp_v6_conn_request, + .syn_recv_sock = tcp_v6_syn_recv_sock, + .remember_stamp = tcp_v6_remember_stamp, + .net_header_len = sizeof(struct ipv6hdr), + .setsockopt = ipv6_setsockopt, + .getsockopt = ipv6_getsockopt, + .addr2sockaddr = inet6_csk_addr2sockaddr, + .sockaddr_len = sizeof(struct sockaddr_in6), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif }; /* @@ -1316,22 +1320,23 @@ static struct inet_connection_sock_af_op */ static struct inet_connection_sock_af_ops ipv6_mapped = { - .queue_xmit = ip_queue_xmit, - .send_check = tcp_v4_send_check, - .rebuild_header = inet_sk_rebuild_header, - .conn_request = tcp_v6_conn_request, - .syn_recv_sock = tcp_v6_syn_recv_sock, - .remember_stamp = tcp_v4_remember_stamp, - .net_header_len = sizeof(struct iphdr), - - .setsockopt = ipv6_setsockopt, - .getsockopt = ipv6_getsockopt, - .addr2sockaddr = inet6_csk_addr2sockaddr, - .sockaddr_len = sizeof(struct sockaddr_in6) + .queue_xmit = ip_queue_xmit, + .send_check = tcp_v4_send_check, + .rebuild_header = inet_sk_rebuild_header, + .conn_request = tcp_v6_conn_request, + .syn_recv_sock = tcp_v6_syn_recv_sock, + .remember_stamp = tcp_v4_remember_stamp, + .net_header_len = sizeof(struct iphdr), + .setsockopt = ipv6_setsockopt, + .getsockopt = ipv6_getsockopt, + .addr2sockaddr = inet6_csk_addr2sockaddr, + .sockaddr_len = sizeof(struct sockaddr_in6), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif }; - - /* NOTE: A lot of things set to zero explicitly by call to * sk_alloc() so need not be done here. */ @@ -1583,6 +1588,10 @@ struct proto tcpv6_prot = { .obj_size = sizeof(struct tcp6_sock), .twsk_prot = &tcp6_timewait_sock_ops, .rsk_prot = &tcp6_request_sock_ops, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_tcp_setsockopt, + .compat_getsockopt = compat_tcp_getsockopt, +#endif }; static struct inet6_protocol tcpv6_protocol = { @@ -1604,21 +1613,12 @@ static struct inet_protosw tcpv6_protosw void __init tcpv6_init(void) { - int err; - /* register inet6 protocol */ if (inet6_add_protocol(&tcpv6_protocol, IPPROTO_TCP) < 0) printk(KERN_ERR "tcpv6_init: Could not register protocol\n"); inet6_register_protosw(&tcpv6_protosw); - err = sock_create_kern(PF_INET6, SOCK_RAW, IPPROTO_TCP, &tcp6_socket); - if (err < 0) + if (inet_csk_ctl_sock_create(&tcp6_socket, PF_INET6, SOCK_RAW, + IPPROTO_TCP) < 0) panic("Failed to create the TCPv6 control socket.\n"); - tcp6_socket->sk->sk_allocation = GFP_ATOMIC; - - /* Unhash it so that IP input processing does not even - * see it, we do not wish this socket to see incoming - * packets. - */ - tcp6_socket->sk->sk_prot->unhash(tcp6_socket->sk); } diff -puN net/ipv6/udp.c~git-net net/ipv6/udp.c --- devel/net/ipv6/udp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/ipv6/udp.c 2006-03-17 23:03:48.000000000 -0800 @@ -880,16 +880,13 @@ static int udpv6_destroy_sock(struct soc /* * Socket option code for UDP */ -static int udpv6_setsockopt(struct sock *sk, int level, int optname, +static int do_udpv6_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct udp_sock *up = udp_sk(sk); int val; int err = 0; - if (level != SOL_UDP) - return ipv6_setsockopt(sk, level, optname, optval, optlen); - if(optlen #include #include +#include #ifdef CONFIG_IPV6_XFRM6_TUNNEL_DEBUG # define X6TDEBUG 3 @@ -357,19 +358,19 @@ static int xfrm6_tunnel_input(struct xfr } static struct xfrm6_tunnel *xfrm6_tunnel_handler; -static DECLARE_MUTEX(xfrm6_tunnel_sem); +static DEFINE_MUTEX(xfrm6_tunnel_mutex); int xfrm6_tunnel_register(struct xfrm6_tunnel *handler) { int ret; - down(&xfrm6_tunnel_sem); + mutex_lock(&xfrm6_tunnel_mutex); ret = 0; if (xfrm6_tunnel_handler != NULL) ret = -EINVAL; if (!ret) xfrm6_tunnel_handler = handler; - up(&xfrm6_tunnel_sem); + mutex_unlock(&xfrm6_tunnel_mutex); return ret; } @@ -380,13 +381,13 @@ int xfrm6_tunnel_deregister(struct xfrm6 { int ret; - down(&xfrm6_tunnel_sem); + mutex_lock(&xfrm6_tunnel_mutex); ret = 0; if (xfrm6_tunnel_handler != handler) ret = -EINVAL; if (!ret) xfrm6_tunnel_handler = NULL; - up(&xfrm6_tunnel_sem); + mutex_unlock(&xfrm6_tunnel_mutex); synchronize_net(); diff -puN net/key/af_key.c~git-net net/key/af_key.c --- devel/net/key/af_key.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/key/af_key.c 2006-03-17 23:03:48.000000000 -0800 @@ -2651,6 +2651,8 @@ static int pfkey_send_notify(struct xfrm return key_notify_sa(x, c); case XFRM_MSG_FLUSHSA: return key_notify_sa_flush(c); + case XFRM_MSG_NEWAE: /* not yet supported */ + break; default: printk("pfkey: Unknown SA event %d\n", c->event); break; @@ -3078,9 +3080,9 @@ static int pfkey_sendmsg(struct kiocb *k if (!hdr) goto out; - down(&xfrm_cfg_sem); + mutex_lock(&xfrm_cfg_mutex); err = pfkey_process(sk, skb, hdr); - up(&xfrm_cfg_sem); + mutex_unlock(&xfrm_cfg_mutex); out: if (err && hdr && pfkey_error(hdr, err, sk) == 0) diff -puN net/llc/af_llc.c~git-net net/llc/af_llc.c --- devel/net/llc/af_llc.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/llc/af_llc.c 2006-03-17 23:03:48.000000000 -0800 @@ -54,7 +54,7 @@ static int llc_ui_wait_for_busy_core(str * * Return the next unused link number for a given sap. */ -static __inline__ u16 llc_ui_next_link_no(int sap) +static inline u16 llc_ui_next_link_no(int sap) { return llc_ui_sap_link_no_max[sap]++; } @@ -65,7 +65,7 @@ static __inline__ u16 llc_ui_next_link_n * * Given an ARP header type return the corresponding ethernet protocol. */ -static __inline__ u16 llc_proto_type(u16 arphrd) +static inline u16 llc_proto_type(u16 arphrd) { return arphrd == ARPHRD_IEEE802_TR ? htons(ETH_P_TR_802_2) : htons(ETH_P_802_2); @@ -75,7 +75,7 @@ static __inline__ u16 llc_proto_type(u16 * llc_ui_addr_null - determines if a address structure is null * @addr: Address to test if null. */ -static __inline__ u8 llc_ui_addr_null(struct sockaddr_llc *addr) +static inline u8 llc_ui_addr_null(struct sockaddr_llc *addr) { return !memcmp(addr, &llc_ui_addrnull, sizeof(*addr)); } @@ -89,8 +89,7 @@ static __inline__ u8 llc_ui_addr_null(st * operation the user would like to perform and the type of socket. * Returns the correct llc header length. */ -static __inline__ u8 llc_ui_header_len(struct sock *sk, - struct sockaddr_llc *addr) +static inline u8 llc_ui_header_len(struct sock *sk, struct sockaddr_llc *addr) { u8 rc = LLC_PDU_LEN_U; @@ -138,7 +137,7 @@ static void llc_ui_sk_init(struct socket } static struct proto llc_proto = { - .name = "DDP", + .name = "LLC", .owner = THIS_MODULE, .obj_size = sizeof(struct llc_sock), }; @@ -188,8 +187,10 @@ static int llc_ui_release(struct socket llc->laddr.lsap, llc->daddr.lsap); if (!llc_send_disc(sk)) llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo); - if (!sock_flag(sk, SOCK_ZAPPED)) + if (!sock_flag(sk, SOCK_ZAPPED)) { + llc_sap_put(llc->sap); llc_sap_remove_socket(llc->sap, sk); + } release_sock(sk); if (llc->dev) dev_put(llc->dev); diff -puN net/llc/llc_c_ac.c~git-net net/llc/llc_c_ac.c --- devel/net/llc/llc_c_ac.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/llc/llc_c_ac.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,7 +27,6 @@ #include #include -#include "llc_output.h" static int llc_conn_ac_inc_vs_by_1(struct sock *sk, struct sk_buff *skb); static void llc_process_tmr_ev(struct sock *sk, struct sk_buff *skb); diff -puN net/llc/llc_core.c~git-net net/llc/llc_core.c --- devel/net/llc/llc_core.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/llc/llc_core.c 2006-03-17 23:03:48.000000000 -0800 @@ -127,7 +127,6 @@ struct llc_sap *llc_sap_open(unsigned ch goto out; sap->laddr.lsap = lsap; sap->rcv_func = func; - llc_sap_hold(sap); llc_add_sap(sap); out: write_unlock_bh(&llc_sap_list_lock); diff -puN net/llc/llc_output.c~git-net net/llc/llc_output.c --- devel/net/llc/llc_output.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/llc/llc_output.c 2006-03-17 23:03:48.000000000 -0800 @@ -30,7 +30,8 @@ * Fills MAC header fields, depending on MAC type. Returns 0, If MAC type * is a valid type and initialization completes correctly 1, otherwise. */ -int llc_mac_hdr_init(struct sk_buff *skb, unsigned char *sa, unsigned char *da) +int llc_mac_hdr_init(struct sk_buff *skb, + const unsigned char *sa, const unsigned char *da) { int rc = 0; diff -L net/llc/llc_output.h -puN net/llc/llc_output.h~git-net /dev/null --- devel/net/llc/llc_output.h +++ /dev/null 2003-09-15 06:40:47.000000000 -0700 @@ -1,20 +0,0 @@ -#ifndef LLC_OUTPUT_H -#define LLC_OUTPUT_H -/* - * Copyright (c) 1997 by Procom Technology, Inc. - * 2001-2003 by Arnaldo Carvalho de Melo - * - * This program can be redistributed or modified under the terms of the - * GNU General Public License version 2 as published by the Free Software - * Foundation. - * This program is distributed without any warranty or implied warranty - * of merchantability or fitness for a particular purpose. - * - * See the GNU General Public License version 2 for more details. - */ - -struct sk_buff; - -int llc_mac_hdr_init(struct sk_buff *skb, unsigned char *sa, unsigned char *da); - -#endif /* LLC_OUTPUT_H */ diff -puN net/llc/llc_s_ac.c~git-net net/llc/llc_s_ac.c --- devel/net/llc/llc_s_ac.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/llc/llc_s_ac.c 2006-03-17 23:03:48.000000000 -0800 @@ -24,7 +24,7 @@ #include #include #include -#include "llc_output.h" + /** * llc_sap_action_unit_data_ind - forward UI PDU to network layer diff -puN net/netfilter/Kconfig~git-net net/netfilter/Kconfig --- devel/net/netfilter/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -279,6 +279,16 @@ config NETFILTER_XT_MATCH_MARK To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_POLICY + tristate 'IPsec "policy" match support' + depends on NETFILTER_XTABLES && XFRM + help + Policy matching allows you to match packets based on the + IPsec policy that was used during decapsulation/will + be used during encapsulation. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_PHYSDEV tristate '"physdev" match support' depends on NETFILTER_XTABLES && BRIDGE_NETFILTER diff -puN net/netfilter/Makefile~git-net net/netfilter/Makefile --- devel/net/netfilter/Makefile~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/Makefile 2006-03-17 23:03:48.000000000 -0800 @@ -40,6 +40,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o obj-$(CONFIG_NETFILTER_XT_MATCH_MARK) += xt_mark.o +obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o diff -puN net/netfilter/nf_conntrack_core.c~git-net net/netfilter/nf_conntrack_core.c --- devel/net/netfilter/nf_conntrack_core.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_core.c 2006-03-17 23:03:48.000000000 -0800 @@ -3,7 +3,7 @@ extension. */ /* (C) 1999-2001 Paul `Rusty' Russell - * (C) 2002-2005 Netfilter Core Team + * (C) 2002-2006 Netfilter Core Team * (C) 2003,2004 USAGI/WIDE Project * * This program is free software; you can redistribute it and/or modify @@ -20,6 +20,9 @@ * - generalize L3 protocol denendent part. * 23 Mar 2004: Yasuyuki Kozakai @USAGI * - add support various size of conntrack structures. + * 26 Jan 2006: Harald Welte + * - restructure nf_conn (introduce nf_conn_help) + * - redesign 'features' how they were originally intended * * Derived from net/ipv4/netfilter/ip_conntrack_core.c */ @@ -55,7 +58,7 @@ #include #include -#define NF_CONNTRACK_VERSION "0.4.1" +#define NF_CONNTRACK_VERSION "0.5.0" #if 0 #define DEBUGP printk @@ -182,7 +185,7 @@ static struct { DEFINE_RWLOCK(nf_ct_cache_lock); /* This avoids calling kmem_cache_create() with same name simultaneously */ -DECLARE_MUTEX(nf_ct_cache_mutex); +static DEFINE_MUTEX(nf_ct_cache_mutex); extern struct nf_conntrack_protocol nf_conntrack_generic_protocol; struct nf_conntrack_protocol * @@ -259,21 +262,8 @@ static inline u_int32_t hash_conntrack(c nf_conntrack_hash_rnd); } -/* Initialize "struct nf_conn" which has spaces for helper */ -static int -init_conntrack_for_helper(struct nf_conn *conntrack, u_int32_t features) -{ - - conntrack->help = (union nf_conntrack_help *) - (((unsigned long)conntrack->data - + (__alignof__(union nf_conntrack_help) - 1)) - & (~((unsigned long)(__alignof__(union nf_conntrack_help) -1)))); - return 0; -} - int nf_conntrack_register_cache(u_int32_t features, const char *name, - size_t size, - int (*init)(struct nf_conn *, u_int32_t)) + size_t size) { int ret = 0; char *cache_name; @@ -288,7 +278,7 @@ int nf_conntrack_register_cache(u_int32_ return -EINVAL; } - down(&nf_ct_cache_mutex); + mutex_lock(&nf_ct_cache_mutex); write_lock_bh(&nf_ct_cache_lock); /* e.g: multiple helpers are loaded */ @@ -296,8 +286,7 @@ int nf_conntrack_register_cache(u_int32_ DEBUGP("nf_conntrack_register_cache: already resisterd.\n"); if ((!strncmp(nf_ct_cache[features].name, name, NF_CT_FEATURES_NAMELEN)) - && nf_ct_cache[features].size == size - && nf_ct_cache[features].init_conntrack == init) { + && nf_ct_cache[features].size == size) { DEBUGP("nf_conntrack_register_cache: reusing.\n"); nf_ct_cache[features].use++; ret = 0; @@ -305,7 +294,7 @@ int nf_conntrack_register_cache(u_int32_ ret = -EBUSY; write_unlock_bh(&nf_ct_cache_lock); - up(&nf_ct_cache_mutex); + mutex_unlock(&nf_ct_cache_mutex); return ret; } write_unlock_bh(&nf_ct_cache_lock); @@ -340,7 +329,6 @@ int nf_conntrack_register_cache(u_int32_ write_lock_bh(&nf_ct_cache_lock); nf_ct_cache[features].use = 1; nf_ct_cache[features].size = size; - nf_ct_cache[features].init_conntrack = init; nf_ct_cache[features].cachep = cachep; nf_ct_cache[features].name = cache_name; write_unlock_bh(&nf_ct_cache_lock); @@ -350,7 +338,7 @@ int nf_conntrack_register_cache(u_int32_ out_free_name: kfree(cache_name); out_up_mutex: - up(&nf_ct_cache_mutex); + mutex_unlock(&nf_ct_cache_mutex); return ret; } @@ -365,19 +353,18 @@ void nf_conntrack_unregister_cache(u_int * slab cache. */ DEBUGP("nf_conntrack_unregister_cache: 0x%04x\n", features); - down(&nf_ct_cache_mutex); + mutex_lock(&nf_ct_cache_mutex); write_lock_bh(&nf_ct_cache_lock); if (--nf_ct_cache[features].use > 0) { write_unlock_bh(&nf_ct_cache_lock); - up(&nf_ct_cache_mutex); + mutex_unlock(&nf_ct_cache_mutex); return; } cachep = nf_ct_cache[features].cachep; name = nf_ct_cache[features].name; nf_ct_cache[features].cachep = NULL; nf_ct_cache[features].name = NULL; - nf_ct_cache[features].init_conntrack = NULL; nf_ct_cache[features].size = 0; write_unlock_bh(&nf_ct_cache_lock); @@ -386,7 +373,7 @@ void nf_conntrack_unregister_cache(u_int kmem_cache_destroy(cachep); kfree(name); - up(&nf_ct_cache_mutex); + mutex_unlock(&nf_ct_cache_mutex); } int @@ -432,11 +419,15 @@ nf_ct_invert_tuple(struct nf_conntrack_t /* nf_conntrack_expect helper functions */ void nf_ct_unlink_expect(struct nf_conntrack_expect *exp) { + struct nf_conn_help *master_help = nfct_help(exp->master); + + NF_CT_ASSERT(master_help); ASSERT_WRITE_LOCK(&nf_conntrack_lock); NF_CT_ASSERT(!timer_pending(&exp->timeout)); + list_del(&exp->list); NF_CT_STAT_INC(expect_delete); - exp->master->expecting--; + master_help->expecting--; nf_conntrack_expect_put(exp); } @@ -508,9 +499,10 @@ find_expectation(const struct nf_conntra void nf_ct_remove_expectations(struct nf_conn *ct) { struct nf_conntrack_expect *i, *tmp; + struct nf_conn_help *help = nfct_help(ct); /* Optimization: most connection never expect any others. */ - if (ct->expecting == 0) + if (!help || help->expecting == 0) return; list_for_each_entry_safe(i, tmp, &nf_conntrack_expect_list, list) { @@ -713,6 +705,7 @@ __nf_conntrack_confirm(struct sk_buff ** conntrack_tuple_cmp, struct nf_conntrack_tuple_hash *, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, NULL)) { + struct nf_conn_help *help; /* Remove from unconfirmed list */ list_del(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list); @@ -726,7 +719,8 @@ __nf_conntrack_confirm(struct sk_buff ** set_bit(IPS_CONFIRMED_BIT, &ct->status); NF_CT_STAT_INC(insert); write_unlock_bh(&nf_conntrack_lock); - if (ct->helper) + help = nfct_help(ct); + if (help && help->helper) nf_conntrack_event_cache(IPCT_HELPER, *pskb); #ifdef CONFIG_NF_NAT_NEEDED if (test_bit(IPS_SRC_NAT_DONE_BIT, &ct->status) || @@ -842,8 +836,9 @@ __nf_conntrack_alloc(const struct nf_con { struct nf_conn *conntrack = NULL; u_int32_t features = 0; + struct nf_conntrack_helper *helper; - if (!nf_conntrack_hash_rnd_initted) { + if (unlikely(!nf_conntrack_hash_rnd_initted)) { get_random_bytes(&nf_conntrack_hash_rnd, 4); nf_conntrack_hash_rnd_initted = 1; } @@ -863,8 +858,11 @@ __nf_conntrack_alloc(const struct nf_con /* find features needed by this conntrack. */ features = l3proto->get_features(orig); + + /* FIXME: protect helper list per RCU */ read_lock_bh(&nf_conntrack_lock); - if (__nf_ct_helper_find(repl) != NULL) + helper = __nf_ct_helper_find(repl); + if (helper) features |= NF_CT_F_HELP; read_unlock_bh(&nf_conntrack_lock); @@ -872,7 +870,7 @@ __nf_conntrack_alloc(const struct nf_con read_lock_bh(&nf_ct_cache_lock); - if (!nf_ct_cache[features].use) { + if (unlikely(!nf_ct_cache[features].use)) { DEBUGP("nf_conntrack_alloc: not supported features = 0x%x\n", features); goto out; @@ -886,12 +884,10 @@ __nf_conntrack_alloc(const struct nf_con memset(conntrack, 0, nf_ct_cache[features].size); conntrack->features = features; - if (nf_ct_cache[features].init_conntrack && - nf_ct_cache[features].init_conntrack(conntrack, features) < 0) { - DEBUGP("nf_conntrack_alloc: failed to init\n"); - kmem_cache_free(nf_ct_cache[features].cachep, conntrack); - conntrack = NULL; - goto out; + if (helper) { + struct nf_conn_help *help = nfct_help(conntrack); + NF_CT_ASSERT(help); + help->helper = helper; } atomic_set(&conntrack->ct_general.use, 1); @@ -972,11 +968,8 @@ init_conntrack(const struct nf_conntrack #endif nf_conntrack_get(&conntrack->master->ct_general); NF_CT_STAT_INC(expect_new); - } else { - conntrack->helper = __nf_ct_helper_find(&repl_tuple); - + } else NF_CT_STAT_INC(new); - } /* Overload tuple linked list to put us in unconfirmed list. */ list_add(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL].list, &unconfirmed); @@ -1206,14 +1199,16 @@ void nf_conntrack_expect_put(struct nf_c static void nf_conntrack_expect_insert(struct nf_conntrack_expect *exp) { + struct nf_conn_help *master_help = nfct_help(exp->master); + atomic_inc(&exp->use); - exp->master->expecting++; + master_help->expecting++; list_add(&exp->list, &nf_conntrack_expect_list); init_timer(&exp->timeout); exp->timeout.data = (unsigned long)exp; exp->timeout.function = expectation_timed_out; - exp->timeout.expires = jiffies + exp->master->helper->timeout * HZ; + exp->timeout.expires = jiffies + master_help->helper->timeout * HZ; add_timer(&exp->timeout); exp->id = ++nf_conntrack_expect_next_id; @@ -1239,10 +1234,12 @@ static void evict_oldest_expect(struct n static inline int refresh_timer(struct nf_conntrack_expect *i) { + struct nf_conn_help *master_help = nfct_help(i->master); + if (!del_timer(&i->timeout)) return 0; - i->timeout.expires = jiffies + i->master->helper->timeout*HZ; + i->timeout.expires = jiffies + master_help->helper->timeout*HZ; add_timer(&i->timeout); return 1; } @@ -1251,8 +1248,11 @@ int nf_conntrack_expect_related(struct n { struct nf_conntrack_expect *i; struct nf_conn *master = expect->master; + struct nf_conn_help *master_help = nfct_help(master); int ret; + NF_CT_ASSERT(master_help); + DEBUGP("nf_conntrack_expect_related %p\n", related_to); DEBUGP("tuple: "); NF_CT_DUMP_TUPLE(&expect->tuple); DEBUGP("mask: "); NF_CT_DUMP_TUPLE(&expect->mask); @@ -1271,8 +1271,8 @@ int nf_conntrack_expect_related(struct n } } /* Will be over limit? */ - if (master->helper->max_expected && - master->expecting >= master->helper->max_expected) + if (master_help->helper->max_expected && + master_help->expecting >= master_help->helper->max_expected) evict_oldest_expect(master); nf_conntrack_expect_insert(expect); @@ -1283,24 +1283,6 @@ out: return ret; } -/* Alter reply tuple (maybe alter helper). This is for NAT, and is - implicitly racy: see __nf_conntrack_confirm */ -void nf_conntrack_alter_reply(struct nf_conn *conntrack, - const struct nf_conntrack_tuple *newreply) -{ - write_lock_bh(&nf_conntrack_lock); - /* Should be unconfirmed, so not in hash table yet */ - NF_CT_ASSERT(!nf_ct_is_confirmed(conntrack)); - - DEBUGP("Altering reply tuple of %p to ", conntrack); - NF_CT_DUMP_TUPLE(newreply); - - conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *newreply; - if (!conntrack->master && conntrack->expecting == 0) - conntrack->helper = __nf_ct_helper_find(newreply); - write_unlock_bh(&nf_conntrack_lock); -} - int nf_conntrack_helper_register(struct nf_conntrack_helper *me) { int ret; @@ -1308,9 +1290,8 @@ int nf_conntrack_helper_register(struct ret = nf_conntrack_register_cache(NF_CT_F_HELP, "nf_conntrack:help", sizeof(struct nf_conn) - + sizeof(union nf_conntrack_help) - + __alignof__(union nf_conntrack_help), - init_conntrack_for_helper); + + sizeof(struct nf_conn_help) + + __alignof__(struct nf_conn_help)); if (ret < 0) { printk(KERN_ERR "nf_conntrack_helper_reigster: Unable to create slab cache for conntracks\n"); return ret; @@ -1338,9 +1319,12 @@ __nf_conntrack_helper_find_byname(const static inline int unhelp(struct nf_conntrack_tuple_hash *i, const struct nf_conntrack_helper *me) { - if (nf_ct_tuplehash_to_ctrack(i)->helper == me) { - nf_conntrack_event(IPCT_HELPER, nf_ct_tuplehash_to_ctrack(i)); - nf_ct_tuplehash_to_ctrack(i)->helper = NULL; + struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(i); + struct nf_conn_help *help = nfct_help(ct); + + if (help && help->helper == me) { + nf_conntrack_event(IPCT_HELPER, ct); + help->helper = NULL; } return 0; } @@ -1356,7 +1340,8 @@ void nf_conntrack_helper_unregister(stru /* Get rid of expectations */ list_for_each_entry_safe(exp, tmp, &nf_conntrack_expect_list, list) { - if (exp->master->helper == me && del_timer(&exp->timeout)) { + struct nf_conn_help *help = nfct_help(exp->master); + if (help->helper == me && del_timer(&exp->timeout)) { nf_ct_unlink_expect(exp); nf_conntrack_expect_put(exp); } @@ -1423,6 +1408,8 @@ void __nf_ct_refresh_acct(struct nf_conn #include #include +#include + /* Generic function for tcp/udp/sctp/dccp and alike. This needs to be * in ip_conntrack_core, since we don't want the protocols to autoload @@ -1697,7 +1684,7 @@ int __init nf_conntrack_init(void) } ret = nf_conntrack_register_cache(NF_CT_F_BASIC, "nf_conntrack:basic", - sizeof(struct nf_conn), NULL); + sizeof(struct nf_conn)); if (ret < 0) { printk(KERN_ERR "Unable to create nf_conn slab cache\n"); goto err_free_hash; diff -puN net/netfilter/nf_conntrack_ftp.c~git-net net/netfilter/nf_conntrack_ftp.c --- devel/net/netfilter/nf_conntrack_ftp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_ftp.c 2006-03-17 23:03:48.000000000 -0800 @@ -440,7 +440,7 @@ static int help(struct sk_buff **pskb, u32 seq; int dir = CTINFO2DIR(ctinfo); unsigned int matchlen, matchoff; - struct ip_ct_ftp_master *ct_ftp_info = &ct->help->ct_ftp_info; + struct ip_ct_ftp_master *ct_ftp_info = &nfct_help(ct)->help.ct_ftp_info; struct nf_conntrack_expect *exp; struct nf_conntrack_man cmd = {}; diff -puN net/netfilter/nf_conntrack_netlink.c~git-net net/netfilter/nf_conntrack_netlink.c --- devel/net/netfilter/nf_conntrack_netlink.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_netlink.c 2006-03-17 23:03:48.000000000 -0800 @@ -2,7 +2,7 @@ * protocol helpers and general trouble making from userspace. * * (C) 2001 by Jay Schulist - * (C) 2002-2005 by Harald Welte + * (C) 2002-2006 by Harald Welte * (C) 2003 by Patrick Mchardy * (C) 2005 by Pablo Neira Ayuso * @@ -44,7 +44,7 @@ MODULE_LICENSE("GPL"); -static char __initdata version[] = "0.92"; +static char __initdata version[] = "0.93"; #if 0 #define DEBUGP printk @@ -165,15 +165,16 @@ static inline int ctnetlink_dump_helpinfo(struct sk_buff *skb, const struct nf_conn *ct) { struct nfattr *nest_helper; + const struct nf_conn_help *help = nfct_help(ct); - if (!ct->helper) + if (!help || !help->helper) return 0; nest_helper = NFA_NEST(skb, CTA_HELP); - NFA_PUT(skb, CTA_HELP_NAME, strlen(ct->helper->name), ct->helper->name); + NFA_PUT(skb, CTA_HELP_NAME, strlen(help->helper->name), help->helper->name); - if (ct->helper->to_nfattr) - ct->helper->to_nfattr(skb, ct); + if (help->helper->to_nfattr) + help->helper->to_nfattr(skb, ct); NFA_NEST_END(skb, nest_helper); @@ -337,9 +338,10 @@ static int ctnetlink_conntrack_event(str group = NFNLGRP_CONNTRACK_UPDATE; } else return NOTIFY_DONE; - - /* FIXME: Check if there are any listeners before, don't hurt performance */ - + + if (!nfnetlink_has_listeners(group)) + return NOTIFY_DONE; + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb) return NOTIFY_DONE; @@ -903,11 +905,17 @@ static inline int ctnetlink_change_helper(struct nf_conn *ct, struct nfattr *cda[]) { struct nf_conntrack_helper *helper; + struct nf_conn_help *help = nfct_help(ct); char *helpname; int err; DEBUGP("entered %s\n", __FUNCTION__); + if (!help) { + /* FIXME: we need to reallocate and rehash */ + return -EBUSY; + } + /* don't change helper of sibling connections */ if (ct->master) return -EINVAL; @@ -924,18 +932,18 @@ ctnetlink_change_helper(struct nf_conn * return -EINVAL; } - if (ct->helper) { + if (help->helper) { if (!helper) { /* we had a helper before ... */ nf_ct_remove_expectations(ct); - ct->helper = NULL; + help->helper = NULL; } else { /* need to zero data of old helper */ - memset(&ct->help, 0, sizeof(ct->help)); + memset(&help->help, 0, sizeof(help->help)); } } - ct->helper = helper; + help->helper = helper; return 0; } @@ -1050,14 +1058,9 @@ ctnetlink_create_conntrack(struct nfattr ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1])); #endif - ct->helper = nf_ct_helper_find_get(rtuple); - add_timer(&ct->timeout); nf_conntrack_hash_insert(ct); - if (ct->helper) - nf_ct_helper_put(ct->helper); - DEBUGP("conntrack with id %u inserted\n", ct->id); return 0; @@ -1417,7 +1420,8 @@ ctnetlink_del_expect(struct sock *ctnl, } list_for_each_entry_safe(exp, tmp, &nf_conntrack_expect_list, list) { - if (exp->master->helper == h + struct nf_conn_help *m_help = nfct_help(exp->master); + if (m_help->helper == h && del_timer(&exp->timeout)) { nf_ct_unlink_expect(exp); nf_conntrack_expect_put(exp); @@ -1452,6 +1456,7 @@ ctnetlink_create_expect(struct nfattr *c struct nf_conntrack_tuple_hash *h = NULL; struct nf_conntrack_expect *exp; struct nf_conn *ct; + struct nf_conn_help *help; int err = 0; DEBUGP("entered %s\n", __FUNCTION__); @@ -1472,8 +1477,9 @@ ctnetlink_create_expect(struct nfattr *c if (!h) return -ENOENT; ct = nf_ct_tuplehash_to_ctrack(h); + help = nfct_help(ct); - if (!ct->helper) { + if (!help || !help->helper) { /* such conntrack hasn't got any helper, abort */ err = -EINVAL; goto out; diff -puN net/netfilter/nf_conntrack_standalone.c~git-net net/netfilter/nf_conntrack_standalone.c --- devel/net/netfilter/nf_conntrack_standalone.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_standalone.c 2006-03-17 23:03:48.000000000 -0800 @@ -839,7 +839,6 @@ EXPORT_SYMBOL(nf_conntrack_l3proto_unreg EXPORT_SYMBOL(nf_conntrack_protocol_register); EXPORT_SYMBOL(nf_conntrack_protocol_unregister); EXPORT_SYMBOL(nf_ct_invert_tuplepr); -EXPORT_SYMBOL(nf_conntrack_alter_reply); EXPORT_SYMBOL(nf_conntrack_destroyed); EXPORT_SYMBOL(need_conntrack); EXPORT_SYMBOL(nf_conntrack_helper_register); diff -puN net/netfilter/nfnetlink.c~git-net net/netfilter/nfnetlink.c --- devel/net/netfilter/nfnetlink.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/nfnetlink.c 2006-03-17 23:03:48.000000000 -0800 @@ -191,6 +191,12 @@ nfnetlink_check_attributes(struct nfnetl return 0; } +int nfnetlink_has_listeners(unsigned int group) +{ + return netlink_has_listeners(nfnl, group); +} +EXPORT_SYMBOL_GPL(nfnetlink_has_listeners); + int nfnetlink_send(struct sk_buff *skb, u32 pid, unsigned group, int echo) { gfp_t allocation = in_interrupt() ? GFP_ATOMIC : GFP_KERNEL; diff -puN net/netfilter/nfnetlink_log.c~git-net net/netfilter/nfnetlink_log.c --- devel/net/netfilter/nfnetlink_log.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/nfnetlink_log.c 2006-03-17 23:03:48.000000000 -0800 @@ -11,6 +11,10 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. * + * 2006-01-26 Harald Welte + * - Add optional local and global sequence number to detect lost + * events from userspace + * */ #include #include @@ -68,11 +72,14 @@ struct nfulnl_instance { unsigned int nlbufsiz; /* netlink buffer allocation size */ unsigned int qthreshold; /* threshold of the queue */ u_int32_t copy_range; + u_int32_t seq; /* instance-local sequential counter */ u_int16_t group_num; /* number of this queue */ + u_int16_t flags; u_int8_t copy_mode; }; static DEFINE_RWLOCK(instances_lock); +static atomic_t global_seq; #define INSTANCE_BUCKETS 16 static struct hlist_head instance_table[INSTANCE_BUCKETS]; @@ -310,6 +317,16 @@ nfulnl_set_qthresh(struct nfulnl_instanc return 0; } +static int +nfulnl_set_flags(struct nfulnl_instance *inst, u_int16_t flags) +{ + spin_lock_bh(&inst->lock); + inst->flags = ntohs(flags); + spin_unlock_bh(&inst->lock); + + return 0; +} + static struct sk_buff *nfulnl_alloc_skb(unsigned int inst_size, unsigned int pkt_size) { @@ -377,6 +394,8 @@ static void nfulnl_timer(unsigned long d spin_unlock_bh(&inst->lock); } +/* This is an inline function, we don't really care about a long + * list of arguments */ static inline int __build_packet_message(struct nfulnl_instance *inst, const struct sk_buff *skb, @@ -515,6 +534,17 @@ __build_packet_message(struct nfulnl_ins read_unlock_bh(&skb->sk->sk_callback_lock); } + /* local sequence number */ + if (inst->flags & NFULNL_CFG_F_SEQ) { + tmp_uint = htonl(inst->seq++); + NFA_PUT(inst->skb, NFULA_SEQ, sizeof(tmp_uint), &tmp_uint); + } + /* global sequence number */ + if (inst->flags & NFULNL_CFG_F_SEQ_GLOBAL) { + tmp_uint = atomic_inc_return(&global_seq); + NFA_PUT(inst->skb, NFULA_SEQ_GLOBAL, sizeof(tmp_uint), &tmp_uint); + } + if (data_len) { struct nfattr *nfa; int size = NFA_LENGTH(data_len); @@ -607,6 +637,11 @@ nfulnl_log_packet(unsigned int pf, spin_lock_bh(&inst->lock); + if (inst->flags & NFULNL_CFG_F_SEQ) + size += NFA_SPACE(sizeof(u_int32_t)); + if (inst->flags & NFULNL_CFG_F_SEQ_GLOBAL) + size += NFA_SPACE(sizeof(u_int32_t)); + qthreshold = inst->qthreshold; /* per-rule qthreshold overrides per-instance */ if (qthreshold > li->u.ulog.qthreshold) @@ -736,10 +771,14 @@ static const int nfula_min[NFULA_MAX] = [NFULA_TIMESTAMP-1] = sizeof(struct nfulnl_msg_packet_timestamp), [NFULA_IFINDEX_INDEV-1] = sizeof(u_int32_t), [NFULA_IFINDEX_OUTDEV-1]= sizeof(u_int32_t), + [NFULA_IFINDEX_PHYSINDEV-1] = sizeof(u_int32_t), + [NFULA_IFINDEX_PHYSOUTDEV-1] = sizeof(u_int32_t), [NFULA_HWADDR-1] = sizeof(struct nfulnl_msg_packet_hw), [NFULA_PAYLOAD-1] = 0, [NFULA_PREFIX-1] = 0, [NFULA_UID-1] = sizeof(u_int32_t), + [NFULA_SEQ-1] = sizeof(u_int32_t), + [NFULA_SEQ_GLOBAL-1] = sizeof(u_int32_t), }; static const int nfula_cfg_min[NFULA_CFG_MAX] = { @@ -748,6 +787,7 @@ static const int nfula_cfg_min[NFULA_CFG [NFULA_CFG_TIMEOUT-1] = sizeof(u_int32_t), [NFULA_CFG_QTHRESH-1] = sizeof(u_int32_t), [NFULA_CFG_NLBUFSIZ-1] = sizeof(u_int32_t), + [NFULA_CFG_FLAGS-1] = sizeof(u_int16_t), }; static int @@ -859,6 +899,12 @@ nfulnl_recv_config(struct sock *ctnl, st nfulnl_set_qthresh(inst, ntohl(qthresh)); } + if (nfula[NFULA_CFG_FLAGS-1]) { + u_int16_t flags = + *(u_int16_t *)NFA_DATA(nfula[NFULA_CFG_FLAGS-1]); + nfulnl_set_flags(inst, ntohl(flags)); + } + out_put: instance_put(inst); return ret; diff -puN net/netfilter/nf_sockopt.c~git-net net/netfilter/nf_sockopt.c --- devel/net/netfilter/nf_sockopt.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/nf_sockopt.c 2006-03-17 23:03:48.000000000 -0800 @@ -4,6 +4,7 @@ #include #include #include +#include #include #include "nf_internals.h" @@ -11,7 +12,7 @@ /* Sockopts only registered and called from user context, so net locking would be overkill. Also, [gs]etsockopt calls may sleep. */ -static DECLARE_MUTEX(nf_sockopt_mutex); +static DEFINE_MUTEX(nf_sockopt_mutex); static LIST_HEAD(nf_sockopts); /* Do exclusive ranges overlap? */ @@ -26,7 +27,7 @@ int nf_register_sockopt(struct nf_sockop struct list_head *i; int ret = 0; - if (down_interruptible(&nf_sockopt_mutex) != 0) + if (mutex_lock_interruptible(&nf_sockopt_mutex) != 0) return -EINTR; list_for_each(i, &nf_sockopts) { @@ -48,7 +49,7 @@ int nf_register_sockopt(struct nf_sockop list_add(®->list, &nf_sockopts); out: - up(&nf_sockopt_mutex); + mutex_unlock(&nf_sockopt_mutex); return ret; } EXPORT_SYMBOL(nf_register_sockopt); @@ -57,18 +58,18 @@ void nf_unregister_sockopt(struct nf_soc { /* No point being interruptible: we're probably in cleanup_module() */ restart: - down(&nf_sockopt_mutex); + mutex_lock(&nf_sockopt_mutex); if (reg->use != 0) { /* To be woken by nf_sockopt call... */ /* FIXME: Stuart Young's name appears gratuitously. */ set_current_state(TASK_UNINTERRUPTIBLE); reg->cleanup_task = current; - up(&nf_sockopt_mutex); + mutex_unlock(&nf_sockopt_mutex); schedule(); goto restart; } list_del(®->list); - up(&nf_sockopt_mutex); + mutex_unlock(&nf_sockopt_mutex); } EXPORT_SYMBOL(nf_unregister_sockopt); @@ -80,7 +81,7 @@ static int nf_sockopt(struct sock *sk, i struct nf_sockopt_ops *ops; int ret; - if (down_interruptible(&nf_sockopt_mutex) != 0) + if (mutex_lock_interruptible(&nf_sockopt_mutex) != 0) return -EINTR; list_for_each(i, &nf_sockopts) { @@ -90,7 +91,7 @@ static int nf_sockopt(struct sock *sk, i if (val >= ops->get_optmin && val < ops->get_optmax) { ops->use++; - up(&nf_sockopt_mutex); + mutex_unlock(&nf_sockopt_mutex); ret = ops->get(sk, val, opt, len); goto out; } @@ -98,22 +99,22 @@ static int nf_sockopt(struct sock *sk, i if (val >= ops->set_optmin && val < ops->set_optmax) { ops->use++; - up(&nf_sockopt_mutex); + mutex_unlock(&nf_sockopt_mutex); ret = ops->set(sk, val, opt, *len); goto out; } } } } - up(&nf_sockopt_mutex); + mutex_unlock(&nf_sockopt_mutex); return -ENOPROTOOPT; out: - down(&nf_sockopt_mutex); + mutex_lock(&nf_sockopt_mutex); ops->use--; if (ops->cleanup_task) wake_up_process(ops->cleanup_task); - up(&nf_sockopt_mutex); + mutex_unlock(&nf_sockopt_mutex); return ret; } @@ -130,3 +131,72 @@ int nf_getsockopt(struct sock *sk, int p } EXPORT_SYMBOL(nf_getsockopt); +#ifdef CONFIG_COMPAT +static int compat_nf_sockopt(struct sock *sk, int pf, int val, + char __user *opt, int *len, int get) +{ + struct list_head *i; + struct nf_sockopt_ops *ops; + int ret; + + if (mutex_lock_interruptible(&nf_sockopt_mutex) != 0) + return -EINTR; + + list_for_each(i, &nf_sockopts) { + ops = (struct nf_sockopt_ops *)i; + if (ops->pf == pf) { + if (get) { + if (val >= ops->get_optmin + && val < ops->get_optmax) { + ops->use++; + mutex_unlock(&nf_sockopt_mutex); + if (ops->compat_get) + ret = ops->compat_get(sk, + val, opt, len); + else + ret = ops->get(sk, + val, opt, len); + goto out; + } + } else { + if (val >= ops->set_optmin + && val < ops->set_optmax) { + ops->use++; + mutex_unlock(&nf_sockopt_mutex); + if (ops->compat_set) + ret = ops->compat_set(sk, + val, opt, *len); + else + ret = ops->set(sk, + val, opt, *len); + goto out; + } + } + } + } + mutex_unlock(&nf_sockopt_mutex); + return -ENOPROTOOPT; + + out: + mutex_lock(&nf_sockopt_mutex); + ops->use--; + if (ops->cleanup_task) + wake_up_process(ops->cleanup_task); + mutex_unlock(&nf_sockopt_mutex); + return ret; +} + +int compat_nf_setsockopt(struct sock *sk, int pf, + int val, char __user *opt, int len) +{ + return compat_nf_sockopt(sk, pf, val, opt, &len, 0); +} +EXPORT_SYMBOL(compat_nf_setsockopt); + +int compat_nf_getsockopt(struct sock *sk, int pf, + int val, char __user *opt, int *len) +{ + return compat_nf_sockopt(sk, pf, val, opt, len, 1); +} +EXPORT_SYMBOL(compat_nf_getsockopt); +#endif diff -puN net/netfilter/x_tables.c~git-net net/netfilter/x_tables.c --- devel/net/netfilter/x_tables.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/x_tables.c 2006-03-17 23:03:48.000000000 -0800 @@ -52,6 +52,12 @@ enum { MATCH, }; +static const char *xt_prefix[NPROTO] = { + [AF_INET] = "ip", + [AF_INET6] = "ip6", + [NF_ARP] = "arp", +}; + /* Registration hooks for targets. */ int xt_register_target(int af, struct xt_target *target) @@ -158,18 +164,12 @@ struct xt_target *xt_find_target(int af, } EXPORT_SYMBOL(xt_find_target); -static const char *xt_prefix[NPROTO] = { - [AF_INET] = "ipt_%s", - [AF_INET6] = "ip6t_%s", - [NF_ARP] = "arpt_%s", -}; - struct xt_target *xt_request_find_target(int af, const char *name, u8 revision) { struct xt_target *target; target = try_then_request_module(xt_find_target(af, name, revision), - xt_prefix[af], name); + "%st_%s", xt_prefix[af], name); if (IS_ERR(target) || !target) return NULL; return target; @@ -237,6 +237,64 @@ int xt_find_revision(int af, const char } EXPORT_SYMBOL_GPL(xt_find_revision); +int xt_check_match(const struct xt_match *match, unsigned short family, + unsigned int size, const char *table, unsigned int hook_mask, + unsigned short proto, int inv_proto) +{ + if (XT_ALIGN(match->matchsize) != size) { + printk("%s_tables: %s match: invalid size %Zu != %u\n", + xt_prefix[family], match->name, + XT_ALIGN(match->matchsize), size); + return -EINVAL; + } + if (match->table && strcmp(match->table, table)) { + printk("%s_tables: %s match: only valid in %s table, not %s\n", + xt_prefix[family], match->name, match->table, table); + return -EINVAL; + } + if (match->hooks && (hook_mask & ~match->hooks) != 0) { + printk("%s_tables: %s match: bad hook_mask %u\n", + xt_prefix[family], match->name, hook_mask); + return -EINVAL; + } + if (match->proto && (match->proto != proto || inv_proto)) { + printk("%s_tables: %s match: only valid for protocol %u\n", + xt_prefix[family], match->name, match->proto); + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL_GPL(xt_check_match); + +int xt_check_target(const struct xt_target *target, unsigned short family, + unsigned int size, const char *table, unsigned int hook_mask, + unsigned short proto, int inv_proto) +{ + if (XT_ALIGN(target->targetsize) != size) { + printk("%s_tables: %s target: invalid size %Zu != %u\n", + xt_prefix[family], target->name, + XT_ALIGN(target->targetsize), size); + return -EINVAL; + } + if (target->table && strcmp(target->table, table)) { + printk("%s_tables: %s target: only valid in %s table, not %s\n", + xt_prefix[family], target->name, target->table, table); + return -EINVAL; + } + if (target->hooks && (hook_mask & ~target->hooks) != 0) { + printk("%s_tables: %s target: bad hook_mask %u\n", + xt_prefix[family], target->name, hook_mask); + return -EINVAL; + } + if (target->proto && (target->proto != proto || inv_proto)) { + printk("%s_tables: %s target: only valid for protocol %u\n", + xt_prefix[family], target->name, target->proto); + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL_GPL(xt_check_target); + struct xt_table_info *xt_alloc_table_info(unsigned int size) { struct xt_table_info *newinfo; diff -puN net/netfilter/xt_CLASSIFY.c~git-net net/netfilter/xt_CLASSIFY.c --- devel/net/netfilter/xt_CLASSIFY.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_CLASSIFY.c 2006-03-17 23:03:48.000000000 -0800 @@ -28,6 +28,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -39,47 +40,22 @@ target(struct sk_buff **pskb, return XT_CONTINUE; } -static int -checkentry(const char *tablename, - const void *e, - void *targinfo, - unsigned int targinfosize, - unsigned int hook_mask) -{ - if (targinfosize != XT_ALIGN(sizeof(struct xt_classify_target_info))){ - printk(KERN_ERR "CLASSIFY: invalid size (%u != %Zu).\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_classify_target_info))); - return 0; - } - - if (hook_mask & ~((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD) | - (1 << NF_IP_POST_ROUTING))) { - printk(KERN_ERR "CLASSIFY: only valid in LOCAL_OUT, FORWARD " - "and POST_ROUTING.\n"); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_ERR "CLASSIFY: can only be called from " - "\"mangle\" table, not \"%s\".\n", - tablename); - return 0; - } - - return 1; -} - static struct xt_target classify_reg = { .name = "CLASSIFY", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_classify_target_info), + .table = "mangle", + .hooks = (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_POST_ROUTING), .me = THIS_MODULE, }; static struct xt_target classify6_reg = { .name = "CLASSIFY", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_classify_target_info), + .table = "mangle", + .hooks = (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_POST_ROUTING), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_comment.c~git-net net/netfilter/xt_comment.c --- devel/net/netfilter/xt_comment.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_comment.c 2006-03-17 23:03:48.000000000 -0800 @@ -19,6 +19,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protooff, @@ -28,30 +29,17 @@ match(const struct sk_buff *skb, return 1; } -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - /* Check the size */ - if (matchsize != XT_ALIGN(sizeof(struct xt_comment_info))) - return 0; - return 1; -} - static struct xt_match comment_match = { .name = "comment", .match = match, - .checkentry = checkentry, + .matchsize = sizeof(struct xt_comment_info), .me = THIS_MODULE }; static struct xt_match comment6_match = { .name = "comment", .match = match, - .checkentry = checkentry, + .matchsize = sizeof(struct xt_comment_info), .me = THIS_MODULE }; diff -puN net/netfilter/xt_connbytes.c~git-net net/netfilter/xt_connbytes.c --- devel/net/netfilter/xt_connbytes.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_connbytes.c 2006-03-17 23:03:48.000000000 -0800 @@ -44,6 +44,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -122,15 +123,13 @@ match(const struct sk_buff *skb, static int check(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct xt_connbytes_info *sinfo = matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_connbytes_info))) - return 0; - if (sinfo->what != XT_CONNBYTES_PKTS && sinfo->what != XT_CONNBYTES_BYTES && sinfo->what != XT_CONNBYTES_AVGPKT) @@ -146,14 +145,16 @@ static int check(const char *tablename, static struct xt_match connbytes_match = { .name = "connbytes", - .match = &match, - .checkentry = &check, + .match = match, + .checkentry = check, + .matchsize = sizeof(struct xt_connbytes_info), .me = THIS_MODULE }; static struct xt_match connbytes6_match = { .name = "connbytes", - .match = &match, - .checkentry = &check, + .match = match, + .checkentry = check, + .matchsize = sizeof(struct xt_connbytes_info), .me = THIS_MODULE }; diff -puN net/netfilter/xt_connmark.c~git-net net/netfilter/xt_connmark.c --- devel/net/netfilter/xt_connmark.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_connmark.c 2006-03-17 23:03:48.000000000 -0800 @@ -35,6 +35,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -52,37 +53,36 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - struct xt_connmark_info *cm = - (struct xt_connmark_info *)matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_connmark_info))) - return 0; + struct xt_connmark_info *cm = (struct xt_connmark_info *)matchinfo; if (cm->mark > 0xffffffff || cm->mask > 0xffffffff) { printk(KERN_WARNING "connmark: only support 32bit mark\n"); return 0; } - return 1; } static struct xt_match connmark_match = { - .name = "connmark", - .match = &match, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "connmark", + .match = match, + .matchsize = sizeof(struct xt_connmark_info), + .checkentry = checkentry, + .me = THIS_MODULE }; + static struct xt_match connmark6_match = { - .name = "connmark", - .match = &match, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "connmark", + .match = match, + .matchsize = sizeof(struct xt_connmark_info), + .checkentry = checkentry, + .me = THIS_MODULE }; - static int __init init(void) { int ret; diff -puN net/netfilter/xt_CONNMARK.c~git-net net/netfilter/xt_CONNMARK.c --- devel/net/netfilter/xt_CONNMARK.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_CONNMARK.c 2006-03-17 23:03:48.000000000 -0800 @@ -37,6 +37,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -74,17 +75,12 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct xt_connmark_target_info *matchinfo = targinfo; - if (targinfosize != XT_ALIGN(sizeof(struct xt_connmark_target_info))) { - printk(KERN_WARNING "CONNMARK: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_connmark_target_info))); - return 0; - } if (matchinfo->mode == XT_CONNMARK_RESTORE) { if (strcmp(tablename, "mangle") != 0) { @@ -102,16 +98,19 @@ checkentry(const char *tablename, } static struct xt_target connmark_reg = { - .name = "CONNMARK", - .target = &target, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "CONNMARK", + .target = target, + .targetsize = sizeof(struct xt_connmark_target_info), + .checkentry = checkentry, + .me = THIS_MODULE }; + static struct xt_target connmark6_reg = { - .name = "CONNMARK", - .target = &target, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "CONNMARK", + .target = target, + .targetsize = sizeof(struct xt_connmark_target_info), + .checkentry = checkentry, + .me = THIS_MODULE }; static int __init init(void) diff -puN net/netfilter/xt_conntrack.c~git-net net/netfilter/xt_conntrack.c --- devel/net/netfilter/xt_conntrack.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_conntrack.c 2006-03-17 23:03:48.000000000 -0800 @@ -32,6 +32,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -118,6 +119,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -201,22 +203,10 @@ match(const struct sk_buff *skb, #endif /* CONFIG_NF_IP_CONNTRACK */ -static int check(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_conntrack_info))) - return 0; - - return 1; -} - static struct xt_match conntrack_match = { .name = "conntrack", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_conntrack_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_dccp.c~git-net net/netfilter/xt_dccp.c --- devel/net/netfilter/xt_dccp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_dccp.c 2006-03-17 23:03:48.000000000 -0800 @@ -95,6 +95,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -129,56 +130,34 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ipt_ip *ip = inf; - const struct xt_dccp_info *info; + const struct xt_dccp_info *info = matchinfo; - info = (const struct xt_dccp_info *)matchinfo; - - return ip->proto == IPPROTO_DCCP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_dccp_info)) - && !(info->flags & ~XT_DCCP_VALID_FLAGS) - && !(info->invflags & ~XT_DCCP_VALID_FLAGS) - && !(info->invflags & ~info->flags); -} - -static int -checkentry6(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ip6t_ip6 *ip = inf; - const struct xt_dccp_info *info; - - info = (const struct xt_dccp_info *)matchinfo; - - return ip->proto == IPPROTO_DCCP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_dccp_info)) - && !(info->flags & ~XT_DCCP_VALID_FLAGS) + return !(info->flags & ~XT_DCCP_VALID_FLAGS) && !(info->invflags & ~XT_DCCP_VALID_FLAGS) && !(info->invflags & ~info->flags); } - static struct xt_match dccp_match = { .name = "dccp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_dccp_info), + .proto = IPPROTO_DCCP, + .checkentry = checkentry, .me = THIS_MODULE, }; static struct xt_match dccp6_match = { .name = "dccp", - .match = &match, - .checkentry = &checkentry6, + .match = match, + .matchsize = sizeof(struct xt_dccp_info), + .proto = IPPROTO_DCCP, + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_helper.c~git-net net/netfilter/xt_helper.c --- devel/net/netfilter/xt_helper.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_helper.c 2006-03-17 23:03:48.000000000 -0800 @@ -42,6 +42,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -89,6 +90,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -96,6 +98,7 @@ match(const struct sk_buff *skb, { const struct xt_helper_info *info = matchinfo; struct nf_conn *ct; + struct nf_conn_help *master_help; enum ip_conntrack_info ctinfo; int ret = info->invert; @@ -111,7 +114,8 @@ match(const struct sk_buff *skb, } read_lock_bh(&nf_conntrack_lock); - if (!ct->master->helper) { + master_help = nfct_help(ct->master); + if (!master_help || !master_help->helper) { DEBUGP("xt_helper: master ct %p has no helper\n", exp->expectant); goto out_unlock; @@ -123,8 +127,8 @@ match(const struct sk_buff *skb, if (info->name[0] == '\0') ret ^= 1; else - ret ^= !strncmp(ct->master->helper->name, info->name, - strlen(ct->master->helper->name)); + ret ^= !strncmp(master_help->helper->name, info->name, + strlen(master_help->helper->name)); out_unlock: read_unlock_bh(&nf_conntrack_lock); return ret; @@ -133,6 +137,7 @@ out_unlock: static int check(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -140,24 +145,21 @@ static int check(const char *tablename, struct xt_helper_info *info = matchinfo; info->name[29] = '\0'; - - /* verify size */ - if (matchsize != XT_ALIGN(sizeof(struct xt_helper_info))) - return 0; - return 1; } static struct xt_match helper_match = { .name = "helper", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_helper_info), + .checkentry = check, .me = THIS_MODULE, }; static struct xt_match helper6_match = { .name = "helper", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_helper_info), + .checkentry = check, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_length.c~git-net net/netfilter/xt_length.c --- devel/net/netfilter/xt_length.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_length.c 2006-03-17 23:03:48.000000000 -0800 @@ -24,6 +24,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -39,6 +40,7 @@ static int match6(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -50,29 +52,17 @@ match6(const struct sk_buff *skb, return (pktlen >= info->min && pktlen <= info->max) ^ info->invert; } -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_length_info))) - return 0; - - return 1; -} - static struct xt_match length_match = { .name = "length", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_length_info), .me = THIS_MODULE, }; + static struct xt_match length6_match = { .name = "length", - .match = &match6, - .checkentry = &checkentry, + .match = match6, + .matchsize = sizeof(struct xt_length_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_limit.c~git-net net/netfilter/xt_limit.c --- devel/net/netfilter/xt_limit.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_limit.c 2006-03-17 23:03:48.000000000 -0800 @@ -68,6 +68,7 @@ static int ipt_limit_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -107,15 +108,13 @@ user2credits(u_int32_t user) static int ipt_limit_checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { struct xt_rateinfo *r = matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_rateinfo))) - return 0; - /* Check for overflow. */ if (r->burst == 0 || user2credits(r->avg * r->burst) < user2credits(r->avg)) { @@ -140,12 +139,14 @@ ipt_limit_checkentry(const char *tablena static struct xt_match ipt_limit_reg = { .name = "limit", .match = ipt_limit_match, + .matchsize = sizeof(struct xt_rateinfo), .checkentry = ipt_limit_checkentry, .me = THIS_MODULE, }; static struct xt_match limit6_reg = { .name = "limit", .match = ipt_limit_match, + .matchsize = sizeof(struct xt_rateinfo), .checkentry = ipt_limit_checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_mac.c~git-net net/netfilter/xt_mac.c --- devel/net/netfilter/xt_mac.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_mac.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,6 +27,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -42,37 +43,20 @@ match(const struct sk_buff *skb, ^ info->invert)); } -static int -ipt_mac_checkentry(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - /* FORWARD isn't always valid, but it's nice to be able to do --RR */ - if (hook_mask - & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_IN) - | (1 << NF_IP_FORWARD))) { - printk("xt_mac: only valid for PRE_ROUTING, LOCAL_IN or FORWARD.\n"); - return 0; - } - - if (matchsize != XT_ALIGN(sizeof(struct xt_mac_info))) - return 0; - - return 1; -} - static struct xt_match mac_match = { .name = "mac", - .match = &match, - .checkentry = &ipt_mac_checkentry, + .match = match, + .matchsize = sizeof(struct xt_mac_info), + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_IN) | + (1 << NF_IP_FORWARD), .me = THIS_MODULE, }; static struct xt_match mac6_match = { .name = "mac", - .match = &match, - .checkentry = &ipt_mac_checkentry, + .match = match, + .matchsize = sizeof(struct xt_mac_info), + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_IN) | + (1 << NF_IP_FORWARD), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_mark.c~git-net net/netfilter/xt_mark.c --- devel/net/netfilter/xt_mark.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_mark.c 2006-03-17 23:03:48.000000000 -0800 @@ -23,6 +23,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -36,34 +37,33 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { struct xt_mark_info *minfo = (struct xt_mark_info *) matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_mark_info))) - return 0; - if (minfo->mark > 0xffffffff || minfo->mask > 0xffffffff) { printk(KERN_WARNING "mark: only supports 32bit mark\n"); return 0; } - return 1; } static struct xt_match mark_match = { .name = "mark", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_mark_info), + .checkentry = checkentry, .me = THIS_MODULE, }; static struct xt_match mark6_match = { .name = "mark", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_mark_info), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_MARK.c~git-net net/netfilter/xt_MARK.c --- devel/net/netfilter/xt_MARK.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_MARK.c 2006-03-17 23:03:48.000000000 -0800 @@ -26,6 +26,7 @@ target_v0(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -42,6 +43,7 @@ target_v1(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -72,53 +74,30 @@ target_v1(struct sk_buff **pskb, static int checkentry_v0(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct xt_mark_target_info *markinfo = targinfo; - if (targinfosize != XT_ALIGN(sizeof(struct xt_mark_target_info))) { - printk(KERN_WARNING "MARK: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_mark_target_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "MARK: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (markinfo->mark > 0xffffffff) { printk(KERN_WARNING "MARK: Only supports 32bit wide mark\n"); return 0; } - return 1; } static int checkentry_v1(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct xt_mark_target_info_v1 *markinfo = targinfo; - if (targinfosize != XT_ALIGN(sizeof(struct xt_mark_target_info_v1))){ - printk(KERN_WARNING "MARK: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_mark_target_info_v1))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "MARK: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (markinfo->mode != XT_MARK_SET && markinfo->mode != XT_MARK_AND && markinfo->mode != XT_MARK_OR) { @@ -126,18 +105,18 @@ checkentry_v1(const char *tablename, markinfo->mode); return 0; } - if (markinfo->mark > 0xffffffff) { printk(KERN_WARNING "MARK: Only supports 32bit wide mark\n"); return 0; } - return 1; } static struct xt_target ipt_mark_reg_v0 = { .name = "MARK", .target = target_v0, + .targetsize = sizeof(struct xt_mark_target_info), + .table = "mangle", .checkentry = checkentry_v0, .me = THIS_MODULE, .revision = 0, @@ -146,6 +125,8 @@ static struct xt_target ipt_mark_reg_v0 static struct xt_target ipt_mark_reg_v1 = { .name = "MARK", .target = target_v1, + .targetsize = sizeof(struct xt_mark_target_info_v1), + .table = "mangle", .checkentry = checkentry_v1, .me = THIS_MODULE, .revision = 1, @@ -154,6 +135,8 @@ static struct xt_target ipt_mark_reg_v1 static struct xt_target ip6t_mark_reg_v0 = { .name = "MARK", .target = target_v0, + .targetsize = sizeof(struct xt_mark_target_info), + .table = "mangle", .checkentry = checkentry_v0, .me = THIS_MODULE, .revision = 0, diff -puN net/netfilter/xt_NFQUEUE.c~git-net net/netfilter/xt_NFQUEUE.c --- devel/net/netfilter/xt_NFQUEUE.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_NFQUEUE.c 2006-03-17 23:03:48.000000000 -0800 @@ -28,6 +28,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -36,41 +37,24 @@ target(struct sk_buff **pskb, return NF_QUEUE_NR(tinfo->queuenum); } -static int -checkentry(const char *tablename, - const void *entry, - void *targinfo, - unsigned int targinfosize, - unsigned int hook_mask) -{ - if (targinfosize != XT_ALIGN(sizeof(struct xt_NFQ_info))) { - printk(KERN_WARNING "NFQUEUE: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_NFQ_info))); - return 0; - } - - return 1; -} - static struct xt_target ipt_NFQ_reg = { .name = "NFQUEUE", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_NFQ_info), .me = THIS_MODULE, }; static struct xt_target ip6t_NFQ_reg = { .name = "NFQUEUE", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_NFQ_info), .me = THIS_MODULE, }; static struct xt_target arpt_NFQ_reg = { .name = "NFQUEUE", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_NFQ_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_NOTRACK.c~git-net net/netfilter/xt_NOTRACK.c --- devel/net/netfilter/xt_NOTRACK.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_NOTRACK.c 2006-03-17 23:03:48.000000000 -0800 @@ -15,6 +15,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -33,38 +34,20 @@ target(struct sk_buff **pskb, return XT_CONTINUE; } -static int -checkentry(const char *tablename, - const void *entry, - void *targinfo, - unsigned int targinfosize, - unsigned int hook_mask) -{ - if (targinfosize != 0) { - printk(KERN_WARNING "NOTRACK: targinfosize %u != 0\n", - targinfosize); - return 0; - } - - if (strcmp(tablename, "raw") != 0) { - printk(KERN_WARNING "NOTRACK: can only be called from \"raw\" table, not \"%s\"\n", tablename); - return 0; - } - - return 1; -} - -static struct xt_target notrack_reg = { - .name = "NOTRACK", - .target = target, - .checkentry = checkentry, - .me = THIS_MODULE, +static struct xt_target notrack_reg = { + .name = "NOTRACK", + .target = target, + .targetsize = 0, + .table = "raw", + .me = THIS_MODULE, }; -static struct xt_target notrack6_reg = { - .name = "NOTRACK", - .target = target, - .checkentry = checkentry, - .me = THIS_MODULE, + +static struct xt_target notrack6_reg = { + .name = "NOTRACK", + .target = target, + .targetsize = 0, + .table = "raw", + .me = THIS_MODULE, }; static int __init init(void) diff -puN net/netfilter/xt_physdev.c~git-net net/netfilter/xt_physdev.c --- devel/net/netfilter/xt_physdev.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_physdev.c 2006-03-17 23:03:48.000000000 -0800 @@ -26,6 +26,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -102,14 +103,13 @@ match_outdev: static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct xt_physdev_info *info = matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_physdev_info))) - return 0; if (!(info->bitmask & XT_PHYSDEV_OP_MASK) || info->bitmask & ~XT_PHYSDEV_OP_MASK) return 0; @@ -118,15 +118,17 @@ checkentry(const char *tablename, static struct xt_match physdev_match = { .name = "physdev", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_physdev_info), + .checkentry = checkentry, .me = THIS_MODULE, }; static struct xt_match physdev6_match = { .name = "physdev", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_physdev_info), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_pkttype.c~git-net net/netfilter/xt_pkttype.c --- devel/net/netfilter/xt_pkttype.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_pkttype.c 2006-03-17 23:03:48.000000000 -0800 @@ -22,6 +22,7 @@ MODULE_ALIAS("ip6t_pkttype"); static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -32,32 +33,20 @@ static int match(const struct sk_buff *s return (skb->pkt_type == info->pkttype) ^ info->invert; } -static int checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_pkttype_info))) - return 0; - - return 1; -} - static struct xt_match pkttype_match = { .name = "pkttype", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_pkttype_info), .me = THIS_MODULE, }; + static struct xt_match pkttype6_match = { .name = "pkttype", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_pkttype_info), .me = THIS_MODULE, }; - static int __init init(void) { int ret; diff -puN /dev/null net/netfilter/xt_policy.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/netfilter/xt_policy.c 2006-03-17 23:03:48.000000000 -0800 @@ -0,0 +1,209 @@ +/* IP tables module for matching IPsec policy + * + * Copyright (c) 2004,2005 Patrick McHardy, + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include + +MODULE_AUTHOR("Patrick McHardy "); +MODULE_DESCRIPTION("Xtables IPsec policy matching module"); +MODULE_LICENSE("GPL"); + +static inline int +xt_addr_cmp(const union xt_policy_addr *a1, const union xt_policy_addr *m, + const union xt_policy_addr *a2, unsigned short family) +{ + switch (family) { + case AF_INET: + return (a1->a4.s_addr ^ a2->a4.s_addr) & m->a4.s_addr; + case AF_INET6: + return ipv6_masked_addr_cmp(&a1->a6, &m->a6, &a2->a6); + } + return 0; +} + +static inline int +match_xfrm_state(struct xfrm_state *x, const struct xt_policy_elem *e, + unsigned short family) +{ +#define MATCH_ADDR(x,y,z) (!e->match.x || \ + (xt_addr_cmp(&e->x, &e->y, z, family) \ + ^ e->invert.x)) +#define MATCH(x,y) (!e->match.x || ((e->x == (y)) ^ e->invert.x)) + + return MATCH_ADDR(saddr, smask, (union xt_policy_addr *)&x->props.saddr) && + MATCH_ADDR(daddr, dmask, (union xt_policy_addr *)&x->id.daddr.a4) && + MATCH(proto, x->id.proto) && + MATCH(mode, x->props.mode) && + MATCH(spi, x->id.spi) && + MATCH(reqid, x->props.reqid); +} + +static int +match_policy_in(const struct sk_buff *skb, const struct xt_policy_info *info, + unsigned short family) +{ + const struct xt_policy_elem *e; + struct sec_path *sp = skb->sp; + int strict = info->flags & XT_POLICY_MATCH_STRICT; + int i, pos; + + if (sp == NULL) + return -1; + if (strict && info->len != sp->len) + return 0; + + for (i = sp->len - 1; i >= 0; i--) { + pos = strict ? i - sp->len + 1 : 0; + if (pos >= info->len) + return 0; + e = &info->pol[pos]; + + if (match_xfrm_state(sp->x[i].xvec, e, family)) { + if (!strict) + return 1; + } else if (strict) + return 0; + } + + return strict ? 1 : 0; +} + +static int +match_policy_out(const struct sk_buff *skb, const struct xt_policy_info *info, + unsigned short family) +{ + const struct xt_policy_elem *e; + struct dst_entry *dst = skb->dst; + int strict = info->flags & XT_POLICY_MATCH_STRICT; + int i, pos; + + if (dst->xfrm == NULL) + return -1; + + for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { + pos = strict ? i : 0; + if (pos >= info->len) + return 0; + e = &info->pol[pos]; + + if (match_xfrm_state(dst->xfrm, e, family)) { + if (!strict) + return 1; + } else if (strict) + return 0; + } + + return strict ? i == info->len : 0; +} + +static int match(const struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + const struct xt_match *match, + const void *matchinfo, + int offset, + unsigned int protoff, + int *hotdrop) +{ + const struct xt_policy_info *info = matchinfo; + int ret; + + if (info->flags & XT_POLICY_MATCH_IN) + ret = match_policy_in(skb, info, match->family); + else + ret = match_policy_out(skb, info, match->family); + + if (ret < 0) + ret = info->flags & XT_POLICY_MATCH_NONE ? 1 : 0; + else if (info->flags & XT_POLICY_MATCH_NONE) + ret = 0; + + return ret; +} + +static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, + void *matchinfo, unsigned int matchsize, + unsigned int hook_mask) +{ + struct xt_policy_info *info = matchinfo; + + if (!(info->flags & (XT_POLICY_MATCH_IN|XT_POLICY_MATCH_OUT))) { + printk(KERN_ERR "xt_policy: neither incoming nor " + "outgoing policy selected\n"); + return 0; + } + /* hook values are equal for IPv4 and IPv6 */ + if (hook_mask & (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_LOCAL_IN) + && info->flags & XT_POLICY_MATCH_OUT) { + printk(KERN_ERR "xt_policy: output policy not valid in " + "PRE_ROUTING and INPUT\n"); + return 0; + } + if (hook_mask & (1 << NF_IP_POST_ROUTING | 1 << NF_IP_LOCAL_OUT) + && info->flags & XT_POLICY_MATCH_IN) { + printk(KERN_ERR "xt_policy: input policy not valid in " + "POST_ROUTING and OUTPUT\n"); + return 0; + } + if (info->len > XT_POLICY_MAX_ELEM) { + printk(KERN_ERR "xt_policy: too many policy elements\n"); + return 0; + } + return 1; +} + +static struct xt_match policy_match = { + .name = "policy", + .family = AF_INET, + .match = match, + .matchsize = sizeof(struct xt_policy_info), + .checkentry = checkentry, + .me = THIS_MODULE, +}; + +static struct xt_match policy6_match = { + .name = "policy", + .family = AF_INET6, + .match = match, + .matchsize = sizeof(struct xt_policy_info), + .checkentry = checkentry, + .me = THIS_MODULE, +}; + +static int __init init(void) +{ + int ret; + + ret = xt_register_match(AF_INET, &policy_match); + if (ret) + return ret; + ret = xt_register_match(AF_INET6, &policy6_match); + if (ret) + xt_unregister_match(AF_INET, &policy_match); + return ret; +} + +static void __exit fini(void) +{ + xt_unregister_match(AF_INET6, &policy6_match); + xt_unregister_match(AF_INET, &policy_match); +} + +module_init(init); +module_exit(fini); +MODULE_ALIAS("ipt_policy"); +MODULE_ALIAS("ip6t_policy"); diff -puN net/netfilter/xt_realm.c~git-net net/netfilter/xt_realm.c --- devel/net/netfilter/xt_realm.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_realm.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,6 +27,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -38,30 +39,12 @@ match(const struct sk_buff *skb, return (info->id == (dst->tclassid & info->mask)) ^ info->invert; } -static int check(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (hook_mask - & ~((1 << NF_IP_POST_ROUTING) | (1 << NF_IP_FORWARD) | - (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_LOCAL_IN))) { - printk("xt_realm: only valid for POST_ROUTING, LOCAL_OUT, " - "LOCAL_IN or FORWARD.\n"); - return 0; - } - if (matchsize != XT_ALIGN(sizeof(struct xt_realm_info))) { - printk("xt_realm: invalid matchsize.\n"); - return 0; - } - return 1; -} - static struct xt_match realm_match = { .name = "realm", - .match = match, - .checkentry = check, + .match = match, + .matchsize = sizeof(struct xt_realm_info), + .hooks = (1 << NF_IP_POST_ROUTING) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_LOCAL_IN), .me = THIS_MODULE }; diff -puN net/netfilter/xt_sctp.c~git-net net/netfilter/xt_sctp.c --- devel/net/netfilter/xt_sctp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_sctp.c 2006-03-17 23:03:48.000000000 -0800 @@ -123,6 +123,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -162,19 +163,14 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct xt_sctp_info *info; - const struct ipt_ip *ip = inf; - - info = (const struct xt_sctp_info *)matchinfo; + const struct xt_sctp_info *info = matchinfo; - return ip->proto == IPPROTO_SCTP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_sctp_info)) - && !(info->flags & ~XT_SCTP_VALID_FLAGS) + return !(info->flags & ~XT_SCTP_VALID_FLAGS) && !(info->invflags & ~XT_SCTP_VALID_FLAGS) && !(info->invflags & ~info->flags) && ((!(info->flags & XT_SCTP_CHUNK_TYPES)) || @@ -184,47 +180,23 @@ checkentry(const char *tablename, | SCTP_CHUNK_MATCH_ONLY))); } -static int -checkentry6(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct xt_sctp_info *info; - const struct ip6t_ip6 *ip = inf; - - info = (const struct xt_sctp_info *)matchinfo; - - return ip->proto == IPPROTO_SCTP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_sctp_info)) - && !(info->flags & ~XT_SCTP_VALID_FLAGS) - && !(info->invflags & ~XT_SCTP_VALID_FLAGS) - && !(info->invflags & ~info->flags) - && ((!(info->flags & XT_SCTP_CHUNK_TYPES)) || - (info->chunk_match_type & - (SCTP_CHUNK_MATCH_ALL - | SCTP_CHUNK_MATCH_ANY - | SCTP_CHUNK_MATCH_ONLY))); -} - - -static struct xt_match sctp_match = -{ - .name = "sctp", - .match = &match, - .checkentry = &checkentry, - .me = THIS_MODULE -}; -static struct xt_match sctp6_match = -{ - .name = "sctp", - .match = &match, - .checkentry = &checkentry6, - .me = THIS_MODULE +static struct xt_match sctp_match = { + .name = "sctp", + .match = match, + .matchsize = sizeof(struct xt_sctp_info), + .proto = IPPROTO_SCTP, + .checkentry = checkentry, + .me = THIS_MODULE }; +static struct xt_match sctp6_match = { + .name = "sctp", + .match = match, + .matchsize = sizeof(struct xt_sctp_info), + .proto = IPPROTO_SCTP, + .checkentry = checkentry, + .me = THIS_MODULE +}; static int __init init(void) { diff -puN net/netfilter/xt_state.c~git-net net/netfilter/xt_state.c --- devel/net/netfilter/xt_state.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_state.c 2006-03-17 23:03:48.000000000 -0800 @@ -24,6 +24,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -43,29 +44,17 @@ match(const struct sk_buff *skb, return (sinfo->statemask & statebit); } -static int check(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_state_info))) - return 0; - - return 1; -} - static struct xt_match state_match = { .name = "state", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_state_info), .me = THIS_MODULE, }; static struct xt_match state6_match = { .name = "state", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_state_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_string.c~git-net net/netfilter/xt_string.c --- devel/net/netfilter/xt_string.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_string.c 2006-03-17 23:03:48.000000000 -0800 @@ -24,6 +24,7 @@ MODULE_ALIAS("ip6t_string"); static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -43,6 +44,7 @@ static int match(const struct sk_buff *s static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -50,9 +52,6 @@ static int checkentry(const char *tablen struct xt_string_info *conf = matchinfo; struct ts_config *ts_conf; - if (matchsize != XT_ALIGN(sizeof(struct xt_string_info))) - return 0; - /* Damn, can't handle this case properly with iptables... */ if (conf->from_offset > conf->to_offset) return 0; @@ -67,7 +66,8 @@ static int checkentry(const char *tablen return 1; } -static void destroy(void *matchinfo, unsigned int matchsize) +static void destroy(const struct xt_match *match, void *matchinfo, + unsigned int matchsize) { textsearch_destroy(STRING_TEXT_PRIV(matchinfo)->config); } @@ -75,6 +75,7 @@ static void destroy(void *matchinfo, uns static struct xt_match string_match = { .name = "string", .match = match, + .matchsize = sizeof(struct xt_string_info), .checkentry = checkentry, .destroy = destroy, .me = THIS_MODULE @@ -82,6 +83,7 @@ static struct xt_match string_match = { static struct xt_match string6_match = { .name = "string", .match = match, + .matchsize = sizeof(struct xt_string_info), .checkentry = checkentry, .destroy = destroy, .me = THIS_MODULE diff -puN net/netfilter/xt_tcpmss.c~git-net net/netfilter/xt_tcpmss.c --- devel/net/netfilter/xt_tcpmss.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_tcpmss.c 2006-03-17 23:03:48.000000000 -0800 @@ -81,6 +81,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -92,58 +93,19 @@ match(const struct sk_buff *skb, info->invert, hotdrop); } -static int -checkentry(const char *tablename, - const void *ipinfo, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ipt_ip *ip = ipinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_tcpmss_match_info))) - return 0; - - /* Must specify -p tcp */ - if (ip->proto != IPPROTO_TCP || (ip->invflags & IPT_INV_PROTO)) { - printk("tcpmss: Only works on TCP packets\n"); - return 0; - } - - return 1; -} - -static int -checkentry6(const char *tablename, - const void *ipinfo, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ip6t_ip6 *ip = ipinfo; - - if (matchsize != XT_ALIGN(sizeof(struct xt_tcpmss_match_info))) - return 0; - - /* Must specify -p tcp */ - if (ip->proto != IPPROTO_TCP || (ip->invflags & XT_INV_PROTO)) { - printk("tcpmss: Only works on TCP packets\n"); - return 0; - } - - return 1; -} - static struct xt_match tcpmss_match = { .name = "tcpmss", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_tcpmss_match_info), + .proto = IPPROTO_TCP, .me = THIS_MODULE, }; static struct xt_match tcpmss6_match = { .name = "tcpmss", - .match = &match, - .checkentry = &checkentry6, + .match = match, + .matchsize = sizeof(struct xt_tcpmss_match_info), + .proto = IPPROTO_TCP, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_tcpudp.c~git-net net/netfilter/xt_tcpudp.c --- devel/net/netfilter/xt_tcpudp.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netfilter/xt_tcpudp.c 2006-03-17 23:03:48.000000000 -0800 @@ -74,6 +74,7 @@ static int tcp_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -138,43 +139,22 @@ tcp_match(const struct sk_buff *skb, static int tcp_checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ipt_ip *ip = info; const struct xt_tcp *tcpinfo = matchinfo; - /* Must specify proto == TCP, and no unknown invflags */ - return ip->proto == IPPROTO_TCP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_tcp)) - && !(tcpinfo->invflags & ~XT_TCP_INV_MASK); + /* Must specify no unknown invflags */ + return !(tcpinfo->invflags & ~XT_TCP_INV_MASK); } -/* Called when user tries to insert an entry of this type. */ -static int -tcp6_checkentry(const char *tablename, - const void *entry, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ip6t_ip6 *ipv6 = entry; - const struct xt_tcp *tcpinfo = matchinfo; - - /* Must specify proto == TCP, and no unknown invflags */ - return ipv6->proto == IPPROTO_TCP - && !(ipv6->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_tcp)) - && !(tcpinfo->invflags & ~XT_TCP_INV_MASK); -} - - static int udp_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -208,87 +188,49 @@ udp_match(const struct sk_buff *skb, static int udp_checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, - unsigned int matchinfosize, - unsigned int hook_mask) -{ - const struct ipt_ip *ip = info; - const struct xt_udp *udpinfo = matchinfo; - - /* Must specify proto == UDP, and no unknown invflags */ - if (ip->proto != IPPROTO_UDP || (ip->invflags & XT_INV_PROTO)) { - duprintf("ipt_udp: Protocol %u != %u\n", ip->proto, - IPPROTO_UDP); - return 0; - } - if (matchinfosize != XT_ALIGN(sizeof(struct xt_udp))) { - duprintf("ipt_udp: matchsize %u != %u\n", - matchinfosize, XT_ALIGN(sizeof(struct xt_udp))); - return 0; - } - if (udpinfo->invflags & ~XT_UDP_INV_MASK) { - duprintf("ipt_udp: unknown flags %X\n", - udpinfo->invflags); - return 0; - } - - return 1; -} - -/* Called when user tries to insert an entry of this type. */ -static int -udp6_checkentry(const char *tablename, - const void *entry, - void *matchinfo, - unsigned int matchinfosize, + unsigned int matchsize, unsigned int hook_mask) { - const struct ip6t_ip6 *ipv6 = entry; - const struct xt_udp *udpinfo = matchinfo; + const struct xt_tcp *udpinfo = matchinfo; - /* Must specify proto == UDP, and no unknown invflags */ - if (ipv6->proto != IPPROTO_UDP || (ipv6->invflags & XT_INV_PROTO)) { - duprintf("ip6t_udp: Protocol %u != %u\n", ipv6->proto, - IPPROTO_UDP); - return 0; - } - if (matchinfosize != XT_ALIGN(sizeof(struct xt_udp))) { - duprintf("ip6t_udp: matchsize %u != %u\n", - matchinfosize, XT_ALIGN(sizeof(struct xt_udp))); - return 0; - } - if (udpinfo->invflags & ~XT_UDP_INV_MASK) { - duprintf("ip6t_udp: unknown flags %X\n", - udpinfo->invflags); - return 0; - } - - return 1; + /* Must specify no unknown invflags */ + return !(udpinfo->invflags & ~XT_UDP_INV_MASK); } static struct xt_match tcp_matchstruct = { .name = "tcp", - .match = &tcp_match, - .checkentry = &tcp_checkentry, + .match = tcp_match, + .matchsize = sizeof(struct xt_tcp), + .proto = IPPROTO_TCP, + .checkentry = tcp_checkentry, .me = THIS_MODULE, }; + static struct xt_match tcp6_matchstruct = { .name = "tcp", - .match = &tcp_match, - .checkentry = &tcp6_checkentry, + .match = tcp_match, + .matchsize = sizeof(struct xt_tcp), + .proto = IPPROTO_TCP, + .checkentry = tcp_checkentry, .me = THIS_MODULE, }; static struct xt_match udp_matchstruct = { .name = "udp", - .match = &udp_match, - .checkentry = &udp_checkentry, + .match = udp_match, + .matchsize = sizeof(struct xt_udp), + .proto = IPPROTO_UDP, + .checkentry = udp_checkentry, .me = THIS_MODULE, }; static struct xt_match udp6_matchstruct = { .name = "udp", - .match = &udp_match, - .checkentry = &udp6_checkentry, + .match = udp_match, + .matchsize = sizeof(struct xt_udp), + .proto = IPPROTO_UDP, + .checkentry = udp_checkentry, .me = THIS_MODULE, }; diff -puN net/netlink/af_netlink.c~git-net net/netlink/af_netlink.c --- devel/net/netlink/af_netlink.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/netlink/af_netlink.c 2006-03-17 23:03:48.000000000 -0800 @@ -106,6 +106,7 @@ struct nl_pid_hash { struct netlink_table { struct nl_pid_hash hash; struct hlist_head mc_list; + unsigned long *listeners; unsigned int nl_nonroot; unsigned int groups; struct module *module; @@ -296,6 +297,24 @@ static inline int nl_pid_hash_dilute(str static const struct proto_ops netlink_ops; +static void +netlink_update_listeners(struct sock *sk) +{ + struct netlink_table *tbl = &nl_table[sk->sk_protocol]; + struct hlist_node *node; + unsigned long mask; + unsigned int i; + + for (i = 0; i < NLGRPSZ(tbl->groups)/sizeof(unsigned long); i++) { + mask = 0; + sk_for_each_bound(sk, node, &tbl->mc_list) + mask |= nlk_sk(sk)->groups[i]; + tbl->listeners[i] = mask; + } + /* this function is only called with the netlink table "grabbed", which + * makes sure updates are visible before bind or setsockopt return. */ +} + static int netlink_insert(struct sock *sk, u32 pid) { struct nl_pid_hash *hash = &nl_table[sk->sk_protocol].hash; @@ -456,12 +475,14 @@ static int netlink_release(struct socket if (nlk->module) module_put(nlk->module); + netlink_table_grab(); if (nlk->flags & NETLINK_KERNEL_SOCKET) { - netlink_table_grab(); + kfree(nl_table[sk->sk_protocol].listeners); nl_table[sk->sk_protocol].module = NULL; nl_table[sk->sk_protocol].registered = 0; - netlink_table_ungrab(); - } + } else if (nlk->subscriptions) + netlink_update_listeners(sk); + netlink_table_ungrab(); kfree(nlk->groups); nlk->groups = NULL; @@ -589,6 +610,7 @@ static int netlink_bind(struct socket *s hweight32(nladdr->nl_groups) - hweight32(nlk->groups[0])); nlk->groups[0] = (nlk->groups[0] & ~0xffffffffUL) | nladdr->nl_groups; + netlink_update_listeners(sk); netlink_table_ungrab(); return 0; @@ -807,6 +829,17 @@ retry: return netlink_sendskb(sk, skb, ssk->sk_protocol); } +int netlink_has_listeners(struct sock *sk, unsigned int group) +{ + int res = 0; + + BUG_ON(!(nlk_sk(sk)->flags & NETLINK_KERNEL_SOCKET)); + if (group - 1 < nl_table[sk->sk_protocol].groups) + res = test_bit(group - 1, nl_table[sk->sk_protocol].listeners); + return res; +} +EXPORT_SYMBOL_GPL(netlink_has_listeners); + static __inline__ int netlink_broadcast_deliver(struct sock *sk, struct sk_buff *skb) { struct netlink_sock *nlk = nlk_sk(sk); @@ -1011,6 +1044,7 @@ static int netlink_setsockopt(struct soc else __clear_bit(val - 1, nlk->groups); netlink_update_subscriptions(sk, subscriptions); + netlink_update_listeners(sk); netlink_table_ungrab(); err = 0; break; @@ -1237,6 +1271,7 @@ netlink_kernel_create(int unit, unsigned struct socket *sock; struct sock *sk; struct netlink_sock *nlk; + unsigned long *listeners = NULL; if (!nl_table) return NULL; @@ -1250,6 +1285,13 @@ netlink_kernel_create(int unit, unsigned if (__netlink_create(sock, unit) < 0) goto out_sock_release; + if (groups < 32) + groups = 32; + + listeners = kzalloc(NLGRPSZ(groups), GFP_KERNEL); + if (!listeners) + goto out_sock_release; + sk = sock->sk; sk->sk_data_ready = netlink_data_ready; if (input) @@ -1262,7 +1304,8 @@ netlink_kernel_create(int unit, unsigned nlk->flags |= NETLINK_KERNEL_SOCKET; netlink_table_grab(); - nl_table[unit].groups = groups < 32 ? 32 : groups; + nl_table[unit].groups = groups; + nl_table[unit].listeners = listeners; nl_table[unit].module = module; nl_table[unit].registered = 1; netlink_table_ungrab(); @@ -1270,6 +1313,7 @@ netlink_kernel_create(int unit, unsigned return sk; out_sock_release: + kfree(listeners); sock_release(sock); return NULL; } diff -puN net/sched/act_ipt.c~git-net net/sched/act_ipt.c --- devel/net/sched/act_ipt.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/act_ipt.c 2006-03-17 23:03:48.000000000 -0800 @@ -70,7 +70,8 @@ ipt_init_target(struct ipt_entry_target t->u.kernel.target = target; if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(table, NULL, t->data, + && !t->u.kernel.target->checkentry(table, NULL, + t->u.kernel.target, t->data, t->u.target_size - sizeof(*t), hook)) { DPRINTK("ipt_init_target: check failed for `%s'.\n", @@ -86,7 +87,7 @@ static void ipt_destroy_target(struct ipt_entry_target *t) { if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); } @@ -224,8 +225,9 @@ tcf_ipt(struct sk_buff *skb, struct tc_a /* iptables targets take a double skb pointer in case the skb * needs to be replaced. We don't own the skb, so this must not * happen. The pskb_expand_head above should make sure of this */ - ret = p->t->u.kernel.target->target(&skb, skb->dev, NULL, - p->hook, p->t->data, NULL); + ret = p->t->u.kernel.target->target(&skb, skb->dev, NULL, p->hook, + p->t->u.kernel.target, p->t->data, + NULL); switch (ret) { case NF_ACCEPT: result = TC_ACT_OK; diff -puN net/sched/Kconfig~git-net net/sched/Kconfig --- devel/net/sched/Kconfig~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/Kconfig 2006-03-17 23:03:48.000000000 -0800 @@ -434,7 +434,6 @@ config NET_EMATCH_TEXT config NET_CLS_ACT bool "Actions" - depends on EXPERIMENTAL select NET_ESTIMATOR ---help--- Say Y here if you want to use traffic control actions. Actions diff -puN net/sched/sch_atm.c~git-net net/sched/sch_atm.c --- devel/net/sched/sch_atm.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_atm.c 2006-03-17 23:03:48.000000000 -0800 @@ -638,6 +638,7 @@ static int atm_tc_dump_class(struct Qdis sch,p,flow,skb,tcm); if (!find_flow(p,flow)) return -EINVAL; tcm->tcm_handle = flow->classid; + tcm->tcm_info = flow->q->handle; rta = (struct rtattr *) b; RTA_PUT(skb,TCA_OPTIONS,0,NULL); RTA_PUT(skb,TCA_ATM_HDR,flow->hdr_len,flow->hdr); diff -puN net/sched/sch_dsmark.c~git-net net/sched/sch_dsmark.c --- devel/net/sched/sch_dsmark.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_dsmark.c 2006-03-17 23:03:48.000000000 -0800 @@ -438,6 +438,7 @@ static int dsmark_dump_class(struct Qdis return -EINVAL; tcm->tcm_handle = TC_H_MAKE(TC_H_MAJ(sch->handle), cl-1); + tcm->tcm_info = p->q->handle; opts = RTA_NEST(skb, TCA_OPTIONS); RTA_PUT_U8(skb,TCA_DSMARK_MASK, p->mask[cl-1]); diff -puN net/sched/sch_generic.c~git-net net/sched/sch_generic.c --- devel/net/sched/sch_generic.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_generic.c 2006-03-17 23:03:48.000000000 -0800 @@ -234,7 +234,7 @@ static void dev_watchdog_down(struct net { spin_lock_bh(&dev->xmit_lock); if (del_timer(&dev->watchdog_timer)) - __dev_put(dev); + dev_put(dev); spin_unlock_bh(&dev->xmit_lock); } diff -puN net/sched/sch_netem.c~git-net net/sched/sch_netem.c --- devel/net/sched/sch_netem.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_netem.c 2006-03-17 23:03:48.000000000 -0800 @@ -252,9 +252,9 @@ static int netem_requeue(struct sk_buff static unsigned int netem_drop(struct Qdisc* sch) { struct netem_sched_data *q = qdisc_priv(sch); - unsigned int len; + unsigned int len = 0; - if ((len = q->qdisc->ops->drop(q->qdisc)) != 0) { + if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { sch->q.qlen--; sch->qstats.drops++; } diff -puN net/sched/sch_prio.c~git-net net/sched/sch_prio.c --- devel/net/sched/sch_prio.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_prio.c 2006-03-17 23:03:48.000000000 -0800 @@ -165,7 +165,7 @@ static unsigned int prio_drop(struct Qdi for (prio = q->bands-1; prio >= 0; prio--) { qdisc = q->queues[prio]; - if ((len = qdisc->ops->drop(qdisc)) != 0) { + if (qdisc->ops->drop && (len = qdisc->ops->drop(qdisc)) != 0) { sch->q.qlen--; return len; } diff -puN net/sched/sch_red.c~git-net net/sched/sch_red.c --- devel/net/sched/sch_red.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_red.c 2006-03-17 23:03:48.000000000 -0800 @@ -44,6 +44,7 @@ struct red_sched_data unsigned char flags; struct red_parms parms; struct red_stats stats; + struct Qdisc *qdisc; }; static inline int red_use_ecn(struct red_sched_data *q) @@ -59,8 +60,10 @@ static inline int red_use_harddrop(struc static int red_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; + int ret; - q->parms.qavg = red_calc_qavg(&q->parms, sch->qstats.backlog); + q->parms.qavg = red_calc_qavg(&q->parms, child->qstats.backlog); if (red_is_idling(&q->parms)) red_end_of_idle_period(&q->parms); @@ -91,11 +94,16 @@ static int red_enqueue(struct sk_buff *s break; } - if (sch->qstats.backlog + skb->len <= q->limit) - return qdisc_enqueue_tail(skb, sch); - - q->stats.pdrop++; - return qdisc_drop(skb, sch); + ret = child->enqueue(skb, child); + if (likely(ret == NET_XMIT_SUCCESS)) { + sch->bstats.bytes += skb->len; + sch->bstats.packets++; + sch->q.qlen++; + } else { + q->stats.pdrop++; + sch->qstats.drops++; + } + return ret; congestion_drop: qdisc_drop(skb, sch); @@ -105,21 +113,30 @@ congestion_drop: static int red_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; + int ret; if (red_is_idling(&q->parms)) red_end_of_idle_period(&q->parms); - return qdisc_requeue(skb, sch); + ret = child->ops->requeue(skb, child); + if (likely(ret == NET_XMIT_SUCCESS)) { + sch->qstats.requeues++; + sch->q.qlen++; + } + return ret; } static struct sk_buff * red_dequeue(struct Qdisc* sch) { struct sk_buff *skb; struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; - skb = qdisc_dequeue_head(sch); - - if (skb == NULL && !red_is_idling(&q->parms)) + skb = child->dequeue(child); + if (skb) + sch->q.qlen--; + else if (!red_is_idling(&q->parms)) red_start_of_idle_period(&q->parms); return skb; @@ -127,14 +144,14 @@ static struct sk_buff * red_dequeue(stru static unsigned int red_drop(struct Qdisc* sch) { - struct sk_buff *skb; struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; + unsigned int len; - skb = qdisc_dequeue_tail(sch); - if (skb) { - unsigned int len = skb->len; + if (child->ops->drop && (len = child->ops->drop(child)) > 0) { q->stats.other++; - qdisc_drop(skb, sch); + sch->qstats.drops++; + sch->q.qlen--; return len; } @@ -148,15 +165,48 @@ static void red_reset(struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); - qdisc_reset_queue(sch); + qdisc_reset(q->qdisc); + sch->q.qlen = 0; red_restart(&q->parms); } +static void red_destroy(struct Qdisc *sch) +{ + struct red_sched_data *q = qdisc_priv(sch); + qdisc_destroy(q->qdisc); +} + +static struct Qdisc *red_create_dflt(struct net_device *dev, u32 limit) +{ + struct Qdisc *q = qdisc_create_dflt(dev, &bfifo_qdisc_ops); + struct rtattr *rta; + int ret; + + if (q) { + rta = kmalloc(RTA_LENGTH(sizeof(struct tc_fifo_qopt)), + GFP_KERNEL); + if (rta) { + rta->rta_type = RTM_NEWQDISC; + rta->rta_len = RTA_LENGTH(sizeof(struct tc_fifo_qopt)); + ((struct tc_fifo_qopt *)RTA_DATA(rta))->limit = limit; + + ret = q->ops->change(q, rta); + kfree(rta); + + if (ret == 0) + return q; + } + qdisc_destroy(q); + } + return NULL; +} + static int red_change(struct Qdisc *sch, struct rtattr *opt) { struct red_sched_data *q = qdisc_priv(sch); struct rtattr *tb[TCA_RED_MAX]; struct tc_red_qopt *ctl; + struct Qdisc *child = NULL; if (opt == NULL || rtattr_parse_nested(tb, TCA_RED_MAX, opt)) return -EINVAL; @@ -169,9 +219,17 @@ static int red_change(struct Qdisc *sch, ctl = RTA_DATA(tb[TCA_RED_PARMS-1]); + if (ctl->limit > 0) { + child = red_create_dflt(sch->dev, ctl->limit); + if (child == NULL) + return -ENOMEM; + } + sch_tree_lock(sch); q->flags = ctl->flags; q->limit = ctl->limit; + if (child) + qdisc_destroy(xchg(&q->qdisc, child)); red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog, ctl->Plog, ctl->Scell_log, @@ -186,6 +244,9 @@ static int red_change(struct Qdisc *sch, static int red_init(struct Qdisc* sch, struct rtattr *opt) { + struct red_sched_data *q = qdisc_priv(sch); + + q->qdisc = &noop_qdisc; return red_change(sch, opt); } @@ -224,15 +285,101 @@ static int red_dump_stats(struct Qdisc * return gnet_stats_copy_app(d, &st, sizeof(st)); } +static int red_dump_class(struct Qdisc *sch, unsigned long cl, + struct sk_buff *skb, struct tcmsg *tcm) +{ + struct red_sched_data *q = qdisc_priv(sch); + + if (cl != 1) + return -ENOENT; + tcm->tcm_handle |= TC_H_MIN(1); + tcm->tcm_info = q->qdisc->handle; + return 0; +} + +static int red_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, + struct Qdisc **old) +{ + struct red_sched_data *q = qdisc_priv(sch); + + if (new == NULL) + new = &noop_qdisc; + + sch_tree_lock(sch); + *old = xchg(&q->qdisc, new); + qdisc_reset(*old); + sch->q.qlen = 0; + sch_tree_unlock(sch); + return 0; +} + +static struct Qdisc *red_leaf(struct Qdisc *sch, unsigned long arg) +{ + struct red_sched_data *q = qdisc_priv(sch); + return q->qdisc; +} + +static unsigned long red_get(struct Qdisc *sch, u32 classid) +{ + return 1; +} + +static void red_put(struct Qdisc *sch, unsigned long arg) +{ + return; +} + +static int red_change_class(struct Qdisc *sch, u32 classid, u32 parentid, + struct rtattr **tca, unsigned long *arg) +{ + return -ENOSYS; +} + +static int red_delete(struct Qdisc *sch, unsigned long cl) +{ + return -ENOSYS; +} + +static void red_walk(struct Qdisc *sch, struct qdisc_walker *walker) +{ + if (!walker->stop) { + if (walker->count >= walker->skip) + if (walker->fn(sch, 1, walker) < 0) { + walker->stop = 1; + return; + } + walker->count++; + } +} + +static struct tcf_proto **red_find_tcf(struct Qdisc *sch, unsigned long cl) +{ + return NULL; +} + +static struct Qdisc_class_ops red_class_ops = { + .graft = red_graft, + .leaf = red_leaf, + .get = red_get, + .put = red_put, + .change = red_change_class, + .delete = red_delete, + .walk = red_walk, + .tcf_chain = red_find_tcf, + .dump = red_dump_class, +}; + static struct Qdisc_ops red_qdisc_ops = { .id = "red", .priv_size = sizeof(struct red_sched_data), + .cl_ops = &red_class_ops, .enqueue = red_enqueue, .dequeue = red_dequeue, .requeue = red_requeue, .drop = red_drop, .init = red_init, .reset = red_reset, + .destroy = red_destroy, .change = red_change, .dump = red_dump, .dump_stats = red_dump_stats, diff -puN net/sched/sch_sfq.c~git-net net/sched/sch_sfq.c --- devel/net/sched/sch_sfq.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_sfq.c 2006-03-17 23:03:48.000000000 -0800 @@ -232,6 +232,7 @@ static unsigned int sfq_drop(struct Qdis sfq_dec(q, x); sch->q.qlen--; sch->qstats.drops++; + sch->qstats.backlog -= len; return len; } @@ -248,6 +249,7 @@ static unsigned int sfq_drop(struct Qdis sch->q.qlen--; q->ht[q->hash[d]] = SFQ_DEPTH; sch->qstats.drops++; + sch->qstats.backlog -= len; return len; } @@ -266,6 +268,7 @@ sfq_enqueue(struct sk_buff *skb, struct q->ht[hash] = x = q->dep[SFQ_DEPTH].next; q->hash[x] = hash; } + sch->qstats.backlog += skb->len; __skb_queue_tail(&q->qs[x], skb); sfq_inc(q, x); if (q->qs[x].qlen == 1) { /* The flow is new */ @@ -301,6 +304,7 @@ sfq_requeue(struct sk_buff *skb, struct q->ht[hash] = x = q->dep[SFQ_DEPTH].next; q->hash[x] = hash; } + sch->qstats.backlog += skb->len; __skb_queue_head(&q->qs[x], skb); sfq_inc(q, x); if (q->qs[x].qlen == 1) { /* The flow is new */ @@ -344,6 +348,7 @@ sfq_dequeue(struct Qdisc* sch) skb = __skb_dequeue(&q->qs[a]); sfq_dec(q, a); sch->q.qlen--; + sch->qstats.backlog -= skb->len; /* Is the slot empty? */ if (q->qs[a].qlen == 0) { diff -puN net/sched/sch_tbf.c~git-net net/sched/sch_tbf.c --- devel/net/sched/sch_tbf.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sched/sch_tbf.c 2006-03-17 23:03:48.000000000 -0800 @@ -177,9 +177,9 @@ static int tbf_requeue(struct sk_buff *s static unsigned int tbf_drop(struct Qdisc* sch) { struct tbf_sched_data *q = qdisc_priv(sch); - unsigned int len; + unsigned int len = 0; - if ((len = q->qdisc->ops->drop(q->qdisc)) != 0) { + if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { sch->q.qlen--; sch->qstats.drops++; } @@ -341,13 +341,14 @@ static int tbf_change(struct Qdisc* sch, if (max_size < 0) goto done; - if (q->qdisc == &noop_qdisc) { + if (qopt->limit > 0) { if ((child = tbf_create_dflt_qdisc(sch->dev, qopt->limit)) == NULL) goto done; } sch_tree_lock(sch); - if (child) q->qdisc = child; + if (child) + qdisc_destroy(xchg(&q->qdisc, child)); q->limit = qopt->limit; q->mtu = qopt->mtu; q->max_size = max_size; diff -puN net/sctp/ipv6.c~git-net net/sctp/ipv6.c --- devel/net/sctp/ipv6.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sctp/ipv6.c 2006-03-17 23:03:48.000000000 -0800 @@ -861,23 +861,27 @@ static int sctp_inet6_supported_addrs(co } static const struct proto_ops inet6_seqpacket_ops = { - .family = PF_INET6, - .owner = THIS_MODULE, - .release = inet6_release, - .bind = inet6_bind, - .connect = inet_dgram_connect, - .socketpair = sock_no_socketpair, - .accept = inet_accept, - .getname = inet6_getname, - .poll = sctp_poll, - .ioctl = inet6_ioctl, - .listen = sctp_inet_listen, - .shutdown = inet_shutdown, - .setsockopt = sock_common_setsockopt, - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, + .family = PF_INET6, + .owner = THIS_MODULE, + .release = inet6_release, + .bind = inet6_bind, + .connect = inet_dgram_connect, + .socketpair = sock_no_socketpair, + .accept = inet_accept, + .getname = inet6_getname, + .poll = sctp_poll, + .ioctl = inet6_ioctl, + .listen = sctp_inet_listen, + .shutdown = inet_shutdown, + .setsockopt = sock_common_setsockopt, + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; static struct inet_protosw sctpv6_seqpacket_protosw = { @@ -911,31 +915,35 @@ static struct inet6_protocol sctpv6_prot }; static struct sctp_af sctp_ipv6_specific = { - .sctp_xmit = sctp_v6_xmit, - .setsockopt = ipv6_setsockopt, - .getsockopt = ipv6_getsockopt, - .get_dst = sctp_v6_get_dst, - .get_saddr = sctp_v6_get_saddr, - .copy_addrlist = sctp_v6_copy_addrlist, - .from_skb = sctp_v6_from_skb, - .from_sk = sctp_v6_from_sk, - .to_sk_saddr = sctp_v6_to_sk_saddr, - .to_sk_daddr = sctp_v6_to_sk_daddr, - .from_addr_param = sctp_v6_from_addr_param, - .to_addr_param = sctp_v6_to_addr_param, - .dst_saddr = sctp_v6_dst_saddr, - .cmp_addr = sctp_v6_cmp_addr, - .scope = sctp_v6_scope, - .addr_valid = sctp_v6_addr_valid, - .inaddr_any = sctp_v6_inaddr_any, - .is_any = sctp_v6_is_any, - .available = sctp_v6_available, - .skb_iif = sctp_v6_skb_iif, - .is_ce = sctp_v6_is_ce, - .seq_dump_addr = sctp_v6_seq_dump_addr, - .net_header_len = sizeof(struct ipv6hdr), - .sockaddr_len = sizeof(struct sockaddr_in6), - .sa_family = AF_INET6, + .sa_family = AF_INET6, + .sctp_xmit = sctp_v6_xmit, + .setsockopt = ipv6_setsockopt, + .getsockopt = ipv6_getsockopt, + .get_dst = sctp_v6_get_dst, + .get_saddr = sctp_v6_get_saddr, + .copy_addrlist = sctp_v6_copy_addrlist, + .from_skb = sctp_v6_from_skb, + .from_sk = sctp_v6_from_sk, + .to_sk_saddr = sctp_v6_to_sk_saddr, + .to_sk_daddr = sctp_v6_to_sk_daddr, + .from_addr_param = sctp_v6_from_addr_param, + .to_addr_param = sctp_v6_to_addr_param, + .dst_saddr = sctp_v6_dst_saddr, + .cmp_addr = sctp_v6_cmp_addr, + .scope = sctp_v6_scope, + .addr_valid = sctp_v6_addr_valid, + .inaddr_any = sctp_v6_inaddr_any, + .is_any = sctp_v6_is_any, + .available = sctp_v6_available, + .skb_iif = sctp_v6_skb_iif, + .is_ce = sctp_v6_is_ce, + .seq_dump_addr = sctp_v6_seq_dump_addr, + .net_header_len = sizeof(struct ipv6hdr), + .sockaddr_len = sizeof(struct sockaddr_in6), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif }; static struct sctp_pf sctp_pf_inet6_specific = { diff -puN net/sctp/protocol.c~git-net net/sctp/protocol.c --- devel/net/sctp/protocol.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sctp/protocol.c 2006-03-17 23:03:48.000000000 -0800 @@ -831,24 +831,28 @@ static struct notifier_block sctp_inetad /* Socket operations. */ static const struct proto_ops inet_seqpacket_ops = { - .family = PF_INET, - .owner = THIS_MODULE, - .release = inet_release, /* Needs to be wrapped... */ - .bind = inet_bind, - .connect = inet_dgram_connect, - .socketpair = sock_no_socketpair, - .accept = inet_accept, - .getname = inet_getname, /* Semantics are different. */ - .poll = sctp_poll, - .ioctl = inet_ioctl, - .listen = sctp_inet_listen, - .shutdown = inet_shutdown, /* Looks harmless. */ - .setsockopt = sock_common_setsockopt, /* IP_SOL IP_OPTION is a problem. */ - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, + .family = PF_INET, + .owner = THIS_MODULE, + .release = inet_release, /* Needs to be wrapped... */ + .bind = inet_bind, + .connect = inet_dgram_connect, + .socketpair = sock_no_socketpair, + .accept = inet_accept, + .getname = inet_getname, /* Semantics are different. */ + .poll = sctp_poll, + .ioctl = inet_ioctl, + .listen = sctp_inet_listen, + .shutdown = inet_shutdown, /* Looks harmless. */ + .setsockopt = sock_common_setsockopt, /* IP_SOL IP_OPTION is a problem */ + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, + .sendpage = sock_no_sendpage, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif }; /* Registration with AF_INET family. */ @@ -880,31 +884,35 @@ static struct net_protocol sctp_protocol /* IPv4 address related functions. */ static struct sctp_af sctp_ipv4_specific = { - .sctp_xmit = sctp_v4_xmit, - .setsockopt = ip_setsockopt, - .getsockopt = ip_getsockopt, - .get_dst = sctp_v4_get_dst, - .get_saddr = sctp_v4_get_saddr, - .copy_addrlist = sctp_v4_copy_addrlist, - .from_skb = sctp_v4_from_skb, - .from_sk = sctp_v4_from_sk, - .to_sk_saddr = sctp_v4_to_sk_saddr, - .to_sk_daddr = sctp_v4_to_sk_daddr, - .from_addr_param= sctp_v4_from_addr_param, - .to_addr_param = sctp_v4_to_addr_param, - .dst_saddr = sctp_v4_dst_saddr, - .cmp_addr = sctp_v4_cmp_addr, - .addr_valid = sctp_v4_addr_valid, - .inaddr_any = sctp_v4_inaddr_any, - .is_any = sctp_v4_is_any, - .available = sctp_v4_available, - .scope = sctp_v4_scope, - .skb_iif = sctp_v4_skb_iif, - .is_ce = sctp_v4_is_ce, - .seq_dump_addr = sctp_v4_seq_dump_addr, - .net_header_len = sizeof(struct iphdr), - .sockaddr_len = sizeof(struct sockaddr_in), - .sa_family = AF_INET, + .sa_family = AF_INET, + .sctp_xmit = sctp_v4_xmit, + .setsockopt = ip_setsockopt, + .getsockopt = ip_getsockopt, + .get_dst = sctp_v4_get_dst, + .get_saddr = sctp_v4_get_saddr, + .copy_addrlist = sctp_v4_copy_addrlist, + .from_skb = sctp_v4_from_skb, + .from_sk = sctp_v4_from_sk, + .to_sk_saddr = sctp_v4_to_sk_saddr, + .to_sk_daddr = sctp_v4_to_sk_daddr, + .from_addr_param = sctp_v4_from_addr_param, + .to_addr_param = sctp_v4_to_addr_param, + .dst_saddr = sctp_v4_dst_saddr, + .cmp_addr = sctp_v4_cmp_addr, + .addr_valid = sctp_v4_addr_valid, + .inaddr_any = sctp_v4_inaddr_any, + .is_any = sctp_v4_is_any, + .available = sctp_v4_available, + .scope = sctp_v4_scope, + .skb_iif = sctp_v4_skb_iif, + .is_ce = sctp_v4_is_ce, + .seq_dump_addr = sctp_v4_seq_dump_addr, + .net_header_len = sizeof(struct iphdr), + .sockaddr_len = sizeof(struct sockaddr_in), +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif }; struct sctp_pf *sctp_get_pf_specific(sa_family_t family) { diff -puN net/socket.c~git-net net/socket.c --- devel/net/socket.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/socket.c 2006-03-17 23:03:48.000000000 -0800 @@ -68,6 +68,7 @@ #include #include #include +#include #include #include #include @@ -348,8 +349,8 @@ static struct dentry_operations sockfs_d /* * Obtains the first available file descriptor and sets it up for use. * - * This function creates file structure and maps it to fd space - * of current process. On success it returns file descriptor + * These functions create file structures and maps them to fd space + * of the current process. On success it returns file descriptor * and file struct implicitly stored in sock->file. * Note that another thread may close file descriptor before we return * from this function. We use the fact that now we do not refer @@ -362,53 +363,90 @@ static struct dentry_operations sockfs_d * but we take care of internal coherence yet. */ -int sock_map_fd(struct socket *sock) +static int sock_alloc_fd(struct file **filep) { int fd; - struct qstr this; - char name[32]; - - /* - * Find a file descriptor suitable for return to the user. - */ fd = get_unused_fd(); - if (fd >= 0) { + if (likely(fd >= 0)) { struct file *file = get_empty_filp(); - if (!file) { + *filep = file; + if (unlikely(!file)) { put_unused_fd(fd); - fd = -ENFILE; - goto out; + return -ENFILE; } + } else + *filep = NULL; + return fd; +} + +static int sock_attach_fd(struct socket *sock, struct file *file) +{ + struct qstr this; + char name[32]; + + this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); + this.name = name; + this.hash = SOCK_INODE(sock)->i_ino; - this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); - this.name = name; - this.hash = SOCK_INODE(sock)->i_ino; - - file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this); - if (!file->f_dentry) { - put_filp(file); + file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this); + if (unlikely(!file->f_dentry)) + return -ENOMEM; + + file->f_dentry->d_op = &sockfs_dentry_operations; + d_add(file->f_dentry, SOCK_INODE(sock)); + file->f_vfsmnt = mntget(sock_mnt); + file->f_mapping = file->f_dentry->d_inode->i_mapping; + + sock->file = file; + file->f_op = SOCK_INODE(sock)->i_fop = &socket_file_ops; + file->f_mode = FMODE_READ | FMODE_WRITE; + file->f_flags = O_RDWR; + file->f_pos = 0; + file->private_data = sock; + + return 0; +} + +int sock_map_fd(struct socket *sock) +{ + struct file *newfile; + int fd = sock_alloc_fd(&newfile); + + if (likely(fd >= 0)) { + int err = sock_attach_fd(sock, newfile); + + if (unlikely(err < 0)) { + put_filp(newfile); put_unused_fd(fd); - fd = -ENOMEM; - goto out; + return err; } - file->f_dentry->d_op = &sockfs_dentry_operations; - d_add(file->f_dentry, SOCK_INODE(sock)); - file->f_vfsmnt = mntget(sock_mnt); - file->f_mapping = file->f_dentry->d_inode->i_mapping; + fd_install(fd, newfile); + } + return fd; +} - sock->file = file; - file->f_op = SOCK_INODE(sock)->i_fop = &socket_file_ops; - file->f_mode = FMODE_READ | FMODE_WRITE; - file->f_flags = O_RDWR; - file->f_pos = 0; - file->private_data = sock; - fd_install(fd, file); +static struct socket *sock_from_file(struct file *file, int *err) +{ + struct inode *inode; + struct socket *sock; + + if (file->f_op == &socket_file_ops) + return file->private_data; /* set in sock_map_fd */ + + inode = file->f_dentry->d_inode; + if (!S_ISSOCK(inode->i_mode)) { + *err = -ENOTSOCK; + return NULL; } -out: - return fd; + sock = SOCKET_I(inode); + if (sock->file != file) { + printk(KERN_ERR "socki_lookup: socket file changed!\n"); + sock->file = file; + } + return sock; } /** @@ -427,31 +465,31 @@ out: struct socket *sockfd_lookup(int fd, int *err) { struct file *file; - struct inode *inode; struct socket *sock; - if (!(file = fget(fd))) - { + if (!(file = fget(fd))) { *err = -EBADF; return NULL; } - - if (file->f_op == &socket_file_ops) - return file->private_data; /* set in sock_map_fd */ - - inode = file->f_dentry->d_inode; - if (!S_ISSOCK(inode->i_mode)) { - *err = -ENOTSOCK; + sock = sock_from_file(file, err); + if (!sock) fput(file); - return NULL; - } + return sock; +} - sock = SOCKET_I(inode); - if (sock->file != file) { - printk(KERN_ERR "socki_lookup: socket file changed!\n"); - sock->file = file; +static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed) +{ + struct file *file; + struct socket *sock; + + file = fget_light(fd, fput_needed); + if (file) { + sock = sock_from_file(file, err); + if (sock) + return sock; + fput_light(file, *fput_needed); } - return sock; + return NULL; } /** @@ -789,36 +827,36 @@ static ssize_t sock_aio_write(struct kio * with module unload. */ -static DECLARE_MUTEX(br_ioctl_mutex); +static DEFINE_MUTEX(br_ioctl_mutex); static int (*br_ioctl_hook)(unsigned int cmd, void __user *arg) = NULL; void brioctl_set(int (*hook)(unsigned int, void __user *)) { - down(&br_ioctl_mutex); + mutex_lock(&br_ioctl_mutex); br_ioctl_hook = hook; - up(&br_ioctl_mutex); + mutex_unlock(&br_ioctl_mutex); } EXPORT_SYMBOL(brioctl_set); -static DECLARE_MUTEX(vlan_ioctl_mutex); +static DEFINE_MUTEX(vlan_ioctl_mutex); static int (*vlan_ioctl_hook)(void __user *arg); void vlan_ioctl_set(int (*hook)(void __user *)) { - down(&vlan_ioctl_mutex); + mutex_lock(&vlan_ioctl_mutex); vlan_ioctl_hook = hook; - up(&vlan_ioctl_mutex); + mutex_unlock(&vlan_ioctl_mutex); } EXPORT_SYMBOL(vlan_ioctl_set); -static DECLARE_MUTEX(dlci_ioctl_mutex); +static DEFINE_MUTEX(dlci_ioctl_mutex); static int (*dlci_ioctl_hook)(unsigned int, void __user *); void dlci_ioctl_set(int (*hook)(unsigned int, void __user *)) { - down(&dlci_ioctl_mutex); + mutex_lock(&dlci_ioctl_mutex); dlci_ioctl_hook = hook; - up(&dlci_ioctl_mutex); + mutex_unlock(&dlci_ioctl_mutex); } EXPORT_SYMBOL(dlci_ioctl_set); @@ -862,10 +900,10 @@ static long sock_ioctl(struct file *file if (!br_ioctl_hook) request_module("bridge"); - down(&br_ioctl_mutex); + mutex_lock(&br_ioctl_mutex); if (br_ioctl_hook) err = br_ioctl_hook(cmd, argp); - up(&br_ioctl_mutex); + mutex_unlock(&br_ioctl_mutex); break; case SIOCGIFVLAN: case SIOCSIFVLAN: @@ -873,10 +911,10 @@ static long sock_ioctl(struct file *file if (!vlan_ioctl_hook) request_module("8021q"); - down(&vlan_ioctl_mutex); + mutex_lock(&vlan_ioctl_mutex); if (vlan_ioctl_hook) err = vlan_ioctl_hook(argp); - up(&vlan_ioctl_mutex); + mutex_unlock(&vlan_ioctl_mutex); break; case SIOCGIFDIVERT: case SIOCSIFDIVERT: @@ -890,9 +928,9 @@ static long sock_ioctl(struct file *file request_module("dlci"); if (dlci_ioctl_hook) { - down(&dlci_ioctl_mutex); + mutex_lock(&dlci_ioctl_mutex); err = dlci_ioctl_hook(cmd, argp); - up(&dlci_ioctl_mutex); + mutex_unlock(&dlci_ioctl_mutex); } break; default: @@ -1286,19 +1324,17 @@ asmlinkage long sys_bind(int fd, struct { struct socket *sock; char address[MAX_SOCK_ADDR]; - int err; + int err, fput_needed; - if((sock = sockfd_lookup(fd,&err))!=NULL) + if((sock = sockfd_lookup_light(fd, &err, &fput_needed))!=NULL) { if((err=move_addr_to_kernel(umyaddr,addrlen,address))>=0) { err = security_socket_bind(sock, (struct sockaddr *)address, addrlen); - if (err) { - sockfd_put(sock); - return err; - } - err = sock->ops->bind(sock, (struct sockaddr *)address, addrlen); + if (!err) + err = sock->ops->bind(sock, + (struct sockaddr *)address, addrlen); } - sockfd_put(sock); + fput_light(sock->file, fput_needed); } return err; } @@ -1315,20 +1351,17 @@ int sysctl_somaxconn = SOMAXCONN; asmlinkage long sys_listen(int fd, int backlog) { struct socket *sock; - int err; + int err, fput_needed; - if ((sock = sockfd_lookup(fd, &err)) != NULL) { + if ((sock = sockfd_lookup_light(fd, &err, &fput_needed)) != NULL) { if ((unsigned) backlog > sysctl_somaxconn) backlog = sysctl_somaxconn; err = security_socket_listen(sock, backlog); - if (err) { - sockfd_put(sock); - return err; - } + if (!err) + err = sock->ops->listen(sock, backlog); - err=sock->ops->listen(sock, backlog); - sockfd_put(sock); + fput_light(sock->file, fput_needed); } return err; } @@ -1349,10 +1382,11 @@ asmlinkage long sys_listen(int fd, int b asmlinkage long sys_accept(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen) { struct socket *sock, *newsock; - int err, len; + struct file *newfile; + int err, len, newfd, fput_needed; char address[MAX_SOCK_ADDR]; - sock = sockfd_lookup(fd, &err); + sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; @@ -1369,35 +1403,48 @@ asmlinkage long sys_accept(int fd, struc */ __module_get(newsock->ops->owner); + newfd = sock_alloc_fd(&newfile); + if (unlikely(newfd < 0)) { + err = newfd; + goto out_release; + } + + err = sock_attach_fd(newsock, newfile); + if (err < 0) + goto out_fd; + err = security_socket_accept(sock, newsock); if (err) - goto out_release; + goto out_fd; err = sock->ops->accept(sock, newsock, sock->file->f_flags); if (err < 0) - goto out_release; + goto out_fd; if (upeer_sockaddr) { if(newsock->ops->getname(newsock, (struct sockaddr *)address, &len, 2)<0) { err = -ECONNABORTED; - goto out_release; + goto out_fd; } err = move_addr_to_user(address, len, upeer_sockaddr, upeer_addrlen); if (err < 0) - goto out_release; + goto out_fd; } /* File flags are not inherited via accept() unlike another OSes. */ - if ((err = sock_map_fd(newsock)) < 0) - goto out_release; + fd_install(newfd, newfile); + err = newfd; security_socket_post_accept(sock, newsock); out_put: - sockfd_put(sock); + fput_light(sock->file, fput_needed); out: return err; +out_fd: + put_filp(newfile); + put_unused_fd(newfd); out_release: sock_release(newsock); goto out_put; @@ -1420,9 +1467,9 @@ asmlinkage long sys_connect(int fd, stru { struct socket *sock; char address[MAX_SOCK_ADDR]; - int err; + int err, fput_needed; - sock = sockfd_lookup(fd, &err); + sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; err = move_addr_to_kernel(uservaddr, addrlen, address); @@ -1436,7 +1483,7 @@ asmlinkage long sys_connect(int fd, stru err = sock->ops->connect(sock, (struct sockaddr *) address, addrlen, sock->file->f_flags); out_put: - sockfd_put(sock); + fput_light(sock->file, fput_needed); out: return err; } @@ -1450,9 +1497,9 @@ asmlinkage long sys_getsockname(int fd, { struct socket *sock; char address[MAX_SOCK_ADDR]; - int len, err; + int len, err, fput_needed; - sock = sockfd_lookup(fd, &err); + sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; @@ -1466,7 +1513,7 @@ asmlinkage long sys_getsockname(int fd, err = move_addr_to_user(address, len, usockaddr, usockaddr_len); out_put: - sockfd_put(sock); + fput_light(sock->file, fput_needed); out: return err; } @@ -1480,20 +1527,19 @@ asmlinkage long sys_getpeername(int fd, { struct socket *sock; char address[MAX_SOCK_ADDR]; - int len, err; + int len, err, fput_needed; - if ((sock = sockfd_lookup(fd, &err))!=NULL) - { + if ((sock = sockfd_lookup_light(fd, &err, &fput_needed)) != NULL) { err = security_socket_getpeername(sock); if (err) { - sockfd_put(sock); + fput_light(sock->file, fput_needed); return err; } err = sock->ops->getname(sock, (struct sockaddr *)address, &len, 1); if (!err) err=move_addr_to_user(address,len, usockaddr, usockaddr_len); - sockfd_put(sock); + fput_light(sock->file, fput_needed); } return err; } @@ -1512,10 +1558,16 @@ asmlinkage long sys_sendto(int fd, void int err; struct msghdr msg; struct iovec iov; - - sock = sockfd_lookup(fd, &err); + int fput_needed; + struct file *sock_file; + + sock_file = fget_light(fd, &fput_needed); + if (!sock_file) + return -EBADF; + + sock = sock_from_file(sock_file, &err); if (!sock) - goto out; + goto out_put; iov.iov_base=buff; iov.iov_len=len; msg.msg_name=NULL; @@ -1524,8 +1576,7 @@ asmlinkage long sys_sendto(int fd, void msg.msg_control=NULL; msg.msg_controllen=0; msg.msg_namelen=0; - if(addr) - { + if (addr) { err = move_addr_to_kernel(addr, addr_len, address); if (err < 0) goto out_put; @@ -1538,8 +1589,7 @@ asmlinkage long sys_sendto(int fd, void err = sock_sendmsg(sock, &msg, len); out_put: - sockfd_put(sock); -out: + fput_light(sock_file, fput_needed); return err; } @@ -1566,8 +1616,14 @@ asmlinkage long sys_recvfrom(int fd, voi struct msghdr msg; char address[MAX_SOCK_ADDR]; int err,err2; + struct file *sock_file; + int fput_needed; - sock = sockfd_lookup(fd, &err); + sock_file = fget_light(fd, &fput_needed); + if (!sock_file) + return -EBADF; + + sock = sock_from_file(sock_file, &err); if (!sock) goto out; @@ -1589,8 +1645,8 @@ asmlinkage long sys_recvfrom(int fd, voi if(err2<0) err=err2; } - sockfd_put(sock); out: + fput_light(sock_file, fput_needed); return err; } @@ -1610,25 +1666,24 @@ asmlinkage long sys_recv(int fd, void __ asmlinkage long sys_setsockopt(int fd, int level, int optname, char __user *optval, int optlen) { - int err; + int err, fput_needed; struct socket *sock; if (optlen < 0) return -EINVAL; - if ((sock = sockfd_lookup(fd, &err))!=NULL) + if ((sock = sockfd_lookup_light(fd, &err, &fput_needed)) != NULL) { err = security_socket_setsockopt(sock,level,optname); - if (err) { - sockfd_put(sock); - return err; - } + if (err) + goto out_put; if (level == SOL_SOCKET) err=sock_setsockopt(sock,level,optname,optval,optlen); else err=sock->ops->setsockopt(sock, level, optname, optval, optlen); - sockfd_put(sock); +out_put: + fput_light(sock->file, fput_needed); } return err; } @@ -1640,23 +1695,20 @@ asmlinkage long sys_setsockopt(int fd, i asmlinkage long sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { - int err; + int err, fput_needed; struct socket *sock; - if ((sock = sockfd_lookup(fd, &err))!=NULL) - { - err = security_socket_getsockopt(sock, level, - optname); - if (err) { - sockfd_put(sock); - return err; - } + if ((sock = sockfd_lookup_light(fd, &err, &fput_needed)) != NULL) { + err = security_socket_getsockopt(sock, level, optname); + if (err) + goto out_put; if (level == SOL_SOCKET) err=sock_getsockopt(sock,level,optname,optval,optlen); else err=sock->ops->getsockopt(sock, level, optname, optval, optlen); - sockfd_put(sock); +out_put: + fput_light(sock->file, fput_needed); } return err; } @@ -1668,19 +1720,15 @@ asmlinkage long sys_getsockopt(int fd, i asmlinkage long sys_shutdown(int fd, int how) { - int err; + int err, fput_needed; struct socket *sock; - if ((sock = sockfd_lookup(fd, &err))!=NULL) + if ((sock = sockfd_lookup_light(fd, &err, &fput_needed))!=NULL) { err = security_socket_shutdown(sock, how); - if (err) { - sockfd_put(sock); - return err; - } - - err=sock->ops->shutdown(sock, how); - sockfd_put(sock); + if (!err) + err = sock->ops->shutdown(sock, how); + fput_light(sock->file, fput_needed); } return err; } @@ -1709,6 +1757,7 @@ asmlinkage long sys_sendmsg(int fd, stru unsigned char *ctl_buf = ctl; struct msghdr msg_sys; int err, ctl_len, iov_size, total_len; + int fput_needed; err = -EFAULT; if (MSG_CMSG_COMPAT & flags) { @@ -1717,7 +1766,7 @@ asmlinkage long sys_sendmsg(int fd, stru } else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr))) return -EFAULT; - sock = sockfd_lookup(fd, &err); + sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; @@ -1785,7 +1834,7 @@ out_freeiov: if (iov != iovstack) sock_kfree_s(sock->sk, iov, iov_size); out_put: - sockfd_put(sock); + fput_light(sock->file, fput_needed); out: return err; } @@ -1803,6 +1852,7 @@ asmlinkage long sys_recvmsg(int fd, stru struct msghdr msg_sys; unsigned long cmsg_ptr; int err, iov_size, total_len, len; + int fput_needed; /* kernel mode address */ char addr[MAX_SOCK_ADDR]; @@ -1818,7 +1868,7 @@ asmlinkage long sys_recvmsg(int fd, stru if (copy_from_user(&msg_sys,msg,sizeof(struct msghdr))) return -EFAULT; - sock = sockfd_lookup(fd, &err); + sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; @@ -1885,7 +1935,7 @@ out_freeiov: if (iov != iovstack) sock_kfree_s(sock->sk, iov, iov_size); out_put: - sockfd_put(sock); + fput_light(sock->file, fput_needed); out: return err; } diff -puN net/sunrpc/cache.c~git-net net/sunrpc/cache.c --- devel/net/sunrpc/cache.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sunrpc/cache.c 2006-03-17 23:03:48.000000000 -0800 @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -532,7 +533,7 @@ void cache_clean_deferred(void *owner) */ static DEFINE_SPINLOCK(queue_lock); -static DECLARE_MUTEX(queue_io_sem); +static DEFINE_MUTEX(queue_io_mutex); struct cache_queue { struct list_head list; @@ -561,7 +562,7 @@ cache_read(struct file *filp, char __use if (count == 0) return 0; - down(&queue_io_sem); /* protect against multiple concurrent + mutex_lock(&queue_io_mutex); /* protect against multiple concurrent * readers on this file */ again: spin_lock(&queue_lock); @@ -574,7 +575,7 @@ cache_read(struct file *filp, char __use } if (rp->q.list.next == &cd->queue) { spin_unlock(&queue_lock); - up(&queue_io_sem); + mutex_unlock(&queue_io_mutex); BUG_ON(rp->offset); return 0; } @@ -621,11 +622,11 @@ cache_read(struct file *filp, char __use } if (err == -EAGAIN) goto again; - up(&queue_io_sem); + mutex_unlock(&queue_io_mutex); return err ? err : count; } -static char write_buf[8192]; /* protected by queue_io_sem */ +static char write_buf[8192]; /* protected by queue_io_mutex */ static ssize_t cache_write(struct file *filp, const char __user *buf, size_t count, @@ -639,10 +640,10 @@ cache_write(struct file *filp, const cha if (count >= sizeof(write_buf)) return -EINVAL; - down(&queue_io_sem); + mutex_lock(&queue_io_mutex); if (copy_from_user(write_buf, buf, count)) { - up(&queue_io_sem); + mutex_unlock(&queue_io_mutex); return -EFAULT; } write_buf[count] = '\0'; @@ -651,7 +652,7 @@ cache_write(struct file *filp, const cha else err = -EINVAL; - up(&queue_io_sem); + mutex_unlock(&queue_io_mutex); return err ? err : count; } diff -puN net/sunrpc/sched.c~git-net net/sunrpc/sched.c --- devel/net/sunrpc/sched.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sunrpc/sched.c 2006-03-17 23:03:48.000000000 -0800 @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -62,7 +63,7 @@ static LIST_HEAD(all_tasks); /* * rpciod-related stuff */ -static DECLARE_MUTEX(rpciod_sema); +static DEFINE_MUTEX(rpciod_mutex); static unsigned int rpciod_users; static struct workqueue_struct *rpciod_workqueue; @@ -1047,7 +1048,7 @@ rpciod_up(void) struct workqueue_struct *wq; int error = 0; - down(&rpciod_sema); + mutex_lock(&rpciod_mutex); dprintk("rpciod_up: users %d\n", rpciod_users); rpciod_users++; if (rpciod_workqueue) @@ -1070,14 +1071,14 @@ rpciod_up(void) rpciod_workqueue = wq; error = 0; out: - up(&rpciod_sema); + mutex_unlock(&rpciod_mutex); return error; } void rpciod_down(void) { - down(&rpciod_sema); + mutex_lock(&rpciod_mutex); dprintk("rpciod_down sema %d\n", rpciod_users); if (rpciod_users) { if (--rpciod_users) @@ -1094,7 +1095,7 @@ rpciod_down(void) destroy_workqueue(rpciod_workqueue); rpciod_workqueue = NULL; out: - up(&rpciod_sema); + mutex_unlock(&rpciod_mutex); } #ifdef RPC_DEBUG diff -puN net/sunrpc/svcsock.c~git-net net/sunrpc/svcsock.c --- devel/net/sunrpc/svcsock.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/sunrpc/svcsock.c 2006-03-17 23:03:48.000000000 -0800 @@ -1296,13 +1296,13 @@ svc_send(struct svc_rqst *rqstp) xb->page_len + xb->tail[0].iov_len; - /* Grab svsk->sk_sem to serialize outgoing data. */ - down(&svsk->sk_sem); + /* Grab svsk->sk_mutex to serialize outgoing data. */ + mutex_lock(&svsk->sk_mutex); if (test_bit(SK_DEAD, &svsk->sk_flags)) len = -ENOTCONN; else len = svsk->sk_sendto(rqstp); - up(&svsk->sk_sem); + mutex_unlock(&svsk->sk_mutex); svc_sock_release(rqstp); if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN) @@ -1351,7 +1351,7 @@ svc_setup_socket(struct svc_serv *serv, svsk->sk_lastrecv = get_seconds(); INIT_LIST_HEAD(&svsk->sk_deferred); INIT_LIST_HEAD(&svsk->sk_ready); - sema_init(&svsk->sk_sem, 1); + mutex_init(&svsk->sk_mutex); /* Initialize the socket */ if (sock->type == SOCK_DGRAM) diff -puN net/tipc/bcast.c~git-net net/tipc/bcast.c --- devel/net/tipc/bcast.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/bcast.c 2006-03-17 23:03:48.000000000 -0800 @@ -107,22 +107,22 @@ static spinlock_t bc_lock = SPIN_LOCK_UN char tipc_bclink_name[] = "multicast-link"; -static inline u32 buf_seqno(struct sk_buff *buf) +static u32 buf_seqno(struct sk_buff *buf) { return msg_seqno(buf_msg(buf)); } -static inline u32 bcbuf_acks(struct sk_buff *buf) +static u32 bcbuf_acks(struct sk_buff *buf) { return (u32)(unsigned long)TIPC_SKB_CB(buf)->handle; } -static inline void bcbuf_set_acks(struct sk_buff *buf, u32 acks) +static void bcbuf_set_acks(struct sk_buff *buf, u32 acks) { TIPC_SKB_CB(buf)->handle = (void *)(unsigned long)acks; } -static inline void bcbuf_decr_acks(struct sk_buff *buf) +static void bcbuf_decr_acks(struct sk_buff *buf) { bcbuf_set_acks(buf, bcbuf_acks(buf) - 1); } @@ -134,7 +134,7 @@ static inline void bcbuf_decr_acks(struc * Called with 'node' locked, bc_lock unlocked */ -static inline void bclink_set_gap(struct node *n_ptr) +static void bclink_set_gap(struct node *n_ptr) { struct sk_buff *buf = n_ptr->bclink.deferred_head; @@ -154,7 +154,7 @@ static inline void bclink_set_gap(struct * distribute NACKs, but tries to use the same spacing (divide by 16). */ -static inline int bclink_ack_allowed(u32 n) +static int bclink_ack_allowed(u32 n) { return((n % TIPC_MIN_LINK_WIN) == tipc_own_tag); } @@ -271,7 +271,7 @@ static void bclink_send_nack(struct node msg_set_bcgap_to(msg, n_ptr->bclink.gap_to); msg_set_bcast_tag(msg, tipc_own_tag); - if (tipc_bearer_send(&bcbearer->bearer, buf, 0)) { + if (tipc_bearer_send(&bcbearer->bearer, buf, NULL)) { bcl->stats.sent_nacks++; buf_discard(buf); } else { @@ -314,7 +314,7 @@ void tipc_bclink_check_gap(struct node * * Only tipc_net_lock set. */ -void tipc_bclink_peek_nack(u32 dest, u32 sender_tag, u32 gap_after, u32 gap_to) +static void tipc_bclink_peek_nack(u32 dest, u32 sender_tag, u32 gap_after, u32 gap_to) { struct node *n_ptr = tipc_node_find(dest); u32 my_after, my_to; @@ -425,9 +425,9 @@ void tipc_bclink_recv_pkt(struct sk_buff msg_bcgap_to(msg)); } else { tipc_bclink_peek_nack(msg_destnode(msg), - msg_bcast_tag(msg), - msg_bcgap_after(msg), - msg_bcgap_to(msg)); + msg_bcast_tag(msg), + msg_bcgap_after(msg), + msg_bcgap_to(msg)); } buf_discard(buf); return; @@ -525,16 +525,18 @@ u32 tipc_bclink_acks_missing(struct node * Returns 0 if packet sent successfully, non-zero if not */ -int tipc_bcbearer_send(struct sk_buff *buf, - struct tipc_bearer *unused1, - struct tipc_media_addr *unused2) +static int tipc_bcbearer_send(struct sk_buff *buf, + struct tipc_bearer *unused1, + struct tipc_media_addr *unused2) { static int send_count = 0; - struct node_map remains; - struct node_map remains_new; + struct node_map *remains; + struct node_map *remains_new; + struct node_map *remains_tmp; int bp_index; int swap_time; + int err; /* Prepare buffer for broadcasting (if first time trying to send it) */ @@ -555,7 +557,9 @@ int tipc_bcbearer_send(struct sk_buff *b /* Send buffer over bearers until all targets reached */ - remains = tipc_cltr_bcast_nodes; + remains = kmalloc(sizeof(struct node_map), GFP_ATOMIC); + remains_new = kmalloc(sizeof(struct node_map), GFP_ATOMIC); + *remains = tipc_cltr_bcast_nodes; for (bp_index = 0; bp_index < MAX_BEARERS; bp_index++) { struct bearer *p = bcbearer->bpairs[bp_index].primary; @@ -564,8 +568,8 @@ int tipc_bcbearer_send(struct sk_buff *b if (!p) break; /* no more bearers to try */ - tipc_nmap_diff(&remains, &p->nodes, &remains_new); - if (remains_new.count == remains.count) + tipc_nmap_diff(remains, &p->nodes, remains_new); + if (remains_new->count == remains->count) continue; /* bearer pair doesn't add anything */ if (!p->publ.blocked && @@ -583,17 +587,27 @@ swap: bcbearer->bpairs[bp_index].primary = s; bcbearer->bpairs[bp_index].secondary = p; update: - if (remains_new.count == 0) - return TIPC_OK; + if (remains_new->count == 0) { + err = TIPC_OK; + goto out; + } + /* swap map */ + remains_tmp = remains; remains = remains_new; + remains_new = remains_tmp; } /* Unable to reach all targets */ bcbearer->bearer.publ.blocked = 1; bcl->stats.bearer_congs++; - return ~TIPC_OK; + err = ~TIPC_OK; + + out: + kfree(remains_new); + kfree(remains); + return err; } /** diff -puN net/tipc/bearer.c~git-net net/tipc/bearer.c --- devel/net/tipc/bearer.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/bearer.c 2006-03-17 23:03:48.000000000 -0800 @@ -45,10 +45,10 @@ #define MAX_ADDR_STR 32 -static struct media *media_list = 0; +static struct media *media_list = NULL; static u32 media_count = 0; -struct bearer *tipc_bearers = 0; +struct bearer *tipc_bearers = NULL; /** * media_name_valid - validate media name @@ -79,7 +79,7 @@ static struct media *media_find(const ch if (!strcmp(m_ptr->name, name)) return m_ptr; } - return 0; + return NULL; } /** @@ -287,7 +287,7 @@ static struct bearer *bearer_find(const if (b_ptr->active && (!strcmp(b_ptr->publ.name, name))) return b_ptr; } - return 0; + return NULL; } /** @@ -307,7 +307,7 @@ struct bearer *tipc_bearer_find_interfac if (!strcmp(b_if_name, if_name)) return b_ptr; } - return 0; + return NULL; } /** @@ -569,7 +569,7 @@ failed: int tipc_block_bearer(const char *name) { - struct bearer *b_ptr = 0; + struct bearer *b_ptr = NULL; struct link *l_ptr; struct link *temp_l_ptr; @@ -666,8 +666,8 @@ int tipc_bearer_init(void) } else { kfree(tipc_bearers); kfree(media_list); - tipc_bearers = 0; - media_list = 0; + tipc_bearers = NULL; + media_list = NULL; res = -ENOMEM; } write_unlock_bh(&tipc_net_lock); @@ -691,8 +691,8 @@ void tipc_bearer_stop(void) } kfree(tipc_bearers); kfree(media_list); - tipc_bearers = 0; - media_list = 0; + tipc_bearers = NULL; + media_list = NULL; media_count = 0; } diff -puN net/tipc/cluster.c~git-net net/tipc/cluster.c --- devel/net/tipc/cluster.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/cluster.c 2006-03-17 23:03:48.000000000 -0800 @@ -44,11 +44,11 @@ #include "msg.h" #include "bearer.h" -void tipc_cltr_multicast(struct cluster *c_ptr, struct sk_buff *buf, - u32 lower, u32 upper); -struct sk_buff *tipc_cltr_prepare_routing_msg(u32 data_size, u32 dest); +static void tipc_cltr_multicast(struct cluster *c_ptr, struct sk_buff *buf, + u32 lower, u32 upper); +static struct sk_buff *tipc_cltr_prepare_routing_msg(u32 data_size, u32 dest); -struct node **tipc_local_nodes = 0; +struct node **tipc_local_nodes = NULL; struct node_map tipc_cltr_bcast_nodes = {0,{0,}}; u32 tipc_highest_allowed_slave = 0; @@ -61,7 +61,7 @@ struct cluster *tipc_cltr_create(u32 add c_ptr = (struct cluster *)kmalloc(sizeof(*c_ptr), GFP_ATOMIC); if (c_ptr == NULL) - return 0; + return NULL; memset(c_ptr, 0, sizeof(*c_ptr)); c_ptr->addr = tipc_addr(tipc_zone(addr), tipc_cluster(addr), 0); @@ -73,7 +73,7 @@ struct cluster *tipc_cltr_create(u32 add c_ptr->nodes = (struct node **)kmalloc(alloc, GFP_ATOMIC); if (c_ptr->nodes == NULL) { kfree(c_ptr); - return 0; + return NULL; } memset(c_ptr->nodes, 0, alloc); if (in_own_cluster(addr)) @@ -91,7 +91,7 @@ struct cluster *tipc_cltr_create(u32 add } else { kfree(c_ptr); - c_ptr = 0; + c_ptr = NULL; } return c_ptr; @@ -204,7 +204,7 @@ struct node *tipc_cltr_select_node(struc assert(!in_own_cluster(c_ptr->addr)); if (!c_ptr->highest_node) - return 0; + return NULL; /* Start entry must be random */ while (mask > c_ptr->highest_node) { @@ -222,14 +222,14 @@ struct node *tipc_cltr_select_node(struc if (tipc_node_has_active_links(c_ptr->nodes[n_num])) return c_ptr->nodes[n_num]; } - return 0; + return NULL; } /* * Routing table management: See description in node.c */ -struct sk_buff *tipc_cltr_prepare_routing_msg(u32 data_size, u32 dest) +static struct sk_buff *tipc_cltr_prepare_routing_msg(u32 data_size, u32 dest) { u32 size = INT_H_SIZE + data_size; struct sk_buff *buf = buf_acquire(size); @@ -495,7 +495,7 @@ void tipc_cltr_remove_as_router(struct c * tipc_cltr_multicast - multicast message to local nodes */ -void tipc_cltr_multicast(struct cluster *c_ptr, struct sk_buff *buf, +static void tipc_cltr_multicast(struct cluster *c_ptr, struct sk_buff *buf, u32 lower, u32 upper) { struct sk_buff *buf_copy; diff -puN net/tipc/cluster.h~git-net net/tipc/cluster.h --- devel/net/tipc/cluster.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/cluster.h 2006-03-17 23:03:48.000000000 -0800 @@ -86,7 +86,7 @@ static inline struct cluster *tipc_cltr_ if (z_ptr) return z_ptr->clusters[1]; - return 0; + return NULL; } #endif diff -puN net/tipc/config.c~git-net net/tipc/config.c --- devel/net/tipc/config.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/config.c 2006-03-17 23:03:48.000000000 -0800 @@ -683,11 +683,11 @@ int tipc_cfg_init(void) memset(&mng, 0, sizeof(mng)); INIT_LIST_HEAD(&mng.link_subscribers); - res = tipc_attach(&mng.user_ref, 0, 0); + res = tipc_attach(&mng.user_ref, NULL, NULL); if (res) goto failed; - res = tipc_createport(mng.user_ref, 0, TIPC_CRITICAL_IMPORTANCE, + res = tipc_createport(mng.user_ref, NULL, TIPC_CRITICAL_IMPORTANCE, NULL, NULL, NULL, NULL, cfg_named_msg_event, NULL, NULL, &mng.port_ref); diff -puN net/tipc/dbg.c~git-net net/tipc/dbg.c --- devel/net/tipc/dbg.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/dbg.c 2006-03-17 23:03:48.000000000 -0800 @@ -81,7 +81,7 @@ void tipc_printbuf_init(struct print_buf pb->crs = pb->buf = raw; pb->size = sz; - pb->next = 0; + pb->next = NULL; pb->buf[0] = 0; pb->buf[sz-1] = ~0; } @@ -216,7 +216,7 @@ void tipc_printf(struct print_buf *pb, c } } pb_next = pb->next; - pb->next = 0; + pb->next = NULL; pb = pb_next; } spin_unlock_bh(&print_lock); diff -puN net/tipc/discover.c~git-net net/tipc/discover.c --- devel/net/tipc/discover.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/discover.c 2006-03-17 23:03:48.000000000 -0800 @@ -110,10 +110,10 @@ void tipc_disc_link_event(u32 addr, char * @b_ptr: ptr to bearer issuing message */ -struct sk_buff *tipc_disc_init_msg(u32 type, - u32 req_links, - u32 dest_domain, - struct bearer *b_ptr) +static struct sk_buff *tipc_disc_init_msg(u32 type, + u32 req_links, + u32 dest_domain, + struct bearer *b_ptr) { struct sk_buff *buf = buf_acquire(DSC_H_SIZE); struct tipc_msg *msg; diff -puN net/tipc/eth_media.c~git-net net/tipc/eth_media.c --- devel/net/tipc/eth_media.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/eth_media.c 2006-03-17 23:03:48.000000000 -0800 @@ -169,7 +169,7 @@ static int enable_bearer(struct tipc_bea static void disable_bearer(struct tipc_bearer *tb_ptr) { - ((struct eth_bearer *)tb_ptr->usr_handle)->bearer = 0; + ((struct eth_bearer *)tb_ptr->usr_handle)->bearer = NULL; } /** @@ -285,7 +285,7 @@ void tipc_eth_media_stop(void) for (i = 0; i < MAX_ETH_BEARERS ; i++) { if (eth_bearers[i].bearer) { eth_bearers[i].bearer->blocked = 1; - eth_bearers[i].bearer = 0; + eth_bearers[i].bearer = NULL; } if (eth_bearers[i].dev) { dev_remove_pack(ð_bearers[i].tipc_packet_type); diff -puN net/tipc/link.c~git-net net/tipc/link.c --- devel/net/tipc/link.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/link.c 2006-03-17 23:03:48.000000000 -0800 @@ -157,13 +157,13 @@ static void link_print(struct link *l_pt } \ } while (0) -static inline void dbg_print_link(struct link *l_ptr, const char *str) +static void dbg_print_link(struct link *l_ptr, const char *str) { if (DBG_OUTPUT) link_print(l_ptr, DBG_OUTPUT, str); } -static inline void dbg_print_buf_chain(struct sk_buff *root_buf) +static void dbg_print_buf_chain(struct sk_buff *root_buf) { if (DBG_OUTPUT) { struct sk_buff *buf = root_buf; @@ -176,50 +176,50 @@ static inline void dbg_print_buf_chain(s } /* - * Simple inlined link routines + * Simple link routines */ -static inline unsigned int align(unsigned int i) +static unsigned int align(unsigned int i) { return (i + 3) & ~3u; } -static inline int link_working_working(struct link *l_ptr) +static int link_working_working(struct link *l_ptr) { return (l_ptr->state == WORKING_WORKING); } -static inline int link_working_unknown(struct link *l_ptr) +static int link_working_unknown(struct link *l_ptr) { return (l_ptr->state == WORKING_UNKNOWN); } -static inline int link_reset_unknown(struct link *l_ptr) +static int link_reset_unknown(struct link *l_ptr) { return (l_ptr->state == RESET_UNKNOWN); } -static inline int link_reset_reset(struct link *l_ptr) +static int link_reset_reset(struct link *l_ptr) { return (l_ptr->state == RESET_RESET); } -static inline int link_blocked(struct link *l_ptr) +static int link_blocked(struct link *l_ptr) { return (l_ptr->exp_msg_count || l_ptr->blocked); } -static inline int link_congested(struct link *l_ptr) +static int link_congested(struct link *l_ptr) { return (l_ptr->out_queue_size >= l_ptr->queue_limit[0]); } -static inline u32 link_max_pkt(struct link *l_ptr) +static u32 link_max_pkt(struct link *l_ptr) { return l_ptr->max_pkt; } -static inline void link_init_max_pkt(struct link *l_ptr) +static void link_init_max_pkt(struct link *l_ptr) { u32 max_pkt; @@ -236,20 +236,20 @@ static inline void link_init_max_pkt(str l_ptr->max_pkt_probes = 0; } -static inline u32 link_next_sent(struct link *l_ptr) +static u32 link_next_sent(struct link *l_ptr) { if (l_ptr->next_out) return msg_seqno(buf_msg(l_ptr->next_out)); return mod(l_ptr->next_out_no); } -static inline u32 link_last_sent(struct link *l_ptr) +static u32 link_last_sent(struct link *l_ptr) { return mod(link_next_sent(l_ptr) - 1); } /* - * Simple non-inlined link routines (i.e. referenced outside this file) + * Simple non-static link routines (i.e. referenced outside this file) */ int tipc_link_is_up(struct link *l_ptr) @@ -396,7 +396,7 @@ static void link_timeout(struct link *l_ tipc_node_unlock(l_ptr->owner); } -static inline void link_set_timer(struct link *l_ptr, u32 time) +static void link_set_timer(struct link *l_ptr, u32 time) { k_start_timer(&l_ptr->timer, time); } @@ -573,7 +573,7 @@ void tipc_link_wakeup_ports(struct link if (win <= 0) break; list_del_init(&p_ptr->wait_list); - p_ptr->congested_link = 0; + p_ptr->congested_link = NULL; assert(p_ptr->wakeup); spin_lock_bh(p_ptr->publ.lock); p_ptr->publ.congested = 0; @@ -1004,9 +1004,9 @@ static int link_bundle_buf(struct link * return 1; } -static inline void link_add_to_outqueue(struct link *l_ptr, - struct sk_buff *buf, - struct tipc_msg *msg) +static void link_add_to_outqueue(struct link *l_ptr, + struct sk_buff *buf, + struct tipc_msg *msg) { u32 ack = mod(l_ptr->next_in_no - 1); u32 seqno = mod(l_ptr->next_out_no++); @@ -1156,8 +1156,8 @@ int tipc_link_send(struct sk_buff *buf, * Link is locked. Returns user data length. */ -static inline int link_send_buf_fast(struct link *l_ptr, struct sk_buff *buf, - u32 *used_max_pkt) +static int link_send_buf_fast(struct link *l_ptr, struct sk_buff *buf, + u32 *used_max_pkt) { struct tipc_msg *msg = buf_msg(buf); int res = msg_data_sz(msg); @@ -1355,7 +1355,7 @@ again: fragm_crs = 0; fragm_rest = 0; sect_rest = 0; - sect_crs = 0; + sect_crs = NULL; curr_sect = -1; /* Prepare reusable fragment header: */ @@ -1549,7 +1549,7 @@ u32 tipc_link_push_packet(struct link *l msg_dbg(buf_msg(buf), ">DEF-PROT>"); l_ptr->unacked_window = 0; buf_discard(buf); - l_ptr->proto_msg_queue = 0; + l_ptr->proto_msg_queue = NULL; return TIPC_OK; } else { msg_dbg(buf_msg(buf), "|>DEF-PROT>"); @@ -1860,7 +1860,7 @@ u32 tipc_link_defer_pkt(struct sk_buff * struct sk_buff **tail, struct sk_buff *buf) { - struct sk_buff *prev = 0; + struct sk_buff *prev = NULL; struct sk_buff *crs = *head; u32 seq_no = msg_seqno(buf_msg(buf)); @@ -1953,7 +1953,7 @@ static void link_handle_out_of_seq_msg(s void tipc_link_send_proto_msg(struct link *l_ptr, u32 msg_typ, int probe_msg, u32 gap, u32 tolerance, u32 priority, u32 ack_mtu) { - struct sk_buff *buf = 0; + struct sk_buff *buf = NULL; struct tipc_msg *msg = l_ptr->pmsg; u32 msg_size = sizeof(l_ptr->proto_msg); @@ -2426,7 +2426,7 @@ static int link_recv_changeover_msg(stru } } exit: - *buf = 0; + *buf = NULL; buf_discard(tunnel_buf); return 0; } @@ -2539,42 +2539,37 @@ exit: * pending message. This makes dynamic memory allocation unecessary. */ -static inline u32 get_long_msg_seqno(struct sk_buff *buf) -{ - return msg_seqno(buf_msg(buf)); -} - -static inline void set_long_msg_seqno(struct sk_buff *buf, u32 seqno) +static void set_long_msg_seqno(struct sk_buff *buf, u32 seqno) { msg_set_seqno(buf_msg(buf), seqno); } -static inline u32 get_fragm_size(struct sk_buff *buf) +static u32 get_fragm_size(struct sk_buff *buf) { return msg_ack(buf_msg(buf)); } -static inline void set_fragm_size(struct sk_buff *buf, u32 sz) +static void set_fragm_size(struct sk_buff *buf, u32 sz) { msg_set_ack(buf_msg(buf), sz); } -static inline u32 get_expected_frags(struct sk_buff *buf) +static u32 get_expected_frags(struct sk_buff *buf) { return msg_bcast_ack(buf_msg(buf)); } -static inline void set_expected_frags(struct sk_buff *buf, u32 exp) +static void set_expected_frags(struct sk_buff *buf, u32 exp) { msg_set_bcast_ack(buf_msg(buf), exp); } -static inline u32 get_timer_cnt(struct sk_buff *buf) +static u32 get_timer_cnt(struct sk_buff *buf) { return msg_reroute_cnt(buf_msg(buf)); } -static inline void incr_timer_cnt(struct sk_buff *buf) +static void incr_timer_cnt(struct sk_buff *buf) { msg_incr_reroute_cnt(buf_msg(buf)); } @@ -2586,13 +2581,13 @@ static inline void incr_timer_cnt(struct int tipc_link_recv_fragment(struct sk_buff **pending, struct sk_buff **fb, struct tipc_msg **m) { - struct sk_buff *prev = 0; + struct sk_buff *prev = NULL; struct sk_buff *fbuf = *fb; struct tipc_msg *fragm = buf_msg(fbuf); struct sk_buff *pbuf = *pending; u32 long_msg_seq_no = msg_long_msgno(fragm); - *fb = 0; + *fb = NULL; msg_dbg(fragm,"FRGdefragm_buf; if (!buf) @@ -2750,19 +2745,19 @@ static struct link *link_find_link(const struct link *l_ptr; if (!link_name_validate(name, &link_name_parts)) - return 0; + return NULL; b_ptr = tipc_bearer_find_interface(link_name_parts.if_local); if (!b_ptr) - return 0; + return NULL; *node = tipc_node_find(link_name_parts.addr_peer); if (!*node) - return 0; + return NULL; l_ptr = (*node)->links[b_ptr->identity]; if (!l_ptr || strcmp(l_ptr->name, name)) - return 0; + return NULL; return l_ptr; } diff -puN net/tipc/name_distr.c~git-net net/tipc/name_distr.c --- devel/net/tipc/name_distr.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/name_distr.c 2006-03-17 23:03:48.000000000 -0800 @@ -168,8 +168,8 @@ void tipc_named_withdraw(struct publicat void tipc_named_node_up(unsigned long node) { struct publication *publ; - struct distr_item *item = 0; - struct sk_buff *buf = 0; + struct distr_item *item = NULL; + struct sk_buff *buf = NULL; u32 left = 0; u32 rest; u32 max_item_buf; @@ -200,7 +200,7 @@ void tipc_named_node_up(unsigned long no "<%u.%u.%u>\n", tipc_zone(node), tipc_cluster(node), tipc_node(node)); tipc_link_send(buf, node, node); - buf = 0; + buf = NULL; } } exit: diff -puN net/tipc/name_table.c~git-net net/tipc/name_table.c --- devel/net/tipc/name_table.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/name_table.c 2006-03-17 23:03:48.000000000 -0800 @@ -46,7 +46,7 @@ #include "cluster.h" #include "bcast.h" -int tipc_nametbl_size = 1024; /* must be a power of 2 */ +static int tipc_nametbl_size = 1024; /* must be a power of 2 */ /** * struct sub_seq - container for all published instances of a name sequence @@ -104,7 +104,7 @@ static atomic_t rsv_publ_ok = ATOMIC_INI rwlock_t tipc_nametbl_lock = RW_LOCK_UNLOCKED; -static inline int hash(int x) +static int hash(int x) { return(x & (tipc_nametbl_size - 1)); } @@ -121,7 +121,7 @@ static struct publication *publ_create(u (struct publication *)kmalloc(sizeof(*publ), GFP_ATOMIC); if (publ == NULL) { warn("Memory squeeze; failed to create publication\n"); - return 0; + return NULL; } memset(publ, 0, sizeof(*publ)); @@ -142,7 +142,7 @@ static struct publication *publ_create(u * tipc_subseq_alloc - allocate a specified number of sub-sequence structures */ -struct sub_seq *tipc_subseq_alloc(u32 cnt) +static struct sub_seq *tipc_subseq_alloc(u32 cnt) { u32 sz = cnt * sizeof(struct sub_seq); struct sub_seq *sseq = (struct sub_seq *)kmalloc(sz, GFP_ATOMIC); @@ -158,7 +158,7 @@ struct sub_seq *tipc_subseq_alloc(u32 cn * Allocates a single sub-sequence structure and sets it to all 0's. */ -struct name_seq *tipc_nameseq_create(u32 type, struct hlist_head *seq_head) +static struct name_seq *tipc_nameseq_create(u32 type, struct hlist_head *seq_head) { struct name_seq *nseq = (struct name_seq *)kmalloc(sizeof(*nseq), GFP_ATOMIC); @@ -168,7 +168,7 @@ struct name_seq *tipc_nameseq_create(u32 warn("Memory squeeze; failed to create name sequence\n"); kfree(nseq); kfree(sseq); - return 0; + return NULL; } memset(nseq, 0, sizeof(*nseq)); @@ -190,8 +190,8 @@ struct name_seq *tipc_nameseq_create(u32 * Very time-critical, so binary searches through sub-sequence array. */ -static inline struct sub_seq *nameseq_find_subseq(struct name_seq *nseq, - u32 instance) +static struct sub_seq *nameseq_find_subseq(struct name_seq *nseq, + u32 instance) { struct sub_seq *sseqs = nseq->sseqs; int low = 0; @@ -207,7 +207,7 @@ static inline struct sub_seq *nameseq_fi else return &sseqs[mid]; } - return 0; + return NULL; } /** @@ -243,9 +243,9 @@ static u32 nameseq_locate_subseq(struct * tipc_nameseq_insert_publ - */ -struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq, - u32 type, u32 lower, u32 upper, - u32 scope, u32 node, u32 port, u32 key) +static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq, + u32 type, u32 lower, u32 upper, + u32 scope, u32 node, u32 port, u32 key) { struct subscription *s; struct subscription *st; @@ -263,7 +263,7 @@ struct publication *tipc_nameseq_insert_ if ((sseq->lower != lower) || (sseq->upper != upper)) { warn("Overlapping publ <%u,%u,%u>\n", type, lower, upper); - return 0; + return NULL; } } else { u32 inspos; @@ -278,7 +278,7 @@ struct publication *tipc_nameseq_insert_ if ((inspos < nseq->first_free) && (upper >= nseq->sseqs[inspos].lower)) { warn("Overlapping publ <%u,%u,%u>\n", type, lower, upper); - return 0; + return NULL; } /* Ensure there is space for new sub-sequence */ @@ -294,7 +294,7 @@ struct publication *tipc_nameseq_insert_ nseq->alloc *= 2; } else { warn("Memory squeeze; failed to create sub-sequence\n"); - return 0; + return NULL; } } dbg("Have %u sseqs for type %u\n", nseq->alloc, type); @@ -319,7 +319,7 @@ struct publication *tipc_nameseq_insert_ publ = publ_create(type, lower, upper, scope, node, port, key); if (!publ) - return 0; + return NULL; dbg("inserting publ %x, node=%x publ->node=%x, subscr->node=%x\n", publ, node, publ->node, publ->subscr.node); @@ -369,8 +369,8 @@ struct publication *tipc_nameseq_insert_ * tipc_nameseq_remove_publ - */ -struct publication *tipc_nameseq_remove_publ(struct name_seq *nseq, u32 inst, - u32 node, u32 ref, u32 key) +static struct publication *tipc_nameseq_remove_publ(struct name_seq *nseq, u32 inst, + u32 node, u32 ref, u32 key) { struct publication *publ; struct publication *prev; @@ -394,7 +394,7 @@ struct publication *tipc_nameseq_remove_ i, &nseq->sseqs[i], nseq->sseqs[i].lower, nseq->sseqs[i].upper); } - return 0; + return NULL; } dbg("nameseq_remove: seq: %x, sseq %x, <%u,%u> key %u\n", nseq, sseq, nseq->type, inst, key); @@ -413,7 +413,7 @@ struct publication *tipc_nameseq_remove_ prev->zone_list_next = publ->zone_list_next; sseq->zone_list = publ->zone_list_next; } else { - sseq->zone_list = 0; + sseq->zone_list = NULL; } if (in_own_cluster(node)) { @@ -431,7 +431,7 @@ struct publication *tipc_nameseq_remove_ prev->cluster_list_next = publ->cluster_list_next; sseq->cluster_list = publ->cluster_list_next; } else { - sseq->cluster_list = 0; + sseq->cluster_list = NULL; } } @@ -450,7 +450,7 @@ struct publication *tipc_nameseq_remove_ prev->node_list_next = publ->node_list_next; sseq->node_list = publ->node_list_next; } else { - sseq->node_list = 0; + sseq->node_list = NULL; } } assert(!publ->node || (publ->node == node)); @@ -535,7 +535,7 @@ static struct name_seq *nametbl_find_seq } } - return 0; + return NULL; }; struct publication *tipc_nametbl_insert_publ(u32 type, u32 lower, u32 upper, @@ -547,7 +547,7 @@ struct publication *tipc_nametbl_insert_ if (lower > upper) { warn("Failed to publish illegal <%u,%u,%u>\n", type, lower, upper); - return 0; + return NULL; } dbg("Publishing <%u,%u,%u> from %x\n", type, lower, upper, node); @@ -556,7 +556,7 @@ struct publication *tipc_nametbl_insert_ dbg("tipc_nametbl_insert_publ: created %x\n", seq); } if (!seq) - return 0; + return NULL; assert(seq->type == type); return tipc_nameseq_insert_publ(seq, type, lower, upper, @@ -570,7 +570,7 @@ struct publication *tipc_nametbl_remove_ struct name_seq *seq = nametbl_find_seq(type); if (!seq) - return 0; + return NULL; dbg("Withdrawing <%u,%u> from %x\n", type, lower, node); publ = tipc_nameseq_remove_publ(seq, lower, node, ref, key); @@ -594,7 +594,7 @@ struct publication *tipc_nametbl_remove_ u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode) { struct sub_seq *sseq; - struct publication *publ = 0; + struct publication *publ = NULL; struct name_seq *seq; u32 ref; @@ -740,12 +740,12 @@ struct publication *tipc_nametbl_publish if (table.local_publ_count >= tipc_max_publications) { warn("Failed publish: max %u local publication\n", tipc_max_publications); - return 0; + return NULL; } if ((type < TIPC_RESERVED_TYPES) && !atomic_read(&rsv_publ_ok)) { warn("Failed to publish reserved name <%u,%u,%u>\n", type, lower, upper); - return 0; + return NULL; } write_lock_bh(&tipc_nametbl_lock); @@ -983,6 +983,7 @@ static void nametbl_list(struct print_bu } } +#if 0 void tipc_nametbl_print(struct print_buf *buf, const char *str) { tipc_printf(buf, str); @@ -990,6 +991,7 @@ void tipc_nametbl_print(struct print_buf nametbl_list(buf, 0, 0, 0, 0); read_unlock_bh(&tipc_nametbl_lock); } +#endif #define MAX_NAME_TBL_QUERY 32768 @@ -1023,10 +1025,12 @@ struct sk_buff *tipc_nametbl_get(const v return buf; } +#if 0 void tipc_nametbl_dump(void) { nametbl_list(TIPC_CONS, 0, 0, 0, 0); } +#endif int tipc_nametbl_init(void) { diff -puN net/tipc/net.c~git-net net/tipc/net.c --- devel/net/tipc/net.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/net.c 2006-03-17 23:03:48.000000000 -0800 @@ -116,7 +116,7 @@ */ rwlock_t tipc_net_lock = RW_LOCK_UNLOCKED; -struct network tipc_net = { 0 }; +struct network tipc_net = { NULL }; struct node *tipc_net_select_remote_node(u32 addr, u32 ref) { @@ -128,13 +128,14 @@ u32 tipc_net_select_router(u32 addr, u32 return tipc_zone_select_router(tipc_net.zones[tipc_zone(addr)], addr, ref); } - +#if 0 u32 tipc_net_next_node(u32 a) { if (tipc_net.zones[tipc_zone(a)]) return tipc_zone_next_node(a); return 0; } +#endif void tipc_net_remove_as_router(u32 router) { @@ -181,7 +182,7 @@ static void net_stop(void) tipc_zone_delete(tipc_net.zones[z_num]); } kfree(tipc_net.zones); - tipc_net.zones = 0; + tipc_net.zones = NULL; } static void net_route_named_msg(struct sk_buff *buf) diff -puN net/tipc/node.c~git-net net/tipc/node.c --- devel/net/tipc/node.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/node.c 2006-03-17 23:03:48.000000000 -0800 @@ -155,7 +155,7 @@ static void node_select_active_links(str u32 i; u32 highest_prio = 0; - active[0] = active[1] = 0; + active[0] = active[1] = NULL; for (i = 0; i < MAX_BEARERS; i++) { struct link *l_ptr = n_ptr->links[i]; @@ -214,7 +214,7 @@ int tipc_node_has_redundant_links(struct (n_ptr->active_links[0] != n_ptr->active_links[1])); } -int tipc_node_has_active_routes(struct node *n_ptr) +static int tipc_node_has_active_routes(struct node *n_ptr) { return (n_ptr && (n_ptr->last_router >= 0)); } @@ -240,7 +240,7 @@ struct node *tipc_node_attach_link(struc err("Attempt to create third link to %s\n", addr_string_fill(addr_string, n_ptr->addr)); - return 0; + return NULL; } if (!n_ptr->links[bearer_id]) { @@ -253,12 +253,12 @@ struct node *tipc_node_attach_link(struc l_ptr->b_ptr->publ.name, addr_string_fill(addr_string, l_ptr->addr)); } - return 0; + return NULL; } void tipc_node_detach_link(struct node *n_ptr, struct link *l_ptr) { - n_ptr->links[l_ptr->b_ptr->identity] = 0; + n_ptr->links[l_ptr->b_ptr->identity] = NULL; tipc_net.zones[tipc_zone(l_ptr->addr)]->links--; n_ptr->link_cnt--; } @@ -424,7 +424,7 @@ static void node_lost_contact(struct nod /* Notify subscribers */ list_for_each_entry_safe(ns, tns, &n_ptr->nsub, nodesub_list) { - ns->node = 0; + ns->node = NULL; list_del_init(&ns->nodesub_list); tipc_k_signal((Handler)ns->handle_node_down, (unsigned long)ns->usr_handle); @@ -443,7 +443,7 @@ struct node *tipc_node_select_next_hop(u u32 router_addr; if (!tipc_addr_domain_valid(addr)) - return 0; + return NULL; /* Look for direct link to destination processsor */ n_ptr = tipc_node_find(addr); @@ -452,7 +452,7 @@ struct node *tipc_node_select_next_hop(u /* Cluster local system nodes *must* have direct links */ if (!is_slave(addr) && in_own_cluster(addr)) - return 0; + return NULL; /* Look for cluster local router with direct link to node */ router_addr = tipc_node_select_router(n_ptr, selector); @@ -462,7 +462,7 @@ struct node *tipc_node_select_next_hop(u /* Slave nodes can only be accessed within own cluster via a known router with direct link -- if no router was found,give up */ if (is_slave(addr)) - return 0; + return NULL; /* Inter zone/cluster -- find any direct link to remote cluster */ addr = tipc_addr(tipc_zone(addr), tipc_cluster(addr), 0); @@ -475,7 +475,7 @@ struct node *tipc_node_select_next_hop(u if (router_addr) return tipc_node_select(router_addr, selector); - return 0; + return NULL; } /** diff -puN net/tipc/node.h~git-net net/tipc/node.h --- devel/net/tipc/node.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/node.h 2006-03-17 23:03:48.000000000 -0800 @@ -121,7 +121,7 @@ static inline struct node *tipc_node_fin if (c_ptr) return c_ptr->nodes[tipc_node(addr)]; } - return 0; + return NULL; } static inline struct node *tipc_node_select(u32 addr, u32 selector) diff -puN net/tipc/node_subscr.c~git-net net/tipc/node_subscr.c --- devel/net/tipc/node_subscr.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/node_subscr.c 2006-03-17 23:03:48.000000000 -0800 @@ -47,7 +47,7 @@ void tipc_nodesub_subscribe(struct node_subscr *node_sub, u32 addr, void *usr_handle, net_ev_handler handle_down) { - node_sub->node = 0; + node_sub->node = NULL; if (addr == tipc_own_addr) return; if (!tipc_addr_node_valid(addr)) { diff -puN net/tipc/port.c~git-net net/tipc/port.c --- devel/net/tipc/port.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/port.c 2006-03-17 23:03:48.000000000 -0800 @@ -54,8 +54,8 @@ #define MAX_REJECT_SIZE 1024 -static struct sk_buff *msg_queue_head = 0; -static struct sk_buff *msg_queue_tail = 0; +static struct sk_buff *msg_queue_head = NULL; +static struct sk_buff *msg_queue_tail = NULL; spinlock_t tipc_port_list_lock = SPIN_LOCK_UNLOCKED; static spinlock_t queue_lock = SPIN_LOCK_UNLOCKED; @@ -67,27 +67,22 @@ static struct sk_buff* port_build_peer_a static void port_timeout(unsigned long ref); -static inline u32 port_peernode(struct port *p_ptr) +static u32 port_peernode(struct port *p_ptr) { return msg_destnode(&p_ptr->publ.phdr); } -static inline u32 port_peerport(struct port *p_ptr) +static u32 port_peerport(struct port *p_ptr) { return msg_destport(&p_ptr->publ.phdr); } -static inline u32 port_out_seqno(struct port *p_ptr) +static u32 port_out_seqno(struct port *p_ptr) { return msg_transp_seqno(&p_ptr->publ.phdr); } -static inline void port_set_out_seqno(struct port *p_ptr, u32 seqno) -{ - msg_set_transp_seqno(&p_ptr->publ.phdr,seqno); -} - -static inline void port_incr_out_seqno(struct port *p_ptr) +static void port_incr_out_seqno(struct port *p_ptr) { struct tipc_msg *m = &p_ptr->publ.phdr; @@ -258,11 +253,11 @@ u32 tipc_createport_raw(void *usr_handle p_ptr->publ.usr_handle = usr_handle; INIT_LIST_HEAD(&p_ptr->wait_list); INIT_LIST_HEAD(&p_ptr->subscription.nodesub_list); - p_ptr->congested_link = 0; + p_ptr->congested_link = NULL; p_ptr->max_pkt = MAX_PKT_DEFAULT; p_ptr->dispatcher = dispatcher; p_ptr->wakeup = wakeup; - p_ptr->user_port = 0; + p_ptr->user_port = NULL; k_init_timer(&p_ptr->timer, (Handler)port_timeout, ref); spin_lock_bh(&tipc_port_list_lock); INIT_LIST_HEAD(&p_ptr->publications); @@ -276,9 +271,9 @@ u32 tipc_createport_raw(void *usr_handle int tipc_deleteport(u32 ref) { struct port *p_ptr; - struct sk_buff *buf = 0; + struct sk_buff *buf = NULL; - tipc_withdraw(ref, 0, 0); + tipc_withdraw(ref, 0, NULL); p_ptr = tipc_port_lock(ref); if (!p_ptr) return -EINVAL; @@ -329,13 +324,13 @@ void *tipc_get_handle(const u32 ref) p_ptr = tipc_port_lock(ref); if (!p_ptr) - return 0; + return NULL; handle = p_ptr->publ.usr_handle; tipc_port_unlock(p_ptr); return handle; } -static inline int port_unreliable(struct port *p_ptr) +static int port_unreliable(struct port *p_ptr) { return msg_src_droppable(&p_ptr->publ.phdr); } @@ -364,7 +359,7 @@ int tipc_set_portunreliable(u32 ref, uns return TIPC_OK; } -static inline int port_unreturnable(struct port *p_ptr) +static int port_unreturnable(struct port *p_ptr) { return msg_dest_droppable(&p_ptr->publ.phdr); } @@ -475,7 +470,7 @@ int tipc_reject_msg(struct sk_buff *buf, /* send self-abort message when rejecting on a connected port */ if (msg_connected(msg)) { - struct sk_buff *abuf = 0; + struct sk_buff *abuf = NULL; struct port *p_ptr = tipc_port_lock(msg_destport(msg)); if (p_ptr) { @@ -510,7 +505,7 @@ int tipc_port_reject_sections(struct por static void port_timeout(unsigned long ref) { struct port *p_ptr = tipc_port_lock(ref); - struct sk_buff *buf = 0; + struct sk_buff *buf = NULL; if (!p_ptr || !p_ptr->publ.connected) return; @@ -540,7 +535,7 @@ static void port_timeout(unsigned long r static void port_handle_node_down(unsigned long ref) { struct port *p_ptr = tipc_port_lock(ref); - struct sk_buff* buf = 0; + struct sk_buff* buf = NULL; if (!p_ptr) return; @@ -555,7 +550,7 @@ static struct sk_buff *port_build_self_a u32 imp = msg_importance(&p_ptr->publ.phdr); if (!p_ptr->publ.connected) - return 0; + return NULL; if (imp < TIPC_CRITICAL_IMPORTANCE) imp++; return port_build_proto_msg(p_ptr->publ.ref, @@ -575,7 +570,7 @@ static struct sk_buff *port_build_peer_a u32 imp = msg_importance(&p_ptr->publ.phdr); if (!p_ptr->publ.connected) - return 0; + return NULL; if (imp < TIPC_CRITICAL_IMPORTANCE) imp++; return port_build_proto_msg(port_peerport(p_ptr), @@ -594,8 +589,8 @@ void tipc_port_recv_proto_msg(struct sk_ struct tipc_msg *msg = buf_msg(buf); struct port *p_ptr = tipc_port_lock(msg_destport(msg)); u32 err = TIPC_OK; - struct sk_buff *r_buf = 0; - struct sk_buff *abort_buf = 0; + struct sk_buff *r_buf = NULL; + struct sk_buff *abort_buf = NULL; msg_dbg(msg, "PORT= 0; i--) { - table[i].object = 0; + table[i].object = NULL; table[i].lock = SPIN_LOCK_UNLOCKED; table[i].data.next_plus_upper = (start & ~index_mask) + i - 1; } @@ -108,7 +108,7 @@ void tipc_ref_table_stop(void) return; vfree(tipc_ref_table.entries); - tipc_ref_table.entries = 0; + tipc_ref_table.entries = NULL; } /** @@ -173,7 +173,7 @@ void tipc_ref_discard(u32 ref) assert(entry->data.reference == ref); /* mark entry as unused */ - entry->object = 0; + entry->object = NULL; if (tipc_ref_table.first_free == 0) tipc_ref_table.first_free = index; else diff -puN net/tipc/ref.h~git-net net/tipc/ref.h --- devel/net/tipc/ref.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/ref.h 2006-03-17 23:03:48.000000000 -0800 @@ -92,7 +92,7 @@ static inline void *tipc_ref_lock(u32 re return r->object; spin_unlock_bh(&r->lock); } - return 0; + return NULL; } /** @@ -125,7 +125,7 @@ static inline void *tipc_ref_deref(u32 r if (likely(r->data.reference == ref)) return r->object; } - return 0; + return NULL; } #endif diff -puN net/tipc/socket.c~git-net net/tipc/socket.c --- devel/net/tipc/socket.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/socket.c 2006-03-17 23:03:48.000000000 -0800 @@ -88,7 +88,7 @@ static atomic_t tipc_queue_size = ATOMIC * with non-socket interfaces. * See net.c for description of locking policy. */ -static inline void sock_lock(struct tipc_sock* tsock) +static void sock_lock(struct tipc_sock* tsock) { spin_lock_bh(tsock->p->lock); } @@ -96,7 +96,7 @@ static inline void sock_lock(struct tipc /* * sock_unlock(): Unlock a port/socket pair */ -static inline void sock_unlock(struct tipc_sock* tsock) +static void sock_unlock(struct tipc_sock* tsock) { spin_unlock_bh(tsock->p->lock); } @@ -119,7 +119,7 @@ static inline void sock_unlock(struct ti * Returns pollmask value */ -static inline u32 pollmask(struct socket *sock) +static u32 pollmask(struct socket *sock) { u32 mask; @@ -144,7 +144,7 @@ static inline u32 pollmask(struct socket * @tsock: TIPC socket */ -static inline void advance_queue(struct tipc_sock *tsock) +static void advance_queue(struct tipc_sock *tsock) { sock_lock(tsock); buf_discard(skb_dequeue(&tsock->sk.sk_receive_queue)); @@ -178,7 +178,7 @@ static int tipc_create(struct socket *so if (unlikely(protocol != 0)) return -EPROTONOSUPPORT; - ref = tipc_createport_raw(0, &dispatch, &wakeupdispatch, TIPC_LOW_IMPORTANCE); + ref = tipc_createport_raw(NULL, &dispatch, &wakeupdispatch, TIPC_LOW_IMPORTANCE); if (unlikely(!ref)) return -ENOMEM; @@ -265,7 +265,7 @@ static int release(struct socket *sock) sock_lock(tsock); buf = skb_dequeue(&sk->sk_receive_queue); if (!buf) - tsock->p->usr_handle = 0; + tsock->p->usr_handle = NULL; sock_unlock(tsock); if (!buf) break; @@ -319,7 +319,7 @@ static int bind(struct socket *sock, str return -ERESTARTSYS; if (unlikely(!uaddr_len)) { - res = tipc_withdraw(tsock->p->ref, 0, 0); + res = tipc_withdraw(tsock->p->ref, 0, NULL); goto exit; } @@ -412,7 +412,7 @@ static unsigned int poll(struct file *fi * Returns 0 if permission is granted, otherwise errno */ -static inline int dest_name_check(struct sockaddr_tipc *dest, struct msghdr *m) +static int dest_name_check(struct sockaddr_tipc *dest, struct msghdr *m) { struct tipc_cfg_msg_hdr hdr; @@ -695,7 +695,7 @@ static int auto_connect(struct socket *s * Note: Address is not captured if not requested by receiver. */ -static inline void set_orig_addr(struct msghdr *m, struct tipc_msg *msg) +static void set_orig_addr(struct msghdr *m, struct tipc_msg *msg) { struct sockaddr_tipc *addr = (struct sockaddr_tipc *)m->msg_name; @@ -721,7 +721,7 @@ static inline void set_orig_addr(struct * Returns 0 if successful, otherwise errno */ -static inline int anc_data_recv(struct msghdr *m, struct tipc_msg *msg, +static int anc_data_recv(struct msghdr *m, struct tipc_msg *msg, struct tipc_port *tport) { u32 anc_data[3]; @@ -1226,7 +1226,7 @@ static int connect(struct socket *sock, { struct tipc_sock *tsock = tipc_sk(sock->sk); struct sockaddr_tipc *dst = (struct sockaddr_tipc *)dest; - struct msghdr m = {0,}; + struct msghdr m = {NULL,}; struct sk_buff *buf; struct tipc_msg *msg; int res; @@ -1251,7 +1251,7 @@ static int connect(struct socket *sock, /* Send a 'SYN-' to destination */ m.msg_name = dest; - if ((res = send_msg(0, sock, &m, 0)) < 0) { + if ((res = send_msg(NULL, sock, &m, 0)) < 0) { sock->state = SS_DISCONNECTING; return res; } @@ -1367,9 +1367,9 @@ static int accept(struct socket *sock, s msg_dbg(msg,"ref, importance, - 0, - 0, + NULL, + NULL, subscr_conn_shutdown_event, - 0, - 0, + NULL, + NULL, subscr_conn_msg_event, - 0, + NULL, &subscriber->port_ref); if (subscriber->port_ref == 0) { warn("Memory squeeze; failed to create subscription port\n"); @@ -461,22 +461,22 @@ int tipc_subscr_start(void) INIT_LIST_HEAD(&topsrv.subscriber_list); spin_lock_bh(&topsrv.lock); - res = tipc_attach(&topsrv.user_ref, 0, 0); + res = tipc_attach(&topsrv.user_ref, NULL, NULL); if (res) { spin_unlock_bh(&topsrv.lock); return res; } res = tipc_createport(topsrv.user_ref, - 0, + NULL, TIPC_CRITICAL_IMPORTANCE, - 0, - 0, - 0, - 0, + NULL, + NULL, + NULL, + NULL, subscr_named_msg_event, - 0, - 0, + NULL, + NULL, &topsrv.setup_port); if (res) goto failed; diff -puN net/tipc/user_reg.c~git-net net/tipc/user_reg.c --- devel/net/tipc/user_reg.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/user_reg.c 2006-03-17 23:03:48.000000000 -0800 @@ -65,7 +65,7 @@ struct tipc_user { #define MAX_USERID 64 #define USER_LIST_SIZE ((MAX_USERID + 1) * sizeof(struct tipc_user)) -static struct tipc_user *users = 0; +static struct tipc_user *users = NULL; static u32 next_free_user = MAX_USERID + 1; static spinlock_t reg_lock = SPIN_LOCK_UNLOCKED; @@ -149,7 +149,7 @@ void tipc_reg_stop(void) reg_callback(&users[id]); } kfree(users); - users = 0; + users = NULL; } /** diff -puN net/tipc/zone.c~git-net net/tipc/zone.c --- devel/net/tipc/zone.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/tipc/zone.c 2006-03-17 23:03:48.000000000 -0800 @@ -44,11 +44,11 @@ struct _zone *tipc_zone_create(u32 addr) { - struct _zone *z_ptr = 0; + struct _zone *z_ptr = NULL; u32 z_num; if (!tipc_addr_domain_valid(addr)) - return 0; + return NULL; z_ptr = (struct _zone *)kmalloc(sizeof(*z_ptr), GFP_ATOMIC); if (z_ptr != NULL) { @@ -114,10 +114,10 @@ struct node *tipc_zone_select_remote_nod u32 c_num; if (!z_ptr) - return 0; + return NULL; c_ptr = z_ptr->clusters[tipc_cluster(addr)]; if (!c_ptr) - return 0; + return NULL; n_ptr = tipc_cltr_select_node(c_ptr, ref); if (n_ptr) return n_ptr; @@ -126,12 +126,12 @@ struct node *tipc_zone_select_remote_nod for (c_num = 1; c_num <= tipc_max_clusters; c_num++) { c_ptr = z_ptr->clusters[c_num]; if (!c_ptr) - return 0; + return NULL; n_ptr = tipc_cltr_select_node(c_ptr, ref); if (n_ptr) return n_ptr; } - return 0; + return NULL; } u32 tipc_zone_select_router(struct _zone *z_ptr, u32 addr, u32 ref) diff -puN net/unix/af_unix.c~git-net net/unix/af_unix.c --- devel/net/unix/af_unix.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/unix/af_unix.c 2006-03-17 23:03:48.000000000 -0800 @@ -566,7 +566,7 @@ static struct sock * unix_create1(struct u->mnt = NULL; spin_lock_init(&u->lock); atomic_set(&u->inflight, sock ? 0 : -1); - init_MUTEX(&u->readsem); /* single task reading lock */ + mutex_init(&u->readlock); /* single task reading lock */ init_waitqueue_head(&u->peer_wait); unix_insert_socket(unix_sockets_unbound, sk); out: @@ -623,7 +623,7 @@ static int unix_autobind(struct socket * struct unix_address * addr; int err; - down(&u->readsem); + mutex_lock(&u->readlock); err = 0; if (u->addr) @@ -661,7 +661,7 @@ retry: spin_unlock(&unix_table_lock); err = 0; -out: up(&u->readsem); +out: mutex_unlock(&u->readlock); return err; } @@ -744,7 +744,7 @@ static int unix_bind(struct socket *sock goto out; addr_len = err; - down(&u->readsem); + mutex_lock(&u->readlock); err = -EINVAL; if (u->addr) @@ -816,7 +816,7 @@ static int unix_bind(struct socket *sock out_unlock: spin_unlock(&unix_table_lock); out_up: - up(&u->readsem); + mutex_unlock(&u->readlock); out: return err; @@ -1290,6 +1290,7 @@ static int unix_dgram_sendmsg(struct kio memcpy(UNIXCREDS(skb), &siocb->scm->creds, sizeof(struct ucred)); if (siocb->scm->fp) unix_attach_fds(siocb->scm, skb); + memcpy(UNIXSID(skb), &siocb->scm->sid, sizeof(u32)); skb->h.raw = skb->data; err = memcpy_fromiovec(skb_put(skb,len), msg->msg_iov, len); @@ -1427,15 +1428,15 @@ static int unix_stream_sendmsg(struct ki while(sent < len) { /* - * Optimisation for the fact that under 0.01% of X messages typically - * need breaking up. + * Optimisation for the fact that under 0.01% of X + * messages typically need breaking up. */ - size=len-sent; + size = len-sent; /* Keep two messages in the pipe so it schedules better */ - if (size > sk->sk_sndbuf / 2 - 64) - size = sk->sk_sndbuf / 2 - 64; + if (size > ((sk->sk_sndbuf >> 1) - 64)) + size = (sk->sk_sndbuf >> 1) - 64; if (size > SKB_MAX_ALLOC) size = SKB_MAX_ALLOC; @@ -1545,7 +1546,7 @@ static int unix_dgram_recvmsg(struct kio msg->msg_namelen = 0; - down(&u->readsem); + mutex_lock(&u->readlock); skb = skb_recv_datagram(sk, flags, noblock, &err); if (!skb) @@ -1570,6 +1571,7 @@ static int unix_dgram_recvmsg(struct kio memset(&tmp_scm, 0, sizeof(tmp_scm)); } siocb->scm->creds = *UNIXCREDS(skb); + siocb->scm->sid = *UNIXSID(skb); if (!(flags & MSG_PEEK)) { @@ -1600,7 +1602,7 @@ static int unix_dgram_recvmsg(struct kio out_free: skb_free_datagram(sk,skb); out_unlock: - up(&u->readsem); + mutex_unlock(&u->readlock); out: return err; } @@ -1676,7 +1678,7 @@ static int unix_stream_recvmsg(struct ki memset(&tmp_scm, 0, sizeof(tmp_scm)); } - down(&u->readsem); + mutex_lock(&u->readlock); do { @@ -1700,7 +1702,7 @@ static int unix_stream_recvmsg(struct ki err = -EAGAIN; if (!timeo) break; - up(&u->readsem); + mutex_unlock(&u->readlock); timeo = unix_stream_data_wait(sk, timeo); @@ -1708,7 +1710,7 @@ static int unix_stream_recvmsg(struct ki err = sock_intr_errno(timeo); goto out; } - down(&u->readsem); + mutex_lock(&u->readlock); continue; } @@ -1774,7 +1776,7 @@ static int unix_stream_recvmsg(struct ki } } while (size); - up(&u->readsem); + mutex_unlock(&u->readlock); scm_recv(sock, msg, siocb->scm, flags); out: return copied ? : err; diff -puN net/unix/garbage.c~git-net net/unix/garbage.c --- devel/net/unix/garbage.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/unix/garbage.c 2006-03-17 23:03:48.000000000 -0800 @@ -76,6 +76,7 @@ #include #include #include +#include #include #include @@ -169,7 +170,7 @@ static void maybe_unmark_and_push(struct void unix_gc(void) { - static DECLARE_MUTEX(unix_gc_sem); + static DEFINE_MUTEX(unix_gc_sem); int i; struct sock *s; struct sk_buff_head hitlist; @@ -179,7 +180,7 @@ void unix_gc(void) * Avoid a recursive GC. */ - if (down_trylock(&unix_gc_sem)) + if (!mutex_trylock(&unix_gc_sem)) return; spin_lock(&unix_table_lock); @@ -308,5 +309,5 @@ void unix_gc(void) */ __skb_queue_purge(&hitlist); - up(&unix_gc_sem); + mutex_unlock(&unix_gc_sem); } diff -puN net/xfrm/xfrm_policy.c~git-net net/xfrm/xfrm_policy.c --- devel/net/xfrm/xfrm_policy.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/xfrm/xfrm_policy.c 2006-03-17 23:03:48.000000000 -0800 @@ -26,8 +26,8 @@ #include #include -DECLARE_MUTEX(xfrm_cfg_sem); -EXPORT_SYMBOL(xfrm_cfg_sem); +DEFINE_MUTEX(xfrm_cfg_mutex); +EXPORT_SYMBOL(xfrm_cfg_mutex); static DEFINE_RWLOCK(xfrm_policy_lock); @@ -203,7 +203,7 @@ static void xfrm_policy_timer(unsigned l } if (warn) - km_policy_expired(xp, dir, 0); + km_policy_expired(xp, dir, 0, 0); if (next != LONG_MAX && !mod_timer(&xp->timer, jiffies + make_jiffies(next))) xfrm_pol_hold(xp); @@ -216,7 +216,7 @@ out: expired: read_unlock(&xp->lock); if (!xfrm_policy_delete(xp, dir)) - km_policy_expired(xp, dir, 1); + km_policy_expired(xp, dir, 1, 0); xfrm_pol_put(xp); } @@ -621,6 +621,7 @@ int xfrm_policy_delete(struct xfrm_polic } return -ENOENT; } +EXPORT_SYMBOL(xfrm_policy_delete); int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol) { diff -puN net/xfrm/xfrm_state.c~git-net net/xfrm/xfrm_state.c --- devel/net/xfrm/xfrm_state.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/xfrm/xfrm_state.c 2006-03-17 23:03:48.000000000 -0800 @@ -20,6 +20,15 @@ #include #include +struct sock *xfrm_nl; +EXPORT_SYMBOL(xfrm_nl); + +u32 sysctl_xfrm_aevent_etime = XFRM_AE_ETIME; +EXPORT_SYMBOL(sysctl_xfrm_aevent_etime); + +u32 sysctl_xfrm_aevent_rseqth = XFRM_AE_SEQT_SIZE; +EXPORT_SYMBOL(sysctl_xfrm_aevent_rseqth); + /* Each xfrm_state may be linked to two tables: 1. Hash table by (spi,daddr,ah/esp) to find SA by SPI. (input,ctl) @@ -50,18 +59,20 @@ static DEFINE_SPINLOCK(xfrm_state_gc_loc static int xfrm_state_gc_flush_bundles; -static int __xfrm_state_delete(struct xfrm_state *x); +int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); -static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); -static void km_state_expired(struct xfrm_state *x, int hard); +int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); +void km_state_expired(struct xfrm_state *x, int hard, u32 pid); static void xfrm_state_gc_destroy(struct xfrm_state *x) { if (del_timer(&x->timer)) BUG(); + if (del_timer(&x->rtimer)) + BUG(); kfree(x->aalg); kfree(x->ealg); kfree(x->calg); @@ -153,7 +164,7 @@ static void xfrm_timer_handler(unsigned x->km.dying = warn; if (warn) - km_state_expired(x, 0); + km_state_expired(x, 0, 0); resched: if (next != LONG_MAX && !mod_timer(&x->timer, jiffies + make_jiffies(next))) @@ -168,13 +179,15 @@ expired: goto resched; } if (!__xfrm_state_delete(x) && x->id.spi) - km_state_expired(x, 1); + km_state_expired(x, 1, 0); out: spin_unlock(&x->lock); xfrm_state_put(x); } +static void xfrm_replay_timer_handler(unsigned long data); + struct xfrm_state *xfrm_state_alloc(void) { struct xfrm_state *x; @@ -190,11 +203,16 @@ struct xfrm_state *xfrm_state_alloc(void init_timer(&x->timer); x->timer.function = xfrm_timer_handler; x->timer.data = (unsigned long)x; + init_timer(&x->rtimer); + x->rtimer.function = xfrm_replay_timer_handler; + x->rtimer.data = (unsigned long)x; x->curlft.add_time = (unsigned long)xtime.tv_sec; x->lft.soft_byte_limit = XFRM_INF; x->lft.soft_packet_limit = XFRM_INF; x->lft.hard_byte_limit = XFRM_INF; x->lft.hard_packet_limit = XFRM_INF; + x->replay_maxage = 0; + x->replay_maxdiff = 0; spin_lock_init(&x->lock); } return x; @@ -212,7 +230,7 @@ void __xfrm_state_destroy(struct xfrm_st } EXPORT_SYMBOL(__xfrm_state_destroy); -static int __xfrm_state_delete(struct xfrm_state *x) +int __xfrm_state_delete(struct xfrm_state *x) { int err = -ESRCH; @@ -228,6 +246,8 @@ static int __xfrm_state_delete(struct xf spin_unlock(&xfrm_state_lock); if (del_timer(&x->timer)) __xfrm_state_put(x); + if (del_timer(&x->rtimer)) + __xfrm_state_put(x); /* The number two in this test is the reference * mentioned in the comment below plus the reference @@ -249,6 +269,7 @@ static int __xfrm_state_delete(struct xf return err; } +EXPORT_SYMBOL(__xfrm_state_delete); int xfrm_state_delete(struct xfrm_state *x) { @@ -426,6 +447,10 @@ static void __xfrm_state_insert(struct x if (!mod_timer(&x->timer, jiffies + HZ)) xfrm_state_hold(x); + if (x->replay_maxage && + !mod_timer(&x->rtimer, jiffies + x->replay_maxage)) + xfrm_state_hold(x); + wake_up(&km_waitq); } @@ -580,7 +605,7 @@ int xfrm_state_check_expire(struct xfrm_ (x->curlft.bytes >= x->lft.soft_byte_limit || x->curlft.packets >= x->lft.soft_packet_limit)) { x->km.dying = 1; - km_state_expired(x, 0); + km_state_expired(x, 0, 0); } return 0; } @@ -762,6 +787,61 @@ out: } EXPORT_SYMBOL(xfrm_state_walk); + +void xfrm_replay_notify(struct xfrm_state *x, int event) +{ + struct km_event c; + /* we send notify messages in case + * 1. we updated on of the sequence numbers, and the seqno difference + * is at least x->replay_maxdiff, in this case we also update the + * timeout of our timer function + * 2. if x->replay_maxage has elapsed since last update, + * and there were changes + * + * The state structure must be locked! + */ + + switch (event) { + case XFRM_REPLAY_UPDATE: + if (x->replay_maxdiff && + (x->replay.seq - x->preplay.seq < x->replay_maxdiff) && + (x->replay.oseq - x->preplay.oseq < x->replay_maxdiff)) + return; + + break; + + case XFRM_REPLAY_TIMEOUT: + if ((x->replay.seq == x->preplay.seq) && + (x->replay.bitmap == x->preplay.bitmap) && + (x->replay.oseq == x->preplay.oseq)) + return; + + break; + } + + memcpy(&x->preplay, &x->replay, sizeof(struct xfrm_replay_state)); + c.event = XFRM_MSG_NEWAE; + c.data.aevent = event; + km_state_notify(x, &c); + + if (x->replay_maxage && + !mod_timer(&x->rtimer, jiffies + x->replay_maxage)) + xfrm_state_hold(x); +} +EXPORT_SYMBOL(xfrm_replay_notify); + +static void xfrm_replay_timer_handler(unsigned long data) +{ + struct xfrm_state *x = (struct xfrm_state*)data; + + spin_lock(&x->lock); + + if (xfrm_aevent_is_on() && x->km.state == XFRM_STATE_VALID) + xfrm_replay_notify(x, XFRM_REPLAY_TIMEOUT); + + spin_unlock(&x->lock); +} + int xfrm_replay_check(struct xfrm_state *x, u32 seq) { u32 diff; @@ -805,6 +885,9 @@ void xfrm_replay_advance(struct xfrm_sta diff = x->replay.seq - seq; x->replay.bitmap |= (1U << diff); } + + if (xfrm_aevent_is_on()) + xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); } EXPORT_SYMBOL(xfrm_replay_advance); @@ -835,11 +918,12 @@ void km_state_notify(struct xfrm_state * EXPORT_SYMBOL(km_policy_notify); EXPORT_SYMBOL(km_state_notify); -static void km_state_expired(struct xfrm_state *x, int hard) +void km_state_expired(struct xfrm_state *x, int hard, u32 pid) { struct km_event c; c.data.hard = hard; + c.pid = pid; c.event = XFRM_MSG_EXPIRE; km_state_notify(x, &c); @@ -847,11 +931,12 @@ static void km_state_expired(struct xfrm wake_up(&km_waitq); } +EXPORT_SYMBOL(km_state_expired); /* * We send to all registered managers regardless of failure * We are happy with one success */ -static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) +int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { int err = -EINVAL, acqret; struct xfrm_mgr *km; @@ -865,6 +950,7 @@ static int km_query(struct xfrm_state *x read_unlock(&xfrm_km_lock); return err; } +EXPORT_SYMBOL(km_query); int km_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport) { @@ -883,17 +969,19 @@ int km_new_mapping(struct xfrm_state *x, } EXPORT_SYMBOL(km_new_mapping); -void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) +void km_policy_expired(struct xfrm_policy *pol, int dir, int hard, u32 pid) { struct km_event c; c.data.hard = hard; + c.pid = pid; c.event = XFRM_MSG_POLEXPIRE; km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); } +EXPORT_SYMBOL(km_policy_expired); int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen) { diff -puN net/xfrm/xfrm_user.c~git-net net/xfrm/xfrm_user.c --- devel/net/xfrm/xfrm_user.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/net/xfrm/xfrm_user.c 2006-03-17 23:03:48.000000000 -0800 @@ -28,8 +28,6 @@ #include #include -static struct sock *xfrm_nl; - static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type) { struct rtattr *rt = xfrma[type - 1]; @@ -103,9 +101,6 @@ static inline int verify_sec_ctx_len(str uctx = RTA_DATA(rt); - if (uctx->ctx_len > PAGE_SIZE) - return -EINVAL; - len += sizeof(struct xfrm_user_sec_ctx); len += uctx->ctx_len; @@ -276,6 +271,56 @@ static void copy_from_user_state(struct x->props.flags = p->flags; } +/* + * someday when pfkey also has support, we could have the code + * somehow made shareable and move it to xfrm_state.c - JHS + * +*/ +static int xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) +{ + int err = - EINVAL; + struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; + struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; + struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH-1]; + struct rtattr *rt = xfrma[XFRMA_REPLAY_THRESH-1]; + + if (rp) { + struct xfrm_replay_state *replay; + if (RTA_PAYLOAD(rp) < sizeof(*replay)) + goto error; + replay = RTA_DATA(rp); + memcpy(&x->replay, replay, sizeof(*replay)); + memcpy(&x->preplay, replay, sizeof(*replay)); + } + + if (lt) { + struct xfrm_lifetime_cur *ltime; + if (RTA_PAYLOAD(lt) < sizeof(*ltime)) + goto error; + ltime = RTA_DATA(lt); + x->curlft.bytes = ltime->bytes; + x->curlft.packets = ltime->packets; + x->curlft.add_time = ltime->add_time; + x->curlft.use_time = ltime->use_time; + } + + if (et) { + if (RTA_PAYLOAD(et) < sizeof(u32)) + goto error; + x->replay_maxage = *(u32*)RTA_DATA(et); + } + + if (rt) { + if (RTA_PAYLOAD(rt) < sizeof(u32)) + goto error; + x->replay_maxdiff = *(u32*)RTA_DATA(rt); + } + + return 0; +error: + return err; +} + static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p, struct rtattr **xfrma, int *errp) @@ -311,6 +356,18 @@ static struct xfrm_state *xfrm_state_con goto error; x->km.seq = p->seq; + x->replay_maxdiff = sysctl_xfrm_aevent_rseqth; + /* sysctl_xfrm_aevent_etime is in 100ms units */ + x->replay_maxage = (sysctl_xfrm_aevent_etime*HZ)/XFRM_AE_ETH_M; + x->preplay.bitmap = 0; + x->preplay.seq = x->replay.seq+x->replay_maxdiff; + x->preplay.oseq = x->replay.oseq +x->replay_maxdiff; + + /* override default values from above */ + + err = xfrm_update_ae_params(x, (struct rtattr **)xfrma); + if (err < 0) + goto error; return x; @@ -1025,9 +1082,142 @@ static int xfrm_flush_sa(struct sk_buff return 0; } -static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) + +static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, struct km_event *c) { + struct xfrm_aevent_id *id; + struct nlmsghdr *nlh; + struct xfrm_lifetime_cur ltime; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_NEWAE, sizeof(*id)); + id = NLMSG_DATA(nlh); + nlh->nlmsg_flags = 0; + + id->sa_id.daddr = x->id.daddr; + id->sa_id.spi = x->id.spi; + id->sa_id.family = x->props.family; + id->sa_id.proto = x->id.proto; + id->flags = c->data.aevent; + + RTA_PUT(skb, XFRMA_REPLAY_VAL, sizeof(x->replay), &x->replay); + + ltime.bytes = x->curlft.bytes; + ltime.packets = x->curlft.packets; + ltime.add_time = x->curlft.add_time; + ltime.use_time = x->curlft.use_time; + + RTA_PUT(skb, XFRMA_LTIME_VAL, sizeof(struct xfrm_lifetime_cur), <ime); + + if (id->flags&XFRM_AE_RTHR) { + RTA_PUT(skb,XFRMA_REPLAY_THRESH,sizeof(u32),&x->replay_maxdiff); + } + + if (id->flags&XFRM_AE_ETHR) { + u32 etimer = x->replay_maxage*10/HZ; + RTA_PUT(skb,XFRMA_ETIMER_THRESH,sizeof(u32),&etimer); + } + + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +rtattr_failure: +nlmsg_failure: + skb_trim(skb, b - skb->data); + return -1; +} + +static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_state *x; + struct sk_buff *r_skb; + int err; struct km_event c; + struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); + struct xfrm_usersa_id *id = &p->sa_id; + + len += RTA_SPACE(sizeof(struct xfrm_replay_state)); + len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur)); + + if (p->flags&XFRM_AE_RTHR) + len+=RTA_SPACE(sizeof(u32)); + + if (p->flags&XFRM_AE_ETHR) + len+=RTA_SPACE(sizeof(u32)); + + r_skb = alloc_skb(len, GFP_ATOMIC); + if (r_skb == NULL) + return -ENOMEM; + + x = xfrm_state_lookup(&id->daddr, id->spi, id->proto, id->family); + if (x == NULL) { + kfree(r_skb); + return -ESRCH; + } + + /* + * XXX: is this lock really needed - none of the other + * gets lock (the concern is things getting updated + * while we are still reading) - jhs + */ + spin_lock_bh(&x->lock); + c.data.aevent = p->flags; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + + if (build_aevent(r_skb, x, &c) < 0) + BUG(); + err = netlink_unicast(xfrm_nl, r_skb, + NETLINK_CB(skb).pid, MSG_DONTWAIT); + spin_unlock_bh(&x->lock); + xfrm_state_put(x); + return err; +} + +static int xfrm_new_ae(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_state *x; + struct km_event c; + int err = - EINVAL; + struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; + struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; + + if (!lt && !rp) + return err; + + /* pedantic mode - thou shalt sayeth replaceth */ + if (!(nlh->nlmsg_flags&NLM_F_REPLACE)) + return err; + + x = xfrm_state_lookup(&p->sa_id.daddr, p->sa_id.spi, p->sa_id.proto, p->sa_id.family); + if (x == NULL) + return -ESRCH; + + if (x->km.state != XFRM_STATE_VALID) + goto out; + + spin_lock_bh(&x->lock); + err = xfrm_update_ae_params(x,(struct rtattr **)xfrma); + spin_unlock_bh(&x->lock); + if (err < 0) + goto out; + + c.event = nlh->nlmsg_type; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.data.aevent = XFRM_AE_CU; + km_state_notify(x, &c); + err = 0; +out: + xfrm_state_put(x); + return err; +} + +static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ +struct km_event c; xfrm_policy_flush(); c.event = nlh->nlmsg_type; @@ -1037,6 +1227,139 @@ static int xfrm_flush_policy(struct sk_b return 0; } +static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_policy *xp; + struct xfrm_user_polexpire *up = NLMSG_DATA(nlh); + struct xfrm_userpolicy_info *p = &up->pol; + int err = -ENOENT; + + if (p->index) + xp = xfrm_policy_byid(p->dir, p->index, 0); + else { + struct rtattr **rtattrs = (struct rtattr **)xfrma; + struct rtattr *rt = rtattrs[XFRMA_SEC_CTX-1]; + struct xfrm_policy tmp; + + err = verify_sec_ctx_len(rtattrs); + if (err) + return err; + + memset(&tmp, 0, sizeof(struct xfrm_policy)); + if (rt) { + struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); + + if ((err = security_xfrm_policy_alloc(&tmp, uctx))) + return err; + } + xp = xfrm_policy_bysel_ctx(p->dir, &p->sel, tmp.security, 0); + security_xfrm_policy_free(&tmp); + } + + if (xp == NULL) + return err; + read_lock(&xp->lock); + if (xp->dead) { + read_unlock(&xp->lock); + goto out; + } + + read_unlock(&xp->lock); + err = 0; + if (up->hard) { + xfrm_policy_delete(xp, p->dir); + } else { + // reset the timers here? + printk("Dont know what to do with soft policy expire\n"); + } + km_policy_expired(xp, p->dir, up->hard, current->pid); + +out: + xfrm_pol_put(xp); + return err; +} + +static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_state *x; + int err; + struct xfrm_user_expire *ue = NLMSG_DATA(nlh); + struct xfrm_usersa_info *p = &ue->state; + + x = xfrm_state_lookup(&p->id.daddr, p->id.spi, p->id.proto, p->family); + err = -ENOENT; + + if (x == NULL) + return err; + + err = -EINVAL; + + spin_lock_bh(&x->lock); + if (x->km.state != XFRM_STATE_VALID) + goto out; + km_state_expired(x, ue->hard, current->pid); + + if (ue->hard) + __xfrm_state_delete(x); +out: + spin_unlock_bh(&x->lock); + xfrm_state_put(x); + return err; +} + +static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_policy *xp; + struct xfrm_user_tmpl *ut; + int i; + struct rtattr *rt = xfrma[XFRMA_TMPL-1]; + + struct xfrm_user_acquire *ua = NLMSG_DATA(nlh); + struct xfrm_state *x = xfrm_state_alloc(); + int err = -ENOMEM; + + if (!x) + return err; + + err = verify_newpolicy_info(&ua->policy); + if (err) { + printk("BAD policy passed\n"); + kfree(x); + return err; + } + + /* build an XP */ + xp = xfrm_policy_construct(&ua->policy, (struct rtattr **) xfrma, &err); if (!xp) { + kfree(x); + return err; + } + + memcpy(&x->id, &ua->id, sizeof(ua->id)); + memcpy(&x->props.saddr, &ua->saddr, sizeof(ua->saddr)); + memcpy(&x->sel, &ua->sel, sizeof(ua->sel)); + + ut = RTA_DATA(rt); + /* extract the templates and for each call km_key */ + for (i = 0; i < xp->xfrm_nr; i++, ut++) { + struct xfrm_tmpl *t = &xp->xfrm_vec[i]; + memcpy(&x->id, &t->id, sizeof(x->id)); + x->props.mode = t->mode; + x->props.reqid = t->reqid; + x->props.family = ut->family; + t->aalgos = ua->aalgos; + t->ealgos = ua->ealgos; + t->calgos = ua->calgos; + err = km_query(x, t, xp); + + } + + kfree(x); + kfree(xp); + + return 0; +} + + #define XMSGSIZE(type) NLMSG_LENGTH(sizeof(struct type)) static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = { @@ -1054,6 +1377,8 @@ static const int xfrm_msg_min[XFRM_NR_MS [XFRM_MSG_POLEXPIRE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire), [XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush), [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = NLMSG_LENGTH(0), + [XFRM_MSG_NEWAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), + [XFRM_MSG_GETAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), }; #undef XMSGSIZE @@ -1071,10 +1396,15 @@ static struct xfrm_link { [XFRM_MSG_GETPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_get_policy, .dump = xfrm_dump_policy }, [XFRM_MSG_ALLOCSPI - XFRM_MSG_BASE] = { .doit = xfrm_alloc_userspi }, + [XFRM_MSG_ACQUIRE - XFRM_MSG_BASE] = { .doit = xfrm_add_acquire }, + [XFRM_MSG_EXPIRE - XFRM_MSG_BASE] = { .doit = xfrm_add_sa_expire }, [XFRM_MSG_UPDPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_add_policy }, [XFRM_MSG_UPDSA - XFRM_MSG_BASE] = { .doit = xfrm_add_sa }, + [XFRM_MSG_POLEXPIRE - XFRM_MSG_BASE] = { .doit = xfrm_add_pol_expire}, [XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = { .doit = xfrm_flush_sa }, [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_flush_policy }, + [XFRM_MSG_NEWAE - XFRM_MSG_BASE] = { .doit = xfrm_new_ae }, + [XFRM_MSG_GETAE - XFRM_MSG_BASE] = { .doit = xfrm_get_ae }, }; static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *errp) @@ -1156,26 +1486,26 @@ static void xfrm_netlink_rcv(struct sock unsigned int qlen = 0; do { - down(&xfrm_cfg_sem); + mutex_lock(&xfrm_cfg_mutex); netlink_run_queue(sk, &qlen, &xfrm_user_rcv_msg); - up(&xfrm_cfg_sem); + mutex_unlock(&xfrm_cfg_mutex); } while (qlen); } -static int build_expire(struct sk_buff *skb, struct xfrm_state *x, int hard) +static int build_expire(struct sk_buff *skb, struct xfrm_state *x, struct km_event *c) { struct xfrm_user_expire *ue; struct nlmsghdr *nlh; unsigned char *b = skb->tail; - nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_EXPIRE, + nlh = NLMSG_PUT(skb, c->pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue)); ue = NLMSG_DATA(nlh); nlh->nlmsg_flags = 0; copy_to_user_state(x, &ue->state); - ue->hard = (hard != 0) ? 1 : 0; + ue->hard = (c->data.hard != 0) ? 1 : 0; nlh->nlmsg_len = skb->tail - b; return skb->len; @@ -1194,13 +1524,31 @@ static int xfrm_exp_state_notify(struct if (skb == NULL) return -ENOMEM; - if (build_expire(skb, x, c->data.hard) < 0) + if (build_expire(skb, x, c) < 0) BUG(); NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_aevent_state_notify(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); + + len += RTA_SPACE(sizeof(struct xfrm_replay_state)); + len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur)); + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + + if (build_aevent(skb, x, c) < 0) + BUG(); + + NETLINK_CB(skb).dst_group = XFRMNLGRP_AEVENTS; + return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC); +} + static int xfrm_notify_sa_flush(struct km_event *c) { struct xfrm_usersa_flush *p; @@ -1313,6 +1661,8 @@ static int xfrm_send_state_notify(struct switch (c->event) { case XFRM_MSG_EXPIRE: return xfrm_exp_state_notify(x, c); + case XFRM_MSG_NEWAE: + return xfrm_aevent_state_notify(x, c); case XFRM_MSG_DELSA: case XFRM_MSG_UPDSA: case XFRM_MSG_NEWSA: @@ -1443,13 +1793,14 @@ static struct xfrm_policy *xfrm_compile_ } static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp, - int dir, int hard) + int dir, struct km_event *c) { struct xfrm_user_polexpire *upe; struct nlmsghdr *nlh; + int hard = c->data.hard; unsigned char *b = skb->tail; - nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe)); + nlh = NLMSG_PUT(skb, c->pid, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe)); upe = NLMSG_DATA(nlh); nlh->nlmsg_flags = 0; @@ -1480,7 +1831,7 @@ static int xfrm_exp_policy_notify(struct if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, c->data.hard) < 0) + if (build_polexpire(skb, xp, dir, c) < 0) BUG(); NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; @@ -1596,12 +1947,15 @@ static struct xfrm_mgr netlink_mgr = { static int __init xfrm_user_init(void) { + struct sock *nlsk; + printk(KERN_INFO "Initializing IPsec netlink socket\n"); - xfrm_nl = netlink_kernel_create(NETLINK_XFRM, XFRMNLGRP_MAX, - xfrm_netlink_rcv, THIS_MODULE); - if (xfrm_nl == NULL) + nlsk = netlink_kernel_create(NETLINK_XFRM, XFRMNLGRP_MAX, + xfrm_netlink_rcv, THIS_MODULE); + if (nlsk == NULL) return -ENOMEM; + rcu_assign_pointer(xfrm_nl, nlsk); xfrm_register_km(&netlink_mgr); @@ -1610,11 +1964,16 @@ static int __init xfrm_user_init(void) static void __exit xfrm_user_exit(void) { + struct sock *nlsk = xfrm_nl; + xfrm_unregister_km(&netlink_mgr); - sock_release(xfrm_nl->sk_socket); + rcu_assign_pointer(xfrm_nl, NULL); + synchronize_rcu(); + sock_release(nlsk->sk_socket); } module_init(xfrm_user_init); module_exit(xfrm_user_exit); MODULE_LICENSE("GPL"); MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_XFRM); + diff -puN security/dummy.c~git-net security/dummy.c --- devel/security/dummy.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/security/dummy.c 2006-03-17 23:03:48.000000000 -0800 @@ -773,8 +773,14 @@ static int dummy_socket_sock_rcv_skb (st return 0; } -static int dummy_socket_getpeersec(struct socket *sock, char __user *optval, - int __user *optlen, unsigned len) +static int dummy_socket_getpeersec_stream(struct socket *sock, char __user *optval, + int __user *optlen, unsigned len) +{ + return -ENOPROTOOPT; +} + +static int dummy_socket_getpeersec_dgram(struct sk_buff *skb, char **secdata, + u32 *seclen) { return -ENOPROTOOPT; } @@ -1014,7 +1020,8 @@ void security_fixup_ops (struct security set_to_dummy_if_null(ops, socket_getsockopt); set_to_dummy_if_null(ops, socket_shutdown); set_to_dummy_if_null(ops, socket_sock_rcv_skb); - set_to_dummy_if_null(ops, socket_getpeersec); + set_to_dummy_if_null(ops, socket_getpeersec_stream); + set_to_dummy_if_null(ops, socket_getpeersec_dgram); set_to_dummy_if_null(ops, sk_alloc_security); set_to_dummy_if_null(ops, sk_free_security); set_to_dummy_if_null(ops, sk_getsid); diff -puN security/selinux/hooks.c~git-net security/selinux/hooks.c --- devel/security/selinux/hooks.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/security/selinux/hooks.c 2006-03-17 23:03:48.000000000 -0800 @@ -3316,24 +3316,38 @@ out: return err; } -static int selinux_socket_getpeersec(struct socket *sock, char __user *optval, - int __user *optlen, unsigned len) +static int selinux_socket_getpeersec_stream(struct socket *sock, char __user *optval, + int __user *optlen, unsigned len) { int err = 0; char *scontext; u32 scontext_len; struct sk_security_struct *ssec; struct inode_security_struct *isec; + u32 peer_sid = 0; isec = SOCK_INODE(sock)->i_security; - if (isec->sclass != SECCLASS_UNIX_STREAM_SOCKET) { + + /* if UNIX_STREAM check peer_sid, if TCP check dst for labelled sa */ + if (isec->sclass == SECCLASS_UNIX_STREAM_SOCKET) { + ssec = sock->sk->sk_security; + peer_sid = ssec->peer_sid; + } + else if (isec->sclass == SECCLASS_TCP_SOCKET) { + peer_sid = selinux_socket_getpeer_stream(sock->sk); + + if (peer_sid == SECSID_NULL) { + err = -ENOPROTOOPT; + goto out; + } + } + else { err = -ENOPROTOOPT; goto out; } - ssec = sock->sk->sk_security; - - err = security_sid_to_context(ssec->peer_sid, &scontext, &scontext_len); + err = security_sid_to_context(peer_sid, &scontext, &scontext_len); + if (err) goto out; @@ -3354,6 +3368,23 @@ out: return err; } +static int selinux_socket_getpeersec_dgram(struct sk_buff *skb, char **secdata, u32 *seclen) +{ + int err = 0; + u32 peer_sid = selinux_socket_getpeer_dgram(skb); + + if (peer_sid == SECSID_NULL) + return -EINVAL; + + err = security_sid_to_context(peer_sid, secdata, seclen); + if (err) + return err; + + return 0; +} + + + static int selinux_sk_alloc_security(struct sock *sk, int family, gfp_t priority) { return sk_alloc_security(sk, family, priority); @@ -4338,7 +4369,8 @@ static struct security_operations selinu .socket_setsockopt = selinux_socket_setsockopt, .socket_shutdown = selinux_socket_shutdown, .socket_sock_rcv_skb = selinux_socket_sock_rcv_skb, - .socket_getpeersec = selinux_socket_getpeersec, + .socket_getpeersec_stream = selinux_socket_getpeersec_stream, + .socket_getpeersec_dgram = selinux_socket_getpeersec_dgram, .sk_alloc_security = selinux_sk_alloc_security, .sk_free_security = selinux_sk_free_security, .sk_getsid = selinux_sk_getsid_security, diff -puN security/selinux/include/xfrm.h~git-net security/selinux/include/xfrm.h --- devel/security/selinux/include/xfrm.h~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/security/selinux/include/xfrm.h 2006-03-17 23:03:48.000000000 -0800 @@ -39,6 +39,8 @@ static inline u32 selinux_no_sk_sid(stru #ifdef CONFIG_SECURITY_NETWORK_XFRM int selinux_xfrm_sock_rcv_skb(u32 sid, struct sk_buff *skb); int selinux_xfrm_postroute_last(u32 isec_sid, struct sk_buff *skb); +u32 selinux_socket_getpeer_stream(struct sock *sk); +u32 selinux_socket_getpeer_dgram(struct sk_buff *skb); #else static inline int selinux_xfrm_sock_rcv_skb(u32 isec_sid, struct sk_buff *skb) { @@ -49,6 +51,16 @@ static inline int selinux_xfrm_postroute { return NF_ACCEPT; } + +static inline int selinux_socket_getpeer_stream(struct sock *sk) +{ + return SECSID_NULL; +} + +static inline int selinux_socket_getpeer_dgram(struct sk_buff *skb) +{ + return SECSID_NULL; +} #endif #endif /* _SELINUX_XFRM_H_ */ diff -puN security/selinux/nlmsgtab.c~git-net security/selinux/nlmsgtab.c --- devel/security/selinux/nlmsgtab.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/security/selinux/nlmsgtab.c 2006-03-17 23:03:48.000000000 -0800 @@ -88,8 +88,15 @@ static struct nlmsg_perm nlmsg_xfrm_perm { XFRM_MSG_DELPOLICY, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, { XFRM_MSG_GETPOLICY, NETLINK_XFRM_SOCKET__NLMSG_READ }, { XFRM_MSG_ALLOCSPI, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_ACQUIRE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_EXPIRE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, { XFRM_MSG_UPDPOLICY, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, { XFRM_MSG_UPDSA, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_POLEXPIRE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_FLUSHSA, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_FLUSHPOLICY, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_NEWAE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_GETAE, NETLINK_XFRM_SOCKET__NLMSG_READ }, }; static struct nlmsg_perm nlmsg_audit_perms[] = diff -puN security/selinux/ss/services.c~git-net security/selinux/ss/services.c --- devel/security/selinux/ss/services.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/security/selinux/ss/services.c 2006-03-17 23:03:48.000000000 -0800 @@ -27,6 +27,7 @@ #include #include #include + #include #include "flask.h" #include "avc.h" diff -puN security/selinux/xfrm.c~git-net security/selinux/xfrm.c --- devel/security/selinux/xfrm.c~git-net 2006-03-17 23:03:46.000000000 -0800 +++ devel-akpm/security/selinux/xfrm.c 2006-03-17 23:03:48.000000000 -0800 @@ -225,6 +225,74 @@ void selinux_xfrm_state_free(struct xfrm } /* + * SELinux internal function to retrieve the context of a connected + * (sk->sk_state == TCP_ESTABLISHED) TCP socket based on its security + * association used to connect to the remote socket. + * + * Retrieve via getsockopt SO_PEERSEC. + */ +u32 selinux_socket_getpeer_stream(struct sock *sk) +{ + struct dst_entry *dst, *dst_test; + u32 peer_sid = SECSID_NULL; + + if (sk->sk_state != TCP_ESTABLISHED) + goto out; + + dst = sk_dst_get(sk); + if (!dst) + goto out; + + for (dst_test = dst; dst_test != 0; + dst_test = dst_test->child) { + struct xfrm_state *x = dst_test->xfrm; + + if (x && selinux_authorizable_xfrm(x)) { + struct xfrm_sec_ctx *ctx = x->security; + peer_sid = ctx->ctx_sid; + break; + } + } + dst_release(dst); + +out: + return peer_sid; +} + +/* + * SELinux internal function to retrieve the context of a UDP packet + * based on its security association used to connect to the remote socket. + * + * Retrieve via setsockopt IP_PASSSEC and recvmsg with control message + * type SCM_SECURITY. + */ +u32 selinux_socket_getpeer_dgram(struct sk_buff *skb) +{ + struct sec_path *sp; + + if (skb == NULL) + return SECSID_NULL; + + if (skb->sk->sk_protocol != IPPROTO_UDP) + return SECSID_NULL; + + sp = skb->sp; + if (sp) { + int i; + + for (i = sp->len-1; i >= 0; i--) { + struct xfrm_state *x = sp->x[i].xvec; + if (selinux_authorizable_xfrm(x)) { + struct xfrm_sec_ctx *ctx = x->security; + return ctx->ctx_sid; + } + } + } + + return SECSID_NULL; +} + +/* * LSM hook that controls access to unlabelled packets. If * a xfrm_state is authorizable (defined by macro) then it was * already authorized by the IPSec process. If not, then _