GIT 9ff1bf473c03f96998b5e395b02e7720132a8064 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.17.git commit d2c5bf98ebbf4ad6ab670ec0e65494b04e75ab90 Author: Arnaldo Carvalho de Melo Date: Sat Feb 25 19:24:19 2006 -0300 [ICSK]: Introduce inet_csk_ctl_sock_create Consolidating open coded sequences in tcp and dccp, v4 and v6. Signed-off-by: Arnaldo Carvalho de Melo commit 3e8d786e99c8e5abd88177763e7121341bb05d05 Author: Arnaldo Carvalho de Melo Date: Sat Feb 25 18:12:20 2006 -0300 [DCCP] ipv6: Add missing ipv6 control socket I guess I forgot to add it, nah, now it just works: 18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0) 18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code) Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both IPv6 and IPv4, so I'll come up with another way for freeing the control sockets in upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo commit c0658f19b2570dfe73979c22dd9030f89b6c2dfd Author: Arnaldo Carvalho de Melo Date: Sat Feb 25 16:59:29 2006 -0300 [DCCP]: Uninline some functions Signed-off-by: Arnaldo Carvalho de Melo commit 16106bb9b047c6a60190f11a674d804bdf1f8320 Author: Adrian Bunk Date: Sat Feb 25 11:55:02 2006 -0300 [DCCP] ipv4: make struct dccp_v4_prot static There's no reason for struct dccp_v4_prot being global. Signed-off-by: Adrian Bunk commit cd9063b7f5b1995ca318c2e03bdbf18f014bde36 Author: David S. Miller Date: Fri Feb 24 15:13:06 2006 -0800 [IPV6]: Fix some code/comment formatting in ip6_dst_output(). Signed-off-by: David S. Miller commit dbc5373e99d519625e9383f42bcea28a5226b17e Author: Robert Olsson Date: Fri Feb 24 14:04:38 2006 -0800 [IPV4]: fib_trie stats fix fib_triestats has been buggy and caused oopses some platforms as openwrt. The patch below should cure those problems. Signed-off-by: Robert Olsson Signed-off-by: David S. Miller commit 9c8f64402ddf3e6f07400c5e143ab48e4320380f Author: Robert Olsson Date: Fri Feb 24 14:03:42 2006 -0800 [IPV4]: fib_trie initialzation fix In some kernel configs /proc functions seems to be accessed before the trie is initialized. The patch below checks for this. Signed-off-by: Robert Olsson Signed-off-by: David S. Miller commit 25969a5bc700990b74770b24ff340c2c865028ce Author: Michael Chan Date: Fri Feb 24 14:02:15 2006 -0800 [TG3]: Fix tg3_get_ringparam() Fix-up tg3_get_ringparam() to return the correct parameters. Set the jumbo rx ring parameter only if it is supported by the chip and currently in use. Add missing value for tx_max_pending, noticed by Rick Jones. Update version to 3.51. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 1214fa62af2df83a10c49cfc93e664bac1f6e39e Author: Michael Chan Date: Fri Feb 24 14:01:48 2006 -0800 [TG3]: Add some missing netif_running() checks Add missing netif_running() checks in tg3's dev->set_multicast_list() and dev->set_mac_address(). If not netif_running(), these 2 calls can simply return 0 after storing the new settings if required. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 52ec758591463802143e24b2338057532d9c5ec7 Author: John Heffner Date: Fri Feb 24 13:58:53 2006 -0800 [TCP] mtu probing: move tcp-specific data out of inet_connection_sock This moves some TCP-specific MTU probing state out of inet_connection_sock back to tcp_sock. Signed-off-by: John Heffner Signed-off-by: David S. Miller commit a52e3f584da47b2a464423cfc5f2424f485cadb1 Author: Benjamin LaHaise Date: Fri Feb 24 13:53:15 2006 -0800 [AF_UNIX]: scm: better initialization Instead of doing a memset then initialization of the fields of the scm structure, just initialize all the members explicitly. Prevent reloading of current on x86 and x86-64 by storing the value in a local variable for subsequent dereferences. This is worth a ~7KB/s increase in af_unix bandwidth. Note that we avoid the issues surrounding potentially uninitialized members of the ucred structure by constructing a struct ucred instead of assigning the members individually, which forces the compiler to zero any padding. Signed-off-by: Benjamin LaHaise Signed-off-by: David S. Miller commit 8e682a5fda48a47ceab9172b41ed29e244d10519 Author: Benjamin LaHaise Date: Fri Feb 24 13:52:42 2006 -0800 [AF_UNIX]: use shift instead of integer division The patch below replaces a divide by 2 with a shift -- sk_sndbuf is an integer, so gcc emits an idiv, which takes 10x longer than a shift by 1. This improves af_unix bandwidth by ~6-10K/s. Also, tidy up the comment to fit in 80 columns while we're at it. Signed-off-by: Benjamin LaHaise Signed-off-by: David S. Miller commit dbccc021e8cb9ba89449b03dfa1b9e000a98b1d3 Author: Jörn Engel Date: Fri Feb 24 13:32:25 2006 -0800 [NET]: Uninline kfree_skb and allow NULL argument o Uninline kfree_skb, which saves some 15k of object code on my notebook. o Allow kfree_skb to be called with a NULL argument. Subsequent patches can remove conditional from drivers and further reduce source and object size. Signed-off-by: Jörn Engel Signed-off-by: David S. Miller commit 42b30504aadc0e1e45eb915573e1d2c07d133b00 Author: Arnaldo Carvalho de Melo Date: Fri Feb 24 13:26:05 2006 -0300 [LLC]: Fix sap refcounting Thanks to Leslie Harlley Watter for reporting the problem an testing this patch. Signed-off-by: Arnaldo Carvalho de Melo commit bac4eba541726216405d373edae1b285de230e0d Author: Arnaldo Carvalho de Melo Date: Fri Feb 24 10:44:06 2006 -0300 [LLC]: Replace __inline__ with inline Signed-off-by: Arnaldo Carvalho de Melo commit 939178799c5c65687d2595a64c388d82dd1479c3 Author: Arnaldo Carvalho de Melo Date: Fri Feb 24 10:27:18 2006 -0300 [LLC]: Fix struct proto .name Cut'n'paste error from ddp_proto. Signed-off-by: Arnaldo Carvalho de Melo commit 02be8ec8fe4ee08e98e43fcff6aadc29a90537cb Author: Arthur Kepner Date: Thu Feb 23 17:14:41 2006 -0800 [NET] pktgen: Fix races between control/worker threads. There's a race in pktgen which can lead to a double free of a pktgen_dev's skb. If a worker thread is in the midst of doing fill_packet(), and the controlling thread gets a "stop" message, the already freed skb can be freed once again in pktgen_stop_device(). This patch gives all responsibility for cleaning up a pktgen_dev's skb to the associated worker thread. Signed-off-by: Arthur Kepner Acked-by: Robert Olsson Signed-off-by: David S. Miller commit a6e49a0e4435714fabc8012084eb5e5c7895f8d3 Author: Jamal Hadi Salim Date: Thu Feb 23 16:26:25 2006 -0800 [XFRM]: Rearrange struct xfrm_aevent_id for better compatibility. struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly. Based upon suggestions from Yoshifuji. Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 63d1742ba27197d1678bd5cc05cfa8a1fde3c6d4 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 17:33:26 2006 -0300 [DCCP]: Move the IPv4 specific bits from proto.c to ipv4.c With this patch in place we can break down the complexity by better compartmentalizing the code that is common to ipv6 and ipv4. Now we have these modules: Module Size Used by dccp_diag 1344 0 inet_diag 9448 1 dccp_diag dccp_ccid3 15856 0 dccp_tfrc_lib 12320 1 dccp_ccid3 dccp_ccid2 5764 0 dccp_ipv4 16996 2 dccp 48208 4 dccp_diag,dccp_ccid3,dccp_ccid2,dccp_ipv4 dccp_ipv6 still requires dccp_ipv4 due to dccp_ipv6_mapped, that is the next target to work on the "hey, ipv4 is legacy, I only want ipv6 dude!" direction. Signed-off-by: Arnaldo Carvalho de Melo commit 2250710f2979bf1e47bba2f6e3fcadfd9a686071 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 15:51:53 2006 -0300 [DCCP]: Rename init_dccp_v4_mibs to dccp_mib_init And introduce dccp_mib_exit grouping previously open coded sequence. Signed-off-by: Arnaldo Carvalho de Melo commit dd931ecebe9c97b8a75ac0b720c0da5f5c452408 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 15:31:22 2006 -0300 [DCCP]: Move dccp_hashinfo from ipv4.c to the core As it is used by both ipv4 and ipv6. Signed-off-by: Arnaldo Carvalho de Melo commit e9ec1a839d3fe9f2ec5e864a094cf04b6c511c03 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 15:19:57 2006 -0300 [DCCP]: Dont use dccp_v4_checksum in dccp_make_response dccp_make_response is shared by ipv4/6 and the ipv6 code was recalculating the checksum, not good, so move the dccp_v4_checksum call to dccp_v4_send_response. Signed-off-by: Arnaldo Carvalho de Melo commit 5f50d3a8059644b039b19a43200800783658e130 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 14:12:33 2006 -0300 [DCCP]: Move dccp_[un]hash from ipv4.c to the core As this is used by both ipv4 and ipv6 and is not ipv4 specific. Signed-off-by: Arnaldo Carvalho de Melo commit 8f62e07749b62f28242dc9fb7db5c0531d33401e Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 13:19:18 2006 -0300 [DCCP]: Move dccp_v4_{init,destroy}_sock to the core Removing one more ipv6 uses ipv4 stuff case in dccp land. Signed-off-by: Arnaldo Carvalho de Melo commit f6d02dcee746dd116bcdf3902d40eeab21e3b517 Author: Arnaldo Carvalho de Melo Date: Thu Feb 23 13:09:11 2006 -0300 [DCCP]: Generalize dccp_v4_send_reset Renaming it to dccp_send_reset and moving it from the ipv4 specific code to the core dccp code. This fixes some bugs in IPV6 where timers would send v4 resets, etc. Signed-off-by: Arnaldo Carvalho de Melo commit 961cf69419702b3f5132e8d76a948e4e8d5bad8e Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 17:13:10 2006 -0300 [DCCP] feat: Introduce sysctls for the default features [root@qemu ~]# for a in /proc/sys/net/dccp/default/* ; do echo $a ; cat $a ; done /proc/sys/net/dccp/default/ack_ratio 2 /proc/sys/net/dccp/default/rx_ccid 3 /proc/sys/net/dccp/default/send_ackvec 1 /proc/sys/net/dccp/default/send_ndp 1 /proc/sys/net/dccp/default/seq_window 100 /proc/sys/net/dccp/default/tx_ccid 3 [root@qemu ~]# So if wanting to test ccid3 as the tx CCID one can just do: [root@qemu ~]# echo 3 > /proc/sys/net/dccp/default/tx_ccid [root@qemu ~]# echo 2 > /proc/sys/net/dccp/default/rx_ccid [root@qemu ~]# cat /proc/sys/net/dccp/default/[tr]x_ccid 2 3 [root@qemu ~]# Of course we also need the setsockopt for each app to tell its preferences, but for testing or defining something other than CCID2 as the default for apps that don't explicitely set their preference the sysctl interface is handy. Signed-off-by: Arnaldo Carvalho de Melo commit 73bfe8f9ec4f0700c8f8d5020c254f5ee16b8b9a Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 16:58:15 2006 -0300 [DCCP]: Call dccp_feat_init more early in dccp_v4_init_sock So that dccp_feat_clean doesn't get confused with uninitialized list_heads. Noticed when testing with no ccid kernel modules. Signed-off-by: Arnaldo Carvalho de Melo commit f67f3ad307a67770ce6488919a6a7305fcc046ed Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 16:54:28 2006 -0300 [DCCP]: Kconfig tidy up Make CCID2 and CCID3 default to what was selected for DCCP and use the standard short description for the CCIDs (TCP-Like & TCP-Friendly). Signed-off-by: Arnaldo Carvalho de Melo commit 5af31380b3ba53b4e315f99add31022a35925233 Author: Arnaldo Carvalho de Melo Date: Wed Feb 22 16:51:15 2006 -0300 [DCCP]: Make CCID2 be the default As per the draft. This fixes the build when netfilter dccp components are built and dccp isn't. Thanks to Reuben Farrelly for reporting this. The following changesets will introduce /proc/sys/net/dccp/defaults/ to give more flexibility to DCCP developers and testers while apps doesn't use setsockopt to specify the desired CCID, etc. Signed-off-by: Arnaldo Carvalho de Melo commit 345ac9577343813fb94d3b88ceff4f50b539ec27 Author: Al Viro Date: Mon Feb 20 16:19:58 2006 -0300 [DCCP]: sparse endianness annotations This also fixes the layout of dccp_hdr short sequence numbers, problem was not fatal now as we only support long (48 bits) sequence numbers. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Al Viro commit a7df1f7ff2828bb625afc6723b50ced1b5e4ad57 Author: Patrick McHardy Date: Mon Feb 20 20:22:03 2006 -0800 [NETFILTER]: Fix skb->nf_bridge lifetime issues The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips the real hook by registering with high priority and returning NF_STOP if skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not set. The flag is only set during the simulated hook. Because skb->nf_bridge is only freed when the packet is destroyed, the packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but in the case of tunnel devices on top of the bridge also all further ones. Forwarded packets from a bridge encapsulated by a tunnel device and sent as locally outgoing packet will also still have the incorrect bridge information from the input path attached. We already have nf_reset calls on all RX/TX paths of tunnel devices, so simply reset the nf_bridge field there too. As an added bonus, the bridge information for locally delivered packets is now also freed when the packet is queued to a socket. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit a5e093c55afcfd9f668651b55d43461ba7b1b851 Author: Andrea Bittau Date: Mon Feb 20 09:27:51 2006 -0300 [DCCP] feat: Actually change the CCID upon negotiation Change the CCID upon successful feature negotiation. Commiter note: patch mostly rewritten to use the new ccid API. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit 5e623d6fbc5176f32d5027f51b121a806bfe6487 Author: Arnaldo Carvalho de Melo Date: Mon Feb 20 09:25:22 2006 -0300 [DCCP] CCID: Improve CCID infrastructure 1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit} does and anynways neither ccid2 nor ccid3 were using it. 2. Rename struct ccid to struct ccid_operations and introduce struct ccid with a pointer to ccid_operations and rigth after it the rx or tx private state. 3. Remove the pointer to the state of the half connections from struct dccp_sock, now its derived thru ccid_priv() from the ccid pointer. Now we also can implement the setsockopt for changing the CCID easily as no ccid init routines can affect struct dccp_sock in any way that prevents other CCIDs from working if a CCID switch operation is asked by apps. Signed-off-by: Arnaldo Carvalho de Melo commit 31ad519de75284ed2be4d16b69a707285a7ec2a5 Author: Patrick McHardy Date: Sun Feb 19 21:05:35 2006 -0800 [PKT_SCHED]: Convert sch_red to a classful qdisc Convert sch_red to a classful qdisc. All qdiscs that maintain accurate backlog counters are eligible as child qdiscs. When a queue limit larger than zero is given, a bfifo qdisc is used for backwards compatibility. Current versions of tc enforce a limit larger than zero, other users can avoid creating the default qdisc by using zero. Signed-off-by: Patrick McHardy Acked-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 1c35c2457f3133fc832413e5df50d77a55907708 Author: David S. Miller Date: Sun Feb 19 10:12:37 2006 -0800 [XFRM]: Add some missing exports. To fix the case of modular xfrm_user. Signed-off-by: David S. Miller commit 203710adc09bcbc9b0e46e311364e6021b68f155 Author: David S. Miller Date: Sun Feb 19 10:09:00 2006 -0800 [XFRM]: Move xfrm_nl to xfrm_state.c from xfrm_user.c xfrm_user could be modular, and since generic code uses this symbol now... Signed-off-by: David S. Miller commit 67649c6cd4ce80a04d672152f43eef64860b8e50 Author: David S. Miller Date: Sun Feb 19 01:13:47 2006 -0800 [XFRM]: Make sure xfrm_replay_timer_handler() is declared early enough. Signed-off-by: David S. Miller commit 3d8caf7e9d1b5fa8542e1c43f2b67130e361c34c Author: Jamal Hadi Salim Date: Sun Feb 19 00:55:24 2006 -0800 [IPSEC]: Sync series - update selinux Add new netlink messages to selinux framework Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 5d8c9079e5eb7d507c73121e1ae5050167bf181b Author: Jamal Hadi Salim Date: Sun Feb 19 00:54:23 2006 -0800 [IPSEC]: Sync series - policy expires This is similar to the SA expire insertion patch - only it inserts expires for SP. Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit cde7fdafab9f98f0f2c79b54ded775a14bd129f8 Author: Jamal Hadi Salim Date: Sun Feb 19 00:53:43 2006 -0800 [IPSEC]: Sync series - SA expires This patch allows a user to insert SA expires. This is useful to do on an HA backup for the case of byte counts but may not be very useful for the case of time based expiry. Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 9fe4df3c68d627def84058bb1555e00fc8484c33 Author: Jamal Hadi Salim Date: Sun Feb 19 00:53:06 2006 -0800 [IPSEC]: Sync series - acquire insert This introduces a feature similar to the one described in RFC 2367: " ... the application needing an SA sends a PF_KEY SADB_ACQUIRE message down to the Key Engine, which then either returns an error or sends a similar SADB_ACQUIRE message up to one or more key management applications capable of creating such SAs. ... ... The third is where an application-layer consumer of security associations (e.g. an OSPFv2 or RIPv2 daemon) needs a security association. Send an SADB_ACQUIRE message from a user process to the kernel. The kernel returns an SADB_ACQUIRE message to registered sockets. The user-level consumer waits for an SADB_UPDATE or SADB_ADD message for its particular type, and then can use that association by using SADB_GET messages. " An app such as OSPF could then use ipsec KM to get keys Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 394a2d518c1937f98b739c8bc67fe89db2e8c102 Author: Jamal Hadi Salim Date: Sun Feb 19 00:52:27 2006 -0800 [IPSEC]: Sync series - user Add xfrm as the user of the core changes Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 506c722e70a7a309389d4676ad7f641c61c962ac Author: Jamal Hadi Salim Date: Sun Feb 19 00:51:34 2006 -0800 [IPSEC]: Sync series - fast path Fast path sequence updates that will generate ipsec async events Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 1ac02b5a2028c5b63ed11d90fac1ccfd98f28115 Author: Jamal Hadi Salim Date: Sun Feb 19 00:50:28 2006 -0800 [IPSEC]: Sync series - core changes This patch provides the core functionality needed for sync events for ipsec. Derived work of Krisztian KOVACS Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 1eae9521c702c12125a7dce47ba5a77bcdc1e460 Author: Patrick McHardy Date: Sun Feb 19 00:39:44 2006 -0800 [PKT_SCHED]: Keep backlog counter in sch_sfq Keep backlog counter in SFQ qdisc to make it usable as child qdisc with RED. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 04a391f4cdd7003b94a9e89f5508a0c3fe617f7b Author: Patrick McHardy Date: Sun Feb 19 00:39:15 2006 -0800 [PKT_SCHED]: Restore TBF change semantic When TBF was converted to a classful qdisc, the semantic of the limit parameter was broken. On initilization an inner bfifo qdisc is created for backwards compatibility, when changing parameters however the new limit is ignored and the current child qdisc remains in place. Always replace the child qdisc by the default bfifo when limit is above zero, otherwise don't touch the inner qdisc. Current tc version enforce a limit above zero, other users can avoid creating the inner qdisc by using zero. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit d50a9b20229435a3113193f3f3d73f45c370a407 Author: Patrick McHardy Date: Sun Feb 19 00:38:50 2006 -0800 [PKT_SCHED]: Dump child qdisc handle in sch_{atm,dsmark} A qdisc should set tcm_info to the child qdisc handle in its class dump function. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 353b57554ae0889962b456c1c47634e995ad70a6 Author: Patrick McHardy Date: Sun Feb 19 00:38:24 2006 -0800 [PKT_SCHED]: Qdisc drop operation is optional The drop operation is optional and qdiscs must check if childs support it. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit b8587eb63983b80165aed60dfde66ccb3ffe0074 Author: Christophe Lucas Date: Sun Feb 19 00:36:52 2006 -0800 [IRDA]: pci_register_driver conversion This patch converts 2 IrDA drivers pci_module_init() calls to pci_register_driver(). Signed-off-by: Christophe Lucas Signed-off-by: Domen Puncer Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit 1d837ebf3b33cdbaab0484dc574f6e480ba8026f Author: David chosrova Date: Sun Feb 19 00:36:16 2006 -0800 [IRDA]: sti/cli removal from EP7211 IrDA driver This patch replaces the deprecated sti/cli routines with the corresponding spin_lock ones. Signed-off-by: David chosrova Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit 7fd05bafa584f19dd1d1a7101b1afd17f2585ea0 Author: Jean Tourrilhes Date: Sun Feb 19 00:35:35 2006 -0800 [IRDA]: nsc-ircc: support for yet another Thinkpad IrDA chipset This patch simply adds support for a variation of the nsc-ircc PC8739x chipset, found in some IBM Thinkpad laptops. Signed-off-by: Jean Tourrilhes Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit 8d786aff9166b7d36ccadaf415573419ca4a4bd8 Author: Dmitry Torokhov Date: Sun Feb 19 00:35:02 2006 -0800 [IRDA]: nsc-ircc: PM update This patch brings the nsc-ircc code to a more up to date power management scheme, following the current device model. Signed-off-by: Dmitry Torokhov Signed-off-by: Rudolf Marek Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit 7de87069f935fb3c98b29700a1b4a747d020cb45 Author: Jean Tourrilhes Date: Sun Feb 19 00:34:12 2006 -0800 [IRDA]: nsc-ircc: ISAPnP support This enables PnP support for the nsc-ircc chipset. Since we can't fetch the chipset cfg_base from the PnP layer, we just use the PnP information as one more hint when probing the chip. Signed-off-by: Jean Tourrilhes Signed-off-by: Samuel Ortiz Signed-off-by: David S. Miller commit e759dabc0710304deb57ce7b50288e0d3dfcfab2 Author: Patrick McHardy Date: Sun Feb 19 00:32:45 2006 -0800 [NETLINK]: Add netlink_has_listeners for avoiding unneccessary event message generation Keep a bitmask of multicast groups with subscribed listeners to let netlink users check for listeners before generating multicast messages. Queries don't perform any locking, which may result in false positives, it is guaranteed however that any new subscriptions are visible before bind() or setsockopt() return. Signed-off-by: Patrick McHardy ACKed-by: Jamal Hadi Salim Signed-off-by: David S. Miller commit 0a3377ffad696297265a09e603a9b1c512895564 Author: Patrick McHardy Date: Sun Feb 19 00:31:19 2006 -0800 [NETFILTER]: ctnetlink: avoid unneccessary event message generation Avoid unneccessary event message generation by checking for netlink listeners before building a message. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 0e4e3691ce7cdb8efdfa6692c43b4191f4470a0d Author: Patrick McHardy Date: Sun Feb 19 00:30:58 2006 -0800 [NETFILTER]: x_tables: replace IPv4/IPv6 policy match by address family independant version Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit b4a7aaf39126486759c8a4ddb743a16d56c2d23d Author: Patrick McHardy Date: Sun Feb 19 00:30:38 2006 -0800 [NETFILTER]: Move ip6_masked_addrcmp to include/net/ipv6.h Replace netfilter's ip6_masked_addrcmp by a more efficient version in include/net/ipv6.h to make it usable without module dependencies. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit e5763697bdcf569fda7d5a35a0e4b11178ef11e8 Author: Patrick McHardy Date: Sun Feb 19 00:30:16 2006 -0800 [NETFILTER]: x_tables: add xt_{match,target} arguments to match/target functions Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 28b107c9aba97c8a023862bf2da7255fe976d608 Author: Patrick McHardy Date: Sun Feb 19 00:29:54 2006 -0800 [NETFILTER]: x_tables: pass registered match/target data to match/target functions This allows to make decisions based on the revision (and address family with a follow-up patch) at runtime. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 62e2fae6781628c96d17bf7f5b97d6d202d27b64 Author: Patrick McHardy Date: Sun Feb 19 00:29:16 2006 -0800 [NETFILTER]: Convert x_tables matches/targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 332127d19bd8de2fada1ce8a14ab3d9936430a88 Author: Patrick McHardy Date: Sun Feb 19 00:28:57 2006 -0800 [NETFILTER]: Convert ip6_tables matches/targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 195390866170632d75bba51c8b3253db98a6220b Author: Patrick McHardy Date: Sun Feb 19 00:28:39 2006 -0800 [NETFILTER]: Convert arp_tables targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 23305f3ade23a735ec4a258c62abc00ab29a8b93 Author: Patrick McHardy Date: Sun Feb 19 00:28:17 2006 -0800 [NETFILTER]: Convert ip_tables matches/targets to centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 822ae5809111eca0d7eb51309db9ad630daeb9cd Author: Patrick McHardy Date: Sun Feb 19 00:26:53 2006 -0800 [NETFILTER]: Change {ip,ip6,arp}_tables to use centralized error checking Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 852dd3eddda2c37b3843ddd0086b94c1a7a95b5c Author: Patrick McHardy Date: Sun Feb 19 00:26:26 2006 -0800 [NETFILTER]: xt_tables: add centralized error checking Introduce new functions for common match/target checks (private data size, valid hooks, valid tables and valid protocols) to get more consistent error reporting and to avoid each module duplicating them. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit b7e0cd9722f8a74725a1467cf7163da047fabbda Author: Yasuyuki Kozakai Date: Sun Feb 19 00:25:52 2006 -0800 [NETFILTER]: nf_conntrack: use ipv6_addr_equal in nf_ct_reasm Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit ff2fb630d4ac26ed18bb12cd7e412c69730eae15 Author: Holger Eitzenberger Date: Sun Feb 19 00:25:18 2006 -0800 [NETFILTER]: Fix CID offset bug in PPTP NAT helper debug message The recent (kernel 2.6.15.1) fix for PPTP NAT helper introduced a bug - which only appears if DEBUGP is enabled though. The calculation of the CID offset into a PPTP request struct is not correct, so that at least not the correct CID is displayed if DEBUGP is enabled. This patch corrects CID offset calculation and introduces a #define for that. Signed-off-by: Holger Eitzenberger Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit e4c057af2d0a09700e82331a560ec45a51d6af2e Author: Andrea Bittau Date: Wed Feb 15 10:28:17 2006 -0200 [DCCP] CCID2: Drop sock reference count on timer expiration and reset. There was a hybrid use of standard timers and sk_timers. This caused the reference count of the sock to be incorrect when resetting the RTO timer. The sock reference count should now be correct, enabling its destruction, and allowing the DCCP module to be unloaded. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit 798caaf5a31f17e30a99e142abf8d1bfee953d39 Author: Arnaldo Carvalho de Melo Date: Wed Feb 15 09:41:42 2006 -0200 [DCCP]: Set the default CCID according to kernel config selection Now CCID2 is the default, as stated in the RFC drafts, but we allow a config where just CCID3 is built, where CCID3 becomes the default. Signed-off-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo commit 41e4b49a78f8e91692cdd0fa8f673e7698a7be75 Author: Harald Welte Date: Mon Feb 13 02:35:31 2006 -0800 [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn' This patch moves all helper related data fields of 'struct nf_conn' into a separate structure 'struct nf_conn_help'. This new structure is only present in conntrack entries for which we actually have a helper loaded. Also, this patch cleans up the nf_conntrack 'features' mechanism to resemble what the original idea was: Just glue the feature-specific data structures at the end of 'struct nf_conn', and explicitly re-calculate the pointer to it when needed rather than keeping pointers around. Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack is 276 bytes. We still need to save another 20 bytes in order to fit into to target of 256bytes. Signed-off-by: Harald Welte Signed-off-by: David S. Miller commit 2e8c03dca56c8686d06ec822cfb0c6133f14bfe8 Author: Michael Chan Date: Fri Feb 10 14:48:13 2006 -0800 [BNX2]: include Include so that it compiles properly on all archs. Update version to 1.4.38. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 21807154451f76b93a3e627419acd9a63cc1cc9a Author: John Heffner Date: Sun Feb 5 22:14:53 2006 -0800 [TCP]: MTU probing Implementation of packetization layer path mtu discovery for TCP, based on the internet-draft currently found at . Signed-off-by: John Heffner Signed-off-by: David S. Miller commit a396e61063a109e8adf224854aa4d104b9f7dbcd Author: Michael Chan Date: Sun Feb 5 18:10:56 2006 -0800 [BNX2]: Update version Update version to 1.4.37. Add missing flush_scheduled_work() in bnx2_suspend as noted by Jeff Garzik. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 95aec3a11de9abf5553e951dadc1ef30b9e4b856 Author: Michael Chan Date: Sun Feb 5 18:10:29 2006 -0800 [BNX2]: Support larger rx ring sizes (part 2) Support bigger rx ring sizes (up to 1020) in the rx fast path. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 3dc647b13a9320ee90a2d609b359f0d466899dac Author: Michael Chan Date: Sun Feb 5 18:10:01 2006 -0800 [BNX2]: Support larger rx ring sizes (part 1) Increase maximum receive ring size from 255 to 1020 by supporting up to 4 linked pages of receive descriptors. To accomodate the higher memory usage, each physical descriptor page is allocated separately and the software ring that keeps track of the SKBs and the DMA addresses is allocated using vmalloc. Some of the receive-related fields in the bp structure are re- organized a bit for better locality of reference. The max. was reduced to 1020 from 4080 after discussion with David Miller. This patch contains ring init code changes only. This next patch contains rx data path code changes. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit bbc2f2428b489da8677ee9a342840234113b725e Author: Michael Chan Date: Sun Feb 5 18:09:29 2006 -0800 [BNX2]: Fix bug when rx ring is full Fix the rx code path that does not handle the full rx ring correctly. When the rx ring is set to the max. size (i.e. 255), the consumer and producer indices will be the same when completing an rx packet. Fix the rx code to handle this condition properly. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit dce40ae88174c73741aee35ae4505aa21908b111 Author: Michael Chan Date: Sun Feb 5 18:08:55 2006 -0800 [BNX2]: Add ethtool -d support Add ETHTOOL_GREGS support. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 59e4437baaddec83762861b8ee0eaee73ef3b5cc Author: Michael Chan Date: Sun Feb 5 18:08:16 2006 -0800 [BNX2]: Reduce register test size Eliminate some of the registers in ethtool register test to reduce driver size. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 6ee936d1986cd52b8aad2334c43f52ea43516628 Author: Michael Chan Date: Sun Feb 5 16:30:20 2006 -0800 [TG3]: Update version and reldate Update version to 3.50. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 9bbd18a42128c863cc0fcae0a26d13a23525a181 Author: Michael Chan Date: Sun Feb 5 16:27:37 2006 -0800 [TG3]: Support shutdown WoL. Support WoL during shutdown by calling tg3_set_power_state(tp, PCI_D3hot) during tg3_close(). Change the power state parameter to pci_power_t type and use constants defined in pci.h. Certain ethtool operations cannot be performed after tg3_close() because the device will go to low power state. Add return -EAGAIN in such cases where appropriate. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 97b6aba3dbad9b8d651ca56d9f3e181807fe51b2 Author: Michael Chan Date: Sun Feb 5 16:27:02 2006 -0800 [TG3]: Enable TSO by default Enable TSO by default on newer chips that support TSO in hardware. Leave TSO off by default on older chips that do firmware TSO because performance is slightly lower. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit 2b9ddace7fdfcfa8aba14a28ef208b155207cc26 Author: Michael Chan Date: Sun Feb 5 16:26:37 2006 -0800 [TG3]: Add support for 5714S and 5715S Add support for 5714S and 5715S. Signed-off-by: Michael Chan Signed-off-by: David S. Miller commit e53816125327a194aa6a4150bb8ed93dbc9acfd5 Author: Adrian Bunk Date: Sat Feb 4 15:51:24 2006 -0800 [IPV4] fib_rules.c: make struct fib_rules static again struct fib_rules became global for no good reason. Signed-off-by: Adrian Bunk Signed-off-by: David S. Miller commit df6a839dffd3dadac13ca2cab1f3c4183a113564 Author: Jesper Juhl Date: Sat Feb 4 15:50:27 2006 -0800 [IPCOMP6]: don't check vfree() argument for NULL. vfree does it's own NULL checking, so checking a pointer before handing it to vfree is pointless. Signed-off-by: Jesper Juhl Signed-off-by: David S. Miller commit 0668d0c5d129b8e4aa0895ca1696dcdd655266d5 Author: Andrea Bittau Date: Sat Feb 4 19:10:17 2006 -0200 [DCCP]: Initial feature negotiation implementation Still needs more work, but boots and doesn't crashes, even does some negotiation! 18:38:52.174934 127.0.0.1.43458 > 127.0.0.1.5001: request 18:38:52.218526 127.0.0.1.5001 > 127.0.0.1.43458: response 18:38:52.185398 127.0.0.1.43458 > 127.0.0.1.5001: :-) Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit 8450395bae4c1842289f267bf71a0c99b8bfacb8 Author: Andrea Bittau Date: Fri Feb 3 15:56:41 2006 -0200 [DCCP] CCID2: Initial CCID2 (TCP-Like) implementation Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several issues on the merge process. For now CCID2 was turned the default for all SOCK_DCCP connections, but this will be remedied soon with the merge of the feature negotiation code. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit ac705501dc137efb26347a19d68aeded84aff555 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:49:14 2006 -0200 [DCCP] CCID3: Set the no_feedback_timer fields near init_timer Signed-off-by: Arnaldo Carvalho de Melo commit 62fa971c4921ee30af849bbaa10b979f8fc923fd Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:30:45 2006 -0200 [DCCP]: Don't alloc ack vector for the control sock Signed-off-by: Arnaldo Carvalho de Melo commit 9aa9a4d3f5ef607a914921953826e47b723ea106 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:28:41 2006 -0200 [DCCP] ackvec: Delete all the ack vector records in dccp_ackvec_free Signed-off-by: Arnaldo Carvalho de Melo commit ab03808d9ef08f3dc517b0ebebd8fa314c2d2725 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 15:23:55 2006 -0200 [DCCP] CCID: Allow ccid_{init,exit} to be NULL Testing if the ccid being instantiated has these methods in ccid_init(). Signed-off-by: Arnaldo Carvalho de Melo commit 762b6fc32ff7992cd66ce5ca3f6c876975e941f5 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 11:04:36 2006 -0200 [DCCP] ackvec: Introduce ack vector records Based on a patch by Andrea Bittau. Signed-off-by: Andrea Bittau Signed-off-by: Arnaldo Carvalho de Melo commit cbcbedbbadd0090d1b4f1d55dd6896cda28a0449 Author: Arnaldo Carvalho de Melo Date: Fri Feb 3 11:00:59 2006 -0200 [LIST]: Introduce list_for_each_entry_from For iterating over list of given type continuing from existing point. Signed-off-by: Arnaldo Carvalho de Melo commit 9e6820f94e9777ec8e6be72664d843a4dc20f93d Author: Robert Olsson Date: Thu Feb 2 17:31:35 2006 -0800 [IPV4]: Use RCU locking in fib_rules. Signed-off-by: Robert Olsson Signed-off-by: David S. Miller commit eaec3ad79802f1b3fce92eea68327e683fff7a00 Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 18:15:14 2006 -0200 [LIST]: Introduce list_for_each_entry_safe_from For iterate over list of given type from existing point safe against removal of list entry. Signed-off-by: Arnaldo Carvalho de Melo commit 5182a76568ca16adbe315aeeb24efe0e7ade325c Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 13:20:15 2006 -0200 [DCCP] ackvec: Introduce dccp_ackvec_slab Signed-off-by: Arnaldo Carvalho de Melo commit 488b7a09d1a53d20fb32be5b15acb1ec45833dab Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 11:52:30 2006 -0200 [DCCP]: Fix error handling in dccp_init Signed-off-by: Arnaldo Carvalho de Melo commit c2c6a911df6c7c6feb241dd8f51759d06ad3ee03 Author: Arnaldo Carvalho de Melo Date: Thu Feb 2 10:05:41 2006 -0200 [DCCP] ackvec: Ditch dccpav_buf_len Simplifying the code a bit as we're always using DCCP_MAX_ACKVEC_LEN. Signed-off-by: Arnaldo Carvalho de Melo commit 9e8c91b3e9a9e681baf3eb66ea43e3437c3967de Author: David S. Miller Date: Wed Feb 1 00:39:07 2006 -0800 [NET] socket: Use put_filp not fput. Another conversion error in the sock_map_fd() splitup for sys_accept(). Based upon a report by Andrew Morton. Signed-off-by: David S. Miller commit 966e454e7186a84966b47415c24fa709e2c3087a Author: Harald Welte Date: Mon Jan 30 15:50:09 2006 -0800 [NETFILTER] nfnetlink_log: add sequence numbers for log events By using a sequence number for every logged netfilter event, we can determine from userspace whether logging information was lots somewhere downstream. The user has a choice of either having per-instance local sequence counters, or using a global sequence counter, or both. Signed-off-by: Harald Welte Signed-off-by: David S. Miller commit a360cae554589462dff55c2e3bde972d66429507 Author: Harald Welte Date: Mon Jan 30 15:49:48 2006 -0800 [NETFILTER] NAT sequence adjustment: Save eight bytes per conntrack This patch reduces the size of 'struct ip_conntrack' on systems with NAT by eight bytes. The sequence number delta values can be int16_t, since we only support one sequence number modification per window anyway, and one such modification is not going to exceed 32kB ;) Signed-off-by: Harald Welte Signed-off-by: David S. Miller commit 7273b53f99e5865641b792dcd53814df22c3f5a8 Author: David S. Miller Date: Sun Jan 29 21:08:25 2006 -0800 [NET] sys_accept: Pass correct socket to sock_attach_fd(). Also, make sure "filep" is assigned to on every path through sock_alloc_fd(). Thanks to Andrew Morton for the OOPS trace. Signed-off-by: David S. Miller commit 8c9a5435e9bb00980576a82d24b15447a4d2afae Author: David S. Miller Date: Thu Jan 26 20:45:15 2006 -0800 [NET]: Do not lose accepted socket when -ENFILE/-EMFILE. Try to allocate the struct file and an unused file descriptor before we try to pull a newly accepted socket out of the protocol layer. Based upon a patch by Prassana Meda. Signed-off-by: David S. Miller commit 715bf1bb2bea5c0174a063e5971037c44f027e1a Author: Patrick McHardy Date: Tue Jan 24 12:43:55 2006 -0800 [NET]: Reduce size of struct sk_buff on 64 bit architectures Move skb->nf_mark next to skb->tc_index to remove a 4 byte hole between skb->nfmark and skb->nfct and another one between skb->users and skb->head when CONFIG_NETFILTER, CONFIG_NET_SCHED and CONFIG_NET_CLS_ACT are enabled. For all other combinations the size stays the same. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit cc98b10c8e3184640a3c6f36f1c090260143f58f Author: Stefan Rompf Date: Fri Jan 20 02:56:51 2006 -0800 [VLAN]: translate IF_OPER_DORMANT to netif_dormant_on() this patch adds support to the VLAN driver to translate IF_OPER_DORMANT of the underlying device to netif_dormant_on(). Beside clean state forwarding, this allows running independant userspace supplicants on both the real device and the stacked VLAN. It depends on my RFC2863 patch. Signed-off-by: Stefan Rompf Signed-off-by: David S. Miller commit a7834a7bcad1ca50a17da9131c629bd5caae5a06 Author: Stefan Rompf Date: Fri Jan 20 02:55:48 2006 -0800 [NET] core: add RFC2863 operstate this patch adds a dormant flag to network devices, RFC2863 operstate derived from these flags and possibility for userspace interaction. It allows drivers to signal that a device is unusable for user traffic without disabling queueing (and therefore the possibility for protocol establishment traffic to flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable without changes to the driver. It is the result of our long discussion. However I must admit that it represents what Jamal and I agreed on with compromises towards Krzysztof, but Thomas and Krzysztof still disagree with some parts. Anyway I think it should be applied. Signed-off-by: Stefan Rompf Signed-off-by: David S. Miller commit e3a7f7805ce5a70dfa0b0879a67d7c27fdf9f879 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:57 2006 +0900 [IPV6]: ROUTE: Ensure to accept redirects from nexthop for the target. It is possible to get redirects from nexthop of "more-specific" routes. Signed-off-by: YOSHIFUJI Hideaki commit 1900e1052ef6d1d8f208496f0a98b41917115a69 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:55 2006 +0900 [IPV6]: ROUTE: Add accept_ra_rt_info_max_plen sysctl. Signed-off-by: YOSHIFUJI Hideaki commit ad535fbda72f9248127982be1d6ef9283414e5d3 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:54 2006 +0900 [IPV6]: ROUTE: Flag RTF_DEFAULT for Route Infomation for ::/0. Signed-off-by: YOSHIFUJI Hideaki commit 4185dadc33cfc5904a0ce69543a2cf244c36e56a Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:53 2006 +0900 [IPV6]: ROUTE: Add experimental support for Route Information Option in RA (RFC4191). Signed-off-by: YOSHIFUJI Hideaki commit ba1a02a4d8213ea0030f95aae387c58e9f075522 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:51 2006 +0900 [IPV6]: ROUTE: Add router_probe_interval sysctl. Signed-off-by: YOSHIFUJI Hideaki commit 6e1648ff33a5d5188b42f0fd823568f18e7602b6 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:50 2006 +0900 [IPV6]: ROUTE: Add accept_ra_rtr_pref sysctl. Signed-off-by: YOSHIFUJI Hideaki commit 29bf2884af7dd684e211316c0827a09496270c41 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:49 2006 +0900 [IPV6]: ROUTE: Add Router Reachability Probing (RFC4191). Signed-off-by: YOSHIFUJI Hideaki commit 7fe4e5f61aed7d08805177c613132896cd4e6f17 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:47 2006 +0900 [IPV6]: ROUTE: Add support for Router Preference (RFC4191). Signed-off-by: YOSHIFUJI Hideaki commit 7fdd994bf6095ed041d301e2ee288099393ddc4d Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:46 2006 +0900 [IPV6]: ROUTE: Handle finding the next best route in reachability in BACKTRACK(). Signed-off-by: YOSHIFUJI Hideaki commit f71588d54f82c99bdf71e9025ac0527996cc37e6 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:45 2006 +0900 [IPV6]: ROUTE: Try finding the next best route. Signed-off-by: YOSHIFUJI Hideaki commit a86c825ff53e0106d6849ebd2534d29f407e7e65 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:44 2006 +0900 [IPV6]: ROUTE: Clean up rt6_select() code path in ip6_route_{intput,output}(). Signed-off-by: YOSHIFUJI Hideaki commit 03bd9812dcc3a3a2c40d4a627c7844dc4f4596f9 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:43 2006 +0900 [IPV6]: ROUTE: Try selecting better route for non-default routes as well. Signed-off-by: YOSHIFUJI Hideaki commit fdbaa4156b716de4201839c7735374c77c78fe59 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:41 2006 +0900 [IPV6]: ROUTE: More strict check for default routers in rt6_get_dflt_router(). Check RTF_ADDRCONF|RTF_DEFAULT in rt6_get_dflt_router(). Signed-off-by: YOSHIFUJI Hideaki commit 196575e58a3c2a3f1315d51c1c11c8cf0d6e5760 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:40 2006 +0900 [IPV6]: ROUTE: Eliminate lock for default route pointer. And prepare for more advanced router selection. Signed-off-by: YOSHIFUJI Hideaki commit 67435e5f21f3db468217606d5e54763586425765 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:39 2006 +0900 [IPV6]: ROUTE: Clean-up cow'ing in ip6_route_{intput,output}(). Signed-off-by: YOSHIFUJI Hideaki commit fee6fdec43ccf3c32c16c3d262b29557a3de446c Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:38 2006 +0900 [IPV6]: ROUTE: Convert rt6_cow() to rt6_alloc_cow(). Signed-off-by: YOSHIFUJI Hideaki commit ba6415307251b1643fe6c5273b0b0ce19403e249 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:36 2006 +0900 [IPV6]: ROUTE: Clean up reference counting / unlocking for returning object. Signed-off-by: YOSHIFUJI Hideaki commit 8309114f59da366d2b7a3b08254b42fb554d7a25 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:35 2006 +0900 [IPV6]: ROUTE: Unify two code paths for pmtu disc. Signed-off-by: YOSHIFUJI Hideaki commit 8ea33955235240cdcda3bee97aaecd7f06be682d Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:34 2006 +0900 [IPV6]: ROUTE: Add rt6_alloc_clone() for cloning route allocation. Signed-off-by: YOSHIFUJI Hideaki commit f425e112ccb70b0c53b6741febe8e2021702251e Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:33 2006 +0900 [IPV6]: ROUTE: Copy u.dst.error for RTF_REJECT routes when cloning. Signed-off-by: YOSHIFUJI Hideaki commit 1f60e06f85ec3a390a2cc0a7957a813ffd814485 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:32 2006 +0900 [IPV6]: ROUTE: Set appropriate information before inserting a route. Signed-off-by: YOSHIFUJI Hideaki commit 51966ce8d4971c1a4553bdff124f7523041220ec Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:30 2006 +0900 [IPV6]: ROUTE: Split up rt6_cow() for future changes. Signed-off-by: YOSHIFUJI Hideaki commit 2ae8ea05ce4dadc54a74dbdffbcd455ece5cb4ab Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:29 2006 +0900 [IPV6]: ADDRCONF: Add accept_ra_pinfo sysctl. This controls whether we accept Prefix Information in RAs. Signed-off-by: YOSHIFUJI Hideaki commit 551a29692d6c3be9dd683d75aa8c03650eb22417 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:28 2006 +0900 [IPV6]: ROUTE: Add accept_ra_defrtr sysctl. This controls whether we accept default router information in RAs. Signed-off-by: YOSHIFUJI Hideaki commit 642416caf1743af39cdec768df1550faceebee22 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:27 2006 +0900 [IPV6]: ADDRCONF: Split up ipv6_generate_eui64() by device type. Signed-off-by: YOSHIFUJI Hideaki commit 60822f2ab48f457c8147c35e3a3767b01bcfba56 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:25 2006 +0900 [IPV6]: ADDRCONF: Use our standard algorithm for randomized ifid. RFC 3041 describes an algorithm to generate random interface identifier. In RFC 3041bis, it is allowed to use different algorithm than one described in RFC 3041. So, let's use our standard pseudo random algorithm to simplify our implementation. Signed-off-by: YOSHIFUJI Hideaki commit eb9dca601bbe879421f9dbc3cdf0ccf66927d2d0 Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:24 2006 +0900 [NET]: NEIGHBOUR: Ensure to record time to neigh->updated when neighbour's state changed. Signed-off-by: YOSHIFUJI Hideaki commit 97c9b5b7b5fda4196f9a8053fccff0be5dc23ffe Author: YOSHIFUJI Hideaki Date: Tue Jan 10 01:49:23 2006 +0900 [IPV6]: TUNNEL6: Don't try to add multicast route twice. Since addrconf_add_dev() has already called addrconf_add_mroute() to added route for multicast prefix, there's no point to call it again in addrconf_ip6_tnl_config(). Signed-off-by: YOSHIFUJI Hideaki --- Signed-off-by: Andrew Morton --- dev/null | 352 ------ Documentation/networking/ip-sysctl.txt | 37 drivers/net/bnx2.c | 475 +++----- drivers/net/bnx2.h | 37 drivers/net/irda/donauboe.c | 2 drivers/net/irda/ep7211_ir.c | 11 drivers/net/irda/nsc-ircc.c | 328 ++++-- drivers/net/irda/nsc-ircc.h | 2 drivers/net/irda/vlsi_ir.c | 2 drivers/net/tg3.c | 119 +- include/linux/dccp.h | 95 + include/linux/icmpv6.h | 11 include/linux/if.h | 26 include/linux/ipv6.h | 14 include/linux/ipv6_route.h | 10 include/linux/list.h | 24 include/linux/netdevice.h | 35 include/linux/netfilter/nfnetlink.h | 1 include/linux/netfilter/nfnetlink_log.h | 6 include/linux/netfilter/x_tables.h | 37 include/linux/netfilter/xt_policy.h | 58 + include/linux/netfilter_ipv4/ip_nat.h | 2 include/linux/netfilter_ipv4/ipt_policy.h | 67 - include/linux/netfilter_ipv6/ip6t_policy.h | 67 - include/linux/netlink.h | 1 include/linux/pci_ids.h | 2 include/linux/rtnetlink.h | 2 include/linux/skbuff.h | 43 include/linux/sysctl.h | 25 include/linux/tcp.h | 6 include/linux/xfrm.h | 30 include/net/if_inet6.h | 3 include/net/inet_connection_sock.h | 15 include/net/ip6_route.h | 24 include/net/ipv6.h | 12 include/net/ndisc.h | 2 include/net/netfilter/nf_conntrack.h | 56 - include/net/scm.h | 12 include/net/tcp.h | 9 include/net/xfrm.h | 51 net/8021q/vlan.c | 43 net/core/dev.c | 14 net/core/link_watch.c | 40 net/core/neighbour.c | 8 net/core/net-sysfs.c | 41 net/core/pktgen.c | 135 ++ net/core/rtnetlink.c | 50 net/core/skbuff.c | 19 net/core/sysctl_net_core.c | 23 net/dccp/Kconfig | 13 net/dccp/Makefile | 9 net/dccp/ackvec.c | 296 +++-- net/dccp/ackvec.h | 53 net/dccp/ccid.c | 189 ++- net/dccp/ccid.h | 117 -- net/dccp/ccids/Kconfig | 43 net/dccp/ccids/Makefile | 4 net/dccp/ccids/ccid2.c | 803 +++++++++++++++ net/dccp/ccids/ccid2.h | 85 + net/dccp/ccids/ccid3.c | 75 - net/dccp/ccids/ccid3.h | 5 net/dccp/dccp.h | 105 - net/dccp/feat.c | 595 +++++++++++ net/dccp/feat.h | 28 net/dccp/input.c | 20 net/dccp/ipv4.c | 281 ++--- net/dccp/ipv6.c | 31 net/dccp/minisocks.c | 32 net/dccp/options.c | 188 +++ net/dccp/output.c | 41 net/dccp/proto.c | 379 ++++--- net/dccp/sysctl.c | 124 ++ net/dccp/timer.c | 14 net/ipv4/ah4.c | 1 net/ipv4/esp4.c | 1 net/ipv4/fib_rules.c | 115 +- net/ipv4/fib_trie.c | 24 net/ipv4/inet_connection_sock.c | 19 net/ipv4/netfilter/Kconfig | 10 net/ipv4/netfilter/Makefile | 1 net/ipv4/netfilter/arp_tables.c | 19 net/ipv4/netfilter/arpt_mangle.c | 23 net/ipv4/netfilter/ip_conntrack_netlink.c | 7 net/ipv4/netfilter/ip_nat_helper_pptp.c | 8 net/ipv4/netfilter/ip_nat_rule.c | 45 net/ipv4/netfilter/ip_tables.c | 67 - net/ipv4/netfilter/ipt_CLUSTERIP.c | 27 net/ipv4/netfilter/ipt_DSCP.c | 17 net/ipv4/netfilter/ipt_ECN.c | 18 net/ipv4/netfilter/ipt_LOG.c | 11 net/ipv4/netfilter/ipt_MASQUERADE.c | 18 net/ipv4/netfilter/ipt_NETMAP.c | 19 net/ipv4/netfilter/ipt_REDIRECT.c | 17 net/ipv4/netfilter/ipt_REJECT.c | 28 net/ipv4/netfilter/ipt_SAME.c | 19 net/ipv4/netfilter/ipt_TCPMSS.c | 16 net/ipv4/netfilter/ipt_TOS.c | 17 net/ipv4/netfilter/ipt_TTL.c | 25 net/ipv4/netfilter/ipt_ULOG.c | 12 net/ipv4/netfilter/ipt_addrtype.c | 20 net/ipv4/netfilter/ipt_ah.c | 25 net/ipv4/netfilter/ipt_dscp.c | 19 net/ipv4/netfilter/ipt_ecn.c | 14 net/ipv4/netfilter/ipt_esp.c | 25 net/ipv4/netfilter/ipt_hashlimit.c | 21 net/ipv4/netfilter/ipt_iprange.c | 28 net/ipv4/netfilter/ipt_multiport.c | 31 net/ipv4/netfilter/ipt_owner.c | 21 net/ipv4/netfilter/ipt_recent.c | 22 net/ipv4/netfilter/ipt_tos.c | 18 net/ipv4/netfilter/ipt_ttl.c | 19 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 22 net/ipv4/sysctl_net_ipv4.c | 16 net/ipv4/tcp_input.c | 49 net/ipv4/tcp_ipv4.c | 14 net/ipv4/tcp_output.c | 240 ++++ net/ipv4/tcp_timer.c | 36 net/ipv6/Kconfig | 26 net/ipv6/addrconf.c | 209 ++- net/ipv6/ah6.c | 1 net/ipv6/esp6.c | 1 net/ipv6/ip6_fib.c | 1 net/ipv6/ip6_output.c | 41 net/ipv6/ipcomp6.c | 4 net/ipv6/ndisc.c | 49 net/ipv6/netfilter/Kconfig | 10 net/ipv6/netfilter/Makefile | 1 net/ipv6/netfilter/ip6_tables.c | 85 - net/ipv6/netfilter/ip6t_HL.c | 19 net/ipv6/netfilter/ip6t_LOG.c | 11 net/ipv6/netfilter/ip6t_REJECT.c | 25 net/ipv6/netfilter/ip6t_ah.c | 12 net/ipv6/netfilter/ip6t_dst.c | 13 net/ipv6/netfilter/ip6t_esp.c | 12 net/ipv6/netfilter/ip6t_eui64.c | 27 net/ipv6/netfilter/ip6t_frag.c | 13 net/ipv6/netfilter/ip6t_hbh.c | 13 net/ipv6/netfilter/ip6t_hl.c | 22 net/ipv6/netfilter/ip6t_ipv6header.c | 8 net/ipv6/netfilter/ip6t_multiport.c | 11 net/ipv6/netfilter/ip6t_owner.c | 18 net/ipv6/netfilter/ip6t_rt.c | 12 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 39 net/ipv6/netfilter/nf_conntrack_reasm.c | 8 net/ipv6/route.c | 677 +++++++----- net/ipv6/tcp_ipv6.c | 14 net/key/af_key.c | 2 net/llc/af_llc.c | 15 net/llc/llc_core.c | 1 net/netfilter/Kconfig | 10 net/netfilter/Makefile | 1 net/netfilter/nf_conntrack_core.c | 119 -- net/netfilter/nf_conntrack_ftp.c | 2 net/netfilter/nf_conntrack_netlink.c | 46 net/netfilter/nf_conntrack_standalone.c | 1 net/netfilter/nfnetlink.c | 6 net/netfilter/nfnetlink_log.c | 46 net/netfilter/x_tables.c | 72 + net/netfilter/xt_CLASSIFY.c | 42 net/netfilter/xt_CONNMARK.c | 27 net/netfilter/xt_MARK.c | 37 net/netfilter/xt_NFQUEUE.c | 24 net/netfilter/xt_NOTRACK.c | 45 net/netfilter/xt_comment.c | 18 net/netfilter/xt_connbytes.c | 15 net/netfilter/xt_connmark.c | 28 net/netfilter/xt_conntrack.c | 18 net/netfilter/xt_dccp.c | 45 net/netfilter/xt_helper.c | 26 net/netfilter/xt_length.c | 24 net/netfilter/xt_limit.c | 7 net/netfilter/xt_mac.c | 34 net/netfilter/xt_mark.c | 16 net/netfilter/xt_physdev.c | 14 net/netfilter/xt_pkttype.c | 23 net/netfilter/xt_policy.c | 209 +++ net/netfilter/xt_realm.c | 27 net/netfilter/xt_sctp.c | 66 - net/netfilter/xt_state.c | 21 net/netfilter/xt_string.c | 10 net/netfilter/xt_tcpmss.c | 52 net/netfilter/xt_tcpudp.c | 112 -- net/netlink/af_netlink.c | 52 net/sched/act_ipt.c | 10 net/sched/sch_atm.c | 1 net/sched/sch_dsmark.c | 1 net/sched/sch_netem.c | 4 net/sched/sch_prio.c | 2 net/sched/sch_red.c | 179 +++ net/sched/sch_sfq.c | 5 net/sched/sch_tbf.c | 9 net/socket.c | 115 +- net/unix/af_unix.c | 10 net/xfrm/xfrm_policy.c | 5 net/xfrm/xfrm_state.c | 106 + net/xfrm/xfrm_user.c | 375 ++++++- security/selinux/nlmsgtab.c | 7 197 files changed, 7108 insertions(+), 3654 deletions(-) diff -puN Documentation/networking/ip-sysctl.txt~git-net Documentation/networking/ip-sysctl.txt --- devel/Documentation/networking/ip-sysctl.txt~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/Documentation/networking/ip-sysctl.txt 2006-02-27 22:37:47.000000000 -0800 @@ -717,6 +717,33 @@ accept_ra - BOOLEAN Functional default: enabled if local forwarding is disabled. disabled if local forwarding is enabled. +accept_ra_defrtr - BOOLEAN + Learn default router in Router Advertisement. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + +accept_ra_pinfo - BOOLEAN + Learn Prefix Inforamtion in Router Advertisement. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + +accept_ra_rt_info_max_plen - INTEGER + Maximum prefix length of Route Information in RA. + + Route Information w/ prefix larger than or equal to this + variable shall be ignored. + + Functional default: 0 if accept_ra_rtr_pref is enabled. + -1 if accept_ra_rtr_pref is disabled. + +accept_ra_rtr_pref - BOOLEAN + Accept Router Preference in RA. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + accept_redirects - BOOLEAN Accept Redirects. @@ -727,8 +754,8 @@ autoconf - BOOLEAN Autoconfigure addresses using Prefix Information in Router Advertisements. - Functional default: enabled if accept_ra is enabled. - disabled if accept_ra is disabled. + Functional default: enabled if accept_ra_pinfo is enabled. + disabled if accept_ra_pinfo is disabled. dad_transmits - INTEGER The amount of Duplicate Address Detection probes to send. @@ -771,6 +798,12 @@ mtu - INTEGER Default Maximum Transfer Unit Default: 1280 (IPv6 required minimum) +router_probe_interval - INTEGER + Minimum interval (in seconds) between Router Probing described + in RFC4191. + + Default: 60 + router_solicitation_delay - INTEGER Number of seconds to wait after interface is brought up before sending Router Solicitations. diff -puN drivers/net/bnx2.c~git-net drivers/net/bnx2.c --- devel/drivers/net/bnx2.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/bnx2.c 2006-02-27 22:37:47.000000000 -0800 @@ -14,8 +14,8 @@ #define DRV_MODULE_NAME "bnx2" #define PFX DRV_MODULE_NAME ": " -#define DRV_MODULE_VERSION "1.4.31" -#define DRV_MODULE_RELDATE "January 19, 2006" +#define DRV_MODULE_VERSION "1.4.38" +#define DRV_MODULE_RELDATE "February 10, 2006" #define RUN_AT(x) (jiffies + (x)) @@ -360,6 +360,8 @@ bnx2_netif_start(struct bnx2 *bp) static void bnx2_free_mem(struct bnx2 *bp) { + int i; + if (bp->stats_blk) { pci_free_consistent(bp->pdev, sizeof(struct statistics_block), bp->stats_blk, bp->stats_blk_mapping); @@ -378,19 +380,23 @@ bnx2_free_mem(struct bnx2 *bp) } kfree(bp->tx_buf_ring); bp->tx_buf_ring = NULL; - if (bp->rx_desc_ring) { - pci_free_consistent(bp->pdev, - sizeof(struct rx_bd) * RX_DESC_CNT, - bp->rx_desc_ring, bp->rx_desc_mapping); - bp->rx_desc_ring = NULL; + for (i = 0; i < bp->rx_max_ring; i++) { + if (bp->rx_desc_ring[i]) + pci_free_consistent(bp->pdev, + sizeof(struct rx_bd) * RX_DESC_CNT, + bp->rx_desc_ring[i], + bp->rx_desc_mapping[i]); + bp->rx_desc_ring[i] = NULL; } - kfree(bp->rx_buf_ring); + vfree(bp->rx_buf_ring); bp->rx_buf_ring = NULL; } static int bnx2_alloc_mem(struct bnx2 *bp) { + int i; + bp->tx_buf_ring = kmalloc(sizeof(struct sw_bd) * TX_DESC_CNT, GFP_KERNEL); if (bp->tx_buf_ring == NULL) @@ -404,18 +410,23 @@ bnx2_alloc_mem(struct bnx2 *bp) if (bp->tx_desc_ring == NULL) goto alloc_mem_err; - bp->rx_buf_ring = kmalloc(sizeof(struct sw_bd) * RX_DESC_CNT, - GFP_KERNEL); + bp->rx_buf_ring = vmalloc(sizeof(struct sw_bd) * RX_DESC_CNT * + bp->rx_max_ring); if (bp->rx_buf_ring == NULL) goto alloc_mem_err; - memset(bp->rx_buf_ring, 0, sizeof(struct sw_bd) * RX_DESC_CNT); - bp->rx_desc_ring = pci_alloc_consistent(bp->pdev, - sizeof(struct rx_bd) * - RX_DESC_CNT, - &bp->rx_desc_mapping); - if (bp->rx_desc_ring == NULL) - goto alloc_mem_err; + memset(bp->rx_buf_ring, 0, sizeof(struct sw_bd) * RX_DESC_CNT * + bp->rx_max_ring); + + for (i = 0; i < bp->rx_max_ring; i++) { + bp->rx_desc_ring[i] = + pci_alloc_consistent(bp->pdev, + sizeof(struct rx_bd) * RX_DESC_CNT, + &bp->rx_desc_mapping[i]); + if (bp->rx_desc_ring[i] == NULL) + goto alloc_mem_err; + + } bp->status_blk = pci_alloc_consistent(bp->pdev, sizeof(struct status_block), @@ -1520,7 +1531,7 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, u16 i struct sk_buff *skb; struct sw_bd *rx_buf = &bp->rx_buf_ring[index]; dma_addr_t mapping; - struct rx_bd *rxbd = &bp->rx_desc_ring[index]; + struct rx_bd *rxbd = &bp->rx_desc_ring[RX_RING(index)][RX_IDX(index)]; unsigned long align; skb = dev_alloc_skb(bp->rx_buf_size); @@ -1656,23 +1667,30 @@ static inline void bnx2_reuse_rx_skb(struct bnx2 *bp, struct sk_buff *skb, u16 cons, u16 prod) { - struct sw_bd *cons_rx_buf = &bp->rx_buf_ring[cons]; - struct sw_bd *prod_rx_buf = &bp->rx_buf_ring[prod]; - struct rx_bd *cons_bd = &bp->rx_desc_ring[cons]; - struct rx_bd *prod_bd = &bp->rx_desc_ring[prod]; + struct sw_bd *cons_rx_buf, *prod_rx_buf; + struct rx_bd *cons_bd, *prod_bd; + + cons_rx_buf = &bp->rx_buf_ring[cons]; + prod_rx_buf = &bp->rx_buf_ring[prod]; pci_dma_sync_single_for_device(bp->pdev, pci_unmap_addr(cons_rx_buf, mapping), bp->rx_offset + RX_COPY_THRESH, PCI_DMA_FROMDEVICE); - prod_rx_buf->skb = cons_rx_buf->skb; - pci_unmap_addr_set(prod_rx_buf, mapping, - pci_unmap_addr(cons_rx_buf, mapping)); + bp->rx_prod_bseq += bp->rx_buf_use_size; - memcpy(prod_bd, cons_bd, 8); + prod_rx_buf->skb = skb; - bp->rx_prod_bseq += bp->rx_buf_use_size; + if (cons == prod) + return; + + pci_unmap_addr_set(prod_rx_buf, mapping, + pci_unmap_addr(cons_rx_buf, mapping)); + cons_bd = &bp->rx_desc_ring[RX_RING(cons)][RX_IDX(cons)]; + prod_bd = &bp->rx_desc_ring[RX_RING(prod)][RX_IDX(prod)]; + prod_bd->rx_bd_haddr_hi = cons_bd->rx_bd_haddr_hi; + prod_bd->rx_bd_haddr_lo = cons_bd->rx_bd_haddr_lo; } static int @@ -1699,14 +1717,19 @@ bnx2_rx_int(struct bnx2 *bp, int budget) u32 status; struct sw_bd *rx_buf; struct sk_buff *skb; + dma_addr_t dma_addr; sw_ring_cons = RX_RING_IDX(sw_cons); sw_ring_prod = RX_RING_IDX(sw_prod); rx_buf = &bp->rx_buf_ring[sw_ring_cons]; skb = rx_buf->skb; - pci_dma_sync_single_for_cpu(bp->pdev, - pci_unmap_addr(rx_buf, mapping), + + rx_buf->skb = NULL; + + dma_addr = pci_unmap_addr(rx_buf, mapping); + + pci_dma_sync_single_for_cpu(bp->pdev, dma_addr, bp->rx_offset + RX_COPY_THRESH, PCI_DMA_FROMDEVICE); rx_hdr = (struct l2_fhdr *) skb->data; @@ -1747,8 +1770,7 @@ bnx2_rx_int(struct bnx2 *bp, int budget) skb = new_skb; } else if (bnx2_alloc_rx_skb(bp, sw_ring_prod) == 0) { - pci_unmap_single(bp->pdev, - pci_unmap_addr(rx_buf, mapping), + pci_unmap_single(bp->pdev, dma_addr, bp->rx_buf_use_size, PCI_DMA_FROMDEVICE); skb_reserve(skb, bp->rx_offset); @@ -1794,8 +1816,6 @@ reuse_rx: rx_pkt++; next_rx: - rx_buf->skb = NULL; - sw_cons = NEXT_RX_BD(sw_cons); sw_prod = NEXT_RX_BD(sw_prod); @@ -3340,27 +3360,35 @@ bnx2_init_rx_ring(struct bnx2 *bp) bp->hw_rx_cons = 0; bp->rx_prod_bseq = 0; - rxbd = &bp->rx_desc_ring[0]; - for (i = 0; i < MAX_RX_DESC_CNT; i++, rxbd++) { - rxbd->rx_bd_len = bp->rx_buf_use_size; - rxbd->rx_bd_flags = RX_BD_FLAGS_START | RX_BD_FLAGS_END; - } + for (i = 0; i < bp->rx_max_ring; i++) { + int j; - rxbd->rx_bd_haddr_hi = (u64) bp->rx_desc_mapping >> 32; - rxbd->rx_bd_haddr_lo = (u64) bp->rx_desc_mapping & 0xffffffff; + rxbd = &bp->rx_desc_ring[i][0]; + for (j = 0; j < MAX_RX_DESC_CNT; j++, rxbd++) { + rxbd->rx_bd_len = bp->rx_buf_use_size; + rxbd->rx_bd_flags = RX_BD_FLAGS_START | RX_BD_FLAGS_END; + } + if (i == (bp->rx_max_ring - 1)) + j = 0; + else + j = i + 1; + rxbd->rx_bd_haddr_hi = (u64) bp->rx_desc_mapping[j] >> 32; + rxbd->rx_bd_haddr_lo = (u64) bp->rx_desc_mapping[j] & + 0xffffffff; + } val = BNX2_L2CTX_CTX_TYPE_CTX_BD_CHN_TYPE_VALUE; val |= BNX2_L2CTX_CTX_TYPE_SIZE_L2; val |= 0x02 << 8; CTX_WR(bp, GET_CID_ADDR(RX_CID), BNX2_L2CTX_CTX_TYPE, val); - val = (u64) bp->rx_desc_mapping >> 32; + val = (u64) bp->rx_desc_mapping[0] >> 32; CTX_WR(bp, GET_CID_ADDR(RX_CID), BNX2_L2CTX_NX_BDHADDR_HI, val); - val = (u64) bp->rx_desc_mapping & 0xffffffff; + val = (u64) bp->rx_desc_mapping[0] & 0xffffffff; CTX_WR(bp, GET_CID_ADDR(RX_CID), BNX2_L2CTX_NX_BDHADDR_LO, val); - for ( ;ring_prod < bp->rx_ring_size; ) { + for (i = 0; i < bp->rx_ring_size; i++) { if (bnx2_alloc_rx_skb(bp, ring_prod) < 0) { break; } @@ -3375,6 +3403,29 @@ bnx2_init_rx_ring(struct bnx2 *bp) } static void +bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size) +{ + u32 num_rings, max; + + bp->rx_ring_size = size; + num_rings = 1; + while (size > MAX_RX_DESC_CNT) { + size -= MAX_RX_DESC_CNT; + num_rings++; + } + /* round to next power of 2 */ + max = MAX_RX_RINGS; + while ((max & num_rings) == 0) + max >>= 1; + + if (num_rings != max) + max <<= 1; + + bp->rx_max_ring = max; + bp->rx_max_ring_idx = (bp->rx_max_ring * RX_DESC_CNT) - 1; +} + +static void bnx2_free_tx_skbs(struct bnx2 *bp) { int i; @@ -3419,7 +3470,7 @@ bnx2_free_rx_skbs(struct bnx2 *bp) if (bp->rx_buf_ring == NULL) return; - for (i = 0; i < RX_DESC_CNT; i++) { + for (i = 0; i < bp->rx_max_ring_idx; i++) { struct sw_bd *rx_buf = &bp->rx_buf_ring[i]; struct sk_buff *skb = rx_buf->skb; @@ -3506,74 +3557,9 @@ bnx2_test_registers(struct bnx2 *bp) { 0x0c00, 0, 0x00000000, 0x00000001 }, { 0x0c04, 0, 0x00000000, 0x03ff0001 }, { 0x0c08, 0, 0x0f0ff073, 0x00000000 }, - { 0x0c0c, 0, 0x00ffffff, 0x00000000 }, - { 0x0c30, 0, 0x00000000, 0xffffffff }, - { 0x0c34, 0, 0x00000000, 0xffffffff }, - { 0x0c38, 0, 0x00000000, 0xffffffff }, - { 0x0c3c, 0, 0x00000000, 0xffffffff }, - { 0x0c40, 0, 0x00000000, 0xffffffff }, - { 0x0c44, 0, 0x00000000, 0xffffffff }, - { 0x0c48, 0, 0x00000000, 0x0007ffff }, - { 0x0c4c, 0, 0x00000000, 0xffffffff }, - { 0x0c50, 0, 0x00000000, 0xffffffff }, - { 0x0c54, 0, 0x00000000, 0xffffffff }, - { 0x0c58, 0, 0x00000000, 0xffffffff }, - { 0x0c5c, 0, 0x00000000, 0xffffffff }, - { 0x0c60, 0, 0x00000000, 0xffffffff }, - { 0x0c64, 0, 0x00000000, 0xffffffff }, - { 0x0c68, 0, 0x00000000, 0xffffffff }, - { 0x0c6c, 0, 0x00000000, 0xffffffff }, - { 0x0c70, 0, 0x00000000, 0xffffffff }, - { 0x0c74, 0, 0x00000000, 0xffffffff }, - { 0x0c78, 0, 0x00000000, 0xffffffff }, - { 0x0c7c, 0, 0x00000000, 0xffffffff }, - { 0x0c80, 0, 0x00000000, 0xffffffff }, - { 0x0c84, 0, 0x00000000, 0xffffffff }, - { 0x0c88, 0, 0x00000000, 0xffffffff }, - { 0x0c8c, 0, 0x00000000, 0xffffffff }, - { 0x0c90, 0, 0x00000000, 0xffffffff }, - { 0x0c94, 0, 0x00000000, 0xffffffff }, - { 0x0c98, 0, 0x00000000, 0xffffffff }, - { 0x0c9c, 0, 0x00000000, 0xffffffff }, - { 0x0ca0, 0, 0x00000000, 0xffffffff }, - { 0x0ca4, 0, 0x00000000, 0xffffffff }, - { 0x0ca8, 0, 0x00000000, 0x0007ffff }, - { 0x0cac, 0, 0x00000000, 0xffffffff }, - { 0x0cb0, 0, 0x00000000, 0xffffffff }, - { 0x0cb4, 0, 0x00000000, 0xffffffff }, - { 0x0cb8, 0, 0x00000000, 0xffffffff }, - { 0x0cbc, 0, 0x00000000, 0xffffffff }, - { 0x0cc0, 0, 0x00000000, 0xffffffff }, - { 0x0cc4, 0, 0x00000000, 0xffffffff }, - { 0x0cc8, 0, 0x00000000, 0xffffffff }, - { 0x0ccc, 0, 0x00000000, 0xffffffff }, - { 0x0cd0, 0, 0x00000000, 0xffffffff }, - { 0x0cd4, 0, 0x00000000, 0xffffffff }, - { 0x0cd8, 0, 0x00000000, 0xffffffff }, - { 0x0cdc, 0, 0x00000000, 0xffffffff }, - { 0x0ce0, 0, 0x00000000, 0xffffffff }, - { 0x0ce4, 0, 0x00000000, 0xffffffff }, - { 0x0ce8, 0, 0x00000000, 0xffffffff }, - { 0x0cec, 0, 0x00000000, 0xffffffff }, - { 0x0cf0, 0, 0x00000000, 0xffffffff }, - { 0x0cf4, 0, 0x00000000, 0xffffffff }, - { 0x0cf8, 0, 0x00000000, 0xffffffff }, - { 0x0cfc, 0, 0x00000000, 0xffffffff }, - { 0x0d00, 0, 0x00000000, 0xffffffff }, - { 0x0d04, 0, 0x00000000, 0xffffffff }, { 0x1000, 0, 0x00000000, 0x00000001 }, { 0x1004, 0, 0x00000000, 0x000f0001 }, - { 0x1044, 0, 0x00000000, 0xffc003ff }, - { 0x1080, 0, 0x00000000, 0x0001ffff }, - { 0x1084, 0, 0x00000000, 0xffffffff }, - { 0x1088, 0, 0x00000000, 0xffffffff }, - { 0x108c, 0, 0x00000000, 0xffffffff }, - { 0x1090, 0, 0x00000000, 0xffffffff }, - { 0x1094, 0, 0x00000000, 0xffffffff }, - { 0x1098, 0, 0x00000000, 0xffffffff }, - { 0x109c, 0, 0x00000000, 0xffffffff }, - { 0x10a0, 0, 0x00000000, 0xffffffff }, { 0x1408, 0, 0x01c00800, 0x00000000 }, { 0x149c, 0, 0x8000ffff, 0x00000000 }, @@ -3585,111 +3571,9 @@ bnx2_test_registers(struct bnx2 *bp) { 0x14c4, 0, 0x00003fff, 0x00000000 }, { 0x14cc, 0, 0x00000000, 0x00000001 }, { 0x14d0, 0, 0xffffffff, 0x00000000 }, - { 0x1500, 0, 0x00000000, 0xffffffff }, - { 0x1504, 0, 0x00000000, 0xffffffff }, - { 0x1508, 0, 0x00000000, 0xffffffff }, - { 0x150c, 0, 0x00000000, 0xffffffff }, - { 0x1510, 0, 0x00000000, 0xffffffff }, - { 0x1514, 0, 0x00000000, 0xffffffff }, - { 0x1518, 0, 0x00000000, 0xffffffff }, - { 0x151c, 0, 0x00000000, 0xffffffff }, - { 0x1520, 0, 0x00000000, 0xffffffff }, - { 0x1524, 0, 0x00000000, 0xffffffff }, - { 0x1528, 0, 0x00000000, 0xffffffff }, - { 0x152c, 0, 0x00000000, 0xffffffff }, - { 0x1530, 0, 0x00000000, 0xffffffff }, - { 0x1534, 0, 0x00000000, 0xffffffff }, - { 0x1538, 0, 0x00000000, 0xffffffff }, - { 0x153c, 0, 0x00000000, 0xffffffff }, - { 0x1540, 0, 0x00000000, 0xffffffff }, - { 0x1544, 0, 0x00000000, 0xffffffff }, - { 0x1548, 0, 0x00000000, 0xffffffff }, - { 0x154c, 0, 0x00000000, 0xffffffff }, - { 0x1550, 0, 0x00000000, 0xffffffff }, - { 0x1554, 0, 0x00000000, 0xffffffff }, - { 0x1558, 0, 0x00000000, 0xffffffff }, - { 0x1600, 0, 0x00000000, 0xffffffff }, - { 0x1604, 0, 0x00000000, 0xffffffff }, - { 0x1608, 0, 0x00000000, 0xffffffff }, - { 0x160c, 0, 0x00000000, 0xffffffff }, - { 0x1610, 0, 0x00000000, 0xffffffff }, - { 0x1614, 0, 0x00000000, 0xffffffff }, - { 0x1618, 0, 0x00000000, 0xffffffff }, - { 0x161c, 0, 0x00000000, 0xffffffff }, - { 0x1620, 0, 0x00000000, 0xffffffff }, - { 0x1624, 0, 0x00000000, 0xffffffff }, - { 0x1628, 0, 0x00000000, 0xffffffff }, - { 0x162c, 0, 0x00000000, 0xffffffff }, - { 0x1630, 0, 0x00000000, 0xffffffff }, - { 0x1634, 0, 0x00000000, 0xffffffff }, - { 0x1638, 0, 0x00000000, 0xffffffff }, - { 0x163c, 0, 0x00000000, 0xffffffff }, - { 0x1640, 0, 0x00000000, 0xffffffff }, - { 0x1644, 0, 0x00000000, 0xffffffff }, - { 0x1648, 0, 0x00000000, 0xffffffff }, - { 0x164c, 0, 0x00000000, 0xffffffff }, - { 0x1650, 0, 0x00000000, 0xffffffff }, - { 0x1654, 0, 0x00000000, 0xffffffff }, { 0x1800, 0, 0x00000000, 0x00000001 }, { 0x1804, 0, 0x00000000, 0x00000003 }, - { 0x1840, 0, 0x00000000, 0xffffffff }, - { 0x1844, 0, 0x00000000, 0xffffffff }, - { 0x1848, 0, 0x00000000, 0xffffffff }, - { 0x184c, 0, 0x00000000, 0xffffffff }, - { 0x1850, 0, 0x00000000, 0xffffffff }, - { 0x1900, 0, 0x7ffbffff, 0x00000000 }, - { 0x1904, 0, 0xffffffff, 0x00000000 }, - { 0x190c, 0, 0xffffffff, 0x00000000 }, - { 0x1914, 0, 0xffffffff, 0x00000000 }, - { 0x191c, 0, 0xffffffff, 0x00000000 }, - { 0x1924, 0, 0xffffffff, 0x00000000 }, - { 0x192c, 0, 0xffffffff, 0x00000000 }, - { 0x1934, 0, 0xffffffff, 0x00000000 }, - { 0x193c, 0, 0xffffffff, 0x00000000 }, - { 0x1944, 0, 0xffffffff, 0x00000000 }, - { 0x194c, 0, 0xffffffff, 0x00000000 }, - { 0x1954, 0, 0xffffffff, 0x00000000 }, - { 0x195c, 0, 0xffffffff, 0x00000000 }, - { 0x1964, 0, 0xffffffff, 0x00000000 }, - { 0x196c, 0, 0xffffffff, 0x00000000 }, - { 0x1974, 0, 0xffffffff, 0x00000000 }, - { 0x197c, 0, 0xffffffff, 0x00000000 }, - { 0x1980, 0, 0x0700ffff, 0x00000000 }, - - { 0x1c00, 0, 0x00000000, 0x00000001 }, - { 0x1c04, 0, 0x00000000, 0x00000003 }, - { 0x1c08, 0, 0x0000000f, 0x00000000 }, - { 0x1c40, 0, 0x00000000, 0xffffffff }, - { 0x1c44, 0, 0x00000000, 0xffffffff }, - { 0x1c48, 0, 0x00000000, 0xffffffff }, - { 0x1c4c, 0, 0x00000000, 0xffffffff }, - { 0x1c50, 0, 0x00000000, 0xffffffff }, - { 0x1d00, 0, 0x7ffbffff, 0x00000000 }, - { 0x1d04, 0, 0xffffffff, 0x00000000 }, - { 0x1d0c, 0, 0xffffffff, 0x00000000 }, - { 0x1d14, 0, 0xffffffff, 0x00000000 }, - { 0x1d1c, 0, 0xffffffff, 0x00000000 }, - { 0x1d24, 0, 0xffffffff, 0x00000000 }, - { 0x1d2c, 0, 0xffffffff, 0x00000000 }, - { 0x1d34, 0, 0xffffffff, 0x00000000 }, - { 0x1d3c, 0, 0xffffffff, 0x00000000 }, - { 0x1d44, 0, 0xffffffff, 0x00000000 }, - { 0x1d4c, 0, 0xffffffff, 0x00000000 }, - { 0x1d54, 0, 0xffffffff, 0x00000000 }, - { 0x1d5c, 0, 0xffffffff, 0x00000000 }, - { 0x1d64, 0, 0xffffffff, 0x00000000 }, - { 0x1d6c, 0, 0xffffffff, 0x00000000 }, - { 0x1d74, 0, 0xffffffff, 0x00000000 }, - { 0x1d7c, 0, 0xffffffff, 0x00000000 }, - { 0x1d80, 0, 0x0700ffff, 0x00000000 }, - - { 0x2004, 0, 0x00000000, 0x0337000f }, - { 0x2008, 0, 0xffffffff, 0x00000000 }, - { 0x200c, 0, 0xffffffff, 0x00000000 }, - { 0x2010, 0, 0xffffffff, 0x00000000 }, - { 0x2014, 0, 0x801fff80, 0x00000000 }, - { 0x2018, 0, 0x000003ff, 0x00000000 }, { 0x2800, 0, 0x00000000, 0x00000001 }, { 0x2804, 0, 0x00000000, 0x00003f01 }, @@ -3707,16 +3591,6 @@ bnx2_test_registers(struct bnx2 *bp) { 0x2c00, 0, 0x00000000, 0x00000011 }, { 0x2c04, 0, 0x00000000, 0x00030007 }, - { 0x3000, 0, 0x00000000, 0x00000001 }, - { 0x3004, 0, 0x00000000, 0x007007ff }, - { 0x3008, 0, 0x00000003, 0x00000000 }, - { 0x300c, 0, 0xffffffff, 0x00000000 }, - { 0x3010, 0, 0xffffffff, 0x00000000 }, - { 0x3014, 0, 0xffffffff, 0x00000000 }, - { 0x3034, 0, 0xffffffff, 0x00000000 }, - { 0x3038, 0, 0xffffffff, 0x00000000 }, - { 0x3050, 0, 0x00000001, 0x00000000 }, - { 0x3c00, 0, 0x00000000, 0x00000001 }, { 0x3c04, 0, 0x00000000, 0x00070000 }, { 0x3c08, 0, 0x00007f71, 0x07f00000 }, @@ -3726,88 +3600,11 @@ bnx2_test_registers(struct bnx2 *bp) { 0x3c18, 0, 0x00000000, 0xffffffff }, { 0x3c1c, 0, 0xfffff000, 0x00000000 }, { 0x3c20, 0, 0xffffff00, 0x00000000 }, - { 0x3c24, 0, 0xffffffff, 0x00000000 }, - { 0x3c28, 0, 0xffffffff, 0x00000000 }, - { 0x3c2c, 0, 0xffffffff, 0x00000000 }, - { 0x3c30, 0, 0xffffffff, 0x00000000 }, - { 0x3c34, 0, 0xffffffff, 0x00000000 }, - { 0x3c38, 0, 0xffffffff, 0x00000000 }, - { 0x3c3c, 0, 0xffffffff, 0x00000000 }, - { 0x3c40, 0, 0xffffffff, 0x00000000 }, - { 0x3c44, 0, 0xffffffff, 0x00000000 }, - { 0x3c48, 0, 0xffffffff, 0x00000000 }, - { 0x3c4c, 0, 0xffffffff, 0x00000000 }, - { 0x3c50, 0, 0xffffffff, 0x00000000 }, - { 0x3c54, 0, 0xffffffff, 0x00000000 }, - { 0x3c58, 0, 0xffffffff, 0x00000000 }, - { 0x3c5c, 0, 0xffffffff, 0x00000000 }, - { 0x3c60, 0, 0xffffffff, 0x00000000 }, - { 0x3c64, 0, 0xffffffff, 0x00000000 }, - { 0x3c68, 0, 0xffffffff, 0x00000000 }, - { 0x3c6c, 0, 0xffffffff, 0x00000000 }, - { 0x3c70, 0, 0xffffffff, 0x00000000 }, - { 0x3c74, 0, 0x0000003f, 0x00000000 }, - { 0x3c78, 0, 0x00000000, 0x00000000 }, - { 0x3c7c, 0, 0x00000000, 0x00000000 }, - { 0x3c80, 0, 0x3fffffff, 0x00000000 }, - { 0x3c84, 0, 0x0000003f, 0x00000000 }, - { 0x3c88, 0, 0x00000000, 0xffffffff }, - { 0x3c8c, 0, 0x00000000, 0xffffffff }, - - { 0x4000, 0, 0x00000000, 0x00000001 }, - { 0x4004, 0, 0x00000000, 0x00030000 }, - { 0x4008, 0, 0x00000ff0, 0x00000000 }, - { 0x400c, 0, 0xffffffff, 0x00000000 }, - { 0x4088, 0, 0x00000000, 0x00070303 }, - - { 0x4400, 0, 0x00000000, 0x00000001 }, - { 0x4404, 0, 0x00000000, 0x00003f01 }, - { 0x4408, 0, 0x7fff00ff, 0x00000000 }, - { 0x440c, 0, 0xffffffff, 0x00000000 }, - { 0x4410, 0, 0xffff, 0x0000 }, - { 0x4414, 0, 0xffff, 0x0000 }, - { 0x4418, 0, 0xffff, 0x0000 }, - { 0x441c, 0, 0xffff, 0x0000 }, - { 0x4428, 0, 0xffffffff, 0x00000000 }, - { 0x442c, 0, 0xffffffff, 0x00000000 }, - { 0x4430, 0, 0xffffffff, 0x00000000 }, - { 0x4434, 0, 0xffffffff, 0x00000000 }, - { 0x4438, 0, 0xffffffff, 0x00000000 }, - { 0x443c, 0, 0xffffffff, 0x00000000 }, - { 0x4440, 0, 0xffffffff, 0x00000000 }, - { 0x4444, 0, 0xffffffff, 0x00000000 }, - - { 0x4c00, 0, 0x00000000, 0x00000001 }, - { 0x4c04, 0, 0x00000000, 0x0000003f }, - { 0x4c08, 0, 0xffffffff, 0x00000000 }, - { 0x4c0c, 0, 0x0007fc00, 0x00000000 }, - { 0x4c10, 0, 0x80003fe0, 0x00000000 }, - { 0x4c14, 0, 0xffffffff, 0x00000000 }, - { 0x4c44, 0, 0x00000000, 0x9fff9fff }, - { 0x4c48, 0, 0x00000000, 0xb3009fff }, - { 0x4c4c, 0, 0x00000000, 0x77f33b30 }, - { 0x4c50, 0, 0x00000000, 0xffffffff }, { 0x5004, 0, 0x00000000, 0x0000007f }, { 0x5008, 0, 0x0f0007ff, 0x00000000 }, { 0x500c, 0, 0xf800f800, 0x07ff07ff }, - { 0x5400, 0, 0x00000008, 0x00000001 }, - { 0x5404, 0, 0x00000000, 0x0000003f }, - { 0x5408, 0, 0x0000001f, 0x00000000 }, - { 0x540c, 0, 0xffffffff, 0x00000000 }, - { 0x5410, 0, 0xffffffff, 0x00000000 }, - { 0x5414, 0, 0x0000ffff, 0x00000000 }, - { 0x5418, 0, 0x0000ffff, 0x00000000 }, - { 0x541c, 0, 0x0000ffff, 0x00000000 }, - { 0x5420, 0, 0x0000ffff, 0x00000000 }, - { 0x5428, 0, 0x000000ff, 0x00000000 }, - { 0x542c, 0, 0xff00ffff, 0x00000000 }, - { 0x5430, 0, 0x001fff80, 0x00000000 }, - { 0x5438, 0, 0xffffffff, 0x00000000 }, - { 0x543c, 0, 0xffffffff, 0x00000000 }, - { 0x5440, 0, 0xf800f800, 0x07ff07ff }, - { 0x5c00, 0, 0x00000000, 0x00000001 }, { 0x5c04, 0, 0x00000000, 0x0003000f }, { 0x5c08, 0, 0x00000003, 0x00000000 }, @@ -4794,6 +4591,64 @@ bnx2_get_drvinfo(struct net_device *dev, info->fw_version[5] = 0; } +#define BNX2_REGDUMP_LEN (32 * 1024) + +static int +bnx2_get_regs_len(struct net_device *dev) +{ + return BNX2_REGDUMP_LEN; +} + +static void +bnx2_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p) +{ + u32 *p = _p, i, offset; + u8 *orig_p = _p; + struct bnx2 *bp = netdev_priv(dev); + u32 reg_boundaries[] = { 0x0000, 0x0098, 0x0400, 0x045c, + 0x0800, 0x0880, 0x0c00, 0x0c10, + 0x0c30, 0x0d08, 0x1000, 0x101c, + 0x1040, 0x1048, 0x1080, 0x10a4, + 0x1400, 0x1490, 0x1498, 0x14f0, + 0x1500, 0x155c, 0x1580, 0x15dc, + 0x1600, 0x1658, 0x1680, 0x16d8, + 0x1800, 0x1820, 0x1840, 0x1854, + 0x1880, 0x1894, 0x1900, 0x1984, + 0x1c00, 0x1c0c, 0x1c40, 0x1c54, + 0x1c80, 0x1c94, 0x1d00, 0x1d84, + 0x2000, 0x2030, 0x23c0, 0x2400, + 0x2800, 0x2820, 0x2830, 0x2850, + 0x2b40, 0x2c10, 0x2fc0, 0x3058, + 0x3c00, 0x3c94, 0x4000, 0x4010, + 0x4080, 0x4090, 0x43c0, 0x4458, + 0x4c00, 0x4c18, 0x4c40, 0x4c54, + 0x4fc0, 0x5010, 0x53c0, 0x5444, + 0x5c00, 0x5c18, 0x5c80, 0x5c90, + 0x5fc0, 0x6000, 0x6400, 0x6428, + 0x6800, 0x6848, 0x684c, 0x6860, + 0x6888, 0x6910, 0x8000 }; + + regs->version = 0; + + memset(p, 0, BNX2_REGDUMP_LEN); + + if (!netif_running(bp->dev)) + return; + + i = 0; + offset = reg_boundaries[0]; + p += offset; + while (offset < BNX2_REGDUMP_LEN) { + *p++ = REG_RD(bp, offset); + offset += 4; + if (offset == reg_boundaries[i + 1]) { + offset = reg_boundaries[i + 2]; + p = (u32 *) (orig_p + offset); + i += 2; + } + } +} + static void bnx2_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) { @@ -4979,7 +4834,7 @@ bnx2_get_ringparam(struct net_device *de { struct bnx2 *bp = netdev_priv(dev); - ering->rx_max_pending = MAX_RX_DESC_CNT; + ering->rx_max_pending = MAX_TOTAL_RX_DESC_CNT; ering->rx_mini_max_pending = 0; ering->rx_jumbo_max_pending = 0; @@ -4996,17 +4851,28 @@ bnx2_set_ringparam(struct net_device *de { struct bnx2 *bp = netdev_priv(dev); - if ((ering->rx_pending > MAX_RX_DESC_CNT) || + if ((ering->rx_pending > MAX_TOTAL_RX_DESC_CNT) || (ering->tx_pending > MAX_TX_DESC_CNT) || (ering->tx_pending <= MAX_SKB_FRAGS)) { return -EINVAL; } - bp->rx_ring_size = ering->rx_pending; + if (netif_running(bp->dev)) { + bnx2_netif_stop(bp); + bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET); + bnx2_free_skbs(bp); + bnx2_free_mem(bp); + } + + bnx2_set_rx_ring_size(bp, ering->rx_pending); bp->tx_ring_size = ering->tx_pending; if (netif_running(bp->dev)) { - bnx2_netif_stop(bp); + int rc; + + rc = bnx2_alloc_mem(bp); + if (rc) + return rc; bnx2_init_nic(bp); bnx2_netif_start(bp); } @@ -5360,6 +5226,8 @@ static struct ethtool_ops bnx2_ethtool_o .get_settings = bnx2_get_settings, .set_settings = bnx2_set_settings, .get_drvinfo = bnx2_get_drvinfo, + .get_regs_len = bnx2_get_regs_len, + .get_regs = bnx2_get_regs, .get_wol = bnx2_get_wol, .set_wol = bnx2_set_wol, .nway_reset = bnx2_nway_reset, @@ -5678,7 +5546,7 @@ bnx2_init_board(struct pci_dev *pdev, st bp->mac_addr[5] = (u8) reg; bp->tx_ring_size = MAX_TX_DESC_CNT; - bp->rx_ring_size = 100; + bnx2_set_rx_ring_size(bp, 100); bp->rx_csum = 1; @@ -5897,6 +5765,7 @@ bnx2_suspend(struct pci_dev *pdev, pm_me if (!netif_running(dev)) return 0; + flush_scheduled_work(); bnx2_netif_stop(bp); netif_device_detach(dev); del_timer_sync(&bp->timer); diff -puN drivers/net/bnx2.h~git-net drivers/net/bnx2.h --- devel/drivers/net/bnx2.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/bnx2.h 2006-02-27 22:37:47.000000000 -0800 @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -3792,8 +3793,10 @@ struct l2_fhdr { #define TX_DESC_CNT (BCM_PAGE_SIZE / sizeof(struct tx_bd)) #define MAX_TX_DESC_CNT (TX_DESC_CNT - 1) +#define MAX_RX_RINGS 4 #define RX_DESC_CNT (BCM_PAGE_SIZE / sizeof(struct rx_bd)) #define MAX_RX_DESC_CNT (RX_DESC_CNT - 1) +#define MAX_TOTAL_RX_DESC_CNT (MAX_RX_DESC_CNT * MAX_RX_RINGS) #define NEXT_TX_BD(x) (((x) & (MAX_TX_DESC_CNT - 1)) == \ (MAX_TX_DESC_CNT - 1)) ? \ @@ -3805,8 +3808,10 @@ struct l2_fhdr { (MAX_RX_DESC_CNT - 1)) ? \ (x) + 2 : (x) + 1 -#define RX_RING_IDX(x) ((x) & MAX_RX_DESC_CNT) +#define RX_RING_IDX(x) ((x) & bp->rx_max_ring_idx) +#define RX_RING(x) (((x) & ~MAX_RX_DESC_CNT) >> 8) +#define RX_IDX(x) ((x) & MAX_RX_DESC_CNT) /* Context size. */ #define CTX_SHIFT 7 @@ -3903,6 +3908,15 @@ struct bnx2 { struct status_block *status_blk; u32 last_status_idx; + u32 flags; +#define PCIX_FLAG 1 +#define PCI_32BIT_FLAG 2 +#define ONE_TDMA_FLAG 4 /* no longer used */ +#define NO_WOL_FLAG 8 +#define USING_DAC_FLAG 0x10 +#define USING_MSI_FLAG 0x20 +#define ASF_ENABLE_FLAG 0x40 + struct tx_bd *tx_desc_ring; struct sw_bd *tx_buf_ring; u32 tx_prod_bseq; @@ -3920,19 +3934,22 @@ struct bnx2 { u32 rx_offset; u32 rx_buf_use_size; /* useable size */ u32 rx_buf_size; /* with alignment */ - struct rx_bd *rx_desc_ring; - struct sw_bd *rx_buf_ring; + u32 rx_max_ring_idx; + u32 rx_prod_bseq; u16 rx_prod; u16 rx_cons; u32 rx_csum; + struct sw_bd *rx_buf_ring; + struct rx_bd *rx_desc_ring[MAX_RX_RINGS]; + /* Only used to synchronize netif_stop_queue/wake_queue when tx */ /* ring is full */ spinlock_t tx_lock; - /* End of fileds used in the performance code paths. */ + /* End of fields used in the performance code paths. */ char *name; @@ -3945,15 +3962,6 @@ struct bnx2 { /* Used to synchronize phy accesses. */ spinlock_t phy_lock; - u32 flags; -#define PCIX_FLAG 1 -#define PCI_32BIT_FLAG 2 -#define ONE_TDMA_FLAG 4 /* no longer used */ -#define NO_WOL_FLAG 8 -#define USING_DAC_FLAG 0x10 -#define USING_MSI_FLAG 0x20 -#define ASF_ENABLE_FLAG 0x40 - u32 phy_flags; #define PHY_SERDES_FLAG 1 #define PHY_CRC_FIX_FLAG 2 @@ -4004,8 +4012,9 @@ struct bnx2 { dma_addr_t tx_desc_mapping; + int rx_max_ring; int rx_ring_size; - dma_addr_t rx_desc_mapping; + dma_addr_t rx_desc_mapping[MAX_RX_RINGS]; u16 tx_quick_cons_trip; u16 tx_quick_cons_trip_int; diff -puN drivers/net/irda/donauboe.c~git-net drivers/net/irda/donauboe.c --- devel/drivers/net/irda/donauboe.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/donauboe.c 2006-02-27 22:37:47.000000000 -0800 @@ -1778,7 +1778,7 @@ static struct pci_driver donauboe_pci_dr static int __init donauboe_init (void) { - return pci_module_init(&donauboe_pci_driver); + return pci_register_driver(&donauboe_pci_driver); } static void __exit diff -puN drivers/net/irda/ep7211_ir.c~git-net drivers/net/irda/ep7211_ir.c --- devel/drivers/net/irda/ep7211_ir.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/ep7211_ir.c 2006-02-27 22:37:47.000000000 -0800 @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -23,6 +24,8 @@ static void ep7211_ir_close(dongle_t *se static int ep7211_ir_change_speed(struct irda_task *task); static int ep7211_ir_reset(struct irda_task *task); +static DEFINE_SPINLOCK(ep7211_lock); + static struct dongle_reg dongle = { .type = IRDA_EP7211_IR, .open = ep7211_ir_open, @@ -36,7 +39,7 @@ static void ep7211_ir_open(dongle_t *sel { unsigned int syscon1, flags; - save_flags(flags); cli(); + spin_lock_irqsave(&ep7211_lock, flags); /* Turn on the SIR encoder. */ syscon1 = clps_readl(SYSCON1); @@ -46,14 +49,14 @@ static void ep7211_ir_open(dongle_t *sel /* XXX: We should disable modem status interrupts on the first UART (interrupt #14). */ - restore_flags(flags); + spin_unlock_irqrestore(&ep7211_lock, flags); } static void ep7211_ir_close(dongle_t *self) { unsigned int syscon1, flags; - save_flags(flags); cli(); + spin_lock_irqsave(&ep7211_lock, flags); /* Turn off the SIR encoder. */ syscon1 = clps_readl(SYSCON1); @@ -63,7 +66,7 @@ static void ep7211_ir_close(dongle_t *se /* XXX: If we've disabled the modem status interrupts, we should reset them back to their original state. */ - restore_flags(flags); + spin_unlock_irqrestore(&ep7211_lock, flags); } /* diff -puN drivers/net/irda/nsc-ircc.c~git-net drivers/net/irda/nsc-ircc.c --- devel/drivers/net/irda/nsc-ircc.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/nsc-ircc.c 2006-02-27 22:37:47.000000000 -0800 @@ -12,6 +12,7 @@ * Copyright (c) 1998-2000 Dag Brattli * Copyright (c) 1998 Lichen Wang, * Copyright (c) 1998 Actisys Corp., www.actisys.com + * Copyright (c) 2000-2004 Jean Tourrilhes * All Rights Reserved * * This program is free software; you can redistribute it and/or @@ -53,14 +54,13 @@ #include #include #include +#include +#include #include #include #include -#include -#include - #include #include #include @@ -72,14 +72,27 @@ static char *driver_name = "nsc-ircc"; +/* Power Management */ +#define NSC_IRCC_DRIVER_NAME "nsc-ircc" +static int nsc_ircc_suspend(struct platform_device *dev, pm_message_t state); +static int nsc_ircc_resume(struct platform_device *dev); + +static struct platform_driver nsc_ircc_driver = { + .suspend = nsc_ircc_suspend, + .resume = nsc_ircc_resume, + .driver = { + .name = NSC_IRCC_DRIVER_NAME, + }, +}; + /* Module parameters */ static int qos_mtt_bits = 0x07; /* 1 ms or more */ static int dongle_id; /* Use BIOS settions by default, but user may supply module parameters */ -static unsigned int io[] = { ~0, ~0, ~0, ~0 }; -static unsigned int irq[] = { 0, 0, 0, 0, 0 }; -static unsigned int dma[] = { 0, 0, 0, 0, 0 }; +static unsigned int io[] = { ~0, ~0, ~0, ~0, ~0 }; +static unsigned int irq[] = { 0, 0, 0, 0, 0 }; +static unsigned int dma[] = { 0, 0, 0, 0, 0 }; static int nsc_ircc_probe_108(nsc_chip_t *chip, chipio_t *info); static int nsc_ircc_probe_338(nsc_chip_t *chip, chipio_t *info); @@ -87,6 +100,7 @@ static int nsc_ircc_probe_39x(nsc_chip_t static int nsc_ircc_init_108(nsc_chip_t *chip, chipio_t *info); static int nsc_ircc_init_338(nsc_chip_t *chip, chipio_t *info); static int nsc_ircc_init_39x(nsc_chip_t *chip, chipio_t *info); +static int nsc_ircc_pnp_probe(struct pnp_dev *dev, const struct pnp_device_id *id); /* These are the known NSC chips */ static nsc_chip_t chips[] = { @@ -101,11 +115,12 @@ static nsc_chip_t chips[] = { /* Contributed by Jan Frey - IBM A30/A31 */ { "PC8739x", { 0x2e, 0x4e, 0x0 }, 0x20, 0xea, 0xff, nsc_ircc_probe_39x, nsc_ircc_init_39x }, + { "IBM", { 0x2e, 0x4e, 0x0 }, 0x20, 0xf4, 0xff, + nsc_ircc_probe_39x, nsc_ircc_init_39x }, { NULL } }; -/* Max 4 instances for now */ -static struct nsc_ircc_cb *dev_self[] = { NULL, NULL, NULL, NULL }; +static struct nsc_ircc_cb *dev_self[] = { NULL, NULL, NULL, NULL, NULL }; static char *dongle_types[] = { "Differential serial interface", @@ -126,8 +141,24 @@ static char *dongle_types[] = { "No dongle connected", }; +/* PNP probing */ +static chipio_t pnp_info; +static const struct pnp_device_id nsc_ircc_pnp_table[] = { + { .id = "NSC6001", .driver_data = 0 }, + { .id = "IBM0071", .driver_data = 0 }, + { } +}; + +MODULE_DEVICE_TABLE(pnp, nsc_ircc_pnp_table); + +static struct pnp_driver nsc_ircc_pnp_driver = { + .name = "nsc-ircc", + .id_table = nsc_ircc_pnp_table, + .probe = nsc_ircc_pnp_probe, +}; + /* Some prototypes */ -static int nsc_ircc_open(int i, chipio_t *info); +static int nsc_ircc_open(chipio_t *info); static int nsc_ircc_close(struct nsc_ircc_cb *self); static int nsc_ircc_setup(chipio_t *info); static void nsc_ircc_pio_receive(struct nsc_ircc_cb *self); @@ -146,7 +177,10 @@ static int nsc_ircc_net_open(struct net static int nsc_ircc_net_close(struct net_device *dev); static int nsc_ircc_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); static struct net_device_stats *nsc_ircc_net_get_stats(struct net_device *dev); -static int nsc_ircc_pmproc(struct pm_dev *dev, pm_request_t rqst, void *data); + +/* Globals */ +static int pnp_registered; +static int pnp_succeeded; /* * Function nsc_ircc_init () @@ -158,28 +192,36 @@ static int __init nsc_ircc_init(void) { chipio_t info; nsc_chip_t *chip; - int ret = -ENODEV; + int ret; int cfg_base; int cfg, id; int reg; int i = 0; + ret = platform_driver_register(&nsc_ircc_driver); + if (ret) { + IRDA_ERROR("%s, Can't register driver!\n", driver_name); + return ret; + } + + /* Register with PnP subsystem to detect disable ports */ + ret = pnp_register_driver(&nsc_ircc_pnp_driver); + + if (ret >= 0) + pnp_registered = 1; + + ret = -ENODEV; + /* Probe for all the NSC chipsets we know about */ - for (chip=chips; chip->name ; chip++) { + for (chip = chips; chip->name ; chip++) { IRDA_DEBUG(2, "%s(), Probing for %s ...\n", __FUNCTION__, chip->name); /* Try all config registers for this chip */ - for (cfg=0; cfg<3; cfg++) { + for (cfg = 0; cfg < ARRAY_SIZE(chip->cfg); cfg++) { cfg_base = chip->cfg[cfg]; if (!cfg_base) continue; - - memset(&info, 0, sizeof(chipio_t)); - info.cfg_base = cfg_base; - info.fir_base = io[i]; - info.dma = dma[i]; - info.irq = irq[i]; /* Read index register */ reg = inb(cfg_base); @@ -194,26 +236,67 @@ static int __init nsc_ircc_init(void) if ((id & chip->cid_mask) == chip->cid_value) { IRDA_DEBUG(2, "%s() Found %s chip, revision=%d\n", __FUNCTION__, chip->name, id & ~chip->cid_mask); - /* - * If the user supplies the base address, then - * we init the chip, if not we probe the values - * set by the BIOS - */ - if (io[i] < 0x2000) { - chip->init(chip, &info); - } else - chip->probe(chip, &info); - - if (nsc_ircc_open(i, &info) == 0) - ret = 0; + + /* + * If we found a correct PnP setting, + * we first try it. + */ + if (pnp_succeeded) { + memset(&info, 0, sizeof(chipio_t)); + info.cfg_base = cfg_base; + info.fir_base = pnp_info.fir_base; + info.dma = pnp_info.dma; + info.irq = pnp_info.irq; + + if (info.fir_base < 0x2000) { + IRDA_MESSAGE("%s, chip->init\n", driver_name); + chip->init(chip, &info); + } else + chip->probe(chip, &info); + + if (nsc_ircc_open(&info) >= 0) + ret = 0; + } + + /* + * Opening based on PnP values failed. + * Let's fallback to user values, or probe + * the chip. + */ + if (ret) { + IRDA_DEBUG(2, "%s, PnP init failed\n", driver_name); + memset(&info, 0, sizeof(chipio_t)); + info.cfg_base = cfg_base; + info.fir_base = io[i]; + info.dma = dma[i]; + info.irq = irq[i]; + + /* + * If the user supplies the base address, then + * we init the chip, if not we probe the values + * set by the BIOS + */ + if (io[i] < 0x2000) { + chip->init(chip, &info); + } else + chip->probe(chip, &info); + + if (nsc_ircc_open(&info) >= 0) + ret = 0; + } i++; } else { IRDA_DEBUG(2, "%s(), Wrong chip id=0x%02x\n", __FUNCTION__, id); } } - } + if (ret) { + platform_driver_unregister(&nsc_ircc_driver); + pnp_unregister_driver(&nsc_ircc_pnp_driver); + pnp_registered = 0; + } + return ret; } @@ -227,12 +310,17 @@ static void __exit nsc_ircc_cleanup(void { int i; - pm_unregister_all(nsc_ircc_pmproc); - - for (i=0; i < 4; i++) { + for (i = 0; i < ARRAY_SIZE(dev_self); i++) { if (dev_self[i]) nsc_ircc_close(dev_self[i]); } + + platform_driver_unregister(&nsc_ircc_driver); + + if (pnp_registered) + pnp_unregister_driver(&nsc_ircc_pnp_driver); + + pnp_registered = 0; } /* @@ -241,16 +329,26 @@ static void __exit nsc_ircc_cleanup(void * Open driver instance * */ -static int __init nsc_ircc_open(int i, chipio_t *info) +static int __init nsc_ircc_open(chipio_t *info) { struct net_device *dev; struct nsc_ircc_cb *self; - struct pm_dev *pmdev; void *ret; - int err; + int err, chip_index; IRDA_DEBUG(2, "%s()\n", __FUNCTION__); + + for (chip_index = 0; chip_index < ARRAY_SIZE(dev_self); chip_index++) { + if (!dev_self[chip_index]) + break; + } + + if (chip_index == ARRAY_SIZE(dev_self)) { + IRDA_ERROR("%s(), maximum number of supported chips reached!\n", __FUNCTION__); + return -ENOMEM; + } + IRDA_MESSAGE("%s, Found chip at base=0x%03x\n", driver_name, info->cfg_base); @@ -271,8 +369,8 @@ static int __init nsc_ircc_open(int i, c spin_lock_init(&self->lock); /* Need to store self somewhere */ - dev_self[i] = self; - self->index = i; + dev_self[chip_index] = self; + self->index = chip_index; /* Initialize IO */ self->io.cfg_base = info->cfg_base; @@ -351,7 +449,7 @@ static int __init nsc_ircc_open(int i, c /* Check if user has supplied a valid dongle id or not */ if ((dongle_id <= 0) || - (dongle_id >= (sizeof(dongle_types) / sizeof(dongle_types[0]))) ) { + (dongle_id >= ARRAY_SIZE(dongle_types))) { dongle_id = nsc_ircc_read_dongle_id(self->io.fir_base); IRDA_MESSAGE("%s, Found dongle: %s\n", driver_name, @@ -364,11 +462,18 @@ static int __init nsc_ircc_open(int i, c self->io.dongle_id = dongle_id; nsc_ircc_init_dongle_interface(self->io.fir_base, dongle_id); - pmdev = pm_register(PM_SYS_DEV, PM_SYS_IRDA, nsc_ircc_pmproc); - if (pmdev) - pmdev->data = self; + self->pldev = platform_device_register_simple(NSC_IRCC_DRIVER_NAME, + self->index, NULL, 0); + if (IS_ERR(self->pldev)) { + err = PTR_ERR(self->pldev); + goto out5; + } + platform_set_drvdata(self->pldev, self); - return 0; + return chip_index; + + out5: + unregister_netdev(dev); out4: dma_free_coherent(NULL, self->tx_buff.truesize, self->tx_buff.head, self->tx_buff_dma); @@ -379,7 +484,7 @@ static int __init nsc_ircc_open(int i, c release_region(self->io.fir_base, self->io.fir_ext); out1: free_netdev(dev); - dev_self[i] = NULL; + dev_self[chip_index] = NULL; return err; } @@ -399,6 +504,8 @@ static int __exit nsc_ircc_close(struct iobase = self->io.fir_base; + platform_device_unregister(self->pldev); + /* Remove netdevice */ unregister_netdev(self->netdev); @@ -806,6 +913,43 @@ static int nsc_ircc_probe_39x(nsc_chip_t return 0; } +/* PNP probing */ +static int nsc_ircc_pnp_probe(struct pnp_dev *dev, const struct pnp_device_id *id) +{ + memset(&pnp_info, 0, sizeof(chipio_t)); + pnp_info.irq = -1; + pnp_info.dma = -1; + pnp_succeeded = 1; + + /* There don't seem to be any way to get the cfg_base. + * On my box, cfg_base is in the PnP descriptor of the + * motherboard. Oh well... Jean II */ + + if (pnp_port_valid(dev, 0) && + !(pnp_port_flags(dev, 0) & IORESOURCE_DISABLED)) + pnp_info.fir_base = pnp_port_start(dev, 0); + + if (pnp_irq_valid(dev, 0) && + !(pnp_irq_flags(dev, 0) & IORESOURCE_DISABLED)) + pnp_info.irq = pnp_irq(dev, 0); + + if (pnp_dma_valid(dev, 0) && + !(pnp_dma_flags(dev, 0) & IORESOURCE_DISABLED)) + pnp_info.dma = pnp_dma(dev, 0); + + IRDA_DEBUG(0, "%s() : From PnP, found firbase 0x%03X ; irq %d ; dma %d.\n", + __FUNCTION__, pnp_info.fir_base, pnp_info.irq, pnp_info.dma); + + if((pnp_info.fir_base == 0) || + (pnp_info.irq == -1) || (pnp_info.dma == -1)) { + /* Returning an error will disable the device. Yuck ! */ + //return -EINVAL; + pnp_succeeded = 0; + } + + return 0; +} + /* * Function nsc_ircc_setup (info) * @@ -2161,45 +2305,83 @@ static struct net_device_stats *nsc_ircc return &self->stats; } -static void nsc_ircc_suspend(struct nsc_ircc_cb *self) +static int nsc_ircc_suspend(struct platform_device *dev, pm_message_t state) { - IRDA_MESSAGE("%s, Suspending\n", driver_name); - + struct nsc_ircc_cb *self = platform_get_drvdata(dev); + int bank; + unsigned long flags; + int iobase = self->io.fir_base; + if (self->io.suspended) - return; - - nsc_ircc_net_close(self->netdev); + return 0; + IRDA_DEBUG(1, "%s, Suspending\n", driver_name); + + rtnl_lock(); + if (netif_running(self->netdev)) { + netif_device_detach(self->netdev); + spin_lock_irqsave(&self->lock, flags); + /* Save current bank */ + bank = inb(iobase+BSR); + + /* Disable interrupts */ + switch_bank(iobase, BANK0); + outb(0, iobase+IER); + + /* Restore bank register */ + outb(bank, iobase+BSR); + + spin_unlock_irqrestore(&self->lock, flags); + free_irq(self->io.irq, self->netdev); + disable_dma(self->io.dma); + } self->io.suspended = 1; + rtnl_unlock(); + + return 0; } - -static void nsc_ircc_wakeup(struct nsc_ircc_cb *self) + +static int nsc_ircc_resume(struct platform_device *dev) { + struct nsc_ircc_cb *self = platform_get_drvdata(dev); + unsigned long flags; + if (!self->io.suspended) - return; + return 0; + IRDA_DEBUG(1, "%s, Waking up\n", driver_name); + + rtnl_lock(); nsc_ircc_setup(&self->io); - nsc_ircc_net_open(self->netdev); - - IRDA_MESSAGE("%s, Waking up\n", driver_name); + nsc_ircc_init_dongle_interface(self->io.fir_base, self->io.dongle_id); + if (netif_running(self->netdev)) { + if (request_irq(self->io.irq, nsc_ircc_interrupt, 0, + self->netdev->name, self->netdev)) { + IRDA_WARNING("%s, unable to allocate irq=%d\n", + driver_name, self->io.irq); + + /* + * Don't fail resume process, just kill this + * network interface + */ + unregister_netdevice(self->netdev); + } else { + spin_lock_irqsave(&self->lock, flags); + nsc_ircc_change_speed(self, self->io.speed); + spin_unlock_irqrestore(&self->lock, flags); + netif_device_attach(self->netdev); + } + + } else { + spin_lock_irqsave(&self->lock, flags); + nsc_ircc_change_speed(self, 9600); + spin_unlock_irqrestore(&self->lock, flags); + } self->io.suspended = 0; -} + rtnl_unlock(); -static int nsc_ircc_pmproc(struct pm_dev *dev, pm_request_t rqst, void *data) -{ - struct nsc_ircc_cb *self = (struct nsc_ircc_cb*) dev->data; - if (self) { - switch (rqst) { - case PM_SUSPEND: - nsc_ircc_suspend(self); - break; - case PM_RESUME: - nsc_ircc_wakeup(self); - break; - } - } - return 0; + return 0; } MODULE_AUTHOR("Dag Brattli "); diff -puN drivers/net/irda/nsc-ircc.h~git-net drivers/net/irda/nsc-ircc.h --- devel/drivers/net/irda/nsc-ircc.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/nsc-ircc.h 2006-02-27 22:37:47.000000000 -0800 @@ -269,7 +269,7 @@ struct nsc_ircc_cb { __u32 new_speed; int index; /* Instance index */ - struct pm_dev *dev; + struct platform_device *pldev; }; static inline void switch_bank(int iobase, int bank) diff -puN drivers/net/irda/vlsi_ir.c~git-net drivers/net/irda/vlsi_ir.c --- devel/drivers/net/irda/vlsi_ir.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/irda/vlsi_ir.c 2006-02-27 22:37:47.000000000 -0800 @@ -1887,7 +1887,7 @@ static int __init vlsi_mod_init(void) vlsi_proc_root->owner = THIS_MODULE; } - ret = pci_module_init(&vlsi_irda_driver); + ret = pci_register_driver(&vlsi_irda_driver); if (ret && vlsi_proc_root) remove_proc_entry(PROC_DIR, NULL); diff -puN drivers/net/tg3.c~git-net drivers/net/tg3.c --- devel/drivers/net/tg3.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/drivers/net/tg3.c 2006-02-27 22:37:47.000000000 -0800 @@ -69,8 +69,8 @@ #define DRV_MODULE_NAME "tg3" #define PFX DRV_MODULE_NAME ": " -#define DRV_MODULE_VERSION "3.49" -#define DRV_MODULE_RELDATE "Feb 2, 2006" +#define DRV_MODULE_VERSION "3.51" +#define DRV_MODULE_RELDATE "Feb 21, 2006" #define TG3_DEF_MAC_MODE 0 #define TG3_DEF_RX_MODE 0 @@ -223,8 +223,12 @@ static struct pci_device_id tg3_pci_tbl[ PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5714, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5714S, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5715, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5715S, + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_TIGON3_5780S, @@ -1038,9 +1042,11 @@ static void tg3_frob_aux_power(struct tg struct net_device *dev_peer; dev_peer = pci_get_drvdata(tp->pdev_peer); + /* remove_one() may have been run on the peer. */ if (!dev_peer) - BUG(); - tp_peer = netdev_priv(dev_peer); + tp_peer = tp; + else + tp_peer = netdev_priv(dev_peer); } if ((tp->tg3_flags & TG3_FLAG_WOL_ENABLE) != 0 || @@ -1131,7 +1137,7 @@ static int tg3_halt_cpu(struct tg3 *, u3 static int tg3_nvram_lock(struct tg3 *); static void tg3_nvram_unlock(struct tg3 *); -static int tg3_set_power_state(struct tg3 *tp, int state) +static int tg3_set_power_state(struct tg3 *tp, pci_power_t state) { u32 misc_host_ctrl; u16 power_control, power_caps; @@ -1150,7 +1156,7 @@ static int tg3_set_power_state(struct tg power_control |= PCI_PM_CTRL_PME_STATUS; power_control &= ~(PCI_PM_CTRL_STATE_MASK); switch (state) { - case 0: + case PCI_D0: power_control |= 0; pci_write_config_word(tp->pdev, pm + PCI_PM_CTRL, @@ -1163,15 +1169,15 @@ static int tg3_set_power_state(struct tg return 0; - case 1: + case PCI_D1: power_control |= 1; break; - case 2: + case PCI_D2: power_control |= 2; break; - case 3: + case PCI_D3hot: power_control |= 3; break; @@ -2680,6 +2686,12 @@ static int tg3_setup_fiber_mii_phy(struc err |= tg3_readphy(tp, MII_BMSR, &bmsr); err |= tg3_readphy(tp, MII_BMSR, &bmsr); + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5714) { + if (tr32(MAC_TX_STATUS) & TX_STATUS_LINK_UP) + bmsr |= BMSR_LSTATUS; + else + bmsr &= ~BMSR_LSTATUS; + } err |= tg3_readphy(tp, MII_BMCR, &bmcr); @@ -2748,6 +2760,13 @@ static int tg3_setup_fiber_mii_phy(struc bmcr = new_bmcr; err |= tg3_readphy(tp, MII_BMSR, &bmsr); err |= tg3_readphy(tp, MII_BMSR, &bmsr); + if (GET_ASIC_REV(tp->pci_chip_rev_id) == + ASIC_REV_5714) { + if (tr32(MAC_TX_STATUS) & TX_STATUS_LINK_UP) + bmsr |= BMSR_LSTATUS; + else + bmsr &= ~BMSR_LSTATUS; + } tp->tg3_flags2 &= ~TG3_FLG2_PARALLEL_DETECT; } } @@ -5501,6 +5520,9 @@ static int tg3_set_mac_addr(struct net_d memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); + if (!netif_running(dev)) + return 0; + spin_lock_bh(&tp->lock); __tg3_set_mac_addr(tp); spin_unlock_bh(&tp->lock); @@ -5568,6 +5590,9 @@ static int tg3_reset_hw(struct tg3 *tp) tg3_abort_hw(tp, 1); } + if (tp->tg3_flags2 & TG3_FLG2_MII_SERDES) + tg3_phy_reset(tp); + err = tg3_chip_reset(tp); if (err) return err; @@ -6080,6 +6105,17 @@ static int tg3_reset_hw(struct tg3 *tp) tp->tg3_flags2 |= TG3_FLG2_HW_AUTONEG; } + if ((tp->tg3_flags2 & TG3_FLG2_MII_SERDES) && + (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5714)) { + u32 tmp; + + tmp = tr32(SERDES_RX_CTRL); + tw32(SERDES_RX_CTRL, tmp | SERDES_RX_SIG_DETECT); + tp->grc_local_ctrl &= ~GRC_LCLCTRL_USE_EXT_SIG_DETECT; + tp->grc_local_ctrl |= GRC_LCLCTRL_USE_SIG_DETECT; + tw32(GRC_LOCAL_CTRL, tp->grc_local_ctrl); + } + err = tg3_setup_phy(tp, 1); if (err) return err; @@ -6158,7 +6194,7 @@ static int tg3_init_hw(struct tg3 *tp) int err; /* Force the chip into D0. */ - err = tg3_set_power_state(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); if (err) goto out; @@ -6447,6 +6483,10 @@ static int tg3_open(struct net_device *d tg3_full_lock(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); + if (err) + return err; + tg3_disable_ints(tp); tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE; @@ -6461,7 +6501,9 @@ static int tg3_open(struct net_device *d if ((tp->tg3_flags2 & TG3_FLG2_5750_PLUS) && (GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5750_AX) && - (GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5750_BX)) { + (GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5750_BX) && + !((GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5714) && + (tp->pdev_peer == tp->pdev))) { /* All MSI supporting chips should support tagged * status. Assert that this is the case. */ @@ -6824,7 +6866,6 @@ static int tg3_close(struct net_device * tp->tg3_flags &= ~(TG3_FLAG_INIT_COMPLETE | TG3_FLAG_GOT_SERDES_FLOWCTL); - netif_carrier_off(tp->dev); tg3_full_unlock(tp); @@ -6841,6 +6882,10 @@ static int tg3_close(struct net_device * tg3_free_consistent(tp); + tg3_set_power_state(tp, PCI_D3hot); + + netif_carrier_off(tp->dev); + return 0; } @@ -7135,6 +7180,9 @@ static void tg3_set_rx_mode(struct net_d { struct tg3 *tp = netdev_priv(dev); + if (!netif_running(dev)) + return; + tg3_full_lock(tp, 0); __tg3_set_rx_mode(dev); tg3_full_unlock(tp); @@ -7159,6 +7207,9 @@ static void tg3_get_regs(struct net_devi memset(p, 0, TG3_REGDUMP_LEN); + if (tp->link_config.phy_is_low_power) + return; + tg3_full_lock(tp, 0); #define __GET_REG32(reg) (*(p)++ = tr32(reg)) @@ -7233,6 +7284,9 @@ static int tg3_get_eeprom(struct net_dev u8 *pd; u32 i, offset, len, val, b_offset, b_count; + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + offset = eeprom->offset; len = eeprom->len; eeprom->len = 0; @@ -7294,6 +7348,9 @@ static int tg3_set_eeprom(struct net_dev u32 offset, len, b_offset, odd_len, start, end; u8 *buf; + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + if (eeprom->magic != TG3_EEPROM_MAGIC) return -EINVAL; @@ -7521,11 +7578,20 @@ static void tg3_get_ringparam(struct net ering->rx_max_pending = TG3_RX_RING_SIZE - 1; ering->rx_mini_max_pending = 0; - ering->rx_jumbo_max_pending = TG3_RX_JUMBO_RING_SIZE - 1; + if (tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) + ering->rx_jumbo_max_pending = TG3_RX_JUMBO_RING_SIZE - 1; + else + ering->rx_jumbo_max_pending = 0; + + ering->tx_max_pending = TG3_TX_RING_SIZE - 1; ering->rx_pending = tp->rx_pending; ering->rx_mini_pending = 0; - ering->rx_jumbo_pending = tp->rx_jumbo_pending; + if (tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) + ering->rx_jumbo_pending = tp->rx_jumbo_pending; + else + ering->rx_jumbo_pending = 0; + ering->tx_pending = tp->tx_pending; } @@ -8214,6 +8280,9 @@ static void tg3_self_test(struct net_dev { struct tg3 *tp = netdev_priv(dev); + if (tp->link_config.phy_is_low_power) + tg3_set_power_state(tp, PCI_D0); + memset(data, 0, sizeof(u64) * TG3_NUM_TEST); if (tg3_test_nvram(tp) != 0) { @@ -8271,6 +8340,9 @@ static void tg3_self_test(struct net_dev tg3_full_unlock(tp); } + if (tp->link_config.phy_is_low_power) + tg3_set_power_state(tp, PCI_D3hot); + } static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) @@ -8290,6 +8362,9 @@ static int tg3_ioctl(struct net_device * if (tp->tg3_flags2 & TG3_FLG2_PHY_SERDES) break; /* We have no PHY */ + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + spin_lock_bh(&tp->lock); err = tg3_readphy(tp, data->reg_num & 0x1f, &mii_regval); spin_unlock_bh(&tp->lock); @@ -8306,6 +8381,9 @@ static int tg3_ioctl(struct net_device * if (!capable(CAP_NET_ADMIN)) return -EPERM; + if (tp->link_config.phy_is_low_power) + return -EAGAIN; + spin_lock_bh(&tp->lock); err = tg3_writephy(tp, data->reg_num & 0x1f, data->val_in); spin_unlock_bh(&tp->lock); @@ -9729,7 +9807,7 @@ static int __devinit tg3_get_invariants( tp->grc_local_ctrl |= GRC_LCLCTRL_GPIO_OE3; /* Force the chip into D0. */ - err = tg3_set_power_state(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); if (err) { printk(KERN_ERR PFX "(%s) transition to D0 failed\n", pci_name(tp->pdev)); @@ -10781,11 +10859,12 @@ static int __devinit tg3_init_one(struct tp->tg3_flags2 |= TG3_FLG2_TSO_CAPABLE; } - /* TSO is off by default, user can enable using ethtool. */ -#if 0 - if (tp->tg3_flags2 & TG3_FLG2_TSO_CAPABLE) + /* TSO is on by default on chips that support hardware TSO. + * Firmware TSO on older chips gives lower performance, so it + * is off by default, but can be enabled using ethtool. + */ + if (tp->tg3_flags2 & TG3_FLG2_HW_TSO) dev->features |= NETIF_F_TSO; -#endif #endif @@ -10978,7 +11057,7 @@ static int tg3_resume(struct pci_dev *pd pci_restore_state(tp->pdev); - err = tg3_set_power_state(tp, 0); + err = tg3_set_power_state(tp, PCI_D0); if (err) return err; diff -puN include/linux/dccp.h~git-net include/linux/dccp.h --- devel/include/linux/dccp.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/dccp.h 2006-02-27 22:37:47.000000000 -0800 @@ -18,7 +18,7 @@ * @dccph_seq - sequence number high or low order 24 bits, depends on dccph_x */ struct dccp_hdr { - __u16 dccph_sport, + __be16 dccph_sport, dccph_dport; __u8 dccph_doff; #if defined(__LITTLE_ENDIAN_BITFIELD) @@ -32,18 +32,18 @@ struct dccp_hdr { #endif __u16 dccph_checksum; #if defined(__LITTLE_ENDIAN_BITFIELD) - __u32 dccph_x:1, + __u8 dccph_x:1, dccph_type:4, - dccph_reserved:3, - dccph_seq:24; + dccph_reserved:3; #elif defined(__BIG_ENDIAN_BITFIELD) - __u32 dccph_reserved:3, + __u8 dccph_reserved:3, dccph_type:4, - dccph_x:1, - dccph_seq:24; + dccph_x:1; #else #error "Adjust your defines" #endif + __u8 dccph_seq2; + __be16 dccph_seq; }; /** @@ -52,7 +52,7 @@ struct dccp_hdr { * @dccph_seq_low - low 24 bits of a 48 bit seq packet */ struct dccp_hdr_ext { - __u32 dccph_seq_low; + __be32 dccph_seq_low; }; /** @@ -62,7 +62,7 @@ struct dccp_hdr_ext { * @dccph_req_options - list of options (must be a multiple of 32 bits */ struct dccp_hdr_request { - __u32 dccph_req_service; + __be32 dccph_req_service; }; /** * struct dccp_hdr_ack_bits - acknowledgment bits common to most packets @@ -71,9 +71,9 @@ struct dccp_hdr_request { * @dccph_resp_ack_nr_low - 48 bit ack number low order bits, contains GSR */ struct dccp_hdr_ack_bits { - __u32 dccph_reserved1:8, - dccph_ack_nr_high:24; - __u32 dccph_ack_nr_low; + __be16 dccph_reserved1; + __be16 dccph_ack_nr_high; + __be32 dccph_ack_nr_low; }; /** * struct dccp_hdr_response - Conection initiation response header @@ -85,7 +85,7 @@ struct dccp_hdr_ack_bits { */ struct dccp_hdr_response { struct dccp_hdr_ack_bits dccph_resp_ack; - __u32 dccph_resp_service; + __be32 dccph_resp_service; }; /** @@ -154,6 +154,10 @@ enum { DCCPO_MANDATORY = 1, DCCPO_MIN_RESERVED = 3, DCCPO_MAX_RESERVED = 31, + DCCPO_CHANGE_L = 32, + DCCPO_CONFIRM_L = 33, + DCCPO_CHANGE_R = 34, + DCCPO_CONFIRM_R = 35, DCCPO_NDP_COUNT = 37, DCCPO_ACK_VECTOR_0 = 38, DCCPO_ACK_VECTOR_1 = 39, @@ -168,7 +172,9 @@ enum { /* DCCP features */ enum { DCCPF_RESERVED = 0, + DCCPF_CCID = 1, DCCPF_SEQUENCE_WINDOW = 3, + DCCPF_ACK_RATIO = 5, DCCPF_SEND_ACK_VECTOR = 6, DCCPF_SEND_NDP_COUNT = 7, /* 10-127 reserved */ @@ -176,9 +182,18 @@ enum { DCCPF_MAX_CCID_SPECIFIC = 255, }; +/* this structure is argument to DCCP_SOCKOPT_CHANGE_X */ +struct dccp_so_feat { + __u8 dccpsf_feat; + __u8 *dccpsf_val; + __u8 dccpsf_len; +}; + /* DCCP socket options */ #define DCCP_SOCKOPT_PACKET_SIZE 1 #define DCCP_SOCKOPT_SERVICE 2 +#define DCCP_SOCKOPT_CHANGE_L 3 +#define DCCP_SOCKOPT_CHANGE_R 4 #define DCCP_SOCKOPT_CCID_RX_INFO 128 #define DCCP_SOCKOPT_CCID_TX_INFO 192 @@ -254,16 +269,12 @@ static inline unsigned int dccp_basic_hd static inline __u64 dccp_hdr_seq(const struct sk_buff *skb) { const struct dccp_hdr *dh = dccp_hdr(skb); -#if defined(__LITTLE_ENDIAN_BITFIELD) - __u64 seq_nr = ntohl(dh->dccph_seq << 8); -#elif defined(__BIG_ENDIAN_BITFIELD) - __u64 seq_nr = ntohl(dh->dccph_seq); -#else -#error "Adjust your defines" -#endif + __u64 seq_nr = ntohs(dh->dccph_seq); if (dh->dccph_x != 0) seq_nr = (seq_nr << 32) + ntohl(dccp_hdrx(skb)->dccph_seq_low); + else + seq_nr += (u32)dh->dccph_seq2 << 16; return seq_nr; } @@ -281,13 +292,7 @@ static inline struct dccp_hdr_ack_bits * static inline u64 dccp_hdr_ack_seq(const struct sk_buff *skb) { const struct dccp_hdr_ack_bits *dhack = dccp_hdr_ack_bits(skb); -#if defined(__LITTLE_ENDIAN_BITFIELD) - return (((u64)ntohl(dhack->dccph_ack_nr_high << 8)) << 32) + ntohl(dhack->dccph_ack_nr_low); -#elif defined(__BIG_ENDIAN_BITFIELD) - return (((u64)ntohl(dhack->dccph_ack_nr_high)) << 32) + ntohl(dhack->dccph_ack_nr_low); -#else -#error "Adjust your defines" -#endif + return ((u64)ntohs(dhack->dccph_ack_nr_high) << 32) + ntohl(dhack->dccph_ack_nr_low); } static inline struct dccp_hdr_response *dccp_hdr_response(struct sk_buff *skb) @@ -314,9 +319,9 @@ static inline unsigned int dccp_hdr_len( /* initial values for each feature */ #define DCCPF_INITIAL_SEQUENCE_WINDOW 100 -/* FIXME: for now we're using CCID 3 (TFRC) */ -#define DCCPF_INITIAL_CCID 3 -#define DCCPF_INITIAL_SEND_ACK_VECTOR 0 +#define DCCPF_INITIAL_ACK_RATIO 2 +#define DCCPF_INITIAL_CCID 2 +#define DCCPF_INITIAL_SEND_ACK_VECTOR 1 /* FIXME: for now we're default to 1 but it should really be 0 */ #define DCCPF_INITIAL_SEND_NDP_COUNT 1 @@ -335,6 +340,24 @@ struct dccp_options { __u8 dccpo_tx_ccid; __u8 dccpo_send_ack_vector; __u8 dccpo_send_ndp_count; + __u8 dccpo_ack_ratio; + struct list_head dccpo_pending; + struct list_head dccpo_conf; +}; + +struct dccp_opt_conf { + __u8 *dccpoc_val; + __u8 dccpoc_len; +}; + +struct dccp_opt_pend { + struct list_head dccpop_node; + __u8 dccpop_type; + __u8 dccpop_feat; + __u8 *dccpop_val; + __u8 dccpop_len; + int dccpop_conf; + struct dccp_opt_conf *dccpop_sc; }; extern void __dccp_options_init(struct dccp_options *dccpo); @@ -345,7 +368,7 @@ struct dccp_request_sock { struct inet_request_sock dreq_inet_rsk; __u64 dreq_iss; __u64 dreq_isr; - __u32 dreq_service; + __be32 dreq_service; }; static inline struct dccp_request_sock *dccp_rsk(const struct request_sock *req) @@ -373,13 +396,13 @@ enum dccp_role { struct dccp_service_list { __u32 dccpsl_nr; - __u32 dccpsl_list[0]; + __be32 dccpsl_list[0]; }; #define DCCP_SERVICE_INVALID_VALUE htonl((__u32)-1) static inline int dccp_list_has_service(const struct dccp_service_list *sl, - const u32 service) + const __be32 service) { if (likely(sl != NULL)) { u32 i = sl->dccpsl_nr; @@ -425,17 +448,17 @@ struct dccp_sock { __u64 dccps_gss; __u64 dccps_gsr; __u64 dccps_gar; - __u32 dccps_service; + __be32 dccps_service; struct dccp_service_list *dccps_service_list; struct timeval dccps_timestamp_time; __u32 dccps_timestamp_echo; __u32 dccps_packet_size; + __u16 dccps_l_ack_ratio; + __u16 dccps_r_ack_ratio; unsigned long dccps_ndp_count; __u32 dccps_mss_cache; struct dccp_options dccps_options; struct dccp_ackvec *dccps_hc_rx_ackvec; - void *dccps_hc_rx_ccid_private; - void *dccps_hc_tx_ccid_private; struct ccid *dccps_hc_rx_ccid; struct ccid *dccps_hc_tx_ccid; struct dccp_options_received dccps_options_received; diff -puN include/linux/icmpv6.h~git-net include/linux/icmpv6.h --- devel/include/linux/icmpv6.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/icmpv6.h 2006-02-27 22:37:47.000000000 -0800 @@ -40,14 +40,16 @@ struct icmp6hdr { struct icmpv6_nd_ra { __u8 hop_limit; #if defined(__LITTLE_ENDIAN_BITFIELD) - __u8 reserved:6, + __u8 reserved:4, + router_pref:2, other:1, managed:1; #elif defined(__BIG_ENDIAN_BITFIELD) __u8 managed:1, other:1, - reserved:6; + router_pref:2, + reserved:4; #else #error "Please fix " #endif @@ -70,8 +72,13 @@ struct icmp6hdr { #define icmp6_addrconf_managed icmp6_dataun.u_nd_ra.managed #define icmp6_addrconf_other icmp6_dataun.u_nd_ra.other #define icmp6_rt_lifetime icmp6_dataun.u_nd_ra.rt_lifetime +#define icmp6_router_pref icmp6_dataun.u_nd_ra.router_pref }; +#define ICMPV6_ROUTER_PREF_LOW 0x3 +#define ICMPV6_ROUTER_PREF_MEDIUM 0x0 +#define ICMPV6_ROUTER_PREF_HIGH 0x1 +#define ICMPV6_ROUTER_PREF_INVALID 0x2 #define ICMPV6_DEST_UNREACH 1 #define ICMPV6_PKT_TOOBIG 2 diff -puN include/linux/if.h~git-net include/linux/if.h --- devel/include/linux/if.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/if.h 2006-02-27 22:37:47.000000000 -0800 @@ -33,7 +33,7 @@ #define IFF_LOOPBACK 0x8 /* is a loopback net */ #define IFF_POINTOPOINT 0x10 /* interface is has p-p link */ #define IFF_NOTRAILERS 0x20 /* avoid use of trailers */ -#define IFF_RUNNING 0x40 /* interface running and carrier ok */ +#define IFF_RUNNING 0x40 /* interface RFC2863 OPER_UP */ #define IFF_NOARP 0x80 /* no ARP protocol */ #define IFF_PROMISC 0x100 /* receive all packets */ #define IFF_ALLMULTI 0x200 /* receive all multicast packets*/ @@ -43,12 +43,16 @@ #define IFF_MULTICAST 0x1000 /* Supports multicast */ -#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_MASTER|IFF_SLAVE|IFF_RUNNING) - #define IFF_PORTSEL 0x2000 /* can set media type */ #define IFF_AUTOMEDIA 0x4000 /* auto media select active */ #define IFF_DYNAMIC 0x8000 /* dialup device with changing addresses*/ +#define IFF_LOWER_UP 0x10000 /* driver signals L1 up */ +#define IFF_DORMANT 0x20000 /* driver signals dormant */ + +#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|\ + IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT) + /* Private (from user) interface flags (netdevice->priv_flags). */ #define IFF_802_1Q_VLAN 0x1 /* 802.1Q VLAN device. */ #define IFF_EBRIDGE 0x2 /* Ethernet bridging device. */ @@ -80,6 +84,22 @@ #define IF_PROTO_FR_ETH_PVC 0x200B #define IF_PROTO_RAW 0x200C /* RAW Socket */ +/* RFC 2863 operational status */ +enum { + IF_OPER_UNKNOWN, + IF_OPER_NOTPRESENT, + IF_OPER_DOWN, + IF_OPER_LOWERLAYERDOWN, + IF_OPER_TESTING, + IF_OPER_DORMANT, + IF_OPER_UP, +}; + +/* link modes */ +enum { + IF_LINK_MODE_DEFAULT, + IF_LINK_MODE_DORMANT, /* limit upward transition to dormant */ +}; /* * Device mapping structure. I'd just gone off and designed a diff -puN include/linux/ipv6.h~git-net include/linux/ipv6.h --- devel/include/linux/ipv6.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/ipv6.h 2006-02-27 22:37:47.000000000 -0800 @@ -145,6 +145,15 @@ struct ipv6_devconf { __s32 max_desync_factor; #endif __s32 max_addresses; + __s32 accept_ra_defrtr; + __s32 accept_ra_pinfo; +#ifdef CONFIG_IPV6_ROUTER_PREF + __s32 accept_ra_rtr_pref; + __s32 rtr_probe_interval; +#ifdef CONFIG_IPV6_ROUTE_INFO + __s32 accept_ra_rt_info_max_plen; +#endif +#endif void *sysctl; }; @@ -167,6 +176,11 @@ enum { DEVCONF_MAX_DESYNC_FACTOR, DEVCONF_MAX_ADDRESSES, DEVCONF_FORCE_MLD_VERSION, + DEVCONF_ACCEPT_RA_DEFRTR, + DEVCONF_ACCEPT_RA_PINFO, + DEVCONF_ACCEPT_RA_RTR_PREF, + DEVCONF_RTR_PROBE_INTERVAL, + DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN, DEVCONF_MAX }; diff -puN include/linux/ipv6_route.h~git-net include/linux/ipv6_route.h --- devel/include/linux/ipv6_route.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/ipv6_route.h 2006-02-27 22:37:47.000000000 -0800 @@ -23,12 +23,22 @@ #define RTF_NONEXTHOP 0x00200000 /* route with no nexthop */ #define RTF_EXPIRES 0x00400000 +#define RTF_ROUTEINFO 0x00800000 /* route information - RA */ + #define RTF_CACHE 0x01000000 /* cache entry */ #define RTF_FLOW 0x02000000 /* flow significant route */ #define RTF_POLICY 0x04000000 /* policy route */ +#define RTF_PREF(pref) ((pref) << 27) +#define RTF_PREF_MASK 0x18000000 + #define RTF_LOCAL 0x80000000 +#ifdef __KERNEL__ +#define IPV6_EXTRACT_PREF(flag) (((flag) & RTF_PREF_MASK) >> 27) +#define IPV6_DECODE_PREF(pref) ((pref) ^ 2) /* 1:low,2:med,3:high */ +#endif + struct in6_rtmsg { struct in6_addr rtmsg_dst; struct in6_addr rtmsg_src; diff -puN include/linux/list.h~git-net include/linux/list.h --- devel/include/linux/list.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/list.h 2006-02-27 22:37:47.000000000 -0800 @@ -411,6 +411,17 @@ static inline void list_splice_init(stru pos = list_entry(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_from - iterate over list of given type + * continuing from existing point + * @pos: the type * to use as a loop counter. + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry_from(pos, head, member) \ + for (; prefetch(pos->member.next), &pos->member != (head); \ + pos = list_entry(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry * @pos: the type * to use as a loop counter. * @n: another type * to use as temporary storage @@ -438,6 +449,19 @@ static inline void list_splice_init(stru pos = n, n = list_entry(n->member.next, typeof(*n), member)) /** + * list_for_each_entry_safe_from - iterate over list of given type + * from existing point safe against removal of list entry + * @pos: the type * to use as a loop counter. + * @n: another type * to use as temporary storage + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry_safe_from(pos, n, head, member) \ + for (n = list_entry(pos->member.next, typeof(*pos), member); \ + &pos->member != (head); \ + pos = n, n = list_entry(n->member.next, typeof(*n), member)) + +/** * list_for_each_entry_safe_reverse - iterate backwards over list of given type safe against * removal of list entry * @pos: the type * to use as a loop counter. diff -puN include/linux/netdevice.h~git-net include/linux/netdevice.h --- devel/include/linux/netdevice.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netdevice.h 2006-02-27 22:37:47.000000000 -0800 @@ -230,7 +230,8 @@ enum netdev_state_t __LINK_STATE_SCHED, __LINK_STATE_NOCARRIER, __LINK_STATE_RX_SCHED, - __LINK_STATE_LINKWATCH_PENDING + __LINK_STATE_LINKWATCH_PENDING, + __LINK_STATE_DORMANT, }; @@ -335,11 +336,14 @@ struct net_device */ - unsigned short flags; /* interface flags (a la BSD) */ + unsigned int flags; /* interface flags (a la BSD) */ unsigned short gflags; unsigned short priv_flags; /* Like 'flags' but invisible to userspace. */ unsigned short padded; /* How much padding added by alloc_netdev() */ + unsigned char operstate; /* RFC2863 operstate */ + unsigned char link_mode; /* mapping policy to operstate */ + unsigned mtu; /* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len; /* hardware hdr length */ @@ -714,6 +718,10 @@ static inline void dev_put(struct net_de /* Carrier loss detection, dial on demand. The functions netif_carrier_on * and _off may be called from IRQ context, but it is caller * who is responsible for serialization of these calls. + * + * The name carrier is inappropriate, these functions should really be + * called netif_lowerlayer_*() because they represent the state of any + * kind of lower layer not just hardware media. */ extern void linkwatch_fire_event(struct net_device *dev); @@ -729,6 +737,29 @@ extern void netif_carrier_on(struct net_ extern void netif_carrier_off(struct net_device *dev); +static inline void netif_dormant_on(struct net_device *dev) +{ + if (!test_and_set_bit(__LINK_STATE_DORMANT, &dev->state)) + linkwatch_fire_event(dev); +} + +static inline void netif_dormant_off(struct net_device *dev) +{ + if (test_and_clear_bit(__LINK_STATE_DORMANT, &dev->state)) + linkwatch_fire_event(dev); +} + +static inline int netif_dormant(const struct net_device *dev) +{ + return test_bit(__LINK_STATE_DORMANT, &dev->state); +} + + +static inline int netif_oper_up(const struct net_device *dev) { + return (dev->operstate == IF_OPER_UP || + dev->operstate == IF_OPER_UNKNOWN /* backward compat */); +} + /* Hot-plugging. */ static inline int netif_device_present(struct net_device *dev) { diff -puN include/linux/netfilter_ipv4/ip_nat.h~git-net include/linux/netfilter_ipv4/ip_nat.h --- devel/include/linux/netfilter_ipv4/ip_nat.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter_ipv4/ip_nat.h 2006-02-27 22:37:47.000000000 -0800 @@ -23,7 +23,7 @@ struct ip_nat_seq { * modification (if any) */ u_int32_t correction_pos; /* sequence number offset before and after last modification */ - int32_t offset_before, offset_after; + int16_t offset_before, offset_after; }; /* Single range specification. */ diff -puN include/linux/netfilter_ipv4/ipt_policy.h~git-net include/linux/netfilter_ipv4/ipt_policy.h --- devel/include/linux/netfilter_ipv4/ipt_policy.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter_ipv4/ipt_policy.h 2006-02-27 22:37:47.000000000 -0800 @@ -1,58 +1,21 @@ #ifndef _IPT_POLICY_H #define _IPT_POLICY_H -#define IPT_POLICY_MAX_ELEM 4 +#define IPT_POLICY_MAX_ELEM XT_POLICY_MAX_ELEM -enum ipt_policy_flags -{ - IPT_POLICY_MATCH_IN = 0x1, - IPT_POLICY_MATCH_OUT = 0x2, - IPT_POLICY_MATCH_NONE = 0x4, - IPT_POLICY_MATCH_STRICT = 0x8, -}; - -enum ipt_policy_modes -{ - IPT_POLICY_MODE_TRANSPORT, - IPT_POLICY_MODE_TUNNEL -}; - -struct ipt_policy_spec -{ - u_int8_t saddr:1, - daddr:1, - proto:1, - mode:1, - spi:1, - reqid:1; -}; - -union ipt_policy_addr -{ - struct in_addr a4; - struct in6_addr a6; -}; - -struct ipt_policy_elem -{ - union ipt_policy_addr saddr; - union ipt_policy_addr smask; - union ipt_policy_addr daddr; - union ipt_policy_addr dmask; - u_int32_t spi; - u_int32_t reqid; - u_int8_t proto; - u_int8_t mode; - - struct ipt_policy_spec match; - struct ipt_policy_spec invert; -}; - -struct ipt_policy_info -{ - struct ipt_policy_elem pol[IPT_POLICY_MAX_ELEM]; - u_int16_t flags; - u_int16_t len; -}; +/* ipt_policy_flags */ +#define IPT_POLICY_MATCH_IN XT_POLICY_MATCH_IN +#define IPT_POLICY_MATCH_OUT XT_POLICY_MATCH_OUT +#define IPT_POLICY_MATCH_NONE XT_POLICY_MATCH_NONE +#define IPT_POLICY_MATCH_STRICT XT_POLICY_MATCH_STRICT + +/* ipt_policy_modes */ +#define IPT_POLICY_MODE_TRANSPORT XT_POLICY_MODE_TRANSPORT +#define IPT_POLICY_MODE_TUNNEL XT_POLICY_MODE_TUNNEL + +#define ipt_policy_spec xt_policy_spec +#define ipt_policy_addr xt_policy_addr +#define ipt_policy_elem xt_policy_elem +#define ipt_policy_info xt_policy_info #endif /* _IPT_POLICY_H */ diff -puN include/linux/netfilter_ipv6/ip6t_policy.h~git-net include/linux/netfilter_ipv6/ip6t_policy.h --- devel/include/linux/netfilter_ipv6/ip6t_policy.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter_ipv6/ip6t_policy.h 2006-02-27 22:37:47.000000000 -0800 @@ -1,58 +1,21 @@ #ifndef _IP6T_POLICY_H #define _IP6T_POLICY_H -#define IP6T_POLICY_MAX_ELEM 4 +#define IP6T_POLICY_MAX_ELEM XT_POLICY_MAX_ELEM -enum ip6t_policy_flags -{ - IP6T_POLICY_MATCH_IN = 0x1, - IP6T_POLICY_MATCH_OUT = 0x2, - IP6T_POLICY_MATCH_NONE = 0x4, - IP6T_POLICY_MATCH_STRICT = 0x8, -}; - -enum ip6t_policy_modes -{ - IP6T_POLICY_MODE_TRANSPORT, - IP6T_POLICY_MODE_TUNNEL -}; - -struct ip6t_policy_spec -{ - u_int8_t saddr:1, - daddr:1, - proto:1, - mode:1, - spi:1, - reqid:1; -}; - -union ip6t_policy_addr -{ - struct in_addr a4; - struct in6_addr a6; -}; - -struct ip6t_policy_elem -{ - union ip6t_policy_addr saddr; - union ip6t_policy_addr smask; - union ip6t_policy_addr daddr; - union ip6t_policy_addr dmask; - u_int32_t spi; - u_int32_t reqid; - u_int8_t proto; - u_int8_t mode; - - struct ip6t_policy_spec match; - struct ip6t_policy_spec invert; -}; - -struct ip6t_policy_info -{ - struct ip6t_policy_elem pol[IP6T_POLICY_MAX_ELEM]; - u_int16_t flags; - u_int16_t len; -}; +/* ip6t_policy_flags */ +#define IP6T_POLICY_MATCH_IN XT_POLICY_MATCH_IN +#define IP6T_POLICY_MATCH_OUT XT_POLICY_MATCH_OUT +#define IP6T_POLICY_MATCH_NONE XT_POLICY_MATCH_NONE +#define IP6T_POLICY_MATCH_STRICT XT_POLICY_MATCH_STRICT + +/* ip6t_policy_modes */ +#define IP6T_POLICY_MODE_TRANSPORT XT_POLICY_MODE_TRANSPORT +#define IP6T_POLICY_MODE_TUNNEL XT_POLICY_MODE_TUNNEL + +#define ip6t_policy_spec xt_policy_spec +#define ip6t_policy_addr xt_policy_addr +#define ip6t_policy_elem xt_policy_elem +#define ip6t_policy_info xt_policy_info #endif /* _IP6T_POLICY_H */ diff -puN include/linux/netfilter/nfnetlink.h~git-net include/linux/netfilter/nfnetlink.h --- devel/include/linux/netfilter/nfnetlink.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter/nfnetlink.h 2006-02-27 22:37:47.000000000 -0800 @@ -164,6 +164,7 @@ extern void nfattr_parse(struct nfattr * __res; \ }) +extern int nfnetlink_has_listeners(unsigned int group); extern int nfnetlink_send(struct sk_buff *skb, u32 pid, unsigned group, int echo); extern int nfnetlink_unicast(struct sk_buff *skb, u_int32_t pid, int flags); diff -puN include/linux/netfilter/nfnetlink_log.h~git-net include/linux/netfilter/nfnetlink_log.h --- devel/include/linux/netfilter/nfnetlink_log.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter/nfnetlink_log.h 2006-02-27 22:37:47.000000000 -0800 @@ -47,6 +47,8 @@ enum nfulnl_attr_type { NFULA_PAYLOAD, /* opaque data payload */ NFULA_PREFIX, /* string prefix */ NFULA_UID, /* user id of socket */ + NFULA_SEQ, /* instance-local sequence number */ + NFULA_SEQ_GLOBAL, /* global sequence number */ __NFULA_MAX }; @@ -77,6 +79,7 @@ enum nfulnl_attr_config { NFULA_CFG_NLBUFSIZ, /* u_int32_t buffer size */ NFULA_CFG_TIMEOUT, /* u_int32_t in 1/100 s */ NFULA_CFG_QTHRESH, /* u_int32_t */ + NFULA_CFG_FLAGS, /* u_int16_t */ __NFULA_CFG_MAX }; #define NFULA_CFG_MAX (__NFULA_CFG_MAX -1) @@ -85,4 +88,7 @@ enum nfulnl_attr_config { #define NFULNL_COPY_META 0x01 #define NFULNL_COPY_PACKET 0x02 +#define NFULNL_CFG_F_SEQ 0x0001 +#define NFULNL_CFG_F_SEQ_GLOBAL 0x0002 + #endif /* _NFNETLINK_LOG_H */ diff -puN include/linux/netfilter/x_tables.h~git-net include/linux/netfilter/x_tables.h --- devel/include/linux/netfilter/x_tables.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netfilter/x_tables.h 2006-02-27 22:37:47.000000000 -0800 @@ -92,8 +92,6 @@ struct xt_match const char name[XT_FUNCTION_MAXNAMELEN-1]; - u_int8_t revision; - /* Return true or false: return FALSE and set *hotdrop = 1 to force immediate packet drop. */ /* Arguments changed since 2.6.9, as this must now handle @@ -102,6 +100,7 @@ struct xt_match int (*match)(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -111,15 +110,25 @@ struct xt_match /* Should return true or false. */ int (*checkentry)(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask); /* Called when entry of this type deleted. */ - void (*destroy)(void *matchinfo, unsigned int matchinfosize); + void (*destroy)(const struct xt_match *match, void *matchinfo, + unsigned int matchinfosize); /* Set this to THIS_MODULE if you are a module, otherwise NULL */ struct module *me; + + char *table; + unsigned int matchsize; + unsigned int hooks; + unsigned short proto; + + unsigned short family; + u_int8_t revision; }; /* Registration hooks for targets. */ @@ -129,8 +138,6 @@ struct xt_target const char name[XT_FUNCTION_MAXNAMELEN-1]; - u_int8_t revision; - /* Returns verdict. Argument order changed since 2.6.9, as this must now handle non-linear skbs, using skb_copy_bits and skb_ip_make_writable. */ @@ -138,6 +145,7 @@ struct xt_target const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userdata); @@ -147,15 +155,25 @@ struct xt_target /* Should return true or false. */ int (*checkentry)(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask); /* Called when entry of this type deleted. */ - void (*destroy)(void *targinfo, unsigned int targinfosize); + void (*destroy)(const struct xt_target *target, void *targinfo, + unsigned int targinfosize); /* Set this to THIS_MODULE if you are a module, otherwise NULL */ struct module *me; + + char *table; + unsigned int targetsize; + unsigned int hooks; + unsigned short proto; + + unsigned short family; + u_int8_t revision; }; /* Furniture shopping... */ @@ -207,6 +225,13 @@ extern void xt_unregister_target(int af, extern int xt_register_match(int af, struct xt_match *target); extern void xt_unregister_match(int af, struct xt_match *target); +extern int xt_check_match(const struct xt_match *match, unsigned short family, + unsigned int size, const char *table, unsigned int hook, + unsigned short proto, int inv_proto); +extern int xt_check_target(const struct xt_target *target, unsigned short family, + unsigned int size, const char *table, unsigned int hook, + unsigned short proto, int inv_proto); + extern int xt_register_table(struct xt_table *table, struct xt_table_info *bootstrap, struct xt_table_info *newinfo); diff -puN /dev/null include/linux/netfilter/xt_policy.h --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/include/linux/netfilter/xt_policy.h 2006-02-27 22:37:47.000000000 -0800 @@ -0,0 +1,58 @@ +#ifndef _XT_POLICY_H +#define _XT_POLICY_H + +#define XT_POLICY_MAX_ELEM 4 + +enum xt_policy_flags +{ + XT_POLICY_MATCH_IN = 0x1, + XT_POLICY_MATCH_OUT = 0x2, + XT_POLICY_MATCH_NONE = 0x4, + XT_POLICY_MATCH_STRICT = 0x8, +}; + +enum xt_policy_modes +{ + XT_POLICY_MODE_TRANSPORT, + XT_POLICY_MODE_TUNNEL +}; + +struct xt_policy_spec +{ + u_int8_t saddr:1, + daddr:1, + proto:1, + mode:1, + spi:1, + reqid:1; +}; + +union xt_policy_addr +{ + struct in_addr a4; + struct in6_addr a6; +}; + +struct xt_policy_elem +{ + union xt_policy_addr saddr; + union xt_policy_addr smask; + union xt_policy_addr daddr; + union xt_policy_addr dmask; + u_int32_t spi; + u_int32_t reqid; + u_int8_t proto; + u_int8_t mode; + + struct xt_policy_spec match; + struct xt_policy_spec invert; +}; + +struct xt_policy_info +{ + struct xt_policy_elem pol[XT_POLICY_MAX_ELEM]; + u_int16_t flags; + u_int16_t len; +}; + +#endif /* _XT_POLICY_H */ diff -puN include/linux/netlink.h~git-net include/linux/netlink.h --- devel/include/linux/netlink.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/netlink.h 2006-02-27 22:37:47.000000000 -0800 @@ -151,6 +151,7 @@ struct netlink_skb_parms extern struct sock *netlink_kernel_create(int unit, unsigned int groups, void (*input)(struct sock *sk, int len), struct module *module); extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err); +extern int netlink_has_listeners(struct sock *sk, unsigned int group); extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock); extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 pid, __u32 group, gfp_t allocation); diff -puN include/linux/pci_ids.h~git-net include/linux/pci_ids.h --- devel/include/linux/pci_ids.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/pci_ids.h 2006-02-27 22:37:47.000000000 -0800 @@ -1854,12 +1854,14 @@ #define PCI_DEVICE_ID_TIGON3_5705M 0x165d #define PCI_DEVICE_ID_TIGON3_5705M_2 0x165e #define PCI_DEVICE_ID_TIGON3_5714 0x1668 +#define PCI_DEVICE_ID_TIGON3_5714S 0x1669 #define PCI_DEVICE_ID_TIGON3_5780 0x166a #define PCI_DEVICE_ID_TIGON3_5780S 0x166b #define PCI_DEVICE_ID_TIGON3_5705F 0x166e #define PCI_DEVICE_ID_TIGON3_5750 0x1676 #define PCI_DEVICE_ID_TIGON3_5751 0x1677 #define PCI_DEVICE_ID_TIGON3_5715 0x1678 +#define PCI_DEVICE_ID_TIGON3_5715S 0x1679 #define PCI_DEVICE_ID_TIGON3_5750M 0x167c #define PCI_DEVICE_ID_TIGON3_5751M 0x167d #define PCI_DEVICE_ID_TIGON3_5751F 0x167e diff -puN include/linux/rtnetlink.h~git-net include/linux/rtnetlink.h --- devel/include/linux/rtnetlink.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/rtnetlink.h 2006-02-27 22:37:47.000000000 -0800 @@ -733,6 +733,8 @@ enum #define IFLA_MAP IFLA_MAP IFLA_WEIGHT, #define IFLA_WEIGHT IFLA_WEIGHT + IFLA_OPERSTATE, + IFLA_LINKMODE, __IFLA_MAX }; diff -puN include/linux/skbuff.h~git-net include/linux/skbuff.h --- devel/include/linux/skbuff.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/skbuff.h 2006-02-27 22:37:47.000000000 -0800 @@ -270,7 +270,6 @@ struct sk_buff { void (*destructor)(struct sk_buff *skb); #ifdef CONFIG_NETFILTER - __u32 nfmark; struct nf_conntrack *nfct; #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) struct sk_buff *nfct_reasm; @@ -278,6 +277,7 @@ struct sk_buff { #ifdef CONFIG_BRIDGE_NETFILTER struct nf_bridge_info *nf_bridge; #endif + __u32 nfmark; #endif /* CONFIG_NETFILTER */ #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ @@ -304,6 +304,7 @@ struct sk_buff { #include +extern void kfree_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); extern struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int fclone); @@ -404,22 +405,6 @@ static inline struct sk_buff *skb_get(st */ /** - * kfree_skb - free an sk_buff - * @skb: buffer to free - * - * Drop a reference to the buffer and free it if the usage count has - * hit zero. - */ -static inline void kfree_skb(struct sk_buff *skb) -{ - if (likely(atomic_read(&skb->users) == 1)) - smp_rmb(); - else if (likely(!atomic_dec_and_test(&skb->users))) - return; - __kfree_skb(skb); -} - -/** * skb_cloned - is the buffer a clone * @skb: buffer to check * @@ -1351,16 +1336,6 @@ static inline void nf_conntrack_put_reas kfree_skb(skb); } #endif -static inline void nf_reset(struct sk_buff *skb) -{ - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) - nf_conntrack_put_reasm(skb->nfct_reasm); - skb->nfct_reasm = NULL; -#endif -} - #ifdef CONFIG_BRIDGE_NETFILTER static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge) { @@ -1373,6 +1348,20 @@ static inline void nf_bridge_get(struct atomic_inc(&nf_bridge->use); } #endif /* CONFIG_BRIDGE_NETFILTER */ +static inline void nf_reset(struct sk_buff *skb) +{ + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) + nf_conntrack_put_reasm(skb->nfct_reasm); + skb->nfct_reasm = NULL; +#endif +#ifdef CONFIG_BRIDGE_NETFILTER + nf_bridge_put(skb->nf_bridge); + skb->nf_bridge = NULL; +#endif +} + #else /* CONFIG_NETFILTER */ static inline void nf_reset(struct sk_buff *skb) {} #endif /* CONFIG_NETFILTER */ diff -puN include/linux/sysctl.h~git-net include/linux/sysctl.h --- devel/include/linux/sysctl.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/sysctl.h 2006-02-27 22:37:47.000000000 -0800 @@ -210,6 +210,7 @@ enum NET_SCTP=17, NET_LLC=18, NET_NETFILTER=19, + NET_DCCP=20, }; /* /proc/sys/kernel/random */ @@ -260,6 +261,8 @@ enum NET_CORE_DEV_WEIGHT=17, NET_CORE_SOMAXCONN=18, NET_CORE_BUDGET=19, + NET_CORE_AEVENT_ETIME=20, + NET_CORE_AEVENT_RSEQTH=21, }; /* /proc/sys/net/ethernet */ @@ -396,6 +399,8 @@ enum NET_TCP_CONG_CONTROL=110, NET_TCP_ABC=111, NET_IPV4_IPFRAG_MAX_DIST=112, + NET_TCP_MTU_PROBING=113, + NET_TCP_BASE_MSS=114, }; enum { @@ -530,6 +535,11 @@ enum { NET_IPV6_MAX_DESYNC_FACTOR=15, NET_IPV6_MAX_ADDRESSES=16, NET_IPV6_FORCE_MLD_VERSION=17, + NET_IPV6_ACCEPT_RA_DEFRTR=18, + NET_IPV6_ACCEPT_RA_PINFO=19, + NET_IPV6_ACCEPT_RA_RTR_PREF=20, + NET_IPV6_RTR_PROBE_INTERVAL=21, + NET_IPV6_ACCEPT_RA_RT_INFO_MAX_PLEN=22, __NET_IPV6_MAX }; @@ -561,6 +571,21 @@ enum { __NET_NEIGH_MAX }; +/* /proc/sys/net/dccp */ +enum { + NET_DCCP_DEFAULT=1, +}; + +/* /proc/sys/net/dccp/default */ +enum { + NET_DCCP_DEFAULT_SEQ_WINDOW = 1, + NET_DCCP_DEFAULT_RX_CCID = 2, + NET_DCCP_DEFAULT_TX_CCID = 3, + NET_DCCP_DEFAULT_ACK_RATIO = 4, + NET_DCCP_DEFAULT_SEND_ACKVEC = 5, + NET_DCCP_DEFAULT_SEND_NDP = 6, +}; + /* /proc/sys/net/ipx */ enum { NET_IPX_PPROP_BROADCASTING=1, diff -puN include/linux/tcp.h~git-net include/linux/tcp.h --- devel/include/linux/tcp.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/tcp.h 2006-02-27 22:37:47.000000000 -0800 @@ -343,6 +343,12 @@ struct tcp_sock { __u32 seq; __u32 time; } rcvq_space; + +/* TCP-specific MTU probe information. */ + struct { + __u32 probe_seq_start; + __u32 probe_seq_end; + } mtu_probe; }; static inline struct tcp_sock *tcp_sk(const struct sock *sk) diff -puN include/linux/xfrm.h~git-net include/linux/xfrm.h --- devel/include/linux/xfrm.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/linux/xfrm.h 2006-02-27 22:37:47.000000000 -0800 @@ -156,6 +156,10 @@ enum { XFRM_MSG_FLUSHPOLICY, #define XFRM_MSG_FLUSHPOLICY XFRM_MSG_FLUSHPOLICY + XFRM_MSG_NEWAE, +#define XFRM_MSG_NEWAE XFRM_MSG_NEWAE + XFRM_MSG_GETAE, +#define XFRM_MSG_GETAE XFRM_MSG_GETAE __XFRM_MSG_MAX }; #define XFRM_MSG_MAX (__XFRM_MSG_MAX - 1) @@ -194,6 +198,21 @@ struct xfrm_encap_tmpl { xfrm_address_t encap_oa; }; +/* AEVENT flags */ +enum xfrm_ae_ftype_t { + XFRM_AE_UNSPEC, + XFRM_AE_RTHR=1, /* replay threshold*/ + XFRM_AE_RVAL=2, /* replay value */ + XFRM_AE_LVAL=4, /* lifetime value */ + XFRM_AE_ETHR=8, /* expiry timer threshold */ + XFRM_AE_CR=16, /* Event cause is replay update */ + XFRM_AE_CE=32, /* Event cause is timer expiry */ + XFRM_AE_CU=64, /* Event cause is policy update */ + __XFRM_AE_MAX + +#define XFRM_AE_MAX (__XFRM_AE_MAX - 1) +}; + /* Netlink message attributes. */ enum xfrm_attr_type_t { XFRMA_UNSPEC, @@ -205,6 +224,10 @@ enum xfrm_attr_type_t { XFRMA_SA, XFRMA_POLICY, XFRMA_SEC_CTX, /* struct xfrm_sec_ctx */ + XFRMA_LTIME_VAL, + XFRMA_REPLAY_VAL, + XFRMA_REPLAY_THRESH, + XFRMA_ETIMER_THRESH, __XFRMA_MAX #define XFRMA_MAX (__XFRMA_MAX - 1) @@ -235,6 +258,11 @@ struct xfrm_usersa_id { __u8 proto; }; +struct xfrm_aevent_id { + struct xfrm_usersa_id sa_id; + __u32 flags; +}; + struct xfrm_userspi_info { struct xfrm_usersa_info info; __u32 min; @@ -306,6 +334,8 @@ enum xfrm_nlgroups { #define XFRMNLGRP_SA XFRMNLGRP_SA XFRMNLGRP_POLICY, #define XFRMNLGRP_POLICY XFRMNLGRP_POLICY + XFRMNLGRP_AEVENTS, +#define XFRMNLGRP_AEVENTS XFRMNLGRP_AEVENTS __XFRMNLGRP_MAX }; #define XFRMNLGRP_MAX (__XFRMNLGRP_MAX - 1) diff -puN include/net/if_inet6.h~git-net include/net/if_inet6.h --- devel/include/net/if_inet6.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/if_inet6.h 2006-02-27 22:37:47.000000000 -0800 @@ -180,11 +180,8 @@ struct inet6_dev #ifdef CONFIG_IPV6_PRIVACY u8 rndid[8]; - u8 entropy[8]; struct timer_list regen_timer; struct inet6_ifaddr *tempaddr_list; - __u8 work_eui64[8]; - __u8 work_digest[16]; #endif struct neigh_parms *nd_parms; diff -puN include/net/inet_connection_sock.h~git-net include/net/inet_connection_sock.h --- devel/include/net/inet_connection_sock.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/inet_connection_sock.h 2006-02-27 22:37:47.000000000 -0800 @@ -72,6 +72,7 @@ struct inet_connection_sock_af_ops { * @icsk_probes_out: unanswered 0 window probes * @icsk_ext_hdr_len: Network protocol overhead (IP/IPv6 options) * @icsk_ack: Delayed ACK control data + * @icsk_mtup; MTU probing control data */ struct inet_connection_sock { /* inet_sock has to be the first member! */ @@ -104,6 +105,16 @@ struct inet_connection_sock { __u16 last_seg_size; /* Size of last incoming segment */ __u16 rcv_mss; /* MSS used for delayed ACK decisions */ } icsk_ack; + struct { + int enabled; + + /* Range of MTUs to search */ + int search_high; + int search_low; + + /* Information on the current probe. */ + int probe_size; + } icsk_mtup; u32 icsk_ca_priv[16]; #define ICSK_CA_PRIV_SIZE (16 * sizeof(u32)) }; @@ -310,4 +321,8 @@ extern void inet_csk_listen_stop(struct extern void inet_csk_addr2sockaddr(struct sock *sk, struct sockaddr *uaddr); +extern int inet_csk_ctl_sock_create(struct socket **sock, + unsigned short family, + unsigned short type, + unsigned char protocol); #endif /* _INET_CONNECTION_SOCK_H */ diff -puN include/net/ip6_route.h~git-net include/net/ip6_route.h --- devel/include/net/ip6_route.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/ip6_route.h 2006-02-27 22:37:47.000000000 -0800 @@ -7,6 +7,23 @@ #define IP6_RT_PRIO_KERN 512 #define IP6_RT_FLOW_MASK 0x00ff +struct route_info { + __u8 type; + __u8 length; + __u8 prefix_len; +#if defined(__BIG_ENDIAN_BITFIELD) + __u8 reserved_h:3, + route_pref:2, + reserved_l:3; +#elif defined(__LITTLE_ENDIAN_BITFIELD) + __u8 reserved_l:3, + route_pref:2, + reserved_h:3; +#endif + __u32 lifetime; + __u8 prefix[0]; /* 0,8 or 16 */ +}; + #ifdef __KERNEL__ #include @@ -87,11 +104,14 @@ extern struct rt6_info *addrconf_dst_all extern struct rt6_info * rt6_get_dflt_router(struct in6_addr *addr, struct net_device *dev); extern struct rt6_info * rt6_add_dflt_router(struct in6_addr *gwaddr, - struct net_device *dev); + struct net_device *dev, + unsigned int pref); extern void rt6_purge_dflt_routers(void); -extern void rt6_reset_dflt_pointer(struct rt6_info *rt); +extern int rt6_route_rcv(struct net_device *dev, + u8 *opt, int len, + struct in6_addr *gwaddr); extern void rt6_redirect(struct in6_addr *dest, struct in6_addr *saddr, diff -puN include/net/ipv6.h~git-net include/net/ipv6.h --- devel/include/net/ipv6.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/ipv6.h 2006-02-27 22:37:47.000000000 -0800 @@ -282,6 +282,18 @@ static inline int ipv6_addr_cmp(const st return memcmp((const void *) a1, (const void *) a2, sizeof(struct in6_addr)); } +static inline int +ipv6_masked_addr_cmp(const struct in6_addr *a1, const struct in6_addr *m, + const struct in6_addr *a2) +{ + unsigned int i; + + for (i = 0; i < 4; i++) + if ((a1->s6_addr32[i] ^ a2->s6_addr32[i]) & m->s6_addr32[i]) + return 1; + return 0; +} + static inline void ipv6_addr_copy(struct in6_addr *a1, const struct in6_addr *a2) { memcpy((void *) a1, (const void *) a2, sizeof(struct in6_addr)); diff -puN include/net/ndisc.h~git-net include/net/ndisc.h --- devel/include/net/ndisc.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/ndisc.h 2006-02-27 22:37:47.000000000 -0800 @@ -22,6 +22,8 @@ enum { ND_OPT_PREFIX_INFO = 3, /* RFC2461 */ ND_OPT_REDIRECT_HDR = 4, /* RFC2461 */ ND_OPT_MTU = 5, /* RFC2461 */ + __ND_OPT_ARRAY_MAX, + ND_OPT_ROUTE_INFO = 24, /* RFC4191 */ __ND_OPT_MAX }; diff -puN include/net/netfilter/nf_conntrack.h~git-net include/net/netfilter/nf_conntrack.h --- devel/include/net/netfilter/nf_conntrack.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/netfilter/nf_conntrack.h 2006-02-27 22:37:47.000000000 -0800 @@ -67,6 +67,18 @@ do { \ struct nf_conntrack_helper; +/* nf_conn feature for connections that have a helper */ +struct nf_conn_help { + /* Helper. if any */ + struct nf_conntrack_helper *helper; + + union nf_conntrack_help help; + + /* Current number of expected connections */ + unsigned int expecting; +}; + + #include struct nf_conn { @@ -81,6 +93,9 @@ struct nf_conn /* Have we seen traffic both ways yet? (bitset) */ unsigned long status; + /* If we were expected by an expectation, this will be it */ + struct nf_conn *master; + /* Timer function; drops refcnt when it goes off. */ struct timer_list timeout; @@ -88,38 +103,22 @@ struct nf_conn /* Accounting Information (same cache line as other written members) */ struct ip_conntrack_counter counters[IP_CT_DIR_MAX]; #endif - /* If we were expected by an expectation, this will be it */ - struct nf_conn *master; - - /* Current number of expected connections */ - unsigned int expecting; /* Unique ID that identifies this conntrack*/ unsigned int id; - /* Helper. if any */ - struct nf_conntrack_helper *helper; - /* features - nat, helper, ... used by allocating system */ u_int32_t features; - /* Storage reserved for other modules: */ - - union nf_conntrack_proto proto; - #if defined(CONFIG_NF_CONNTRACK_MARK) u_int32_t mark; #endif - /* These members are dynamically allocated. */ - - union nf_conntrack_help *help; + /* Storage reserved for other modules: */ + union nf_conntrack_proto proto; - /* Layer 3 dependent members. (ex: NAT) */ - union { - struct nf_conntrack_ipv4 *ipv4; - } l3proto; - void *data[0]; + /* features dynamically at the end: helper, nat (both optional) */ + char data[0]; }; struct nf_conntrack_expect @@ -373,10 +372,23 @@ nf_conntrack_expect_event(enum ip_conntr #define NF_CT_F_NUM 4 extern int -nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size, - int (*init_conntrack)(struct nf_conn *, u_int32_t)); +nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size); extern void nf_conntrack_unregister_cache(u_int32_t features); +/* valid combinations: + * basic: nf_conn, nf_conn .. nf_conn_help + * nat: nf_conn .. nf_conn_nat, nf_conn .. nf_conn_nat, nf_conn help + */ +static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) +{ + unsigned int offset = sizeof(struct nf_conn); + + if (!(ct->features & NF_CT_F_HELP)) + return NULL; + + return (struct nf_conn_help *) ((void *)ct + offset); +} + #endif /* __KERNEL__ */ #endif /* _NF_CONNTRACK_H */ diff -puN include/net/scm.h~git-net include/net/scm.h --- devel/include/net/scm.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/scm.h 2006-02-27 22:37:47.000000000 -0800 @@ -37,10 +37,14 @@ static __inline__ void scm_destroy(struc static __inline__ int scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm) { - memset(scm, 0, sizeof(*scm)); - scm->creds.uid = current->uid; - scm->creds.gid = current->gid; - scm->creds.pid = current->tgid; + struct task_struct *p = current; + scm->creds = (struct ucred) { + .uid = p->uid, + .gid = p->gid, + .pid = p->tgid + }; + scm->fp = NULL; + scm->seq = 0; if (msg->msg_controllen <= 0) return 0; return __scm_send(sock, msg, scm); diff -puN include/net/tcp.h~git-net include/net/tcp.h --- devel/include/net/tcp.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/tcp.h 2006-02-27 22:37:47.000000000 -0800 @@ -60,6 +60,9 @@ extern void tcp_time_wait(struct sock *s /* Minimal RCV_MSS. */ #define TCP_MIN_RCVMSS 536U +/* The least MTU to use for probing */ +#define TCP_BASE_MSS 512 + /* After receiving this amount of duplicate ACKs fast retransmit starts. */ #define TCP_FASTRETRANS_THRESH 3 @@ -219,6 +222,8 @@ extern int sysctl_tcp_nometrics_save; extern int sysctl_tcp_moderate_rcvbuf; extern int sysctl_tcp_tso_win_divisor; extern int sysctl_tcp_abc; +extern int sysctl_tcp_mtu_probing; +extern int sysctl_tcp_base_mss; extern atomic_t tcp_memory_allocated; extern atomic_t tcp_sockets_allocated; @@ -447,6 +452,10 @@ extern int tcp_read_sock(struct sock *sk extern void tcp_initialize_rcv_mss(struct sock *sk); +extern int tcp_mtu_to_mss(struct sock *sk, int pmtu); +extern int tcp_mss_to_mtu(struct sock *sk, int mss); +extern void tcp_mtup_init(struct sock *sk); + static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd) { tp->pred_flags = htonl((tp->tcp_header_len << 26) | diff -puN include/net/xfrm.h~git-net include/net/xfrm.h --- devel/include/net/xfrm.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/include/net/xfrm.h 2006-02-27 22:37:47.000000000 -0800 @@ -20,6 +20,10 @@ #define XFRM_ALIGN8(len) (((len) + 7) & ~7) +extern struct sock *xfrm_nl; +extern u32 sysctl_xfrm_aevent_etime; +extern u32 sysctl_xfrm_aevent_rseqth; + extern struct semaphore xfrm_cfg_sem; /* Organization of SPD aka "XFRM rules" @@ -135,6 +139,16 @@ struct xfrm_state /* State for replay detection */ struct xfrm_replay_state replay; + /* Replay detection state at the time we sent the last notification */ + struct xfrm_replay_state preplay; + + /* Replay detection notification settings */ + u32 replay_maxage; + u32 replay_maxdiff; + + /* Replay detection notification timer */ + struct timer_list rtimer; + /* Statistics */ struct xfrm_stats stats; @@ -169,6 +183,7 @@ struct km_event u32 hard; u32 proto; u32 byid; + u32 aevent; } data; u32 seq; @@ -199,10 +214,13 @@ extern int xfrm_policy_register_afinfo(s extern int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo); extern void km_policy_notify(struct xfrm_policy *xp, int dir, struct km_event *c); extern void km_state_notify(struct xfrm_state *x, struct km_event *c); - #define XFRM_ACQ_EXPIRES 30 struct xfrm_tmpl; +extern int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); +extern void km_state_expired(struct xfrm_state *x, int hard, u32 pid); +extern int __xfrm_state_delete(struct xfrm_state *x); + struct xfrm_state_afinfo { unsigned short family; rwlock_t lock; @@ -305,7 +323,21 @@ struct xfrm_policy struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH]; }; -#define XFRM_KM_TIMEOUT 30 +#define XFRM_KM_TIMEOUT 30 +/* which seqno */ +#define XFRM_REPLAY_SEQ 1 +#define XFRM_REPLAY_OSEQ 2 +#define XFRM_REPLAY_SEQ_MASK 3 +/* what happened */ +#define XFRM_REPLAY_UPDATE XFRM_AE_CR +#define XFRM_REPLAY_TIMEOUT XFRM_AE_CE + +/* default aevent timeout in units of 100ms */ +#define XFRM_AE_ETIME 10 +/* Async Event timer multiplier */ +#define XFRM_AE_ETH_M 10 +/* default seq threshold size */ +#define XFRM_AE_SEQT_SIZE 2 struct xfrm_mgr { @@ -865,6 +897,7 @@ extern int xfrm_state_delete(struct xfrm extern void xfrm_state_flush(u8 proto); extern int xfrm_replay_check(struct xfrm_state *x, u32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, u32 seq); +extern void xfrm_replay_notify(struct xfrm_state *x, int event); extern int xfrm_state_check(struct xfrm_state *x, struct sk_buff *skb); extern int xfrm_state_mtu(struct xfrm_state *x, int mtu); extern int xfrm_init_state(struct xfrm_state *x); @@ -924,7 +957,7 @@ extern void xfrm_init_pmtu(struct dst_en extern wait_queue_head_t km_waitq; extern int km_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport); -extern void km_policy_expired(struct xfrm_policy *pol, int dir, int hard); +extern void km_policy_expired(struct xfrm_policy *pol, int dir, int hard, u32 pid); extern void xfrm_input_init(void); extern int xfrm_parse_spi(struct sk_buff *skb, u8 nexthdr, u32 *spi, u32 *seq); @@ -965,4 +998,16 @@ static inline int xfrm_policy_id2dir(u32 return index & 7; } +static inline int xfrm_aevent_is_on(void) +{ + return netlink_has_listeners(xfrm_nl,XFRMNLGRP_AEVENTS); +} + +static inline void xfrm_aevent_doreplay(struct xfrm_state *x) +{ + if (xfrm_aevent_is_on()) + xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); +} + + #endif /* _NET_XFRM_H */ diff -puN net/8021q/vlan.c~git-net net/8021q/vlan.c --- devel/net/8021q/vlan.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/8021q/vlan.c 2006-02-27 22:37:47.000000000 -0800 @@ -69,7 +69,7 @@ static struct packet_type vlan_packet_ty /* Bits of netdev state that are propagated from real device to virtual */ #define VLAN_LINK_STATE_MASK \ - ((1<<__LINK_STATE_PRESENT)|(1<<__LINK_STATE_NOCARRIER)) + ((1<<__LINK_STATE_PRESENT)|(1<<__LINK_STATE_NOCARRIER)|(1<<__LINK_STATE_DORMANT)) /* End of global variables definitions. */ @@ -344,6 +344,26 @@ static void vlan_setup(struct net_device new_dev->do_ioctl = vlan_dev_ioctl; } +static void vlan_transfer_operstate(const struct net_device *dev, struct net_device *vlandev) +{ + /* Have to respect userspace enforced dormant state + * of real device, also must allow supplicant running + * on VLAN device + */ + if (dev->operstate == IF_OPER_DORMANT) + netif_dormant_on(vlandev); + else + netif_dormant_off(vlandev); + + if (netif_carrier_ok(dev)) { + if (!netif_carrier_ok(vlandev)) + netif_carrier_on(vlandev); + } else { + if (netif_carrier_ok(vlandev)) + netif_carrier_off(vlandev); + } +} + /* Attach a VLAN device to a mac address (ie Ethernet Card). * Returns the device that was created, or NULL if there was * an error of some kind. @@ -450,7 +470,7 @@ static struct net_device *register_vlan_ new_dev->flags = real_dev->flags; new_dev->flags &= ~IFF_UP; - new_dev->state = real_dev->state & VLAN_LINK_STATE_MASK; + new_dev->state = real_dev->state & ~(1<<__LINK_STATE_START); /* need 4 bytes for extra VLAN header info, * hope the underlying device can handle it. @@ -498,6 +518,10 @@ static struct net_device *register_vlan_ if (register_netdevice(new_dev)) goto out_free_newdev; + new_dev->iflink = real_dev->ifindex; + vlan_transfer_operstate(real_dev, new_dev); + linkwatch_fire_event(new_dev); /* _MUST_ call rfc2863_policy() */ + /* So, got the sucker initialized, now lets place * it into our local structure. */ @@ -573,25 +597,12 @@ static int vlan_device_event(struct noti switch (event) { case NETDEV_CHANGE: /* Propagate real device state to vlan devices */ - flgs = dev->state & VLAN_LINK_STATE_MASK; for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++) { vlandev = grp->vlan_devices[i]; if (!vlandev) continue; - if (netif_carrier_ok(dev)) { - if (!netif_carrier_ok(vlandev)) - netif_carrier_on(vlandev); - } else { - if (netif_carrier_ok(vlandev)) - netif_carrier_off(vlandev); - } - - if ((vlandev->state & VLAN_LINK_STATE_MASK) != flgs) { - vlandev->state = (vlandev->state &~ VLAN_LINK_STATE_MASK) - | flgs; - netdev_state_change(vlandev); - } + vlan_transfer_operstate(dev, vlandev); } break; diff -puN net/core/dev.c~git-net net/core/dev.c --- devel/net/core/dev.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/dev.c 2006-02-27 22:37:47.000000000 -0800 @@ -2150,12 +2150,20 @@ unsigned dev_get_flags(const struct net_ flags = (dev->flags & ~(IFF_PROMISC | IFF_ALLMULTI | - IFF_RUNNING)) | + IFF_RUNNING | + IFF_LOWER_UP | + IFF_DORMANT)) | (dev->gflags & (IFF_PROMISC | IFF_ALLMULTI)); - if (netif_running(dev) && netif_carrier_ok(dev)) - flags |= IFF_RUNNING; + if (netif_running(dev)) { + if (netif_oper_up(dev)) + flags |= IFF_RUNNING; + if (netif_carrier_ok(dev)) + flags |= IFF_LOWER_UP; + if (netif_dormant(dev)) + flags |= IFF_DORMANT; + } return flags; } diff -puN net/core/link_watch.c~git-net net/core/link_watch.c --- devel/net/core/link_watch.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/link_watch.c 2006-02-27 22:37:47.000000000 -0800 @@ -49,6 +49,45 @@ struct lw_event { /* Avoid kmalloc() for most systems */ static struct lw_event singleevent; +static unsigned char default_operstate(const struct net_device *dev) +{ + if (!netif_carrier_ok(dev)) + return (dev->ifindex != dev->iflink ? + IF_OPER_LOWERLAYERDOWN : IF_OPER_DOWN); + + if (netif_dormant(dev)) + return IF_OPER_DORMANT; + + return IF_OPER_UP; +} + + +static void rfc2863_policy(struct net_device *dev) +{ + unsigned char operstate = default_operstate(dev); + + if (operstate == dev->operstate) + return; + + write_lock_bh(&dev_base_lock); + + switch(dev->link_mode) { + case IF_LINK_MODE_DORMANT: + if (operstate == IF_OPER_UP) + operstate = IF_OPER_DORMANT; + break; + + case IF_LINK_MODE_DEFAULT: + default: + break; + }; + + dev->operstate = operstate; + + write_unlock_bh(&dev_base_lock); +} + + /* Must be called with the rtnl semaphore held */ void linkwatch_run_queue(void) { @@ -74,6 +113,7 @@ void linkwatch_run_queue(void) */ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state); + rfc2863_policy(dev); if (dev->flags & IFF_UP) { if (netif_carrier_ok(dev)) { WARN_ON(dev->qdisc_sleeping == &noop_qdisc); diff -puN net/core/neighbour.c~git-net net/core/neighbour.c --- devel/net/core/neighbour.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/neighbour.c 2006-02-27 22:37:47.000000000 -0800 @@ -750,11 +750,13 @@ static void neigh_timer_handler(unsigned neigh->used + neigh->parms->delay_probe_time)) { NEIGH_PRINTK2("neigh %p is delayed.\n", neigh); neigh->nud_state = NUD_DELAY; + neigh->updated = jiffies; neigh_suspect(neigh); next = now + neigh->parms->delay_probe_time; } else { NEIGH_PRINTK2("neigh %p is suspected.\n", neigh); neigh->nud_state = NUD_STALE; + neigh->updated = jiffies; neigh_suspect(neigh); } } else if (state & NUD_DELAY) { @@ -762,11 +764,13 @@ static void neigh_timer_handler(unsigned neigh->confirmed + neigh->parms->delay_probe_time)) { NEIGH_PRINTK2("neigh %p is now reachable.\n", neigh); neigh->nud_state = NUD_REACHABLE; + neigh->updated = jiffies; neigh_connect(neigh); next = neigh->confirmed + neigh->parms->reachable_time; } else { NEIGH_PRINTK2("neigh %p is probed.\n", neigh); neigh->nud_state = NUD_PROBE; + neigh->updated = jiffies; atomic_set(&neigh->probes, 0); next = now + neigh->parms->retrans_time; } @@ -780,6 +784,7 @@ static void neigh_timer_handler(unsigned struct sk_buff *skb; neigh->nud_state = NUD_FAILED; + neigh->updated = jiffies; notify = 1; NEIGH_CACHE_STAT_INC(neigh->tbl, res_failed); NEIGH_PRINTK2("neigh %p is failed.\n", neigh); @@ -843,10 +848,12 @@ int __neigh_event_send(struct neighbour if (neigh->parms->mcast_probes + neigh->parms->app_probes) { atomic_set(&neigh->probes, neigh->parms->ucast_probes); neigh->nud_state = NUD_INCOMPLETE; + neigh->updated = jiffies; neigh_hold(neigh); neigh_add_timer(neigh, now + 1); } else { neigh->nud_state = NUD_FAILED; + neigh->updated = jiffies; write_unlock_bh(&neigh->lock); if (skb) @@ -857,6 +864,7 @@ int __neigh_event_send(struct neighbour NEIGH_PRINTK2("neigh %p is delayed.\n", neigh); neigh_hold(neigh); neigh->nud_state = NUD_DELAY; + neigh->updated = jiffies; neigh_add_timer(neigh, jiffies + neigh->parms->delay_probe_time); } diff -puN net/core/net-sysfs.c~git-net net/core/net-sysfs.c --- devel/net/core/net-sysfs.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/net-sysfs.c 2006-02-27 22:37:47.000000000 -0800 @@ -91,6 +91,7 @@ NETDEVICE_SHOW(iflink, fmt_dec); NETDEVICE_SHOW(ifindex, fmt_dec); NETDEVICE_SHOW(features, fmt_long_hex); NETDEVICE_SHOW(type, fmt_dec); +NETDEVICE_SHOW(link_mode, fmt_dec); /* use same locking rules as GIFHWADDR ioctl's */ static ssize_t format_addr(char *buf, const unsigned char *addr, int len) @@ -133,6 +134,43 @@ static ssize_t show_carrier(struct class return -EINVAL; } +static ssize_t show_dormant(struct class_device *dev, char *buf) +{ + struct net_device *netdev = to_net_dev(dev); + + if (netif_running(netdev)) + return sprintf(buf, fmt_dec, !!netif_dormant(netdev)); + + return -EINVAL; +} + +static const char *operstates[] = { + "unknown", + "notpresent", /* currently unused */ + "down", + "lowerlayerdown", + "testing", /* currently unused */ + "dormant", + "up" +}; + +static ssize_t show_operstate(struct class_device *dev, char *buf) +{ + const struct net_device *netdev = to_net_dev(dev); + unsigned char operstate; + + read_lock(&dev_base_lock); + operstate = netdev->operstate; + if (!netif_running(netdev)) + operstate = IF_OPER_DOWN; + read_unlock(&dev_base_lock); + + if (operstate >= sizeof(operstates)) + return -EINVAL; /* should not happen */ + + return sprintf(buf, "%s\n", operstates[operstate]); +} + /* read-write attributes */ NETDEVICE_SHOW(mtu, fmt_dec); @@ -190,9 +228,12 @@ static struct class_device_attribute net __ATTR(ifindex, S_IRUGO, show_ifindex, NULL), __ATTR(features, S_IRUGO, show_features, NULL), __ATTR(type, S_IRUGO, show_type, NULL), + __ATTR(link_mode, S_IRUGO, show_link_mode, NULL), __ATTR(address, S_IRUGO, show_address, NULL), __ATTR(broadcast, S_IRUGO, show_broadcast, NULL), __ATTR(carrier, S_IRUGO, show_carrier, NULL), + __ATTR(dormant, S_IRUGO, show_dormant, NULL), + __ATTR(operstate, S_IRUGO, show_operstate, NULL), __ATTR(mtu, S_IRUGO | S_IWUSR, show_mtu, store_mtu), __ATTR(flags, S_IRUGO | S_IWUSR, show_flags, store_flags), __ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len, diff -puN net/core/pktgen.c~git-net net/core/pktgen.c --- devel/net/core/pktgen.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/pktgen.c 2006-02-27 22:37:47.000000000 -0800 @@ -153,7 +153,7 @@ #include -#define VERSION "pktgen v2.63: Packet Generator for packet performance testing.\n" +#define VERSION "pktgen v2.64: Packet Generator for packet performance testing.\n" /* #define PG_DEBUG(a) a */ #define PG_DEBUG(a) @@ -176,7 +176,8 @@ #define T_TERMINATE (1<<0) #define T_STOP (1<<1) /* Stop run */ #define T_RUN (1<<2) /* Start run */ -#define T_REMDEV (1<<3) /* Remove all devs */ +#define T_REMDEVALL (1<<3) /* Remove all devs */ +#define T_REMDEV (1<<4) /* Remove one dev */ /* Locks */ #define thread_lock() down(&pktgen_sem) @@ -218,6 +219,8 @@ struct pktgen_dev { * we will do a random selection from within the range. */ __u32 flags; + int removal_mark; /* non-zero => the device is marked for + * removal by worker thread */ int min_pkt_size; /* = ETH_ZLEN; */ int max_pkt_size; /* = ETH_ZLEN; */ @@ -481,7 +484,7 @@ static void pktgen_stop_all_threads_ifs( static int pktgen_stop_device(struct pktgen_dev *pkt_dev); static void pktgen_stop(struct pktgen_thread* t); static void pktgen_clear_counters(struct pktgen_dev *pkt_dev); -static struct pktgen_dev *pktgen_NN_threads(const char* dev_name, int remove); +static int pktgen_mark_device(const char* ifname); static unsigned int scan_ip6(const char *s,char ip[16]); static unsigned int fmt_ip6(char *s,const char ip[16]); @@ -1406,7 +1409,7 @@ static ssize_t pktgen_thread_write(struc if (!strcmp(name, "rem_device_all")) { thread_lock(); - t->control |= T_REMDEV; + t->control |= T_REMDEVALL; thread_unlock(); schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Propagate thread->control */ ret = count; @@ -1457,7 +1460,8 @@ static struct pktgen_dev *__pktgen_NN_th if (pkt_dev) { if(remove) { if_lock(t); - pktgen_remove_device(t, pkt_dev); + pkt_dev->removal_mark = 1; + t->control |= T_REMDEV; if_unlock(t); } break; @@ -1467,13 +1471,44 @@ static struct pktgen_dev *__pktgen_NN_th return pkt_dev; } -static struct pktgen_dev *pktgen_NN_threads(const char* ifname, int remove) +/* + * mark a device for removal + */ +static int pktgen_mark_device(const char* ifname) { struct pktgen_dev *pkt_dev = NULL; + const int max_tries = 10, msec_per_try = 125; + int i = 0; + int ret = 0; + thread_lock(); - pkt_dev = __pktgen_NN_threads(ifname, remove); - thread_unlock(); - return pkt_dev; + PG_DEBUG(printk("pktgen: pktgen_mark_device marking %s for removal\n", + ifname)); + + while(1) { + + pkt_dev = __pktgen_NN_threads(ifname, REMOVE); + if (pkt_dev == NULL) break; /* success */ + + thread_unlock(); + PG_DEBUG(printk("pktgen: pktgen_mark_device waiting for %s " + "to disappear....\n", ifname)); + schedule_timeout_interruptible(msecs_to_jiffies(msec_per_try)); + thread_lock(); + + if (++i >= max_tries) { + printk("pktgen_mark_device: timed out after waiting " + "%d msec for device %s to be removed\n", + msec_per_try*i, ifname); + ret = 1; + break; + } + + } + + thread_unlock(); + + return ret; } static int pktgen_device_event(struct notifier_block *unused, unsigned long event, void *ptr) @@ -1493,7 +1528,7 @@ static int pktgen_device_event(struct no break; case NETDEV_UNREGISTER: - pktgen_NN_threads(dev->name, REMOVE); + pktgen_mark_device(dev->name); break; }; @@ -2303,11 +2338,11 @@ static void pktgen_stop_all_threads_ifs( { struct pktgen_thread *t = pktgen_threads; - PG_DEBUG(printk("pktgen: entering pktgen_stop_all_threads.\n")); + PG_DEBUG(printk("pktgen: entering pktgen_stop_all_threads_ifs.\n")); thread_lock(); while(t) { - pktgen_stop(t); + t->control |= T_STOP; t = t->next; } thread_unlock(); @@ -2431,7 +2466,9 @@ static void show_results(struct pktgen_d static int pktgen_stop_device(struct pktgen_dev *pkt_dev) { - + int nr_frags = pkt_dev->skb ? + skb_shinfo(pkt_dev->skb)->nr_frags: -1; + if (!pkt_dev->running) { printk("pktgen: interface: %s is already stopped\n", pkt_dev->ifname); return -EINVAL; @@ -2440,13 +2477,8 @@ static int pktgen_stop_device(struct pkt pkt_dev->stopped_at = getCurUs(); pkt_dev->running = 0; - show_results(pkt_dev, skb_shinfo(pkt_dev->skb)->nr_frags); - - if (pkt_dev->skb) - kfree_skb(pkt_dev->skb); + show_results(pkt_dev, nr_frags); - pkt_dev->skb = NULL; - return 0; } @@ -2469,26 +2501,66 @@ static struct pktgen_dev *next_to_run(st static void pktgen_stop(struct pktgen_thread *t) { struct pktgen_dev *next = NULL; - PG_DEBUG(printk("pktgen: entering pktgen_stop.\n")); + PG_DEBUG(printk("pktgen: entering pktgen_stop\n")); if_lock(t); - for(next=t->if_list; next; next=next->next) + for(next=t->if_list; next; next=next->next) { pktgen_stop_device(next); + if (next->skb) + kfree_skb(next->skb); + + next->skb = NULL; + } if_unlock(t); } +/* + * one of our devices needs to be removed - find it + * and remove it + */ +static void pktgen_rem_one_if(struct pktgen_thread *t) +{ + struct pktgen_dev *cur, *next = NULL; + + PG_DEBUG(printk("pktgen: entering pktgen_rem_one_if\n")); + + if_lock(t); + + for(cur=t->if_list; cur; cur=next) { + next = cur->next; + + if (!cur->removal_mark) continue; + + if (cur->skb) + kfree_skb(cur->skb); + cur->skb = NULL; + + pktgen_remove_device(t, cur); + + break; + } + + if_unlock(t); +} + static void pktgen_rem_all_ifs(struct pktgen_thread *t) { struct pktgen_dev *cur, *next = NULL; - - /* Remove all devices, free mem */ + /* Remove all devices, free mem */ + + PG_DEBUG(printk("pktgen: entering pktgen_rem_all_ifs\n")); if_lock(t); for(cur=t->if_list; cur; cur=next) { next = cur->next; + + if (cur->skb) + kfree_skb(cur->skb); + cur->skb = NULL; + pktgen_remove_device(t, cur); } @@ -2550,6 +2622,9 @@ static __inline__ void pktgen_xmit(struc if (!netif_running(odev)) { pktgen_stop_device(pkt_dev); + if (pkt_dev->skb) + kfree_skb(pkt_dev->skb); + pkt_dev->skb = NULL; goto out; } if (need_resched()) @@ -2581,7 +2656,7 @@ static __inline__ void pktgen_xmit(struc pkt_dev->clone_count = 0; /* reset counter */ } } - + spin_lock_bh(&odev->xmit_lock); if (!netif_queue_stopped(odev)) { @@ -2644,6 +2719,9 @@ retry_now: /* Done with this */ pktgen_stop_device(pkt_dev); + if (pkt_dev->skb) + kfree_skb(pkt_dev->skb); + pkt_dev->skb = NULL; } out:; } @@ -2685,6 +2763,7 @@ static void pktgen_thread_worker(struct t->control &= ~(T_TERMINATE); t->control &= ~(T_RUN); t->control &= ~(T_STOP); + t->control &= ~(T_REMDEVALL); t->control &= ~(T_REMDEV); t->pid = current->pid; @@ -2748,8 +2827,13 @@ static void pktgen_thread_worker(struct t->control &= ~(T_RUN); } - if(t->control & T_REMDEV) { + if(t->control & T_REMDEVALL) { pktgen_rem_all_ifs(t); + t->control &= ~(T_REMDEVALL); + } + + if(t->control & T_REMDEV) { + pktgen_rem_one_if(t); t->control &= ~(T_REMDEV); } @@ -2833,6 +2917,7 @@ static int pktgen_add_device(struct pktg } memset(pkt_dev->flows, 0, MAX_CFLOWS*sizeof(struct flow_state)); + pkt_dev->removal_mark = 0; pkt_dev->min_pkt_size = ETH_ZLEN; pkt_dev->max_pkt_size = ETH_ZLEN; pkt_dev->nfrags = 0; diff -puN net/core/rtnetlink.c~git-net net/core/rtnetlink.c --- devel/net/core/rtnetlink.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/rtnetlink.c 2006-02-27 22:37:47.000000000 -0800 @@ -179,6 +179,33 @@ rtattr_failure: } +static void set_operstate(struct net_device *dev, unsigned char transition) +{ + unsigned char operstate = dev->operstate; + + switch(transition) { + case IF_OPER_UP: + if ((operstate == IF_OPER_DORMANT || + operstate == IF_OPER_UNKNOWN) && + !netif_dormant(dev)) + operstate = IF_OPER_UP; + break; + + case IF_OPER_DORMANT: + if (operstate == IF_OPER_UP || + operstate == IF_OPER_UNKNOWN) + operstate = IF_OPER_DORMANT; + break; + }; + + if (dev->operstate != operstate) { + write_lock_bh(&dev_base_lock); + dev->operstate = operstate; + write_unlock_bh(&dev_base_lock); + netdev_state_change(dev); + } +} + static int rtnetlink_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, int type, u32 pid, u32 seq, u32 change, unsigned int flags) @@ -209,6 +236,13 @@ static int rtnetlink_fill_ifinfo(struct } if (1) { + u8 operstate = netif_running(dev)?dev->operstate:IF_OPER_DOWN; + u8 link_mode = dev->link_mode; + RTA_PUT(skb, IFLA_OPERSTATE, sizeof(operstate), &operstate); + RTA_PUT(skb, IFLA_LINKMODE, sizeof(link_mode), &link_mode); + } + + if (1) { struct rtnl_link_ifmap map = { .mem_start = dev->mem_start, .mem_end = dev->mem_end, @@ -399,6 +433,22 @@ static int do_setlink(struct sk_buff *sk dev->weight = *((u32 *) RTA_DATA(ida[IFLA_WEIGHT - 1])); } + if (ida[IFLA_OPERSTATE - 1]) { + if (ida[IFLA_OPERSTATE - 1]->rta_len != RTA_LENGTH(sizeof(u8))) + goto out; + + set_operstate(dev, *((u8 *) RTA_DATA(ida[IFLA_OPERSTATE - 1]))); + } + + if (ida[IFLA_LINKMODE - 1]) { + if (ida[IFLA_LINKMODE - 1]->rta_len != RTA_LENGTH(sizeof(u8))) + goto out; + + write_lock_bh(&dev_base_lock); + dev->link_mode = *((u8 *) RTA_DATA(ida[IFLA_LINKMODE - 1])); + write_unlock_bh(&dev_base_lock); + } + if (ifm->ifi_index >= 0 && ida[IFLA_IFNAME - 1]) { char ifname[IFNAMSIZ]; diff -puN net/core/skbuff.c~git-net net/core/skbuff.c --- devel/net/core/skbuff.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/skbuff.c 2006-02-27 22:37:47.000000000 -0800 @@ -356,6 +356,24 @@ void __kfree_skb(struct sk_buff *skb) } /** + * kfree_skb - free an sk_buff + * @skb: buffer to free + * + * Drop a reference to the buffer and free it if the usage count has + * hit zero. + */ +void kfree_skb(struct sk_buff *skb) +{ + if (unlikely(!skb)) + return; + if (likely(atomic_read(&skb->users) == 1)) + smp_rmb(); + else if (likely(!atomic_dec_and_test(&skb->users))) + return; + __kfree_skb(skb); +} + +/** * skb_clone - duplicate an sk_buff * @skb: buffer to clone * @gfp_mask: allocation priority @@ -1799,6 +1817,7 @@ void __init skb_init(void) EXPORT_SYMBOL(___pskb_trim); EXPORT_SYMBOL(__kfree_skb); +EXPORT_SYMBOL(kfree_skb); EXPORT_SYMBOL(__pskb_pull_tail); EXPORT_SYMBOL(__alloc_skb); EXPORT_SYMBOL(pskb_copy); diff -puN net/core/sysctl_net_core.c~git-net net/core/sysctl_net_core.c --- devel/net/core/sysctl_net_core.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/core/sysctl_net_core.c 2006-02-27 22:37:47.000000000 -0800 @@ -26,6 +26,11 @@ extern int sysctl_core_destroy_delay; extern char sysctl_divert_version[]; #endif /* CONFIG_NET_DIVERT */ +#ifdef CONFIG_XFRM +extern u32 sysctl_xfrm_aevent_etime; +extern u32 sysctl_xfrm_aevent_rseqth; +#endif + ctl_table core_table[] = { #ifdef CONFIG_NET { @@ -111,6 +116,24 @@ ctl_table core_table[] = { .proc_handler = &proc_dostring }, #endif /* CONFIG_NET_DIVERT */ +#ifdef CONFIG_XFRM + { + .ctl_name = NET_CORE_AEVENT_ETIME, + .procname = "xfrm_aevent_etime", + .data = &sysctl_xfrm_aevent_etime, + .maxlen = sizeof(u32), + .mode = 0644, + .proc_handler = &proc_dointvec + }, + { + .ctl_name = NET_CORE_AEVENT_RSEQTH, + .procname = "xfrm_aevent_rseqth", + .data = &sysctl_xfrm_aevent_rseqth, + .maxlen = sizeof(u32), + .mode = 0644, + .proc_handler = &proc_dointvec + }, +#endif /* CONFIG_XFRM */ #endif /* CONFIG_NET */ { .ctl_name = NET_CORE_SOMAXCONN, diff -puN net/dccp/ackvec.c~git-net net/dccp/ackvec.c --- devel/net/dccp/ackvec.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ackvec.c 2006-02-27 22:37:47.000000000 -0800 @@ -13,18 +13,77 @@ #include "dccp.h" #include +#include +#include +#include #include +#include #include +static kmem_cache_t *dccp_ackvec_slab; +static kmem_cache_t *dccp_ackvec_record_slab; + +static struct dccp_ackvec_record *dccp_ackvec_record_new(void) +{ + struct dccp_ackvec_record *avr = + kmem_cache_alloc(dccp_ackvec_record_slab, GFP_ATOMIC); + + if (avr != NULL) + INIT_LIST_HEAD(&avr->dccpavr_node); + + return avr; +} + +static void dccp_ackvec_record_delete(struct dccp_ackvec_record *avr) +{ + if (unlikely(avr == NULL)) + return; + /* Check if deleting a linked record */ + WARN_ON(!list_empty(&avr->dccpavr_node)); + kmem_cache_free(dccp_ackvec_record_slab, avr); +} + +static void dccp_ackvec_insert_avr(struct dccp_ackvec *av, + struct dccp_ackvec_record *avr) +{ + /* + * AVRs are sorted by seqno. Since we are sending them in order, we + * just add the AVR at the head of the list. + * -sorbo. + */ + if (!list_empty(&av->dccpav_records)) { + const struct dccp_ackvec_record *head = + list_entry(av->dccpav_records.next, + struct dccp_ackvec_record, + dccpavr_node); + BUG_ON(before48(avr->dccpavr_ack_seqno, + head->dccpavr_ack_seqno)); + } + + list_add(&avr->dccpavr_node, &av->dccpav_records); +} + int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); +#ifdef CONFIG_IP_DCCP_DEBUG + const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? + "CLIENT tx: " : "server tx: "; +#endif struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec; int len = av->dccpav_vec_len + 2; struct timeval now; u32 elapsed_time; unsigned char *to, *from; + struct dccp_ackvec_record *avr; + + if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) + return -1; + + avr = dccp_ackvec_record_new(); + if (avr == NULL) + return -1; dccp_timestamp(sk, &now); elapsed_time = timeval_delta(&now, &av->dccpav_time) / 10; @@ -32,19 +91,6 @@ int dccp_insert_option_ackvec(struct soc if (elapsed_time != 0) dccp_insert_option_elapsed_time(sk, skb, elapsed_time); - if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) - return -1; - - /* - * XXX: now we have just one ack vector sent record, so - * we have to wait for it to be cleared. - * - * Of course this is not acceptable, but this is just for - * basic testing now. - */ - if (av->dccpav_ack_seqno != DCCP_MAX_SEQNO + 1) - return -1; - DCCP_SKB_CB(skb)->dccpd_opt_len += len; to = skb_push(skb, len); @@ -55,8 +101,8 @@ int dccp_insert_option_ackvec(struct soc from = av->dccpav_buf + av->dccpav_buf_head; /* Check if buf_head wraps */ - if ((int)av->dccpav_buf_head + len > av->dccpav_vec_len) { - const u32 tailsize = av->dccpav_vec_len - av->dccpav_buf_head; + if ((int)av->dccpav_buf_head + len > DCCP_MAX_ACKVEC_LEN) { + const u32 tailsize = DCCP_MAX_ACKVEC_LEN - av->dccpav_buf_head; memcpy(to, from, tailsize); to += tailsize; @@ -73,45 +119,37 @@ int dccp_insert_option_ackvec(struct soc * sequence number it used for the ack packet; ack_ptr will equal * buf_head; ack_ackno will equal buf_ackno; and ack_nonce will * equal buf_nonce. - * - * This implemention uses just one ack record for now. */ - av->dccpav_ack_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - av->dccpav_ack_ptr = av->dccpav_buf_head; - av->dccpav_ack_ackno = av->dccpav_buf_ackno; - av->dccpav_ack_nonce = av->dccpav_buf_nonce; - av->dccpav_sent_len = av->dccpav_vec_len; + avr->dccpavr_ack_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + avr->dccpavr_ack_ptr = av->dccpav_buf_head; + avr->dccpavr_ack_ackno = av->dccpav_buf_ackno; + avr->dccpavr_ack_nonce = av->dccpav_buf_nonce; + avr->dccpavr_sent_len = av->dccpav_vec_len; + + dccp_ackvec_insert_avr(av, avr); dccp_pr_debug("%sACK Vector 0, len=%d, ack_seqno=%llu, " "ack_ackno=%llu\n", - debug_prefix, av->dccpav_sent_len, - (unsigned long long)av->dccpav_ack_seqno, - (unsigned long long)av->dccpav_ack_ackno); - return -1; + debug_prefix, avr->dccpavr_sent_len, + (unsigned long long)avr->dccpavr_ack_seqno, + (unsigned long long)avr->dccpavr_ack_ackno); + return 0; } -struct dccp_ackvec *dccp_ackvec_alloc(const unsigned int len, - const gfp_t priority) +struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority) { - struct dccp_ackvec *av; - - BUG_ON(len == 0); + struct dccp_ackvec *av = kmem_cache_alloc(dccp_ackvec_slab, priority); - if (len > DCCP_MAX_ACKVEC_LEN) - return NULL; - - av = kmalloc(sizeof(*av) + len, priority); if (av != NULL) { - av->dccpav_buf_len = len; av->dccpav_buf_head = - av->dccpav_buf_tail = av->dccpav_buf_len - 1; - av->dccpav_buf_ackno = - av->dccpav_ack_ackno = av->dccpav_ack_seqno = ~0LLU; + av->dccpav_buf_tail = DCCP_MAX_ACKVEC_LEN - 1; + av->dccpav_buf_ackno = DCCP_MAX_SEQNO + 1; av->dccpav_buf_nonce = av->dccpav_buf_nonce = 0; av->dccpav_ack_ptr = 0; av->dccpav_time.tv_sec = 0; av->dccpav_time.tv_usec = 0; av->dccpav_sent_len = av->dccpav_vec_len = 0; + INIT_LIST_HEAD(&av->dccpav_records); } return av; @@ -119,7 +157,20 @@ struct dccp_ackvec *dccp_ackvec_alloc(co void dccp_ackvec_free(struct dccp_ackvec *av) { - kfree(av); + if (unlikely(av == NULL)) + return; + + if (!list_empty(&av->dccpav_records)) { + struct dccp_ackvec_record *avr, *next; + + list_for_each_entry_safe(avr, next, &av->dccpav_records, + dccpavr_node) { + list_del_init(&avr->dccpavr_node); + dccp_ackvec_record_delete(avr); + } + } + + kmem_cache_free(dccp_ackvec_slab, av); } static inline u8 dccp_ackvec_state(const struct dccp_ackvec *av, @@ -146,7 +197,7 @@ static inline int dccp_ackvec_set_buf_he unsigned int gap; long new_head; - if (av->dccpav_vec_len + packets > av->dccpav_buf_len) + if (av->dccpav_vec_len + packets > DCCP_MAX_ACKVEC_LEN) return -ENOBUFS; gap = packets - 1; @@ -158,7 +209,7 @@ static inline int dccp_ackvec_set_buf_he gap + new_head + 1); gap = -new_head; } - new_head += av->dccpav_buf_len; + new_head += DCCP_MAX_ACKVEC_LEN; } av->dccpav_buf_head = new_head; @@ -251,7 +302,7 @@ int dccp_ackvec_add(struct dccp_ackvec * goto out_duplicate; delta -= len + 1; - if (++index == av->dccpav_buf_len) + if (++index == DCCP_MAX_ACKVEC_LEN) index = 0; } } @@ -297,44 +348,50 @@ void dccp_ackvec_print(const struct dccp } #endif -static void dccp_ackvec_throw_away_ack_record(struct dccp_ackvec *av) +static void dccp_ackvec_throw_record(struct dccp_ackvec *av, + struct dccp_ackvec_record *avr) { - /* - * As we're keeping track of the ack vector size (dccpav_vec_len) and - * the sent ack vector size (dccpav_sent_len) we don't need - * dccpav_buf_tail at all, but keep this code here as in the future - * we'll implement a vector of ack records, as suggested in - * draft-ietf-dccp-spec-11.txt Appendix A. -acme - */ -#if 0 - u32 new_buf_tail = av->dccpav_ack_ptr + 1; - if (new_buf_tail >= av->dccpav_vec_len) - new_buf_tail -= av->dccpav_vec_len; - av->dccpav_buf_tail = new_buf_tail; -#endif - av->dccpav_vec_len -= av->dccpav_sent_len; + struct dccp_ackvec_record *next; + + av->dccpav_buf_tail = avr->dccpavr_ack_ptr - 1; + if (av->dccpav_buf_tail == 0) + av->dccpav_buf_tail = DCCP_MAX_ACKVEC_LEN - 1; + + av->dccpav_vec_len -= avr->dccpavr_sent_len; + + /* free records */ + list_for_each_entry_safe_from(avr, next, &av->dccpav_records, + dccpavr_node) { + list_del_init(&avr->dccpavr_node); + dccp_ackvec_record_delete(avr); + } } void dccp_ackvec_check_rcv_ackno(struct dccp_ackvec *av, struct sock *sk, const u64 ackno) { - /* Check if we actually sent an ACK vector */ - if (av->dccpav_ack_seqno == DCCP_MAX_SEQNO + 1) - return; + struct dccp_ackvec_record *avr; - if (ackno == av->dccpav_ack_seqno) { + /* + * If we traverse backwards, it should be faster when we have large + * windows. We will be receiving ACKs for stuff we sent a while back + * -sorbo. + */ + list_for_each_entry_reverse(avr, &av->dccpav_records, dccpavr_node) { + if (ackno == avr->dccpavr_ack_seqno) { #ifdef CONFIG_IP_DCCP_DEBUG - struct dccp_sock *dp = dccp_sk(sk); - const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? - "CLIENT rx ack: " : "server rx ack: "; + struct dccp_sock *dp = dccp_sk(sk); + const char *debug_prefix = dp->dccps_role == DCCP_ROLE_CLIENT ? + "CLIENT rx ack: " : "server rx ack: "; #endif - dccp_pr_debug("%sACK packet 0, len=%d, ack_seqno=%llu, " - "ack_ackno=%llu, ACKED!\n", - debug_prefix, 1, - (unsigned long long)av->dccpav_ack_seqno, - (unsigned long long)av->dccpav_ack_ackno); - dccp_ackvec_throw_away_ack_record(av); - av->dccpav_ack_seqno = DCCP_MAX_SEQNO + 1; + dccp_pr_debug("%sACK packet 0, len=%d, ack_seqno=%llu, " + "ack_ackno=%llu, ACKED!\n", + debug_prefix, 1, + (unsigned long long)avr->dccpavr_ack_seqno, + (unsigned long long)avr->dccpavr_ack_ackno); + dccp_ackvec_throw_record(av, avr); + break; + } } } @@ -344,28 +401,20 @@ static void dccp_ackvec_check_rcv_ackvec const unsigned char *vector) { unsigned char i; + struct dccp_ackvec_record *avr; /* Check if we actually sent an ACK vector */ - if (av->dccpav_ack_seqno == DCCP_MAX_SEQNO + 1) + if (list_empty(&av->dccpav_records)) return; - /* - * We're in the receiver half connection, so if the received an ACK - * vector ackno (e.g. 50) before dccpav_ack_seqno (e.g. 52), we're - * not interested. - * - * Extra explanation with example: - * - * if we received an ACK vector with ackno 50, it can only be acking - * 50, 49, 48, etc, not 52 (the seqno for the ACK vector we sent). - */ - /* dccp_pr_debug("is %llu < %llu? ", ackno, av->dccpav_ack_seqno); */ - if (before48(ackno, av->dccpav_ack_seqno)) { - /* dccp_pr_debug_cat("yes\n"); */ - return; - } - /* dccp_pr_debug_cat("no\n"); */ i = len; + /* + * XXX + * I think it might be more efficient to work backwards. See comment on + * rcv_ackno. -sorbo. + */ + avr = list_entry(av->dccpav_records.next, struct dccp_ackvec_record, + dccpavr_node); while (i--) { const u8 rl = *vector & DCCP_ACKVEC_LEN_MASK; u64 ackno_end_rl; @@ -373,14 +422,20 @@ static void dccp_ackvec_check_rcv_ackvec dccp_set_seqno(&ackno_end_rl, ackno - rl); /* - * dccp_pr_debug("is %llu <= %llu <= %llu? ", ackno_end_rl, - * av->dccpav_ack_seqno, ackno); + * If our AVR sequence number is greater than the ack, go + * forward in the AVR list until it is not so. */ - if (between48(av->dccpav_ack_seqno, ackno_end_rl, ackno)) { + list_for_each_entry_from(avr, &av->dccpav_records, + dccpavr_node) { + if (!after48(avr->dccpavr_ack_seqno, ackno)) + goto found; + } + /* End of the dccpav_records list, not found, exit */ + break; +found: + if (between48(avr->dccpavr_ack_seqno, ackno_end_rl, ackno)) { const u8 state = (*vector & DCCP_ACKVEC_STATE_MASK) >> 6; - /* dccp_pr_debug_cat("yes\n"); */ - if (state != DCCP_ACKVEC_STATE_NOT_RECEIVED) { #ifdef CONFIG_IP_DCCP_DEBUG struct dccp_sock *dp = dccp_sk(sk); @@ -393,19 +448,16 @@ static void dccp_ackvec_check_rcv_ackvec "ACKED!\n", debug_prefix, len, (unsigned long long) - av->dccpav_ack_seqno, + avr->dccpavr_ack_seqno, (unsigned long long) - av->dccpav_ack_ackno); - dccp_ackvec_throw_away_ack_record(av); + avr->dccpavr_ack_ackno); + dccp_ackvec_throw_record(av, avr); } /* - * If dccpav_ack_seqno was not received, no problem - * we'll send another ACK vector. + * If it wasn't received, continue scanning... we might + * find another one. */ - av->dccpav_ack_seqno = DCCP_MAX_SEQNO + 1; - break; } - /* dccp_pr_debug_cat("no\n"); */ dccp_set_seqno(&ackno, ackno_end_rl - 1); ++vector; @@ -424,3 +476,43 @@ int dccp_ackvec_parse(struct sock *sk, c len, value); return 0; } + +static char dccp_ackvec_slab_msg[] __initdata = + KERN_CRIT "DCCP: Unable to create ack vectors slab caches\n"; + +int __init dccp_ackvec_init(void) +{ + dccp_ackvec_slab = kmem_cache_create("dccp_ackvec", + sizeof(struct dccp_ackvec), 0, + SLAB_HWCACHE_ALIGN, NULL, NULL); + if (dccp_ackvec_slab == NULL) + goto out_err; + + dccp_ackvec_record_slab = + kmem_cache_create("dccp_ackvec_record", + sizeof(struct dccp_ackvec_record), + 0, SLAB_HWCACHE_ALIGN, NULL, NULL); + if (dccp_ackvec_record_slab == NULL) + goto out_destroy_slab; + + return 0; + +out_destroy_slab: + kmem_cache_destroy(dccp_ackvec_slab); + dccp_ackvec_slab = NULL; +out_err: + printk(dccp_ackvec_slab_msg); + return -ENOBUFS; +} + +void dccp_ackvec_exit(void) +{ + if (dccp_ackvec_slab != NULL) { + kmem_cache_destroy(dccp_ackvec_slab); + dccp_ackvec_slab = NULL; + } + if (dccp_ackvec_record_slab != NULL) { + kmem_cache_destroy(dccp_ackvec_record_slab); + dccp_ackvec_record_slab = NULL; + } +} diff -puN net/dccp/ackvec.h~git-net net/dccp/ackvec.h --- devel/net/dccp/ackvec.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ackvec.h 2006-02-27 22:37:47.000000000 -0800 @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -42,39 +43,57 @@ * Ack Vectors it has recently sent. For each packet sent carrying an * Ack Vector, it remembers four variables: * - * @dccpav_ack_seqno - the Sequence Number used for the packet - * (HC-Receiver seqno) * @dccpav_ack_ptr - the value of buf_head at the time of acknowledgement. - * @dccpav_ack_ackno - the Acknowledgement Number used for the packet - * (HC-Sender seqno) + * @dccpav_records - list of dccp_ackvec_record * @dccpav_ack_nonce - the one-bit sum of the ECN Nonces for all State 0. * - * @dccpav_buf_len - circular buffer length * @dccpav_time - the time in usecs * @dccpav_buf - circular buffer of acknowledgeable packets */ struct dccp_ackvec { u64 dccpav_buf_ackno; - u64 dccpav_ack_seqno; - u64 dccpav_ack_ackno; + struct list_head dccpav_records; struct timeval dccpav_time; u8 dccpav_buf_head; u8 dccpav_buf_tail; u8 dccpav_ack_ptr; u8 dccpav_sent_len; u8 dccpav_vec_len; - u8 dccpav_buf_len; u8 dccpav_buf_nonce; u8 dccpav_ack_nonce; - u8 dccpav_buf[0]; + u8 dccpav_buf[DCCP_MAX_ACKVEC_LEN]; +}; + +/** struct dccp_ackvec_record - ack vector record + * + * ACK vector record as defined in Appendix A of spec. + * + * The list is sorted by dccpavr_ack_seqno + * + * @dccpavr_node - node in dccpav_records + * @dccpavr_ack_seqno - sequence number of the packet this record was sent on + * @dccpavr_ack_ackno - sequence number being acknowledged + * @dccpavr_ack_ptr - pointer into dccpav_buf where this record starts + * @dccpavr_ack_nonce - dccpav_ack_nonce at the time this record was sent + * @dccpavr_sent_len - lenght of the record in dccpav_buf + */ +struct dccp_ackvec_record { + struct list_head dccpavr_node; + u64 dccpavr_ack_seqno; + u64 dccpavr_ack_ackno; + u8 dccpavr_ack_ptr; + u8 dccpavr_ack_nonce; + u8 dccpavr_sent_len; }; struct sock; struct sk_buff; #ifdef CONFIG_IP_DCCP_ACKVEC -extern struct dccp_ackvec *dccp_ackvec_alloc(unsigned int len, - const gfp_t priority); +extern int dccp_ackvec_init(void); +extern void dccp_ackvec_exit(void); + +extern struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority); extern void dccp_ackvec_free(struct dccp_ackvec *av); extern int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock *sk, @@ -92,8 +111,16 @@ static inline int dccp_ackvec_pending(co return av->dccpav_sent_len != av->dccpav_vec_len; } #else /* CONFIG_IP_DCCP_ACKVEC */ -static inline struct dccp_ackvec *dccp_ackvec_alloc(unsigned int len, - const gfp_t priority) +static inline int dccp_ackvec_init(void) +{ + return 0; +} + +static inline void dccp_ackvec_exit(void) +{ +} + +static inline struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority) { return NULL; } diff -puN net/dccp/ccid.c~git-net net/dccp/ccid.c --- devel/net/dccp/ccid.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ccid.c 2006-02-27 22:37:47.000000000 -0800 @@ -13,7 +13,7 @@ #include "ccid.h" -static struct ccid *ccids[CCID_MAX]; +static struct ccid_operations *ccids[CCID_MAX]; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT) static atomic_t ccids_lockct = ATOMIC_INIT(0); static DEFINE_SPINLOCK(ccids_lock); @@ -55,85 +55,202 @@ static inline void ccids_read_unlock(voi #define ccids_read_unlock() do { } while(0) #endif -int ccid_register(struct ccid *ccid) +static kmem_cache_t *ccid_kmem_cache_create(int obj_size, const char *fmt,...) { - int err; + kmem_cache_t *slab; + char slab_name_fmt[32], *slab_name; + va_list args; + + va_start(args, fmt); + vsnprintf(slab_name_fmt, sizeof(slab_name_fmt), fmt, args); + va_end(args); + + slab_name = kstrdup(slab_name_fmt, GFP_KERNEL); + if (slab_name == NULL) + return NULL; + slab = kmem_cache_create(slab_name, sizeof(struct ccid) + obj_size, 0, + SLAB_HWCACHE_ALIGN, NULL, NULL); + if (slab == NULL) + kfree(slab_name); + return slab; +} + +static void ccid_kmem_cache_destroy(kmem_cache_t *slab) +{ + if (slab != NULL) { + const char *name = kmem_cache_name(slab); + + kmem_cache_destroy(slab); + kfree(name); + } +} + +int ccid_register(struct ccid_operations *ccid_ops) +{ + int err = -ENOBUFS; + + ccid_ops->ccid_hc_rx_slab = + ccid_kmem_cache_create(ccid_ops->ccid_hc_rx_obj_size, + "%s_hc_rx_sock", + ccid_ops->ccid_name); + if (ccid_ops->ccid_hc_rx_slab == NULL) + goto out; - if (ccid->ccid_init == NULL) - return -1; + ccid_ops->ccid_hc_tx_slab = + ccid_kmem_cache_create(ccid_ops->ccid_hc_tx_obj_size, + "%s_hc_tx_sock", + ccid_ops->ccid_name); + if (ccid_ops->ccid_hc_tx_slab == NULL) + goto out_free_rx_slab; ccids_write_lock(); err = -EEXIST; - if (ccids[ccid->ccid_id] == NULL) { - ccids[ccid->ccid_id] = ccid; + if (ccids[ccid_ops->ccid_id] == NULL) { + ccids[ccid_ops->ccid_id] = ccid_ops; err = 0; } ccids_write_unlock(); - if (err == 0) - pr_info("CCID: Registered CCID %d (%s)\n", - ccid->ccid_id, ccid->ccid_name); + if (err != 0) + goto out_free_tx_slab; + + pr_info("CCID: Registered CCID %d (%s)\n", + ccid_ops->ccid_id, ccid_ops->ccid_name); +out: return err; +out_free_tx_slab: + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_tx_slab); + ccid_ops->ccid_hc_tx_slab = NULL; + goto out; +out_free_rx_slab: + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_rx_slab); + ccid_ops->ccid_hc_rx_slab = NULL; + goto out; } EXPORT_SYMBOL_GPL(ccid_register); -int ccid_unregister(struct ccid *ccid) +int ccid_unregister(struct ccid_operations *ccid_ops) { ccids_write_lock(); - ccids[ccid->ccid_id] = NULL; + ccids[ccid_ops->ccid_id] = NULL; ccids_write_unlock(); + + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_tx_slab); + ccid_ops->ccid_hc_tx_slab = NULL; + ccid_kmem_cache_destroy(ccid_ops->ccid_hc_rx_slab); + ccid_ops->ccid_hc_rx_slab = NULL; + pr_info("CCID: Unregistered CCID %d (%s)\n", - ccid->ccid_id, ccid->ccid_name); + ccid_ops->ccid_id, ccid_ops->ccid_name); return 0; } EXPORT_SYMBOL_GPL(ccid_unregister); -struct ccid *ccid_init(unsigned char id, struct sock *sk) +struct ccid *ccid_new(unsigned char id, struct sock *sk, int rx, gfp_t gfp) { - struct ccid *ccid; + struct ccid_operations *ccid_ops; + struct ccid *ccid = NULL; + ccids_read_lock(); #ifdef CONFIG_KMOD - if (ccids[id] == NULL) + if (ccids[id] == NULL) { + /* We only try to load if in process context */ + ccids_read_unlock(); + if (gfp & GFP_ATOMIC) + goto out; request_module("net-dccp-ccid-%d", id); + ccids_read_lock(); + } #endif - ccids_read_lock(); + ccid_ops = ccids[id]; + if (ccid_ops == NULL) + goto out_unlock; - ccid = ccids[id]; - if (ccid == NULL) - goto out; + if (!try_module_get(ccid_ops->ccid_owner)) + goto out_unlock; - if (!try_module_get(ccid->ccid_owner)) - goto out_err; + ccids_read_unlock(); - if (ccid->ccid_init(sk) != 0) + ccid = kmem_cache_alloc(rx ? ccid_ops->ccid_hc_rx_slab : + ccid_ops->ccid_hc_tx_slab, gfp); + if (ccid == NULL) goto out_module_put; + ccid->ccid_ops = ccid_ops; + if (rx) { + memset(ccid + 1, 0, ccid_ops->ccid_hc_rx_obj_size); + if (ccid->ccid_ops->ccid_hc_rx_init != NULL && + ccid->ccid_ops->ccid_hc_rx_init(ccid, sk) != 0) + goto out_free_ccid; + } else { + memset(ccid + 1, 0, ccid_ops->ccid_hc_tx_obj_size); + if (ccid->ccid_ops->ccid_hc_tx_init != NULL && + ccid->ccid_ops->ccid_hc_tx_init(ccid, sk) != 0) + goto out_free_ccid; + } out: - ccids_read_unlock(); return ccid; -out_module_put: - module_put(ccid->ccid_owner); -out_err: +out_unlock: + ccids_read_unlock(); + goto out; +out_free_ccid: + kmem_cache_free(rx ? ccid_ops->ccid_hc_rx_slab : + ccid_ops->ccid_hc_tx_slab, ccid); ccid = NULL; +out_module_put: + module_put(ccid_ops->ccid_owner); goto out; } -EXPORT_SYMBOL_GPL(ccid_init); +EXPORT_SYMBOL_GPL(ccid_new); + +struct ccid *ccid_hc_rx_new(unsigned char id, struct sock *sk, gfp_t gfp) +{ + return ccid_new(id, sk, 1, gfp); +} + +EXPORT_SYMBOL_GPL(ccid_hc_rx_new); + +struct ccid *ccid_hc_tx_new(unsigned char id,struct sock *sk, gfp_t gfp) +{ + return ccid_new(id, sk, 0, gfp); +} + +EXPORT_SYMBOL_GPL(ccid_hc_tx_new); -void ccid_exit(struct ccid *ccid, struct sock *sk) +static void ccid_delete(struct ccid *ccid, struct sock *sk, int rx) { + struct ccid_operations *ccid_ops; + if (ccid == NULL) return; + ccid_ops = ccid->ccid_ops; + if (rx) { + if (ccid_ops->ccid_hc_rx_exit != NULL) + ccid_ops->ccid_hc_rx_exit(sk); + kmem_cache_free(ccid_ops->ccid_hc_rx_slab, ccid); + } else { + if (ccid_ops->ccid_hc_tx_exit != NULL) + ccid_ops->ccid_hc_tx_exit(sk); + kmem_cache_free(ccid_ops->ccid_hc_tx_slab, ccid); + } ccids_read_lock(); + if (ccids[ccid_ops->ccid_id] != NULL) + module_put(ccid_ops->ccid_owner); + ccids_read_unlock(); +} - if (ccids[ccid->ccid_id] != NULL) { - if (ccid->ccid_exit != NULL) - ccid->ccid_exit(sk); - module_put(ccid->ccid_owner); - } +void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk) +{ + ccid_delete(ccid, sk, 1); +} - ccids_read_unlock(); +EXPORT_SYMBOL_GPL(ccid_hc_rx_delete); + +void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk) +{ + ccid_delete(ccid, sk, 0); } -EXPORT_SYMBOL_GPL(ccid_exit); +EXPORT_SYMBOL_GPL(ccid_hc_tx_delete); diff -puN net/dccp/ccid.h~git-net net/dccp/ccid.h --- devel/net/dccp/ccid.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ccid.h 2006-02-27 22:37:47.000000000 -0800 @@ -23,14 +23,16 @@ struct tcp_info; -struct ccid { +struct ccid_operations { unsigned char ccid_id; const char *ccid_name; struct module *ccid_owner; - int (*ccid_init)(struct sock *sk); - void (*ccid_exit)(struct sock *sk); - int (*ccid_hc_rx_init)(struct sock *sk); - int (*ccid_hc_tx_init)(struct sock *sk); + kmem_cache_t *ccid_hc_rx_slab; + __u32 ccid_hc_rx_obj_size; + kmem_cache_t *ccid_hc_tx_slab; + __u32 ccid_hc_tx_obj_size; + int (*ccid_hc_rx_init)(struct ccid *ccid, struct sock *sk); + int (*ccid_hc_tx_init)(struct ccid *ccid, struct sock *sk); void (*ccid_hc_rx_exit)(struct sock *sk); void (*ccid_hc_tx_exit)(struct sock *sk); void (*ccid_hc_rx_packet_recv)(struct sock *sk, @@ -67,75 +69,58 @@ struct ccid { int __user *optlen); }; -extern int ccid_register(struct ccid *ccid); -extern int ccid_unregister(struct ccid *ccid); +extern int ccid_register(struct ccid_operations *ccid_ops); +extern int ccid_unregister(struct ccid_operations *ccid_ops); -extern struct ccid *ccid_init(unsigned char id, struct sock *sk); -extern void ccid_exit(struct ccid *ccid, struct sock *sk); +struct ccid { + struct ccid_operations *ccid_ops; + char ccid_priv[0]; +}; -static inline void __ccid_get(struct ccid *ccid) +static inline void *ccid_priv(const struct ccid *ccid) { - __module_get(ccid->ccid_owner); + return (void *)ccid->ccid_priv; } +extern struct ccid *ccid_new(unsigned char id, struct sock *sk, int rx, + gfp_t gfp); + +extern struct ccid *ccid_hc_rx_new(unsigned char id, struct sock *sk, + gfp_t gfp); +extern struct ccid *ccid_hc_tx_new(unsigned char id, struct sock *sk, + gfp_t gfp); + +extern void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk); +extern void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk); + static inline int ccid_hc_tx_send_packet(struct ccid *ccid, struct sock *sk, struct sk_buff *skb, int len) { int rc = 0; - if (ccid->ccid_hc_tx_send_packet != NULL) - rc = ccid->ccid_hc_tx_send_packet(sk, skb, len); + if (ccid->ccid_ops->ccid_hc_tx_send_packet != NULL) + rc = ccid->ccid_ops->ccid_hc_tx_send_packet(sk, skb, len); return rc; } static inline void ccid_hc_tx_packet_sent(struct ccid *ccid, struct sock *sk, int more, int len) { - if (ccid->ccid_hc_tx_packet_sent != NULL) - ccid->ccid_hc_tx_packet_sent(sk, more, len); + if (ccid->ccid_ops->ccid_hc_tx_packet_sent != NULL) + ccid->ccid_ops->ccid_hc_tx_packet_sent(sk, more, len); } - -static inline int ccid_hc_rx_init(struct ccid *ccid, struct sock *sk) -{ - int rc = 0; - if (ccid->ccid_hc_rx_init != NULL) - rc = ccid->ccid_hc_rx_init(sk); - return rc; -} - -static inline int ccid_hc_tx_init(struct ccid *ccid, struct sock *sk) -{ - int rc = 0; - if (ccid->ccid_hc_tx_init != NULL) - rc = ccid->ccid_hc_tx_init(sk); - return rc; -} - -static inline void ccid_hc_rx_exit(struct ccid *ccid, struct sock *sk) -{ - if (ccid != NULL && ccid->ccid_hc_rx_exit != NULL && - dccp_sk(sk)->dccps_hc_rx_ccid_private != NULL) - ccid->ccid_hc_rx_exit(sk); -} - -static inline void ccid_hc_tx_exit(struct ccid *ccid, struct sock *sk) -{ - if (ccid != NULL && ccid->ccid_hc_tx_exit != NULL && - dccp_sk(sk)->dccps_hc_tx_ccid_private != NULL) - ccid->ccid_hc_tx_exit(sk); -} - + static inline void ccid_hc_rx_packet_recv(struct ccid *ccid, struct sock *sk, struct sk_buff *skb) { - if (ccid->ccid_hc_rx_packet_recv != NULL) - ccid->ccid_hc_rx_packet_recv(sk, skb); + if (ccid->ccid_ops->ccid_hc_rx_packet_recv != NULL) + ccid->ccid_ops->ccid_hc_rx_packet_recv(sk, skb); } static inline void ccid_hc_tx_packet_recv(struct ccid *ccid, struct sock *sk, struct sk_buff *skb) { - if (ccid->ccid_hc_tx_packet_recv != NULL) - ccid->ccid_hc_tx_packet_recv(sk, skb); + if (ccid->ccid_ops->ccid_hc_tx_packet_recv != NULL) + ccid->ccid_ops->ccid_hc_tx_packet_recv(sk, skb); } static inline int ccid_hc_tx_parse_options(struct ccid *ccid, struct sock *sk, @@ -144,8 +129,8 @@ static inline int ccid_hc_tx_parse_optio unsigned char* value) { int rc = 0; - if (ccid->ccid_hc_tx_parse_options != NULL) - rc = ccid->ccid_hc_tx_parse_options(sk, option, len, idx, + if (ccid->ccid_ops->ccid_hc_tx_parse_options != NULL) + rc = ccid->ccid_ops->ccid_hc_tx_parse_options(sk, option, len, idx, value); return rc; } @@ -156,37 +141,37 @@ static inline int ccid_hc_rx_parse_optio unsigned char* value) { int rc = 0; - if (ccid->ccid_hc_rx_parse_options != NULL) - rc = ccid->ccid_hc_rx_parse_options(sk, option, len, idx, value); + if (ccid->ccid_ops->ccid_hc_rx_parse_options != NULL) + rc = ccid->ccid_ops->ccid_hc_rx_parse_options(sk, option, len, idx, value); return rc; } static inline void ccid_hc_tx_insert_options(struct ccid *ccid, struct sock *sk, struct sk_buff *skb) { - if (ccid->ccid_hc_tx_insert_options != NULL) - ccid->ccid_hc_tx_insert_options(sk, skb); + if (ccid->ccid_ops->ccid_hc_tx_insert_options != NULL) + ccid->ccid_ops->ccid_hc_tx_insert_options(sk, skb); } static inline void ccid_hc_rx_insert_options(struct ccid *ccid, struct sock *sk, struct sk_buff *skb) { - if (ccid->ccid_hc_rx_insert_options != NULL) - ccid->ccid_hc_rx_insert_options(sk, skb); + if (ccid->ccid_ops->ccid_hc_rx_insert_options != NULL) + ccid->ccid_ops->ccid_hc_rx_insert_options(sk, skb); } static inline void ccid_hc_rx_get_info(struct ccid *ccid, struct sock *sk, struct tcp_info *info) { - if (ccid->ccid_hc_rx_get_info != NULL) - ccid->ccid_hc_rx_get_info(sk, info); + if (ccid->ccid_ops->ccid_hc_rx_get_info != NULL) + ccid->ccid_ops->ccid_hc_rx_get_info(sk, info); } static inline void ccid_hc_tx_get_info(struct ccid *ccid, struct sock *sk, struct tcp_info *info) { - if (ccid->ccid_hc_tx_get_info != NULL) - ccid->ccid_hc_tx_get_info(sk, info); + if (ccid->ccid_ops->ccid_hc_tx_get_info != NULL) + ccid->ccid_ops->ccid_hc_tx_get_info(sk, info); } static inline int ccid_hc_rx_getsockopt(struct ccid *ccid, struct sock *sk, @@ -194,8 +179,8 @@ static inline int ccid_hc_rx_getsockopt( u32 __user *optval, int __user *optlen) { int rc = -ENOPROTOOPT; - if (ccid->ccid_hc_rx_getsockopt != NULL) - rc = ccid->ccid_hc_rx_getsockopt(sk, optname, len, + if (ccid->ccid_ops->ccid_hc_rx_getsockopt != NULL) + rc = ccid->ccid_ops->ccid_hc_rx_getsockopt(sk, optname, len, optval, optlen); return rc; } @@ -205,8 +190,8 @@ static inline int ccid_hc_tx_getsockopt( u32 __user *optval, int __user *optlen) { int rc = -ENOPROTOOPT; - if (ccid->ccid_hc_tx_getsockopt != NULL) - rc = ccid->ccid_hc_tx_getsockopt(sk, optname, len, + if (ccid->ccid_ops->ccid_hc_tx_getsockopt != NULL) + rc = ccid->ccid_ops->ccid_hc_tx_getsockopt(sk, optname, len, optval, optlen); return rc; } diff -puN /dev/null net/dccp/ccids/ccid2.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/ccids/ccid2.c 2006-02-27 22:37:47.000000000 -0800 @@ -0,0 +1,803 @@ +/* + * net/dccp/ccids/ccid2.c + * + * Copyright (c) 2005, 2006 Andrea Bittau + * + * Changes to meet Linux coding standards, and DCCP infrastructure fixes. + * + * Copyright (c) 2006 Arnaldo Carvalho de Melo + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +/* + * This implementation should follow: draft-ietf-dccp-ccid2-10.txt + * + * BUGS: + * - sequence number wrapping + * - jiffies wrapping + */ + +#include +#include "../ccid.h" +#include "../dccp.h" +#include "ccid2.h" + +static int ccid2_debug; + +#if 0 +#define CCID2_DEBUG +#endif + +#ifdef CCID2_DEBUG +#define ccid2_pr_debug(format, a...) \ + do { if (ccid2_debug) \ + printk(KERN_DEBUG "%s: " format, __FUNCTION__, ##a); \ + } while (0) +#else +#define ccid2_pr_debug(format, a...) +#endif + +static const int ccid2_seq_len = 128; + +#ifdef CCID2_DEBUG +static void ccid2_hc_tx_check_sanity(const struct ccid2_hc_tx_sock *hctx) +{ + int len = 0; + struct ccid2_seq *seqp; + int pipe = 0; + + seqp = hctx->ccid2hctx_seqh; + + /* there is data in the chain */ + if (seqp != hctx->ccid2hctx_seqt) { + seqp = seqp->ccid2s_prev; + len++; + if (!seqp->ccid2s_acked) + pipe++; + + while (seqp != hctx->ccid2hctx_seqt) { + struct ccid2_seq *prev; + + prev = seqp->ccid2s_prev; + len++; + if (!prev->ccid2s_acked) + pipe++; + + /* packets are sent sequentially */ + BUG_ON(seqp->ccid2s_seq <= prev->ccid2s_seq); + BUG_ON(seqp->ccid2s_sent < prev->ccid2s_sent); + BUG_ON(len > ccid2_seq_len); + + seqp = prev; + } + } + + BUG_ON(pipe != hctx->ccid2hctx_pipe); + ccid2_pr_debug("len of chain=%d\n", len); + + do { + seqp = seqp->ccid2s_prev; + len++; + BUG_ON(len > ccid2_seq_len); + } while(seqp != hctx->ccid2hctx_seqh); + + BUG_ON(len != ccid2_seq_len); + ccid2_pr_debug("total len=%d\n", len); +} +#else +#define ccid2_hc_tx_check_sanity(hctx) do {} while (0) +#endif + +static int ccid2_hc_tx_send_packet(struct sock *sk, + struct sk_buff *skb, int len) +{ + struct ccid2_hc_tx_sock *hctx; + + switch (DCCP_SKB_CB(skb)->dccpd_type) { + case 0: /* XXX data packets from userland come through like this */ + case DCCP_PKT_DATA: + case DCCP_PKT_DATAACK: + break; + /* No congestion control on other packets */ + default: + return 0; + } + + hctx = ccid2_hc_tx_sk(sk); + + ccid2_pr_debug("pipe=%d cwnd=%d\n", hctx->ccid2hctx_pipe, + hctx->ccid2hctx_cwnd); + + if (hctx->ccid2hctx_pipe < hctx->ccid2hctx_cwnd) { + /* OK we can send... make sure previous packet was sent off */ + if (!hctx->ccid2hctx_sendwait) { + hctx->ccid2hctx_sendwait = 1; + return 0; + } + } + + return 100; /* XXX */ +} + +static void ccid2_change_l_ack_ratio(struct sock *sk, int val) +{ + struct dccp_sock *dp = dccp_sk(sk); + /* + * XXX I don't really agree with val != 2. If cwnd is 1, ack ratio + * should be 1... it shouldn't be allowed to become 2. + * -sorbo. + */ + if (val != 2) { + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + int max = hctx->ccid2hctx_cwnd / 2; + + /* round up */ + if (hctx->ccid2hctx_cwnd & 1) + max++; + + if (val > max) + val = max; + } + + ccid2_pr_debug("changing local ack ratio to %d\n", val); + WARN_ON(val <= 0); + dp->dccps_l_ack_ratio = val; +} + +static void ccid2_change_cwnd(struct sock *sk, int val) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + if (val == 0) + val = 1; + + /* XXX do we need to change ack ratio? */ + ccid2_pr_debug("change cwnd to %d\n", val); + + BUG_ON(val < 1); + hctx->ccid2hctx_cwnd = val; +} + +static void ccid2_start_rto_timer(struct sock *sk); + +static void ccid2_hc_tx_rto_expire(unsigned long data) +{ + struct sock *sk = (struct sock *)data; + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + long s; + + /* XXX I don't think i'm locking correctly + * -sorbo. + */ + bh_lock_sock(sk); + if (sock_owned_by_user(sk)) { + sk_reset_timer(sk, &hctx->ccid2hctx_rtotimer, + jiffies + HZ / 5); + goto out; + } + + ccid2_pr_debug("RTO_EXPIRE\n"); + + ccid2_hc_tx_check_sanity(hctx); + + /* back-off timer */ + hctx->ccid2hctx_rto <<= 1; + + s = hctx->ccid2hctx_rto / HZ; + if (s > 60) + hctx->ccid2hctx_rto = 60 * HZ; + + ccid2_start_rto_timer(sk); + + /* adjust pipe, cwnd etc */ + hctx->ccid2hctx_pipe = 0; + hctx->ccid2hctx_ssthresh = hctx->ccid2hctx_cwnd >> 1; + if (hctx->ccid2hctx_ssthresh < 2) + hctx->ccid2hctx_ssthresh = 2; + ccid2_change_cwnd(sk, 1); + + /* clear state about stuff we sent */ + hctx->ccid2hctx_seqt = hctx->ccid2hctx_seqh; + hctx->ccid2hctx_ssacks = 0; + hctx->ccid2hctx_acks = 0; + hctx->ccid2hctx_sent = 0; + + /* clear ack ratio state. */ + hctx->ccid2hctx_arsent = 0; + hctx->ccid2hctx_ackloss = 0; + hctx->ccid2hctx_rpseq = 0; + hctx->ccid2hctx_rpdupack = -1; + ccid2_change_l_ack_ratio(sk, 1); + ccid2_hc_tx_check_sanity(hctx); +out: + bh_unlock_sock(sk); + sock_put(sk); +} + +static void ccid2_start_rto_timer(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + ccid2_pr_debug("setting RTO timeout=%ld\n", hctx->ccid2hctx_rto); + + BUG_ON(timer_pending(&hctx->ccid2hctx_rtotimer)); + sk_reset_timer(sk, &hctx->ccid2hctx_rtotimer, + jiffies + hctx->ccid2hctx_rto); +} + +static void ccid2_hc_tx_packet_sent(struct sock *sk, int more, int len) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + u64 seq; + + ccid2_hc_tx_check_sanity(hctx); + + BUG_ON(!hctx->ccid2hctx_sendwait); + hctx->ccid2hctx_sendwait = 0; + hctx->ccid2hctx_pipe++; + BUG_ON(hctx->ccid2hctx_pipe < 0); + + /* There is an issue. What if another packet is sent between + * packet_send() and packet_sent(). Then the sequence number would be + * wrong. + * -sorbo. + */ + seq = dp->dccps_gss; + + hctx->ccid2hctx_seqh->ccid2s_seq = seq; + hctx->ccid2hctx_seqh->ccid2s_acked = 0; + hctx->ccid2hctx_seqh->ccid2s_sent = jiffies; + hctx->ccid2hctx_seqh = hctx->ccid2hctx_seqh->ccid2s_next; + + ccid2_pr_debug("cwnd=%d pipe=%d\n", hctx->ccid2hctx_cwnd, + hctx->ccid2hctx_pipe); + + if (hctx->ccid2hctx_seqh == hctx->ccid2hctx_seqt) { + /* XXX allocate more space */ + WARN_ON(1); + } + + hctx->ccid2hctx_sent++; + + /* Ack Ratio. Need to maintain a concept of how many windows we sent */ + hctx->ccid2hctx_arsent++; + /* We had an ack loss in this window... */ + if (hctx->ccid2hctx_ackloss) { + if (hctx->ccid2hctx_arsent >= hctx->ccid2hctx_cwnd) { + hctx->ccid2hctx_arsent = 0; + hctx->ccid2hctx_ackloss = 0; + } + } + /* No acks lost up to now... */ + else { + /* decrease ack ratio if enough packets were sent */ + if (dp->dccps_l_ack_ratio > 1) { + /* XXX don't calculate denominator each time */ + int denom; + + denom = dp->dccps_l_ack_ratio * dp->dccps_l_ack_ratio - + dp->dccps_l_ack_ratio; + denom = hctx->ccid2hctx_cwnd * hctx->ccid2hctx_cwnd / denom; + + if (hctx->ccid2hctx_arsent >= denom) { + ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio - 1); + hctx->ccid2hctx_arsent = 0; + } + } + /* we can't increase ack ratio further [1] */ + else { + hctx->ccid2hctx_arsent = 0; /* or maybe set it to cwnd*/ + } + } + + /* setup RTO timer */ + if (!timer_pending(&hctx->ccid2hctx_rtotimer)) { + ccid2_start_rto_timer(sk); + } +#ifdef CCID2_DEBUG + ccid2_pr_debug("pipe=%d\n", hctx->ccid2hctx_pipe); + ccid2_pr_debug("Sent: seq=%llu\n", seq); + do { + struct ccid2_seq *seqp = hctx->ccid2hctx_seqt; + + while (seqp != hctx->ccid2hctx_seqh) { + ccid2_pr_debug("out seq=%llu acked=%d time=%lu\n", + seqp->ccid2s_seq, seqp->ccid2s_acked, + seqp->ccid2s_sent); + seqp = seqp->ccid2s_next; + } + } while(0); + ccid2_pr_debug("=========\n"); + ccid2_hc_tx_check_sanity(hctx); +#endif +} + +/* XXX Lame code duplication! + * returns -1 if none was found. + * else returns the next offset to use in the function call. + */ +static int ccid2_ackvector(struct sock *sk, struct sk_buff *skb, int offset, + unsigned char **vec, unsigned char *veclen) +{ + const struct dccp_hdr *dh = dccp_hdr(skb); + unsigned char *options = (unsigned char *)dh + dccp_hdr_len(skb); + unsigned char *opt_ptr; + const unsigned char *opt_end = (unsigned char *)dh + + (dh->dccph_doff * 4); + unsigned char opt, len; + unsigned char *value; + + BUG_ON(offset < 0); + options += offset; + opt_ptr = options; + if (opt_ptr >= opt_end) + return -1; + + while (opt_ptr != opt_end) { + opt = *opt_ptr++; + len = 0; + value = NULL; + + /* Check if this isn't a single byte option */ + if (opt > DCCPO_MAX_RESERVED) { + if (opt_ptr == opt_end) + goto out_invalid_option; + + len = *opt_ptr++; + if (len < 3) + goto out_invalid_option; + /* + * Remove the type and len fields, leaving + * just the value size + */ + len -= 2; + value = opt_ptr; + opt_ptr += len; + + if (opt_ptr > opt_end) + goto out_invalid_option; + } + + switch (opt) { + case DCCPO_ACK_VECTOR_0: + case DCCPO_ACK_VECTOR_1: + *vec = value; + *veclen = len; + return offset + (opt_ptr - options); + break; + } + } + + return -1; + +out_invalid_option: + BUG_ON(1); /* should never happen... options were previously parsed ! */ + return -1; +} + +static void ccid2_hc_tx_kill_rto_timer(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + sk_stop_timer(sk, &hctx->ccid2hctx_rtotimer); + ccid2_pr_debug("deleted RTO timer\n"); +} + +static inline void ccid2_new_ack(struct sock *sk, + struct ccid2_seq *seqp, + unsigned int *maxincr) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + /* slow start */ + if (hctx->ccid2hctx_cwnd < hctx->ccid2hctx_ssthresh) { + hctx->ccid2hctx_acks = 0; + + /* We can increase cwnd at most maxincr [ack_ratio/2] */ + if (*maxincr) { + /* increase every 2 acks */ + hctx->ccid2hctx_ssacks++; + if (hctx->ccid2hctx_ssacks == 2) { + ccid2_change_cwnd(sk, hctx->ccid2hctx_cwnd + 1); + hctx->ccid2hctx_ssacks = 0; + *maxincr = *maxincr - 1; + } + } + /* increased cwnd enough for this single ack */ + else { + hctx->ccid2hctx_ssacks = 0; + } + } + else { + hctx->ccid2hctx_ssacks = 0; + hctx->ccid2hctx_acks++; + + if (hctx->ccid2hctx_acks >= hctx->ccid2hctx_cwnd) { + ccid2_change_cwnd(sk, hctx->ccid2hctx_cwnd + 1); + hctx->ccid2hctx_acks = 0; + } + } + + /* update RTO */ + if (hctx->ccid2hctx_srtt == -1 || + (jiffies - hctx->ccid2hctx_lastrtt) >= hctx->ccid2hctx_srtt) { + unsigned long r = jiffies - seqp->ccid2s_sent; + int s; + + /* first measurement */ + if (hctx->ccid2hctx_srtt == -1) { + ccid2_pr_debug("R: %lu Time=%lu seq=%llu\n", + r, jiffies, seqp->ccid2s_seq); + hctx->ccid2hctx_srtt = r; + hctx->ccid2hctx_rttvar = r >> 1; + } + else { + /* RTTVAR */ + long tmp = hctx->ccid2hctx_srtt - r; + if (tmp < 0) + tmp *= -1; + + tmp >>= 2; + hctx->ccid2hctx_rttvar *= 3; + hctx->ccid2hctx_rttvar >>= 2; + hctx->ccid2hctx_rttvar += tmp; + + /* SRTT */ + hctx->ccid2hctx_srtt *= 7; + hctx->ccid2hctx_srtt >>= 3; + tmp = r >> 3; + hctx->ccid2hctx_srtt += tmp; + } + s = hctx->ccid2hctx_rttvar << 2; + /* clock granularity is 1 when based on jiffies */ + if (!s) + s = 1; + hctx->ccid2hctx_rto = hctx->ccid2hctx_srtt + s; + + /* must be at least a second */ + s = hctx->ccid2hctx_rto / HZ; + /* DCCP doesn't require this [but I like it cuz my code sux] */ +#if 1 + if (s < 1) + hctx->ccid2hctx_rto = HZ; +#endif + /* max 60 seconds */ + if (s > 60) + hctx->ccid2hctx_rto = HZ * 60; + + hctx->ccid2hctx_lastrtt = jiffies; + + ccid2_pr_debug("srtt: %ld rttvar: %ld rto: %ld (HZ=%d) R=%lu\n", + hctx->ccid2hctx_srtt, hctx->ccid2hctx_rttvar, + hctx->ccid2hctx_rto, HZ, r); + hctx->ccid2hctx_sent = 0; + } + + /* we got a new ack, so re-start RTO timer */ + ccid2_hc_tx_kill_rto_timer(sk); + ccid2_start_rto_timer(sk); +} + +static void ccid2_hc_tx_dec_pipe(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + hctx->ccid2hctx_pipe--; + BUG_ON(hctx->ccid2hctx_pipe < 0); + + if (hctx->ccid2hctx_pipe == 0) + ccid2_hc_tx_kill_rto_timer(sk); +} + +static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + u64 ackno, seqno; + struct ccid2_seq *seqp; + unsigned char *vector; + unsigned char veclen; + int offset = 0; + int done = 0; + int loss = 0; + unsigned int maxincr = 0; + + ccid2_hc_tx_check_sanity(hctx); + /* check reverse path congestion */ + seqno = DCCP_SKB_CB(skb)->dccpd_seq; + + /* XXX this whole "algorithm" is broken. Need to fix it to keep track + * of the seqnos of the dupacks so that rpseq and rpdupack are correct + * -sorbo. + */ + /* need to bootstrap */ + if (hctx->ccid2hctx_rpdupack == -1) { + hctx->ccid2hctx_rpdupack = 0; + hctx->ccid2hctx_rpseq = seqno; + } + else { + /* check if packet is consecutive */ + if ((hctx->ccid2hctx_rpseq + 1) == seqno) { + hctx->ccid2hctx_rpseq++; + } + /* it's a later packet */ + else if (after48(seqno, hctx->ccid2hctx_rpseq)) { + hctx->ccid2hctx_rpdupack++; + + /* check if we got enough dupacks */ + if (hctx->ccid2hctx_rpdupack >= + hctx->ccid2hctx_numdupack) { + + hctx->ccid2hctx_rpdupack = -1; /* XXX lame */ + hctx->ccid2hctx_rpseq = 0; + + ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio << 1); + } + } + } + + /* check forward path congestion */ + /* still didn't send out new data packets */ + if (hctx->ccid2hctx_seqh == hctx->ccid2hctx_seqt) + return; + + switch (DCCP_SKB_CB(skb)->dccpd_type) { + case DCCP_PKT_ACK: + case DCCP_PKT_DATAACK: + break; + + default: + return; + } + + ackno = DCCP_SKB_CB(skb)->dccpd_ack_seq; + seqp = hctx->ccid2hctx_seqh->ccid2s_prev; + + /* If in slow-start, cwnd can increase at most Ack Ratio / 2 packets for + * this single ack. I round up. + * -sorbo. + */ + maxincr = dp->dccps_l_ack_ratio >> 1; + maxincr++; + + /* go through all ack vectors */ + while ((offset = ccid2_ackvector(sk, skb, offset, + &vector, &veclen)) != -1) { + /* go through this ack vector */ + while (veclen--) { + const u8 rl = *vector & DCCP_ACKVEC_LEN_MASK; + u64 ackno_end_rl; + + dccp_set_seqno(&ackno_end_rl, ackno - rl); + ccid2_pr_debug("ackvec start:%llu end:%llu\n", ackno, + ackno_end_rl); + /* if the seqno we are analyzing is larger than the + * current ackno, then move towards the tail of our + * seqnos. + */ + while (after48(seqp->ccid2s_seq, ackno)) { + if (seqp == hctx->ccid2hctx_seqt) { + done = 1; + break; + } + seqp = seqp->ccid2s_prev; + } + if (done) + break; + + /* check all seqnos in the range of the vector + * run length + */ + while (between48(seqp->ccid2s_seq,ackno_end_rl,ackno)) { + const u8 state = (*vector & + DCCP_ACKVEC_STATE_MASK) >> 6; + + /* new packet received or marked */ + if (state != DCCP_ACKVEC_STATE_NOT_RECEIVED && + !seqp->ccid2s_acked) { + if (state == + DCCP_ACKVEC_STATE_ECN_MARKED) { + loss = 1; + } + else { + ccid2_new_ack(sk, seqp, + &maxincr); + } + + seqp->ccid2s_acked = 1; + ccid2_pr_debug("Got ack for %llu\n", + seqp->ccid2s_seq); + ccid2_hc_tx_dec_pipe(sk); + } + if (seqp == hctx->ccid2hctx_seqt) { + done = 1; + break; + } + seqp = seqp->ccid2s_next; + } + if (done) + break; + + + dccp_set_seqno(&ackno, ackno_end_rl - 1); + vector++; + } + if (done) + break; + } + + /* The state about what is acked should be correct now + * Check for NUMDUPACK + */ + seqp = hctx->ccid2hctx_seqh->ccid2s_prev; + done = 0; + while (1) { + if (seqp->ccid2s_acked) { + done++; + if (done == hctx->ccid2hctx_numdupack) { + break; + } + } + if (seqp == hctx->ccid2hctx_seqt) { + break; + } + seqp = seqp->ccid2s_prev; + } + + /* If there are at least 3 acknowledgements, anything unacknowledged + * below the last sequence number is considered lost + */ + if (done == hctx->ccid2hctx_numdupack) { + struct ccid2_seq *last_acked = seqp; + + /* check for lost packets */ + while (1) { + if (!seqp->ccid2s_acked) { + loss = 1; + ccid2_hc_tx_dec_pipe(sk); + } + if (seqp == hctx->ccid2hctx_seqt) + break; + seqp = seqp->ccid2s_prev; + } + + hctx->ccid2hctx_seqt = last_acked; + } + + /* trim acked packets in tail */ + while (hctx->ccid2hctx_seqt != hctx->ccid2hctx_seqh) { + if (!hctx->ccid2hctx_seqt->ccid2s_acked) + break; + + hctx->ccid2hctx_seqt = hctx->ccid2hctx_seqt->ccid2s_next; + } + + if (loss) { + /* XXX do bit shifts guarantee a 0 as the new bit? */ + ccid2_change_cwnd(sk, hctx->ccid2hctx_cwnd >> 1); + hctx->ccid2hctx_ssthresh = hctx->ccid2hctx_cwnd; + if (hctx->ccid2hctx_ssthresh < 2) + hctx->ccid2hctx_ssthresh = 2; + } + + ccid2_hc_tx_check_sanity(hctx); +} + +static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid_priv(ccid); + int seqcount = ccid2_seq_len; + int i; + + /* XXX init variables with proper values */ + hctx->ccid2hctx_cwnd = 1; + hctx->ccid2hctx_ssthresh = 10; + hctx->ccid2hctx_numdupack = 3; + + /* XXX init ~ to window size... */ + hctx->ccid2hctx_seqbuf = kmalloc(sizeof(*hctx->ccid2hctx_seqbuf) * + seqcount, gfp_any()); + if (hctx->ccid2hctx_seqbuf == NULL) + return -ENOMEM; + + for (i = 0; i < (seqcount - 1); i++) { + hctx->ccid2hctx_seqbuf[i].ccid2s_next = + &hctx->ccid2hctx_seqbuf[i + 1]; + hctx->ccid2hctx_seqbuf[i + 1].ccid2s_prev = + &hctx->ccid2hctx_seqbuf[i]; + } + hctx->ccid2hctx_seqbuf[seqcount - 1].ccid2s_next = + hctx->ccid2hctx_seqbuf; + hctx->ccid2hctx_seqbuf->ccid2s_prev = + &hctx->ccid2hctx_seqbuf[seqcount - 1]; + + hctx->ccid2hctx_seqh = hctx->ccid2hctx_seqbuf; + hctx->ccid2hctx_seqt = hctx->ccid2hctx_seqh; + hctx->ccid2hctx_sent = 0; + hctx->ccid2hctx_rto = 3 * HZ; + hctx->ccid2hctx_srtt = -1; + hctx->ccid2hctx_rttvar = -1; + hctx->ccid2hctx_lastrtt = 0; + hctx->ccid2hctx_rpdupack = -1; + + hctx->ccid2hctx_rtotimer.function = &ccid2_hc_tx_rto_expire; + hctx->ccid2hctx_rtotimer.data = (unsigned long)sk; + init_timer(&hctx->ccid2hctx_rtotimer); + + ccid2_hc_tx_check_sanity(hctx); + return 0; +} + +static void ccid2_hc_tx_exit(struct sock *sk) +{ + struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk); + + ccid2_hc_tx_kill_rto_timer(sk); + kfree(hctx->ccid2hctx_seqbuf); + hctx->ccid2hctx_seqbuf = NULL; +} + +static void ccid2_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) +{ + const struct dccp_sock *dp = dccp_sk(sk); + struct ccid2_hc_rx_sock *hcrx = ccid2_hc_rx_sk(sk); + + switch (DCCP_SKB_CB(skb)->dccpd_type) { + case DCCP_PKT_DATA: + case DCCP_PKT_DATAACK: + hcrx->ccid2hcrx_data++; + if (hcrx->ccid2hcrx_data >= dp->dccps_r_ack_ratio) { + dccp_send_ack(sk); + hcrx->ccid2hcrx_data = 0; + } + break; + } +} + +static struct ccid_operations ccid2 = { + .ccid_id = 2, + .ccid_name = "ccid2", + .ccid_owner = THIS_MODULE, + .ccid_hc_tx_obj_size = sizeof(struct ccid2_hc_tx_sock), + .ccid_hc_tx_init = ccid2_hc_tx_init, + .ccid_hc_tx_exit = ccid2_hc_tx_exit, + .ccid_hc_tx_send_packet = ccid2_hc_tx_send_packet, + .ccid_hc_tx_packet_sent = ccid2_hc_tx_packet_sent, + .ccid_hc_tx_packet_recv = ccid2_hc_tx_packet_recv, + .ccid_hc_rx_obj_size = sizeof(struct ccid2_hc_rx_sock), + .ccid_hc_rx_packet_recv = ccid2_hc_rx_packet_recv, +}; + +module_param(ccid2_debug, int, 0444); +MODULE_PARM_DESC(ccid2_debug, "Enable debug messages"); + +static __init int ccid2_module_init(void) +{ + return ccid_register(&ccid2); +} +module_init(ccid2_module_init); + +static __exit void ccid2_module_exit(void) +{ + ccid_unregister(&ccid2); +} +module_exit(ccid2_module_exit); + +MODULE_AUTHOR("Andrea Bittau "); +MODULE_DESCRIPTION("DCCP TCP CCID2 CCID"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("net-dccp-ccid-2"); diff -puN /dev/null net/dccp/ccids/ccid2.h --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/ccids/ccid2.h 2006-02-27 22:37:47.000000000 -0800 @@ -0,0 +1,85 @@ +/* + * net/dccp/ccids/ccid2.h + * + * Copyright (c) 2005 Andrea Bittau + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ +#ifndef _DCCP_CCID2_H_ +#define _DCCP_CCID2_H_ + +#include +#include +#include +#include "../ccid.h" + +struct sock; + +struct ccid2_seq { + u64 ccid2s_seq; + unsigned long ccid2s_sent; + int ccid2s_acked; + struct ccid2_seq *ccid2s_prev; + struct ccid2_seq *ccid2s_next; +}; + +/** struct ccid2_hc_tx_sock - CCID2 TX half connection + * + * @ccid2hctx_ssacks - ACKs recv in slow start + * @ccid2hctx_acks - ACKS recv in AI phase + * @ccid2hctx_sent - packets sent in this window + * @ccid2hctx_lastrtt -time RTT was last measured + * @ccid2hctx_arsent - packets sent [ack ratio] + * @ccid2hctx_ackloss - ack was lost in this win + * @ccid2hctx_rpseq - last consecutive seqno + * @ccid2hctx_rpdupack - dupacks since rpseq +*/ +struct ccid2_hc_tx_sock { + int ccid2hctx_cwnd; + int ccid2hctx_ssacks; + int ccid2hctx_acks; + int ccid2hctx_ssthresh; + int ccid2hctx_pipe; + int ccid2hctx_numdupack; + struct ccid2_seq *ccid2hctx_seqbuf; + struct ccid2_seq *ccid2hctx_seqh; + struct ccid2_seq *ccid2hctx_seqt; + long ccid2hctx_rto; + long ccid2hctx_srtt; + long ccid2hctx_rttvar; + int ccid2hctx_sent; + unsigned long ccid2hctx_lastrtt; + struct timer_list ccid2hctx_rtotimer; + unsigned long ccid2hctx_arsent; + int ccid2hctx_ackloss; + u64 ccid2hctx_rpseq; + int ccid2hctx_rpdupack; + int ccid2hctx_sendwait; +}; + +struct ccid2_hc_rx_sock { + int ccid2hcrx_data; +}; + +static inline struct ccid2_hc_tx_sock *ccid2_hc_tx_sk(const struct sock *sk) +{ + return ccid_priv(dccp_sk(sk)->dccps_hc_tx_ccid); +} + +static inline struct ccid2_hc_rx_sock *ccid2_hc_rx_sk(const struct sock *sk) +{ + return ccid_priv(dccp_sk(sk)->dccps_hc_rx_ccid); +} +#endif /* _DCCP_CCID2_H_ */ diff -puN net/dccp/ccids/ccid3.c~git-net net/dccp/ccids/ccid3.c --- devel/net/dccp/ccids/ccid3.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/ccid3.c 2006-02-27 22:37:47.000000000 -0800 @@ -46,7 +46,7 @@ * Reason for maths here is to avoid 32 bit overflow when a is big. * With this we get close to the limit. */ -static inline u32 usecs_div(const u32 a, const u32 b) +static u32 usecs_div(const u32 a, const u32 b) { const u32 div = a < (UINT_MAX / (USEC_PER_SEC / 10)) ? 10 : a < (UINT_MAX / (USEC_PER_SEC / 50)) ? 50 : @@ -76,15 +76,6 @@ static struct dccp_tx_hist *ccid3_tx_his static struct dccp_rx_hist *ccid3_rx_hist; static struct dccp_li_hist *ccid3_li_hist; -static int ccid3_init(struct sock *sk) -{ - return 0; -} - -static void ccid3_exit(struct sock *sk) -{ -} - /* TFRC sender states */ enum ccid3_hc_tx_states { TFRC_SSTATE_NO_SENT = 1, @@ -107,8 +98,8 @@ static const char *ccid3_tx_state_name(e } #endif -static inline void ccid3_hc_tx_set_state(struct sock *sk, - enum ccid3_hc_tx_states state) +static void ccid3_hc_tx_set_state(struct sock *sk, + enum ccid3_hc_tx_states state) { struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk); enum ccid3_hc_tx_states oldstate = hctx->ccid3hctx_state; @@ -316,8 +307,6 @@ static int ccid3_hc_tx_send_packet(struc switch (hctx->ccid3hctx_state) { case TFRC_SSTATE_NO_SENT: - hctx->ccid3hctx_no_feedback_timer.function = ccid3_hc_tx_no_feedback_timer; - hctx->ccid3hctx_no_feedback_timer.data = (unsigned long)sk; sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer, jiffies + usecs_to_jiffies(TFRC_INITIAL_TIMEOUT)); hctx->ccid3hctx_last_win_count = 0; @@ -626,7 +615,7 @@ static int ccid3_hc_tx_parse_options(str __FUNCTION__, dccp_role(sk), sk); rc = -EINVAL; } else { - opt_recv->ccid3or_loss_event_rate = ntohl(*(u32 *)value); + opt_recv->ccid3or_loss_event_rate = ntohl(*(__be32 *)value); ccid3_pr_debug("%s, sk=%p, LOSS_EVENT_RATE=%u\n", dccp_role(sk), sk, opt_recv->ccid3or_loss_event_rate); @@ -647,7 +636,7 @@ static int ccid3_hc_tx_parse_options(str __FUNCTION__, dccp_role(sk), sk); rc = -EINVAL; } else { - opt_recv->ccid3or_receive_rate = ntohl(*(u32 *)value); + opt_recv->ccid3or_receive_rate = ntohl(*(__be32 *)value); ccid3_pr_debug("%s, sk=%p, RECEIVE_RATE=%u\n", dccp_role(sk), sk, opt_recv->ccid3or_receive_rate); @@ -658,17 +647,10 @@ static int ccid3_hc_tx_parse_options(str return rc; } -static int ccid3_hc_tx_init(struct sock *sk) +static int ccid3_hc_tx_init(struct ccid *ccid, struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); - struct ccid3_hc_tx_sock *hctx; - - dp->dccps_hc_tx_ccid_private = kmalloc(sizeof(*hctx), gfp_any()); - if (dp->dccps_hc_tx_ccid_private == NULL) - return -ENOMEM; - - hctx = ccid3_hc_tx_sk(sk); - memset(hctx, 0, sizeof(*hctx)); + struct ccid3_hc_tx_sock *hctx = ccid_priv(ccid); if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) @@ -681,6 +663,9 @@ static int ccid3_hc_tx_init(struct sock hctx->ccid3hctx_t_rto = USEC_PER_SEC; hctx->ccid3hctx_state = TFRC_SSTATE_NO_SENT; INIT_LIST_HEAD(&hctx->ccid3hctx_hist); + + hctx->ccid3hctx_no_feedback_timer.function = ccid3_hc_tx_no_feedback_timer; + hctx->ccid3hctx_no_feedback_timer.data = (unsigned long)sk; init_timer(&hctx->ccid3hctx_no_feedback_timer); return 0; @@ -688,7 +673,6 @@ static int ccid3_hc_tx_init(struct sock static void ccid3_hc_tx_exit(struct sock *sk) { - struct dccp_sock *dp = dccp_sk(sk); struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk); BUG_ON(hctx == NULL); @@ -698,9 +682,6 @@ static void ccid3_hc_tx_exit(struct sock /* Empty packet history */ dccp_tx_hist_purge(ccid3_tx_hist, &hctx->ccid3hctx_hist); - - kfree(dp->dccps_hc_tx_ccid_private); - dp->dccps_hc_tx_ccid_private = NULL; } /* @@ -727,8 +708,8 @@ static const char *ccid3_rx_state_name(e } #endif -static inline void ccid3_hc_rx_set_state(struct sock *sk, - enum ccid3_hc_rx_states state) +static void ccid3_hc_rx_set_state(struct sock *sk, + enum ccid3_hc_rx_states state) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); enum ccid3_hc_rx_states oldstate = hcrx->ccid3hcrx_state; @@ -796,7 +777,7 @@ static void ccid3_hc_rx_send_feedback(st static void ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) { const struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - u32 x_recv, pinv; + __be32 x_recv, pinv; BUG_ON(hcrx == NULL); @@ -1043,20 +1024,13 @@ static void ccid3_hc_rx_packet_recv(stru } } -static int ccid3_hc_rx_init(struct sock *sk) +static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); - struct ccid3_hc_rx_sock *hcrx; + struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid); ccid3_pr_debug("%s, sk=%p\n", dccp_role(sk), sk); - dp->dccps_hc_rx_ccid_private = kmalloc(sizeof(*hcrx), gfp_any()); - if (dp->dccps_hc_rx_ccid_private == NULL) - return -ENOMEM; - - hcrx = ccid3_hc_rx_sk(sk); - memset(hcrx, 0, sizeof(*hcrx)); - if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) hcrx->ccid3hcrx_s = dp->dccps_packet_size; @@ -1075,7 +1049,6 @@ static int ccid3_hc_rx_init(struct sock static void ccid3_hc_rx_exit(struct sock *sk) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct dccp_sock *dp = dccp_sk(sk); BUG_ON(hcrx == NULL); @@ -1086,9 +1059,6 @@ static void ccid3_hc_rx_exit(struct sock /* Empty loss interval history */ dccp_li_hist_purge(ccid3_li_hist, &hcrx->ccid3hcrx_li_hist); - - kfree(dp->dccps_hc_rx_ccid_private); - dp->dccps_hc_rx_ccid_private = NULL; } static void ccid3_hc_rx_get_info(struct sock *sk, struct tcp_info *info) @@ -1174,12 +1144,11 @@ static int ccid3_hc_tx_getsockopt(struct return 0; } -static struct ccid ccid3 = { +static struct ccid_operations ccid3 = { .ccid_id = 3, .ccid_name = "ccid3", .ccid_owner = THIS_MODULE, - .ccid_init = ccid3_init, - .ccid_exit = ccid3_exit, + .ccid_hc_tx_obj_size = sizeof(struct ccid3_hc_tx_sock), .ccid_hc_tx_init = ccid3_hc_tx_init, .ccid_hc_tx_exit = ccid3_hc_tx_exit, .ccid_hc_tx_send_packet = ccid3_hc_tx_send_packet, @@ -1187,6 +1156,7 @@ static struct ccid ccid3 = { .ccid_hc_tx_packet_recv = ccid3_hc_tx_packet_recv, .ccid_hc_tx_insert_options = ccid3_hc_tx_insert_options, .ccid_hc_tx_parse_options = ccid3_hc_tx_parse_options, + .ccid_hc_rx_obj_size = sizeof(struct ccid3_hc_rx_sock), .ccid_hc_rx_init = ccid3_hc_rx_init, .ccid_hc_rx_exit = ccid3_hc_rx_exit, .ccid_hc_rx_insert_options = ccid3_hc_rx_insert_options, @@ -1237,15 +1207,6 @@ module_init(ccid3_module_init); static __exit void ccid3_module_exit(void) { -#ifdef CONFIG_IP_DCCP_UNLOAD_HACK - /* - * Hack to use while developing, so that we get rid of the control - * sock, that is what keeps a refcount on dccp.ko -acme - */ - extern void dccp_ctl_sock_exit(void); - - dccp_ctl_sock_exit(); -#endif ccid_unregister(&ccid3); if (ccid3_tx_hist != NULL) { diff -puN net/dccp/ccids/ccid3.h~git-net net/dccp/ccids/ccid3.h --- devel/net/dccp/ccids/ccid3.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/ccid3.h 2006-02-27 22:37:47.000000000 -0800 @@ -41,6 +41,7 @@ #include #include #include +#include "../ccid.h" #define TFRC_MIN_PACKET_SIZE 16 #define TFRC_STD_PACKET_SIZE 256 @@ -135,12 +136,12 @@ struct ccid3_hc_rx_sock { static inline struct ccid3_hc_tx_sock *ccid3_hc_tx_sk(const struct sock *sk) { - return dccp_sk(sk)->dccps_hc_tx_ccid_private; + return ccid_priv(dccp_sk(sk)->dccps_hc_tx_ccid); } static inline struct ccid3_hc_rx_sock *ccid3_hc_rx_sk(const struct sock *sk) { - return dccp_sk(sk)->dccps_hc_rx_ccid_private; + return ccid_priv(dccp_sk(sk)->dccps_hc_rx_ccid); } #endif /* _DCCP_CCID3_H_ */ diff -puN net/dccp/ccids/Kconfig~git-net net/dccp/ccids/Kconfig --- devel/net/dccp/ccids/Kconfig~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/Kconfig 2006-02-27 22:37:47.000000000 -0800 @@ -1,9 +1,39 @@ menu "DCCP CCIDs Configuration (EXPERIMENTAL)" depends on IP_DCCP && EXPERIMENTAL +config IP_DCCP_CCID2 + tristate "CCID2 (TCP-Like) (EXPERIMENTAL)" + depends on IP_DCCP + def_tristate IP_DCCP + select IP_DCCP_ACKVEC + ---help--- + CCID 2, TCP-like Congestion Control, denotes Additive Increase, + Multiplicative Decrease (AIMD) congestion control with behavior + modelled directly on TCP, including congestion window, slow start, + timeouts, and so forth [RFC 2581]. CCID 2 achieves maximum + bandwidth over the long term, consistent with the use of end-to-end + congestion control, but halves its congestion window in response to + each congestion event. This leads to the abrupt rate changes + typical of TCP. Applications should use CCID 2 if they prefer + maximum bandwidth utilization to steadiness of rate. This is often + the case for applications that are not playing their data directly + to the user. For example, a hypothetical application that + transferred files over DCCP, using application-level retransmissions + for lost packets, would prefer CCID 2 to CCID 3. On-line games may + also prefer CCID 2. + + CCID 2 is further described in: + http://www.icir.org/kohler/dccp/draft-ietf-dccp-ccid2-10.txt + + This text was extracted from: + http://www.icir.org/kohler/dccp/draft-ietf-dccp-spec-13.txt + + If in doubt, say M. + config IP_DCCP_CCID3 - tristate "CCID3 (TFRC) (EXPERIMENTAL)" + tristate "CCID3 (TCP-Friendly) (EXPERIMENTAL)" depends on IP_DCCP + def_tristate IP_DCCP ---help--- CCID 3 denotes TCP-Friendly Rate Control (TFRC), an equation-based rate-controlled congestion control mechanism. TFRC is designed to @@ -15,10 +45,15 @@ config IP_DCCP_CCID3 suitable than CCID 2 for applications such streaming media where a relatively smooth sending rate is of importance. - CCID 3 is further described in [CCID 3 PROFILE]. The TFRC - congestion control algorithms were initially described in RFC 3448. + CCID 3 is further described in: + + http://www.icir.org/kohler/dccp/draft-ietf-dccp-ccid3-11.txt. + + The TFRC congestion control algorithms were initially described in + RFC 3448. - This text was extracted from draft-ietf-dccp-spec-11.txt. + This text was extracted from: + http://www.icir.org/kohler/dccp/draft-ietf-dccp-spec-13.txt If in doubt, say M. diff -puN net/dccp/ccids/Makefile~git-net net/dccp/ccids/Makefile --- devel/net/dccp/ccids/Makefile~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ccids/Makefile 2006-02-27 22:37:47.000000000 -0800 @@ -2,4 +2,8 @@ obj-$(CONFIG_IP_DCCP_CCID3) += dccp_ccid dccp_ccid3-y := ccid3.o +obj-$(CONFIG_IP_DCCP_CCID2) += dccp_ccid2.o + +dccp_ccid2-y := ccid2.o + obj-y += lib/ diff -puN net/dccp/dccp.h~git-net net/dccp/dccp.h --- devel/net/dccp/dccp.h~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/dccp.h 2006-02-27 22:37:47.000000000 -0800 @@ -59,8 +59,6 @@ extern void dccp_time_wait(struct sock * #define DCCP_RTO_MAX ((unsigned)(120 * HZ)) /* FIXME: using TCP value */ -extern struct proto dccp_prot; - /* is seq1 < seq2 ? */ static inline int before48(const u64 seq1, const u64 seq2) { @@ -140,53 +138,8 @@ extern unsigned int dccp_sync_mss(struct extern const char *dccp_packet_name(const int type); extern const char *dccp_state_name(const int state); -static inline void dccp_set_state(struct sock *sk, const int state) -{ - const int oldstate = sk->sk_state; - - dccp_pr_debug("%s(%p) %-10.10s -> %s\n", - dccp_role(sk), sk, - dccp_state_name(oldstate), dccp_state_name(state)); - WARN_ON(state == oldstate); - - switch (state) { - case DCCP_OPEN: - if (oldstate != DCCP_OPEN) - DCCP_INC_STATS(DCCP_MIB_CURRESTAB); - break; - - case DCCP_CLOSED: - if (oldstate == DCCP_CLOSING || oldstate == DCCP_OPEN) - DCCP_INC_STATS(DCCP_MIB_ESTABRESETS); - - sk->sk_prot->unhash(sk); - if (inet_csk(sk)->icsk_bind_hash != NULL && - !(sk->sk_userlocks & SOCK_BINDPORT_LOCK)) - inet_put_port(&dccp_hashinfo, sk); - /* fall through */ - default: - if (oldstate == DCCP_OPEN) - DCCP_DEC_STATS(DCCP_MIB_CURRESTAB); - } - - /* Change state AFTER socket is unhashed to avoid closed - * socket sitting in hash tables. - */ - sk->sk_state = state; -} - -static inline void dccp_done(struct sock *sk) -{ - dccp_set_state(sk, DCCP_CLOSED); - dccp_clear_xmit_timers(sk); - - sk->sk_shutdown = SHUTDOWN_MASK; - - if (!sock_flag(sk, SOCK_DEAD)) - sk->sk_state_change(sk); - else - inet_csk_destroy_sock(sk); -} +extern void dccp_set_state(struct sock *sk, const int state); +extern void dccp_done(struct sock *sk); static inline void dccp_openreq_init(struct request_sock *req, struct dccp_sock *dp, @@ -209,10 +162,6 @@ extern struct sock *dccp_create_openreq_ extern int dccp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); -extern void dccp_v4_err(struct sk_buff *skb, u32); - -extern int dccp_v4_rcv(struct sk_buff *skb); - extern struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, @@ -228,19 +177,17 @@ extern int dccp_rcv_state_process(struct extern int dccp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct dccp_hdr *dh, const unsigned len); -extern int dccp_v4_init_sock(struct sock *sk); -extern int dccp_v4_destroy_sock(struct sock *sk); +extern int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized); +extern int dccp_destroy_sock(struct sock *sk); extern void dccp_close(struct sock *sk, long timeout); extern struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst, struct request_sock *req); -extern struct sk_buff *dccp_make_reset(struct sock *sk, - struct dst_entry *dst, - enum dccp_reset_codes code); extern int dccp_connect(struct sock *sk); extern int dccp_disconnect(struct sock *sk, int flags); +extern void dccp_hash(struct sock *sk); extern void dccp_unhash(struct sock *sk); extern int dccp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); @@ -262,15 +209,14 @@ extern int dccp_v4_connect(struct soc int addr_len); extern int dccp_v4_checksum(const struct sk_buff *skb, - const u32 saddr, const u32 daddr); + const __be32 saddr, const __be32 daddr); -extern int dccp_v4_send_reset(struct sock *sk, - enum dccp_reset_codes code); +extern int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code); extern void dccp_send_close(struct sock *sk, const int active); extern int dccp_invalid_packet(struct sk_buff *skb); static inline int dccp_bad_service_code(const struct sock *sk, - const __u32 service) + const __be32 service) { const struct dccp_sock *dp = dccp_sk(sk); @@ -334,27 +280,16 @@ static inline void dccp_hdr_set_seq(stru { struct dccp_hdr_ext *dhx = (struct dccp_hdr_ext *)((void *)dh + sizeof(*dh)); - -#if defined(__LITTLE_ENDIAN_BITFIELD) - dh->dccph_seq = htonl((gss >> 32)) >> 8; -#elif defined(__BIG_ENDIAN_BITFIELD) - dh->dccph_seq = htonl((gss >> 32)); -#else -#error "Adjust your defines" -#endif + dh->dccph_seq2 = 0; + dh->dccph_seq = htons((gss >> 32) & 0xfffff); dhx->dccph_seq_low = htonl(gss & 0xffffffff); } static inline void dccp_hdr_set_ack(struct dccp_hdr_ack_bits *dhack, const u64 gsr) { -#if defined(__LITTLE_ENDIAN_BITFIELD) - dhack->dccph_ack_nr_high = htonl((gsr >> 32)) >> 8; -#elif defined(__BIG_ENDIAN_BITFIELD) - dhack->dccph_ack_nr_high = htonl((gsr >> 32)); -#else -#error "Adjust your defines" -#endif + dhack->dccph_reserved1 = 0; + dhack->dccph_ack_nr_high = htons(gsr >> 32); dhack->dccph_ack_nr_low = htonl(gsr & 0xffffffff); } @@ -402,8 +337,6 @@ extern void dccp_insert_option(struct so unsigned char option, const void *value, unsigned char len); -extern struct socket *dccp_ctl_socket; - extern void dccp_timestamp(const struct sock *sk, struct timeval *tv); static inline suseconds_t timeval_usecs(const struct timeval *tv) @@ -444,4 +377,18 @@ static inline void timeval_sub_usecs(str } } +#ifdef CONFIG_SYSCTL +extern int dccp_sysctl_init(void); +extern void dccp_sysctl_exit(void); +#else +static inline int dccp_sysctl_init(void) +{ + return 0; +} + +static inline void dccp_sysctl_exit(void) +{ +} +#endif + #endif /* _DCCP_H */ diff -puN /dev/null net/dccp/feat.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/feat.c 2006-02-27 22:37:47.000000000 -0800 @@ -0,0 +1,595 @@ +/* + * net/dccp/feat.c + * + * An implementation of the DCCP protocol + * Andrea Bittau + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include + +#include "dccp.h" +#include "ccid.h" +#include "feat.h" + +#define DCCP_FEAT_SP_NOAGREE (-123) + +int dccp_feat_change(struct sock *sk, u8 type, u8 feature, u8 *val, u8 len, + gfp_t gfp) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_opt_pend *opt; + + dccp_pr_debug("feat change type=%d feat=%d\n", type, feature); + + /* XXX sanity check feat change request */ + + /* check if that feature is already being negotiated */ + list_for_each_entry(opt, &dp->dccps_options.dccpo_pending, + dccpop_node) { + /* ok we found a negotiation for this option already */ + if (opt->dccpop_feat == feature && opt->dccpop_type == type) { + dccp_pr_debug("Replacing old\n"); + /* replace */ + BUG_ON(opt->dccpop_val == NULL); + kfree(opt->dccpop_val); + opt->dccpop_val = val; + opt->dccpop_len = len; + opt->dccpop_conf = 0; + return 0; + } + } + + /* negotiation for a new feature */ + opt = kmalloc(sizeof(*opt), gfp); + if (opt == NULL) + return -ENOMEM; + + opt->dccpop_type = type; + opt->dccpop_feat = feature; + opt->dccpop_len = len; + opt->dccpop_val = val; + opt->dccpop_conf = 0; + opt->dccpop_sc = NULL; + + BUG_ON(opt->dccpop_val == NULL); + + list_add_tail(&opt->dccpop_node, &dp->dccps_options.dccpo_pending); + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_feat_change); + +static int dccp_feat_update_ccid(struct sock *sk, u8 type, u8 new_ccid_nr) +{ + struct dccp_sock *dp = dccp_sk(sk); + /* figure out if we are changing our CCID or the peer's */ + const int rx = type == DCCPO_CHANGE_R; + const u8 ccid_nr = rx ? dp->dccps_options.dccpo_rx_ccid : + dp->dccps_options.dccpo_tx_ccid; + struct ccid *new_ccid; + + /* Check if nothing is being changed. */ + if (ccid_nr == new_ccid_nr) + return 0; + + new_ccid = ccid_new(new_ccid_nr, sk, rx, GFP_ATOMIC); + if (new_ccid == NULL) + return -ENOMEM; + + if (rx) { + ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); + dp->dccps_hc_rx_ccid = new_ccid; + dp->dccps_options.dccpo_rx_ccid = new_ccid_nr; + } else { + ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); + dp->dccps_hc_tx_ccid = new_ccid; + dp->dccps_options.dccpo_tx_ccid = new_ccid_nr; + } + + return 0; +} + +/* XXX taking only u8 vals */ +static int dccp_feat_update(struct sock *sk, u8 type, u8 feat, u8 val) +{ + dccp_pr_debug("changing [%d] feat %d to %d\n", type, feat, val); + + switch (feat) { + case DCCPF_CCID: + return dccp_feat_update_ccid(sk, type, val); + default: + dccp_pr_debug("IMPLEMENT changing [%d] feat %d to %d\n", + type, feat, val); + break; + } + return 0; +} + +static int dccp_feat_reconcile(struct sock *sk, struct dccp_opt_pend *opt, + u8 *rpref, u8 rlen) +{ + struct dccp_sock *dp = dccp_sk(sk); + u8 *spref, slen, *res = NULL; + int i, j, rc, agree = 1; + + BUG_ON(rpref == NULL); + + /* check if we are the black sheep */ + if (dp->dccps_role == DCCP_ROLE_CLIENT) { + spref = rpref; + slen = rlen; + rpref = opt->dccpop_val; + rlen = opt->dccpop_len; + } else { + spref = opt->dccpop_val; + slen = opt->dccpop_len; + } + /* + * Now we have server preference list in spref and client preference in + * rpref + */ + BUG_ON(spref == NULL); + BUG_ON(rpref == NULL); + + /* FIXME sanity check vals */ + + /* Are values in any order? XXX Lame "algorithm" here */ + /* XXX assume values are 1 byte */ + for (i = 0; i < slen; i++) { + for (j = 0; j < rlen; j++) { + if (spref[i] == rpref[j]) { + res = &spref[i]; + break; + } + } + if (res) + break; + } + + /* we didn't agree on anything */ + if (res == NULL) { + /* confirm previous value */ + switch (opt->dccpop_feat) { + case DCCPF_CCID: + /* XXX did i get this right? =P */ + if (opt->dccpop_type == DCCPO_CHANGE_L) + res = &dp->dccps_options.dccpo_tx_ccid; + else + res = &dp->dccps_options.dccpo_rx_ccid; + break; + + default: + WARN_ON(1); /* XXX implement res */ + return -EFAULT; + } + + dccp_pr_debug("Don't agree... reconfirming %d\n", *res); + agree = 0; /* this is used for mandatory options... */ + } + + /* need to put result and our preference list */ + /* XXX assume 1 byte vals */ + rlen = 1 + opt->dccpop_len; + rpref = kmalloc(rlen, GFP_ATOMIC); + if (rpref == NULL) + return -ENOMEM; + + *rpref = *res; + memcpy(&rpref[1], opt->dccpop_val, opt->dccpop_len); + + /* put it in the "confirm queue" */ + if (opt->dccpop_sc == NULL) { + opt->dccpop_sc = kmalloc(sizeof(*opt->dccpop_sc), GFP_ATOMIC); + if (opt->dccpop_sc == NULL) { + kfree(rpref); + return -ENOMEM; + } + } else { + /* recycle the confirm slot */ + BUG_ON(opt->dccpop_sc->dccpoc_val == NULL); + kfree(opt->dccpop_sc->dccpoc_val); + dccp_pr_debug("recycling confirm slot\n"); + } + memset(opt->dccpop_sc, 0, sizeof(*opt->dccpop_sc)); + + opt->dccpop_sc->dccpoc_val = rpref; + opt->dccpop_sc->dccpoc_len = rlen; + + /* update the option on our side [we are about to send the confirm] */ + rc = dccp_feat_update(sk, opt->dccpop_type, opt->dccpop_feat, *res); + if (rc) { + kfree(opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc); + opt->dccpop_sc = 0; + return rc; + } + + dccp_pr_debug("Will confirm %d\n", *rpref); + + /* say we want to change to X but we just got a confirm X, suppress our + * change + */ + if (!opt->dccpop_conf) { + if (*opt->dccpop_val == *res) + opt->dccpop_conf = 1; + dccp_pr_debug("won't ask for change of same feature\n"); + } + + return agree ? 0 : DCCP_FEAT_SP_NOAGREE; /* used for mandatory opts */ +} + +static int dccp_feat_sp(struct sock *sk, u8 type, u8 feature, u8 *val, u8 len) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_opt_pend *opt; + int rc = 1; + u8 t; + + /* + * We received a CHANGE. We gotta match it against our own preference + * list. If we got a CHANGE_R it means it's a change for us, so we need + * to compare our CHANGE_L list. + */ + if (type == DCCPO_CHANGE_L) + t = DCCPO_CHANGE_R; + else + t = DCCPO_CHANGE_L; + + /* find our preference list for this feature */ + list_for_each_entry(opt, &dp->dccps_options.dccpo_pending, + dccpop_node) { + if (opt->dccpop_type != t || opt->dccpop_feat != feature) + continue; + + /* find the winner from the two preference lists */ + rc = dccp_feat_reconcile(sk, opt, val, len); + break; + } + + /* We didn't deal with the change. This can happen if we have no + * preference list for the feature. In fact, it just shouldn't + * happen---if we understand a feature, we should have a preference list + * with at least the default value. + */ + BUG_ON(rc == 1); + + return rc; +} + +static int dccp_feat_nn(struct sock *sk, u8 type, u8 feature, u8 *val, u8 len) +{ + struct dccp_opt_pend *opt; + struct dccp_sock *dp = dccp_sk(sk); + u8 *copy; + int rc; + + /* NN features must be change L */ + if (type == DCCPO_CHANGE_R) { + dccp_pr_debug("received CHANGE_R %d for NN feat %d\n", + type, feature); + return -EFAULT; + } + + /* XXX sanity check opt val */ + + /* copy option so we can confirm it */ + opt = kzalloc(sizeof(*opt), GFP_ATOMIC); + if (opt == NULL) + return -ENOMEM; + + copy = kmalloc(len, GFP_ATOMIC); + if (copy == NULL) { + kfree(opt); + return -ENOMEM; + } + memcpy(copy, val, len); + + opt->dccpop_type = DCCPO_CONFIRM_R; /* NN can only confirm R */ + opt->dccpop_feat = feature; + opt->dccpop_val = copy; + opt->dccpop_len = len; + + /* change feature */ + rc = dccp_feat_update(sk, type, feature, *val); + if (rc) { + kfree(opt->dccpop_val); + kfree(opt); + return rc; + } + + dccp_pr_debug("Confirming NN feature %d (val=%d)\n", feature, *copy); + list_add_tail(&opt->dccpop_node, &dp->dccps_options.dccpo_conf); + + return 0; +} + +static void dccp_feat_empty_confirm(struct sock *sk, u8 type, u8 feature) +{ + struct dccp_sock *dp = dccp_sk(sk); + /* XXX check if other confirms for that are queued and recycle slot */ + struct dccp_opt_pend *opt = kzalloc(sizeof(*opt), GFP_ATOMIC); + + if (opt == NULL) { + /* XXX what do we do? Ignoring should be fine. It's a change + * after all =P + */ + return; + } + + opt->dccpop_type = type == DCCPO_CHANGE_L ? DCCPO_CONFIRM_R : + DCCPO_CONFIRM_L; + opt->dccpop_feat = feature; + opt->dccpop_val = 0; + opt->dccpop_len = 0; + + /* change feature */ + dccp_pr_debug("Empty confirm feature %d type %d\n", feature, type); + list_add_tail(&opt->dccpop_node, &dp->dccps_options.dccpo_conf); +} + +static void dccp_feat_flush_confirm(struct sock *sk) +{ + struct dccp_sock *dp = dccp_sk(sk); + /* Check if there is anything to confirm in the first place */ + int yes = !list_empty(&dp->dccps_options.dccpo_conf); + + if (!yes) { + struct dccp_opt_pend *opt; + + list_for_each_entry(opt, &dp->dccps_options.dccpo_pending, + dccpop_node) { + if (opt->dccpop_conf) { + yes = 1; + break; + } + } + } + + if (!yes) + return; + + /* OK there is something to confirm... */ + /* XXX check if packet is in flight? Send delayed ack?? */ + if (sk->sk_state == DCCP_OPEN) + dccp_send_ack(sk); +} + +int dccp_feat_change_recv(struct sock *sk, u8 type, u8 feature, u8 *val, u8 len) +{ + int rc; + + dccp_pr_debug("got feat change type=%d feat=%d\n", type, feature); + + /* figure out if it's SP or NN feature */ + switch (feature) { + /* deal with SP features */ + case DCCPF_CCID: + rc = dccp_feat_sp(sk, type, feature, val, len); + break; + + /* deal with NN features */ + case DCCPF_ACK_RATIO: + rc = dccp_feat_nn(sk, type, feature, val, len); + break; + + /* XXX implement other features */ + default: + rc = -EFAULT; + break; + } + + /* check if there were problems changing features */ + if (rc) { + /* If we don't agree on SP, we sent a confirm for old value. + * However we propagate rc to caller in case option was + * mandatory + */ + if (rc != DCCP_FEAT_SP_NOAGREE) + dccp_feat_empty_confirm(sk, type, feature); + } + + /* generate the confirm [if required] */ + dccp_feat_flush_confirm(sk); + + return rc; +} + +EXPORT_SYMBOL_GPL(dccp_feat_change_recv); + +int dccp_feat_confirm_recv(struct sock *sk, u8 type, u8 feature, + u8 *val, u8 len) +{ + u8 t; + struct dccp_opt_pend *opt; + struct dccp_sock *dp = dccp_sk(sk); + int rc = 1; + int all_confirmed = 1; + + dccp_pr_debug("got feat confirm type=%d feat=%d\n", type, feature); + + /* XXX sanity check type & feat */ + + /* locate our change request */ + t = type == DCCPO_CONFIRM_L ? DCCPO_CHANGE_R : DCCPO_CHANGE_L; + + list_for_each_entry(opt, &dp->dccps_options.dccpo_pending, + dccpop_node) { + if (!opt->dccpop_conf && opt->dccpop_type == t && + opt->dccpop_feat == feature) { + /* we found it */ + /* XXX do sanity check */ + + opt->dccpop_conf = 1; + + /* We got a confirmation---change the option */ + dccp_feat_update(sk, opt->dccpop_type, + opt->dccpop_feat, *val); + + dccp_pr_debug("feat %d type %d confirmed %d\n", + feature, type, *val); + rc = 0; + break; + } + + if (!opt->dccpop_conf) + all_confirmed = 0; + } + + /* fix re-transmit timer */ + /* XXX gotta make sure that no option negotiation occurs during + * connection shutdown. Consider that the CLOSEREQ is sent and timer is + * on. if all options are confirmed it might kill timer which should + * remain alive until close is received. + */ + if (all_confirmed) { + dccp_pr_debug("clear feat negotiation timer %p\n", sk); + inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS); + } + + if (rc) + dccp_pr_debug("feat %d type %d never requested\n", + feature, type); + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_feat_confirm_recv); + +void dccp_feat_clean(struct sock *sk) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_opt_pend *opt, *next; + + list_for_each_entry_safe(opt, next, &dp->dccps_options.dccpo_pending, + dccpop_node) { + BUG_ON(opt->dccpop_val == NULL); + kfree(opt->dccpop_val); + + if (opt->dccpop_sc != NULL) { + BUG_ON(opt->dccpop_sc->dccpoc_val == NULL); + kfree(opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc); + } + + kfree(opt); + } + INIT_LIST_HEAD(&dp->dccps_options.dccpo_pending); + + list_for_each_entry_safe(opt, next, &dp->dccps_options.dccpo_conf, + dccpop_node) { + BUG_ON(opt == NULL); + if (opt->dccpop_val != NULL) + kfree(opt->dccpop_val); + kfree(opt); + } + INIT_LIST_HEAD(&dp->dccps_options.dccpo_conf); +} + +EXPORT_SYMBOL_GPL(dccp_feat_clean); + +/* this is to be called only when a listening sock creates its child. It is + * assumed by the function---the confirm is not duplicated, but rather it is + * "passed on". + */ +int dccp_feat_clone(struct sock *oldsk, struct sock *newsk) +{ + struct dccp_sock *olddp = dccp_sk(oldsk); + struct dccp_sock *newdp = dccp_sk(newsk); + struct dccp_opt_pend *opt; + int rc = 0; + + INIT_LIST_HEAD(&newdp->dccps_options.dccpo_pending); + INIT_LIST_HEAD(&newdp->dccps_options.dccpo_conf); + + list_for_each_entry(opt, &olddp->dccps_options.dccpo_pending, + dccpop_node) { + struct dccp_opt_pend *newopt; + /* copy the value of the option */ + u8 *val = kmalloc(opt->dccpop_len, GFP_ATOMIC); + + if (val == NULL) + goto out_clean; + memcpy(val, opt->dccpop_val, opt->dccpop_len); + + newopt = kmalloc(sizeof(*newopt), GFP_ATOMIC); + if (newopt == NULL) { + kfree(val); + goto out_clean; + } + + /* insert the option */ + memcpy(newopt, opt, sizeof(*newopt)); + newopt->dccpop_val = val; + list_add_tail(&newopt->dccpop_node, + &newdp->dccps_options.dccpo_pending); + + /* XXX what happens with backlogs and multiple connections at + * once... + */ + /* the master socket no longer needs to worry about confirms */ + opt->dccpop_sc = 0; /* it's not a memleak---new socket has it */ + + /* reset state for a new socket */ + opt->dccpop_conf = 0; + } + + /* XXX not doing anything about the conf queue */ + +out: + return rc; + +out_clean: + dccp_feat_clean(newsk); + rc = -ENOMEM; + goto out; +} + +EXPORT_SYMBOL_GPL(dccp_feat_clone); + +static int __dccp_feat_init(struct sock *sk, u8 type, u8 feat, u8 *val, u8 len) +{ + int rc = -ENOMEM; + u8 *copy = kmalloc(len, GFP_KERNEL); + + if (copy != NULL) { + memcpy(copy, val, len); + rc = dccp_feat_change(sk, type, feat, copy, len, GFP_KERNEL); + if (rc) + kfree(copy); + } + return rc; +} + +int dccp_feat_init(struct sock *sk) +{ + struct dccp_sock *dp = dccp_sk(sk); + int rc; + + INIT_LIST_HEAD(&dp->dccps_options.dccpo_pending); + INIT_LIST_HEAD(&dp->dccps_options.dccpo_conf); + + /* CCID L */ + rc = __dccp_feat_init(sk, DCCPO_CHANGE_L, DCCPF_CCID, + &dp->dccps_options.dccpo_tx_ccid, 1); + if (rc) + goto out; + + /* CCID R */ + rc = __dccp_feat_init(sk, DCCPO_CHANGE_R, DCCPF_CCID, + &dp->dccps_options.dccpo_rx_ccid, 1); + if (rc) + goto out; + + /* Ack ratio */ + rc = __dccp_feat_init(sk, DCCPO_CHANGE_L, DCCPF_ACK_RATIO, + &dp->dccps_options.dccpo_ack_ratio, 1); +out: + return rc; +} + +EXPORT_SYMBOL_GPL(dccp_feat_init); diff -puN /dev/null net/dccp/feat.h --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/feat.h 2006-02-27 22:37:47.000000000 -0800 @@ -0,0 +1,28 @@ +#ifndef _DCCP_FEAT_H +#define _DCCP_FEAT_H +/* + * net/dccp/feat.h + * + * An implementation of the DCCP protocol + * Copyright (c) 2005 Andrea Bittau + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include + +struct sock; + +extern int dccp_feat_change(struct sock *sk, u8 type, u8 feature, + u8 *val, u8 len, gfp_t gfp); +extern int dccp_feat_change_recv(struct sock *sk, u8 type, u8 feature, + u8 *val, u8 len); +extern int dccp_feat_confirm_recv(struct sock *sk, u8 type, u8 feature, + u8 *val, u8 len); +extern void dccp_feat_clean(struct sock *sk); +extern int dccp_feat_clone(struct sock *oldsk, struct sock *newsk); +extern int dccp_feat_init(struct sock *sk); + +#endif /* _DCCP_FEAT_H */ diff -puN net/dccp/input.c~git-net net/dccp/input.c --- devel/net/dccp/input.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/input.c 2006-02-27 22:37:47.000000000 -0800 @@ -32,7 +32,7 @@ static void dccp_fin(struct sock *sk, st static void dccp_rcv_close(struct sock *sk, struct sk_buff *skb) { - dccp_v4_send_reset(sk, DCCP_RESET_CODE_CLOSED); + dccp_send_reset(sk, DCCP_RESET_CODE_CLOSED); dccp_fin(sk, skb); dccp_set_state(sk, DCCP_CLOSED); sk_wake_async(sk, 1, POLL_HUP); @@ -56,7 +56,7 @@ static void dccp_rcv_closereq(struct soc dccp_send_close(sk, 0); } -static inline void dccp_event_ack_recv(struct sock *sk, struct sk_buff *skb) +static void dccp_event_ack_recv(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); @@ -151,9 +151,8 @@ static int dccp_check_seqno(struct sock return 0; } -static inline int __dccp_rcv_established(struct sock *sk, struct sk_buff *skb, - const struct dccp_hdr *dh, - const unsigned len) +static int __dccp_rcv_established(struct sock *sk, struct sk_buff *skb, + const struct dccp_hdr *dh, const unsigned len) { struct dccp_sock *dp = dccp_sk(sk); @@ -300,6 +299,9 @@ static int dccp_rcv_request_sent_state_p goto out_invalid_packet; } + if (dccp_parse_options(sk, skb)) + goto out_invalid_packet; + if (dp->dccps_options.dccpo_send_ack_vector && dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk, DCCP_SKB_CB(skb)->dccpd_seq, @@ -321,14 +323,6 @@ static int dccp_rcv_request_sent_state_p dccp_set_seqno(&dp->dccps_swl, max48(dp->dccps_swl, dp->dccps_isr)); - if (ccid_hc_rx_init(dp->dccps_hc_rx_ccid, sk) != 0 || - ccid_hc_tx_init(dp->dccps_hc_tx_ccid, sk) != 0) { - ccid_hc_rx_exit(dp->dccps_hc_rx_ccid, sk); - ccid_hc_tx_exit(dp->dccps_hc_tx_ccid, sk); - /* FIXME: send appropriate RESET code */ - goto out_invalid_packet; - } - dccp_sync_mss(sk, icsk->icsk_pmtu_cookie); /* diff -puN net/dccp/ipv4.c~git-net net/dccp/ipv4.c --- devel/net/dccp/ipv4.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ipv4.c 2006-02-27 22:37:47.000000000 -0800 @@ -18,8 +18,10 @@ #include #include +#include #include #include +#include #include #include #include @@ -28,14 +30,14 @@ #include "ackvec.h" #include "ccid.h" #include "dccp.h" +#include "feat.h" -struct inet_hashinfo __cacheline_aligned dccp_hashinfo = { - .lhash_lock = RW_LOCK_UNLOCKED, - .lhash_users = ATOMIC_INIT(0), - .lhash_wait = __WAIT_QUEUE_HEAD_INITIALIZER(dccp_hashinfo.lhash_wait), -}; - -EXPORT_SYMBOL_GPL(dccp_hashinfo); +/* + * This is the global socket data structure used for responding to + * the Out-of-the-blue (OOTB) packets. A control sock will be created + * for this socket at the initialization time. + */ +static struct socket *dccp_v4_ctl_socket; static int dccp_v4_get_port(struct sock *sk, const unsigned short snum) { @@ -43,18 +45,6 @@ static int dccp_v4_get_port(struct sock inet_csk_bind_conflict); } -static void dccp_v4_hash(struct sock *sk) -{ - inet_hash(&dccp_hashinfo, sk); -} - -void dccp_unhash(struct sock *sk) -{ - inet_unhash(&dccp_hashinfo, sk); -} - -EXPORT_SYMBOL_GPL(dccp_unhash); - int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) { struct inet_sock *inet = inet_sk(sk); @@ -243,11 +233,11 @@ static void dccp_v4_ctl_send_ack(struct dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), DCCP_SKB_CB(rxskb)->dccpd_seq); - bh_lock_sock(dccp_ctl_socket->sk); - err = ip_build_and_send_pkt(skb, dccp_ctl_socket->sk, + bh_lock_sock(dccp_v4_ctl_socket->sk); + err = ip_build_and_send_pkt(skb, dccp_v4_ctl_socket->sk, rxskb->nh.iph->daddr, rxskb->nh.iph->saddr, NULL); - bh_unlock_sock(dccp_ctl_socket->sk); + bh_unlock_sock(dccp_v4_ctl_socket->sk); if (err == NET_XMIT_CN || err == 0) { DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); @@ -275,7 +265,10 @@ static int dccp_v4_send_response(struct skb = dccp_make_response(sk, dst, req); if (skb != NULL) { const struct inet_request_sock *ireq = inet_rsk(req); + struct dccp_hdr *dh = dccp_hdr(skb); + dh->dccph_checksum = dccp_v4_checksum(skb, ireq->loc_addr, + ireq->rmt_addr); memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); err = ip_build_and_send_pkt(skb, sk, ireq->loc_addr, ireq->rmt_addr, @@ -301,7 +294,7 @@ out: * check at all. A more general error queue to queue errors for later handling * is probably better. */ -void dccp_v4_err(struct sk_buff *skb, u32 info) +static void dccp_v4_err(struct sk_buff *skb, u32 info) { const struct iphdr *iph = (struct iphdr *)skb->data; const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + @@ -456,32 +449,6 @@ void dccp_v4_send_check(struct sock *sk, EXPORT_SYMBOL_GPL(dccp_v4_send_check); -int dccp_v4_send_reset(struct sock *sk, enum dccp_reset_codes code) -{ - struct sk_buff *skb; - /* - * FIXME: what if rebuild_header fails? - * Should we be doing a rebuild_header here? - */ - int err = inet_sk_rebuild_header(sk); - - if (err != 0) - return err; - - skb = dccp_make_reset(sk, sk->sk_dst_cache, code); - if (skb != NULL) { - const struct inet_sock *inet = inet_sk(sk); - - memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); - err = ip_build_and_send_pkt(skb, sk, - inet->saddr, inet->daddr, NULL); - if (err == NET_XMIT_CN) - err = 0; - } - - return err; -} - static inline u64 dccp_v4_init_sequence(const struct sock *sk, const struct sk_buff *skb) { @@ -497,9 +464,9 @@ int dccp_v4_conn_request(struct sock *sk struct dccp_sock dp; struct request_sock *req; struct dccp_request_sock *dreq; - const __u32 saddr = skb->nh.iph->saddr; - const __u32 daddr = skb->nh.iph->daddr; - const __u32 service = dccp_hdr_request(skb)->dccph_req_service; + const __be32 saddr = skb->nh.iph->saddr; + const __be32 daddr = skb->nh.iph->daddr; + const __be32 service = dccp_hdr_request(skb)->dccph_req_service; struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); __u8 reset_code = DCCP_RESET_CODE_TOO_BUSY; @@ -535,7 +502,8 @@ int dccp_v4_conn_request(struct sock *sk if (req == NULL) goto drop; - /* FIXME: process options */ + if (dccp_parse_options(sk, skb)) + goto drop; dccp_openreq_init(req, &dp, skb); @@ -660,8 +628,8 @@ static struct sock *dccp_v4_hnd_req(stru return sk; } -int dccp_v4_checksum(const struct sk_buff *skb, const u32 saddr, - const u32 daddr) +int dccp_v4_checksum(const struct sk_buff *skb, const __be32 saddr, + const __be32 daddr) { const struct dccp_hdr* dh = dccp_hdr(skb); int checksum_len; @@ -680,8 +648,10 @@ int dccp_v4_checksum(const struct sk_buf IPPROTO_DCCP, tmp); } +EXPORT_SYMBOL_GPL(dccp_v4_checksum); + static int dccp_v4_verify_checksum(struct sk_buff *skb, - const u32 saddr, const u32 daddr) + const __be32 saddr, const __be32 daddr) { struct dccp_hdr *dh = dccp_hdr(skb); int checksum_len; @@ -741,7 +711,7 @@ static void dccp_v4_ctl_send_reset(struc if (((struct rtable *)rxskb->dst)->rt_type != RTN_LOCAL) return; - dst = dccp_v4_route_skb(dccp_ctl_socket->sk, rxskb); + dst = dccp_v4_route_skb(dccp_v4_ctl_socket->sk, rxskb); if (dst == NULL) return; @@ -778,11 +748,11 @@ static void dccp_v4_ctl_send_reset(struc dh->dccph_checksum = dccp_v4_checksum(skb, rxskb->nh.iph->saddr, rxskb->nh.iph->daddr); - bh_lock_sock(dccp_ctl_socket->sk); - err = ip_build_and_send_pkt(skb, dccp_ctl_socket->sk, + bh_lock_sock(dccp_v4_ctl_socket->sk); + err = ip_build_and_send_pkt(skb, dccp_v4_ctl_socket->sk, rxskb->nh.iph->daddr, rxskb->nh.iph->saddr, NULL); - bh_unlock_sock(dccp_ctl_socket->sk); + bh_unlock_sock(dccp_v4_ctl_socket->sk); if (err == NET_XMIT_CN || err == 0) { DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); @@ -912,7 +882,7 @@ int dccp_invalid_packet(struct sk_buff * EXPORT_SYMBOL_GPL(dccp_invalid_packet); /* this is called when real data arrives */ -int dccp_v4_rcv(struct sk_buff *skb) +static int dccp_v4_rcv(struct sk_buff *skb) { const struct dccp_hdr *dh; struct sock *sk; @@ -1019,7 +989,7 @@ do_time_wait: goto no_dccp_socket; } -struct inet_connection_sock_af_ops dccp_ipv4_af_ops = { +static struct inet_connection_sock_af_ops dccp_ipv4_af_ops = { .queue_xmit = ip_queue_xmit, .send_check = dccp_v4_send_check, .rebuild_header = inet_sk_rebuild_header, @@ -1032,98 +1002,20 @@ struct inet_connection_sock_af_ops dccp_ .sockaddr_len = sizeof(struct sockaddr_in), }; -int dccp_v4_init_sock(struct sock *sk) +static int dccp_v4_init_sock(struct sock *sk) { - struct dccp_sock *dp = dccp_sk(sk); - struct inet_connection_sock *icsk = inet_csk(sk); - static int dccp_ctl_socket_init = 1; - - dccp_options_init(&dp->dccps_options); - do_gettimeofday(&dp->dccps_epoch); + static __u8 dccp_v4_ctl_sock_initialized; + int err = dccp_init_sock(sk, dccp_v4_ctl_sock_initialized); - if (dp->dccps_options.dccpo_send_ack_vector) { - dp->dccps_hc_rx_ackvec = dccp_ackvec_alloc(DCCP_MAX_ACKVEC_LEN, - GFP_KERNEL); - if (dp->dccps_hc_rx_ackvec == NULL) - return -ENOMEM; + if (err == 0) { + if (unlikely(!dccp_v4_ctl_sock_initialized)) + dccp_v4_ctl_sock_initialized = 1; + inet_csk(sk)->icsk_af_ops = &dccp_ipv4_af_ops; } - /* - * FIXME: We're hardcoding the CCID, and doing this at this point makes - * the listening (master) sock get CCID control blocks, which is not - * necessary, but for now, to not mess with the test userspace apps, - * lets leave it here, later the real solution is to do this in a - * setsockopt(CCIDs-I-want/accept). -acme - */ - if (likely(!dccp_ctl_socket_init)) { - dp->dccps_hc_rx_ccid = ccid_init(dp->dccps_options.dccpo_rx_ccid, - sk); - dp->dccps_hc_tx_ccid = ccid_init(dp->dccps_options.dccpo_tx_ccid, - sk); - if (dp->dccps_hc_rx_ccid == NULL || - dp->dccps_hc_tx_ccid == NULL) { - ccid_exit(dp->dccps_hc_rx_ccid, sk); - ccid_exit(dp->dccps_hc_tx_ccid, sk); - if (dp->dccps_options.dccpo_send_ack_vector) { - dccp_ackvec_free(dp->dccps_hc_rx_ackvec); - dp->dccps_hc_rx_ackvec = NULL; - } - dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; - return -ENOMEM; - } - } else - dccp_ctl_socket_init = 0; - - dccp_init_xmit_timers(sk); - icsk->icsk_rto = DCCP_TIMEOUT_INIT; - sk->sk_state = DCCP_CLOSED; - sk->sk_write_space = dccp_write_space; - icsk->icsk_af_ops = &dccp_ipv4_af_ops; - icsk->icsk_sync_mss = dccp_sync_mss; - dp->dccps_mss_cache = 536; - dp->dccps_role = DCCP_ROLE_UNDEFINED; - dp->dccps_service = DCCP_SERVICE_INVALID_VALUE; - - return 0; -} - -EXPORT_SYMBOL_GPL(dccp_v4_init_sock); - -int dccp_v4_destroy_sock(struct sock *sk) -{ - struct dccp_sock *dp = dccp_sk(sk); - - /* - * DCCP doesn't use sk_write_queue, just sk_send_head - * for retransmissions - */ - if (sk->sk_send_head != NULL) { - kfree_skb(sk->sk_send_head); - sk->sk_send_head = NULL; - } - - /* Clean up a referenced DCCP bind bucket. */ - if (inet_csk(sk)->icsk_bind_hash != NULL) - inet_put_port(&dccp_hashinfo, sk); - - kfree(dp->dccps_service_list); - dp->dccps_service_list = NULL; - - ccid_hc_rx_exit(dp->dccps_hc_rx_ccid, sk); - ccid_hc_tx_exit(dp->dccps_hc_tx_ccid, sk); - if (dp->dccps_options.dccpo_send_ack_vector) { - dccp_ackvec_free(dp->dccps_hc_rx_ackvec); - dp->dccps_hc_rx_ackvec = NULL; - } - ccid_exit(dp->dccps_hc_rx_ccid, sk); - ccid_exit(dp->dccps_hc_tx_ccid, sk); - dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; - - return 0; + return err; } -EXPORT_SYMBOL_GPL(dccp_v4_destroy_sock); - static void dccp_v4_reqsk_destructor(struct request_sock *req) { kfree(inet_rsk(req)->opt); @@ -1142,7 +1034,7 @@ static struct timewait_sock_ops dccp_tim .twsk_obj_size = sizeof(struct inet_timewait_sock), }; -struct proto dccp_prot = { +static struct proto dccp_v4_prot = { .name = "DCCP", .owner = THIS_MODULE, .close = dccp_close, @@ -1155,12 +1047,12 @@ struct proto dccp_prot = { .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v4_do_rcv, - .hash = dccp_v4_hash, + .hash = dccp_hash, .unhash = dccp_unhash, .accept = inet_csk_accept, .get_port = dccp_v4_get_port, .shutdown = dccp_shutdown, - .destroy = dccp_v4_destroy_sock, + .destroy = dccp_destroy_sock, .orphan_count = &dccp_orphan_count, .max_header = MAX_DCCP_HEADER, .obj_size = sizeof(struct dccp_sock), @@ -1168,4 +1060,89 @@ struct proto dccp_prot = { .twsk_prot = &dccp_timewait_sock_ops, }; -EXPORT_SYMBOL_GPL(dccp_prot); +static struct net_protocol dccp_v4_protocol = { + .handler = dccp_v4_rcv, + .err_handler = dccp_v4_err, + .no_policy = 1, +}; + +static const struct proto_ops inet_dccp_ops = { + .family = PF_INET, + .owner = THIS_MODULE, + .release = inet_release, + .bind = inet_bind, + .connect = inet_stream_connect, + .socketpair = sock_no_socketpair, + .accept = inet_accept, + .getname = inet_getname, + /* FIXME: work on tcp_poll to rename it to inet_csk_poll */ + .poll = dccp_poll, + .ioctl = inet_ioctl, + /* FIXME: work on inet_listen to rename it to sock_common_listen */ + .listen = inet_dccp_listen, + .shutdown = inet_shutdown, + .setsockopt = sock_common_setsockopt, + .getsockopt = sock_common_getsockopt, + .sendmsg = inet_sendmsg, + .recvmsg = sock_common_recvmsg, + .mmap = sock_no_mmap, + .sendpage = sock_no_sendpage, +}; + +static struct inet_protosw dccp_v4_protosw = { + .type = SOCK_DCCP, + .protocol = IPPROTO_DCCP, + .prot = &dccp_v4_prot, + .ops = &inet_dccp_ops, + .capability = -1, + .no_check = 0, + .flags = INET_PROTOSW_ICSK, +}; + +static int __init dccp_v4_init(void) +{ + int err = proto_register(&dccp_v4_prot, 1); + + if (err != 0) + goto out; + + err = inet_add_protocol(&dccp_v4_protocol, IPPROTO_DCCP); + if (err != 0) + goto out_proto_unregister; + + inet_register_protosw(&dccp_v4_protosw); + + err = inet_csk_ctl_sock_create(&dccp_v4_ctl_socket, PF_INET, + SOCK_DCCP, IPPROTO_DCCP); + if (err) + goto out_unregister_protosw; +out: + return err; +out_unregister_protosw: + inet_unregister_protosw(&dccp_v4_protosw); + inet_del_protocol(&dccp_v4_protocol, IPPROTO_DCCP); +out_proto_unregister: + proto_unregister(&dccp_v4_prot); + goto out; +} + +static void __exit dccp_v4_exit(void) +{ + inet_unregister_protosw(&dccp_v4_protosw); + inet_del_protocol(&dccp_v4_protocol, IPPROTO_DCCP); + proto_unregister(&dccp_v4_prot); +} + +module_init(dccp_v4_init); +module_exit(dccp_v4_exit); + +/* + * __stringify doesn't likes enums, so use SOCK_DCCP (6) and IPPROTO_DCCP (33) + * values directly, Also cover the case where the protocol is not specified, + * i.e. net-pf-PF_INET-proto-0-type-SOCK_DCCP + */ +MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-33-type-6"); +MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-0-type-6"); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Arnaldo Carvalho de Melo "); +MODULE_DESCRIPTION("DCCP - Datagram Congestion Controlled Protocol"); diff -puN net/dccp/ipv6.c~git-net net/dccp/ipv6.c --- devel/net/dccp/ipv6.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/ipv6.c 2006-02-27 22:37:47.000000000 -0800 @@ -33,6 +33,9 @@ #include "dccp.h" #include "ipv6.h" +/* Socket used for sending RSTs and ACKs */ +static struct socket *dccp_v6_ctl_socket; + static void dccp_v6_ctl_send_reset(struct sk_buff *skb); static void dccp_v6_reqsk_send_ack(struct sk_buff *skb, struct request_sock *req); @@ -53,7 +56,7 @@ static void dccp_v6_hash(struct sock *sk { if (sk->sk_state != DCCP_CLOSED) { if (inet_csk(sk)->icsk_af_ops == &dccp_ipv6_mapped) { - dccp_prot.hash(sk); + dccp_hash(sk); return; } local_bh_disable(); @@ -264,7 +267,7 @@ failure: } static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, - int type, int code, int offset, __u32 info) + int type, int code, int offset, __be32 info) { struct ipv6hdr *hdr = (struct ipv6hdr *)skb->data; const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset); @@ -568,7 +571,7 @@ static void dccp_v6_ctl_send_reset(struc /* sk = NULL, but it is safe for now. RST socket required. */ if (!ip6_dst_lookup(NULL, &skb->dst, &fl)) { if (xfrm_lookup(&skb->dst, &fl, NULL, 0) >= 0) { - ip6_xmit(NULL, skb, &fl, NULL, 0); + ip6_xmit(dccp_v6_ctl_socket->sk, skb, &fl, NULL, 0); DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); DCCP_INC_STATS_BH(DCCP_MIB_OUTRSTS); return; @@ -623,7 +626,7 @@ static void dccp_v6_ctl_send_ack(struct if (!ip6_dst_lookup(NULL, &skb->dst, &fl)) { if (xfrm_lookup(&skb->dst, &fl, NULL, 0) >= 0) { - ip6_xmit(NULL, skb, &fl, NULL, 0); + ip6_xmit(dccp_v6_ctl_socket->sk, skb, &fl, NULL, 0); DCCP_INC_STATS_BH(DCCP_MIB_OUTSEGS); return; } @@ -678,7 +681,7 @@ static int dccp_v6_conn_request(struct s struct dccp_request_sock *dreq; struct inet6_request_sock *ireq6; struct ipv6_pinfo *np = inet6_sk(sk); - const __u32 service = dccp_hdr_request(skb)->dccph_req_service; + const __be32 service = dccp_hdr_request(skb)->dccph_req_service; struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); __u8 reset_code = DCCP_RESET_CODE_TOO_BUSY; @@ -1146,17 +1149,21 @@ static struct inet_connection_sock_af_op */ static int dccp_v6_init_sock(struct sock *sk) { - int err = dccp_v4_init_sock(sk); + static __u8 dccp_v6_ctl_sock_initialized; + int err = dccp_init_sock(sk, dccp_v6_ctl_sock_initialized); - if (err == 0) + if (err == 0) { + if (unlikely(!dccp_v6_ctl_sock_initialized)) + dccp_v6_ctl_sock_initialized = 1; inet_csk(sk)->icsk_af_ops = &dccp_ipv6_af_ops; + } return err; } static int dccp_v6_destroy_sock(struct sock *sk) { - dccp_v4_destroy_sock(sk); + dccp_destroy_sock(sk); return inet6_destroy_sock(sk); } @@ -1234,8 +1241,16 @@ static int __init dccp_v6_init(void) goto out_unregister_proto; inet6_register_protosw(&dccp_v6_protosw); + + err = inet_csk_ctl_sock_create(&dccp_v6_ctl_socket, PF_INET6, + SOCK_DCCP, IPPROTO_DCCP); + if (err != 0) + goto out_unregister_protosw; out: return err; +out_unregister_protosw: + inet6_del_protocol(&dccp_v6_protocol, IPPROTO_DCCP); + inet6_unregister_protosw(&dccp_v6_protosw); out_unregister_proto: proto_unregister(&dccp_v6_prot); goto out; diff -puN net/dccp/Kconfig~git-net net/dccp/Kconfig --- devel/net/dccp/Kconfig~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/Kconfig 2006-02-27 22:37:47.000000000 -0800 @@ -24,6 +24,10 @@ config INET_DCCP_DIAG def_tristate y if (IP_DCCP = y && INET_DIAG = y) def_tristate m +config IP_DCCP_ACKVEC + depends on IP_DCCP + def_bool N + source "net/dccp/ccids/Kconfig" menu "DCCP Kernel Hacking" @@ -36,15 +40,6 @@ config IP_DCCP_DEBUG Just say N. -config IP_DCCP_UNLOAD_HACK - depends on IP_DCCP=m && IP_DCCP_CCID3=m - bool "DCCP control sock unload hack" - ---help--- - Enable this to be able to unload the dccp module when the it - has only one refcount held, the control sock one. Just execute - "rmmod dccp_ccid3 dccp" - - Just say N. endmenu endmenu diff -puN net/dccp/Makefile~git-net net/dccp/Makefile --- devel/net/dccp/Makefile~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/Makefile 2006-02-27 22:37:47.000000000 -0800 @@ -2,15 +2,18 @@ obj-$(CONFIG_IPV6) += dccp_ipv6.o dccp_ipv6-y := ipv6.o -obj-$(CONFIG_IP_DCCP) += dccp.o +obj-$(CONFIG_IP_DCCP) += dccp.o dccp_ipv4.o -dccp-y := ccid.o input.o ipv4.o minisocks.o options.o output.o proto.o \ - timer.o +dccp-y := ccid.o feat.o input.o minisocks.o options.o output.o proto.o timer.o + +dccp_ipv4-y := ipv4.o dccp-$(CONFIG_IP_DCCP_ACKVEC) += ackvec.o obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o +dccp-$(CONFIG_SYSCTL) += sysctl.o + dccp_diag-y := diag.o obj-y += ccids/ diff -puN net/dccp/minisocks.c~git-net net/dccp/minisocks.c --- devel/net/dccp/minisocks.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/minisocks.c 2006-02-27 22:37:47.000000000 -0800 @@ -22,6 +22,7 @@ #include "ackvec.h" #include "ccid.h" #include "dccp.h" +#include "feat.h" struct inet_timewait_death_row dccp_death_row = { .sysctl_max_tw_buckets = NR_FILE * 2, @@ -114,27 +115,27 @@ struct sock *dccp_create_openreq_child(s newicsk->icsk_rto = DCCP_TIMEOUT_INIT; do_gettimeofday(&newdp->dccps_epoch); + if (dccp_feat_clone(sk, newsk)) + goto out_free; + if (newdp->dccps_options.dccpo_send_ack_vector) { newdp->dccps_hc_rx_ackvec = - dccp_ackvec_alloc(DCCP_MAX_ACKVEC_LEN, - GFP_ATOMIC); - /* - * XXX: We're using the same CCIDs set on the parent, - * i.e. sk_clone copied the master sock and left the - * CCID pointers for this child, that is why we do the - * __ccid_get calls. - */ + dccp_ackvec_alloc(GFP_ATOMIC); if (unlikely(newdp->dccps_hc_rx_ackvec == NULL)) goto out_free; } - if (unlikely(ccid_hc_rx_init(newdp->dccps_hc_rx_ccid, - newsk) != 0 || - ccid_hc_tx_init(newdp->dccps_hc_tx_ccid, - newsk) != 0)) { + newdp->dccps_hc_rx_ccid = + ccid_hc_rx_new(newdp->dccps_options.dccpo_rx_ccid, + newsk, GFP_ATOMIC); + newdp->dccps_hc_tx_ccid = + ccid_hc_tx_new(newdp->dccps_options.dccpo_tx_ccid, + newsk, GFP_ATOMIC); + if (unlikely(newdp->dccps_hc_rx_ccid == NULL || + newdp->dccps_hc_tx_ccid == NULL)) { dccp_ackvec_free(newdp->dccps_hc_rx_ackvec); - ccid_hc_rx_exit(newdp->dccps_hc_rx_ccid, newsk); - ccid_hc_tx_exit(newdp->dccps_hc_tx_ccid, newsk); + ccid_hc_rx_delete(newdp->dccps_hc_rx_ccid, newsk); + ccid_hc_tx_delete(newdp->dccps_hc_tx_ccid, newsk); out_free: /* It is still raw copy of parent, so invalidate * destructor and make plain sk_free() */ @@ -143,9 +144,6 @@ out_free: return NULL; } - __ccid_get(newdp->dccps_hc_rx_ccid); - __ccid_get(newdp->dccps_hc_tx_ccid); - /* * Step 3: Process LISTEN state * diff -puN net/dccp/options.c~git-net net/dccp/options.c --- devel/net/dccp/options.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/options.c 2006-02-27 22:37:47.000000000 -0800 @@ -21,19 +21,23 @@ #include "ackvec.h" #include "ccid.h" #include "dccp.h" +#include "feat.h" -/* stores the default values for new connection. may be changed with sysctl */ -static const struct dccp_options dccpo_default_values = { - .dccpo_sequence_window = DCCPF_INITIAL_SEQUENCE_WINDOW, - .dccpo_rx_ccid = DCCPF_INITIAL_CCID, - .dccpo_tx_ccid = DCCPF_INITIAL_CCID, - .dccpo_send_ack_vector = DCCPF_INITIAL_SEND_ACK_VECTOR, - .dccpo_send_ndp_count = DCCPF_INITIAL_SEND_NDP_COUNT, -}; +int dccp_feat_default_sequence_window = DCCPF_INITIAL_SEQUENCE_WINDOW; +int dccp_feat_default_rx_ccid = DCCPF_INITIAL_CCID; +int dccp_feat_default_tx_ccid = DCCPF_INITIAL_CCID; +int dccp_feat_default_ack_ratio = DCCPF_INITIAL_ACK_RATIO; +int dccp_feat_default_send_ack_vector = DCCPF_INITIAL_SEND_ACK_VECTOR; +int dccp_feat_default_send_ndp_count = DCCPF_INITIAL_SEND_NDP_COUNT; void dccp_options_init(struct dccp_options *dccpo) { - memcpy(dccpo, &dccpo_default_values, sizeof(*dccpo)); + dccpo->dccpo_sequence_window = dccp_feat_default_sequence_window; + dccpo->dccpo_rx_ccid = dccp_feat_default_rx_ccid; + dccpo->dccpo_tx_ccid = dccp_feat_default_tx_ccid; + dccpo->dccpo_ack_ratio = dccp_feat_default_ack_ratio; + dccpo->dccpo_send_ack_vector = dccp_feat_default_send_ack_vector; + dccpo->dccpo_send_ndp_count = dccp_feat_default_send_ndp_count; } static u32 dccp_decode_value_var(const unsigned char *bf, const u8 len) @@ -69,6 +73,8 @@ int dccp_parse_options(struct sock *sk, unsigned char opt, len; unsigned char *value; u32 elapsed_time; + int rc; + int mandatory = 0; memset(opt_recv, 0, sizeof(*opt_recv)); @@ -100,6 +106,11 @@ int dccp_parse_options(struct sock *sk, switch (opt) { case DCCPO_PADDING: break; + case DCCPO_MANDATORY: + if (mandatory) + goto out_invalid_option; + mandatory = 1; + break; case DCCPO_NDP_COUNT: if (len > 3) goto out_invalid_option; @@ -108,6 +119,31 @@ int dccp_parse_options(struct sock *sk, dccp_pr_debug("%sNDP count=%d\n", debug_prefix, opt_recv->dccpor_ndp); break; + case DCCPO_CHANGE_L: + /* fall through */ + case DCCPO_CHANGE_R: + if (len < 2) + goto out_invalid_option; + rc = dccp_feat_change_recv(sk, opt, *value, value + 1, + len - 1); + /* + * When there is a change error, change_recv is + * responsible for dealing with it. i.e. reply with an + * empty confirm. + * If the change was mandatory, then we need to die. + */ + if (rc && mandatory) + goto out_invalid_option; + break; + case DCCPO_CONFIRM_L: + /* fall through */ + case DCCPO_CONFIRM_R: + if (len < 2) + goto out_invalid_option; + if (dccp_feat_confirm_recv(sk, opt, *value, + value + 1, len - 1)) + goto out_invalid_option; + break; case DCCPO_ACK_VECTOR_0: case DCCPO_ACK_VECTOR_1: if (pkt_type == DCCP_PKT_DATA) @@ -121,7 +157,7 @@ int dccp_parse_options(struct sock *sk, if (len != 4) goto out_invalid_option; - opt_recv->dccpor_timestamp = ntohl(*(u32 *)value); + opt_recv->dccpor_timestamp = ntohl(*(__be32 *)value); dp->dccps_timestamp_echo = opt_recv->dccpor_timestamp; dccp_timestamp(sk, &dp->dccps_timestamp_time); @@ -135,7 +171,7 @@ int dccp_parse_options(struct sock *sk, if (len != 4 && len != 6 && len != 8) goto out_invalid_option; - opt_recv->dccpor_timestamp_echo = ntohl(*(u32 *)value); + opt_recv->dccpor_timestamp_echo = ntohl(*(__be32 *)value); dccp_pr_debug("%sTIMESTAMP_ECHO=%u, len=%d, ackno=%llu, ", debug_prefix, @@ -149,9 +185,9 @@ int dccp_parse_options(struct sock *sk, break; if (len == 6) - elapsed_time = ntohs(*(u16 *)(value + 4)); + elapsed_time = ntohs(*(__be16 *)(value + 4)); else - elapsed_time = ntohl(*(u32 *)(value + 4)); + elapsed_time = ntohl(*(__be32 *)(value + 4)); /* Give precedence to the biggest ELAPSED_TIME */ if (elapsed_time > opt_recv->dccpor_elapsed_time) @@ -165,9 +201,9 @@ int dccp_parse_options(struct sock *sk, continue; if (len == 2) - elapsed_time = ntohs(*(u16 *)value); + elapsed_time = ntohs(*(__be16 *)value); else - elapsed_time = ntohl(*(u32 *)value); + elapsed_time = ntohl(*(__be32 *)value); if (elapsed_time > opt_recv->dccpor_elapsed_time) opt_recv->dccpor_elapsed_time = elapsed_time; @@ -208,6 +244,9 @@ int dccp_parse_options(struct sock *sk, sk, opt, len); break; } + + if (opt != DCCPO_MANDATORY) + mandatory = 0; } return 0; @@ -219,6 +258,8 @@ out_invalid_option: return -1; } +EXPORT_SYMBOL_GPL(dccp_parse_options); + static void dccp_encode_value_var(const u32 value, unsigned char *to, const unsigned int len) { @@ -321,10 +362,10 @@ void dccp_insert_option_elapsed_time(str *to++ = len; if (elapsed_time_len == 2) { - const u16 var16 = htons((u16)elapsed_time); + const __be16 var16 = htons((u16)elapsed_time); memcpy(to, &var16, 2); } else { - const u32 var32 = htonl(elapsed_time); + const __be32 var32 = htonl(elapsed_time); memcpy(to, &var32, 4); } @@ -355,14 +396,13 @@ EXPORT_SYMBOL_GPL(dccp_timestamp); void dccp_insert_option_timestamp(struct sock *sk, struct sk_buff *skb) { struct timeval tv; - u32 now; - + __be32 now; + dccp_timestamp(sk, &tv); - now = timeval_usecs(&tv) / 10; + now = htonl(timeval_usecs(&tv) / 10); /* yes this will overflow but that is the point as we want a * 10 usec 32 bit timer which mean it wraps every 11.9 hours */ - now = htonl(now); dccp_insert_option(sk, skb, DCCPO_TIMESTAMP, &now, sizeof(now)); } @@ -377,7 +417,7 @@ static void dccp_insert_option_timestamp "CLIENT TX opt: " : "server TX opt: "; #endif struct timeval now; - u32 tstamp_echo; + __be32 tstamp_echo; u32 elapsed_time; int len, elapsed_time_len; unsigned char *to; @@ -402,12 +442,12 @@ static void dccp_insert_option_timestamp tstamp_echo = htonl(dp->dccps_timestamp_echo); memcpy(to, &tstamp_echo, 4); to += 4; - + if (elapsed_time_len == 2) { - const u16 var16 = htons((u16)elapsed_time); + const __be16 var16 = htons((u16)elapsed_time); memcpy(to, &var16, 2); } else if (elapsed_time_len == 4) { - const u32 var32 = htonl(elapsed_time); + const __be32 var32 = htonl(elapsed_time); memcpy(to, &var32, 4); } @@ -421,6 +461,93 @@ static void dccp_insert_option_timestamp dp->dccps_timestamp_time.tv_usec = 0; } +static int dccp_insert_feat_opt(struct sk_buff *skb, u8 type, u8 feat, + u8 *val, u8 len) +{ + u8 *to; + + if (DCCP_SKB_CB(skb)->dccpd_opt_len + len + 3 > DCCP_MAX_OPT_LEN) { + LIMIT_NETDEBUG(KERN_INFO "DCCP: packet too small" + " to insert feature %d option!\n", feat); + return -1; + } + + DCCP_SKB_CB(skb)->dccpd_opt_len += len + 3; + + to = skb_push(skb, len + 3); + *to++ = type; + *to++ = len + 3; + *to++ = feat; + + if (len) + memcpy(to, val, len); + dccp_pr_debug("option %d feat %d len %d\n", type, feat, len); + + return 0; +} + +static void dccp_insert_feat(struct sock *sk, struct sk_buff *skb) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct dccp_opt_pend *opt, *next; + int change = 0; + + /* confirm any options [NN opts] */ + list_for_each_entry_safe(opt, next, &dp->dccps_options.dccpo_conf, + dccpop_node) { + dccp_insert_feat_opt(skb, opt->dccpop_type, + opt->dccpop_feat, opt->dccpop_val, + opt->dccpop_len); + /* fear empty confirms */ + if (opt->dccpop_val) + kfree(opt->dccpop_val); + kfree(opt); + } + INIT_LIST_HEAD(&dp->dccps_options.dccpo_conf); + + /* see which features we need to send */ + list_for_each_entry(opt, &dp->dccps_options.dccpo_pending, + dccpop_node) { + /* see if we need to send any confirm */ + if (opt->dccpop_sc) { + dccp_insert_feat_opt(skb, opt->dccpop_type + 1, + opt->dccpop_feat, + opt->dccpop_sc->dccpoc_val, + opt->dccpop_sc->dccpoc_len); + + BUG_ON(!opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc->dccpoc_val); + kfree(opt->dccpop_sc); + opt->dccpop_sc = NULL; + } + + /* any option not confirmed, re-send it */ + if (!opt->dccpop_conf) { + dccp_insert_feat_opt(skb, opt->dccpop_type, + opt->dccpop_feat, opt->dccpop_val, + opt->dccpop_len); + change++; + } + } + + /* Retransmit timer. + * If this is the master listening sock, we don't set a timer on it. It + * should be fine because if the dude doesn't receive our RESPONSE + * [which will contain the CHANGE] he will send another REQUEST which + * will "retrnasmit" the change. + */ + if (change && dp->dccps_role != DCCP_ROLE_LISTEN) { + dccp_pr_debug("reset feat negotiation timer %p\n", sk); + + /* XXX don't reset the timer on re-transmissions. I.e. reset it + * only when sending new stuff i guess. Currently the timer + * never backs off because on re-transmission it just resets it! + */ + inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, + inet_csk(sk)->icsk_rto, DCCP_RTO_MAX); + } +} + void dccp_insert_options(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); @@ -447,6 +574,17 @@ void dccp_insert_options(struct sock *sk dp->dccps_hc_tx_insert_options = 0; } + /* Feature negotiation */ + switch(DCCP_SKB_CB(skb)->dccpd_type) { + /* Data packets can't do feat negotiation */ + case DCCP_PKT_DATA: + case DCCP_PKT_DATAACK: + break; + default: + dccp_insert_feat(sk, skb); + break; + } + /* XXX: insert other options when appropriate */ if (DCCP_SKB_CB(skb)->dccpd_opt_len != 0) { diff -puN net/dccp/output.c~git-net net/dccp/output.c --- devel/net/dccp/output.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/output.c 2006-02-27 22:37:47.000000000 -0800 @@ -27,7 +27,7 @@ static inline void dccp_event_ack_sent(s inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK); } -static inline void dccp_skb_entail(struct sock *sk, struct sk_buff *skb) +static void dccp_skb_entail(struct sock *sk, struct sk_buff *skb) { skb_set_owner_w(skb, sk); WARN_ON(sk->sk_send_head); @@ -64,6 +64,10 @@ static int dccp_transmit_skb(struct sock case DCCP_PKT_DATAACK: break; + case DCCP_PKT_REQUEST: + set_ack = 0; + /* fall through */ + case DCCP_PKT_SYNC: case DCCP_PKT_SYNCACK: ackno = dcb->dccpd_seq; @@ -310,17 +314,14 @@ struct sk_buff *dccp_make_response(struc dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), dreq->dreq_isr); dccp_hdr_response(skb)->dccph_resp_service = dreq->dreq_service; - dh->dccph_checksum = dccp_v4_checksum(skb, inet_rsk(req)->loc_addr, - inet_rsk(req)->rmt_addr); - DCCP_INC_STATS(DCCP_MIB_OUTSEGS); return skb; } EXPORT_SYMBOL_GPL(dccp_make_response); -struct sk_buff *dccp_make_reset(struct sock *sk, struct dst_entry *dst, - const enum dccp_reset_codes code) +static struct sk_buff *dccp_make_reset(struct sock *sk, struct dst_entry *dst, + const enum dccp_reset_codes code) { struct dccp_hdr *dh; @@ -362,14 +363,34 @@ struct sk_buff *dccp_make_reset(struct s dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), dp->dccps_gsr); dccp_hdr_reset(skb)->dccph_reset_code = code; - - dh->dccph_checksum = dccp_v4_checksum(skb, inet_sk(sk)->saddr, - inet_sk(sk)->daddr); + inet_csk(sk)->icsk_af_ops->send_check(sk, skb->len, skb); DCCP_INC_STATS(DCCP_MIB_OUTSEGS); return skb; } +int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code) +{ + /* + * FIXME: what if rebuild_header fails? + * Should we be doing a rebuild_header here? + */ + int err = inet_sk_rebuild_header(sk); + + if (err == 0) { + struct sk_buff *skb = dccp_make_reset(sk, sk->sk_dst_cache, + code); + if (skb != NULL) { + memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); + err = inet_csk(sk)->icsk_af_ops->queue_xmit(skb, 0); + if (err == NET_XMIT_CN) + err = 0; + } + } + + return err; +} + /* * Do all connect socket setups that can be done AF independent. */ @@ -505,6 +526,8 @@ void dccp_send_sync(struct sock *sk, con dccp_transmit_skb(sk, skb); } +EXPORT_SYMBOL_GPL(dccp_send_sync); + /* * Send a DCCP_PKT_CLOSE/CLOSEREQ. The caller locks the socket for us. This * cannot be allowed to fail queueing a DCCP_PKT_CLOSE/CLOSEREQ frame under diff -puN net/dccp/proto.c~git-net net/dccp/proto.c --- devel/net/dccp/proto.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/proto.c 2006-02-27 22:37:47.000000000 -0800 @@ -23,9 +23,7 @@ #include #include -#include #include -#include #include #include @@ -37,6 +35,7 @@ #include "ccid.h" #include "dccp.h" +#include "feat.h" DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly; @@ -46,12 +45,66 @@ atomic_t dccp_orphan_count = ATOMIC_INIT EXPORT_SYMBOL_GPL(dccp_orphan_count); -static struct net_protocol dccp_protocol = { - .handler = dccp_v4_rcv, - .err_handler = dccp_v4_err, - .no_policy = 1, +struct inet_hashinfo __cacheline_aligned dccp_hashinfo = { + .lhash_lock = RW_LOCK_UNLOCKED, + .lhash_users = ATOMIC_INIT(0), + .lhash_wait = __WAIT_QUEUE_HEAD_INITIALIZER(dccp_hashinfo.lhash_wait), }; +EXPORT_SYMBOL_GPL(dccp_hashinfo); + +void dccp_set_state(struct sock *sk, const int state) +{ + const int oldstate = sk->sk_state; + + dccp_pr_debug("%s(%p) %-10.10s -> %s\n", + dccp_role(sk), sk, + dccp_state_name(oldstate), dccp_state_name(state)); + WARN_ON(state == oldstate); + + switch (state) { + case DCCP_OPEN: + if (oldstate != DCCP_OPEN) + DCCP_INC_STATS(DCCP_MIB_CURRESTAB); + break; + + case DCCP_CLOSED: + if (oldstate == DCCP_CLOSING || oldstate == DCCP_OPEN) + DCCP_INC_STATS(DCCP_MIB_ESTABRESETS); + + sk->sk_prot->unhash(sk); + if (inet_csk(sk)->icsk_bind_hash != NULL && + !(sk->sk_userlocks & SOCK_BINDPORT_LOCK)) + inet_put_port(&dccp_hashinfo, sk); + /* fall through */ + default: + if (oldstate == DCCP_OPEN) + DCCP_DEC_STATS(DCCP_MIB_CURRESTAB); + } + + /* Change state AFTER socket is unhashed to avoid closed + * socket sitting in hash tables. + */ + sk->sk_state = state; +} + +EXPORT_SYMBOL_GPL(dccp_set_state); + +void dccp_done(struct sock *sk) +{ + dccp_set_state(sk, DCCP_CLOSED); + dccp_clear_xmit_timers(sk); + + sk->sk_shutdown = SHUTDOWN_MASK; + + if (!sock_flag(sk, SOCK_DEAD)) + sk->sk_state_change(sk); + else + inet_csk_destroy_sock(sk); +} + +EXPORT_SYMBOL_GPL(dccp_done); + const char *dccp_packet_name(const int type) { static const char *dccp_packet_names[] = { @@ -96,6 +149,120 @@ const char *dccp_state_name(const int st EXPORT_SYMBOL_GPL(dccp_state_name); +void dccp_hash(struct sock *sk) +{ + inet_hash(&dccp_hashinfo, sk); +} + +EXPORT_SYMBOL_GPL(dccp_hash); + +void dccp_unhash(struct sock *sk) +{ + inet_unhash(&dccp_hashinfo, sk); +} + +EXPORT_SYMBOL_GPL(dccp_unhash); + +int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) +{ + struct dccp_sock *dp = dccp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + + dccp_options_init(&dp->dccps_options); + do_gettimeofday(&dp->dccps_epoch); + + /* + * FIXME: We're hardcoding the CCID, and doing this at this point makes + * the listening (master) sock get CCID control blocks, which is not + * necessary, but for now, to not mess with the test userspace apps, + * lets leave it here, later the real solution is to do this in a + * setsockopt(CCIDs-I-want/accept). -acme + */ + if (likely(ctl_sock_initialized)) { + int rc = dccp_feat_init(sk); + + if (rc) + return rc; + + if (dp->dccps_options.dccpo_send_ack_vector) { + dp->dccps_hc_rx_ackvec = dccp_ackvec_alloc(GFP_KERNEL); + if (dp->dccps_hc_rx_ackvec == NULL) + return -ENOMEM; + } + dp->dccps_hc_rx_ccid = + ccid_hc_rx_new(dp->dccps_options.dccpo_rx_ccid, + sk, GFP_KERNEL); + dp->dccps_hc_tx_ccid = + ccid_hc_tx_new(dp->dccps_options.dccpo_tx_ccid, + sk, GFP_KERNEL); + if (unlikely(dp->dccps_hc_rx_ccid == NULL || + dp->dccps_hc_tx_ccid == NULL)) { + ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); + ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); + if (dp->dccps_options.dccpo_send_ack_vector) { + dccp_ackvec_free(dp->dccps_hc_rx_ackvec); + dp->dccps_hc_rx_ackvec = NULL; + } + dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; + return -ENOMEM; + } + } else { + /* control socket doesn't need feat nego */ + INIT_LIST_HEAD(&dp->dccps_options.dccpo_pending); + INIT_LIST_HEAD(&dp->dccps_options.dccpo_conf); + } + + dccp_init_xmit_timers(sk); + icsk->icsk_rto = DCCP_TIMEOUT_INIT; + sk->sk_state = DCCP_CLOSED; + sk->sk_write_space = dccp_write_space; + icsk->icsk_sync_mss = dccp_sync_mss; + dp->dccps_mss_cache = 536; + dp->dccps_role = DCCP_ROLE_UNDEFINED; + dp->dccps_service = DCCP_SERVICE_INVALID_VALUE; + dp->dccps_l_ack_ratio = dp->dccps_r_ack_ratio = 1; + + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_init_sock); + +int dccp_destroy_sock(struct sock *sk) +{ + struct dccp_sock *dp = dccp_sk(sk); + + /* + * DCCP doesn't use sk_write_queue, just sk_send_head + * for retransmissions + */ + if (sk->sk_send_head != NULL) { + kfree_skb(sk->sk_send_head); + sk->sk_send_head = NULL; + } + + /* Clean up a referenced DCCP bind bucket. */ + if (inet_csk(sk)->icsk_bind_hash != NULL) + inet_put_port(&dccp_hashinfo, sk); + + kfree(dp->dccps_service_list); + dp->dccps_service_list = NULL; + + if (dp->dccps_options.dccpo_send_ack_vector) { + dccp_ackvec_free(dp->dccps_hc_rx_ackvec); + dp->dccps_hc_rx_ackvec = NULL; + } + ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); + ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); + dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; + + /* clean up feature negotiation state */ + dccp_feat_clean(sk); + + return 0; +} + +EXPORT_SYMBOL_GPL(dccp_destroy_sock); + static inline int dccp_listen_start(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); @@ -220,7 +387,7 @@ int dccp_ioctl(struct sock *sk, int cmd, EXPORT_SYMBOL_GPL(dccp_ioctl); -static int dccp_setsockopt_service(struct sock *sk, const u32 service, +static int dccp_setsockopt_service(struct sock *sk, const __be32 service, char __user *optval, int optlen) { struct dccp_sock *dp = dccp_sk(sk); @@ -255,6 +422,39 @@ static int dccp_setsockopt_service(struc return 0; } +/* byte 1 is feature. the rest is the preference list */ +static int dccp_setsockopt_change(struct sock *sk, int type, + struct dccp_so_feat __user *optval) +{ + struct dccp_so_feat opt; + u8 *val; + int rc; + + if (copy_from_user(&opt, optval, sizeof(opt))) + return -EFAULT; + + val = kmalloc(opt.dccpsf_len, GFP_KERNEL); + if (!val) + return -ENOMEM; + + if (copy_from_user(val, opt.dccpsf_val, opt.dccpsf_len)) { + rc = -EFAULT; + goto out_free_val; + } + + rc = dccp_feat_change(sk, type, opt.dccpsf_feat, val, opt.dccpsf_len, + GFP_KERNEL); + if (rc) + goto out_free_val; + +out: + return rc; + +out_free_val: + kfree(val); + goto out; +} + int dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { @@ -284,6 +484,25 @@ int dccp_setsockopt(struct sock *sk, int case DCCP_SOCKOPT_PACKET_SIZE: dp->dccps_packet_size = val; break; + + case DCCP_SOCKOPT_CHANGE_L: + if (optlen != sizeof(struct dccp_so_feat)) + err = -EINVAL; + else + err = dccp_setsockopt_change(sk, DCCPO_CHANGE_L, + (struct dccp_so_feat *) + optval); + break; + + case DCCP_SOCKOPT_CHANGE_R: + if (optlen != sizeof(struct dccp_so_feat)) + err = -EINVAL; + else + err = dccp_setsockopt_change(sk, DCCPO_CHANGE_R, + (struct dccp_so_feat *) + optval); + break; + default: err = -ENOPROTOOPT; break; @@ -296,7 +515,7 @@ int dccp_setsockopt(struct sock *sk, int EXPORT_SYMBOL_GPL(dccp_setsockopt); static int dccp_getsockopt_service(struct sock *sk, int len, - u32 __user *optval, + __be32 __user *optval, int __user *optlen) { const struct dccp_sock *dp = dccp_sk(sk); @@ -351,7 +570,7 @@ int dccp_getsockopt(struct sock *sk, int break; case DCCP_SOCKOPT_SERVICE: return dccp_getsockopt_service(sk, len, - (u32 __user *)optval, optlen); + (__be32 __user *)optval, optlen); case 128 ... 191: return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname, len, (u32 __user *)optval, optlen); @@ -679,84 +898,7 @@ void dccp_shutdown(struct sock *sk, int EXPORT_SYMBOL_GPL(dccp_shutdown); -static const struct proto_ops inet_dccp_ops = { - .family = PF_INET, - .owner = THIS_MODULE, - .release = inet_release, - .bind = inet_bind, - .connect = inet_stream_connect, - .socketpair = sock_no_socketpair, - .accept = inet_accept, - .getname = inet_getname, - /* FIXME: work on tcp_poll to rename it to inet_csk_poll */ - .poll = dccp_poll, - .ioctl = inet_ioctl, - /* FIXME: work on inet_listen to rename it to sock_common_listen */ - .listen = inet_dccp_listen, - .shutdown = inet_shutdown, - .setsockopt = sock_common_setsockopt, - .getsockopt = sock_common_getsockopt, - .sendmsg = inet_sendmsg, - .recvmsg = sock_common_recvmsg, - .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, -}; - -extern struct net_proto_family inet_family_ops; - -static struct inet_protosw dccp_v4_protosw = { - .type = SOCK_DCCP, - .protocol = IPPROTO_DCCP, - .prot = &dccp_prot, - .ops = &inet_dccp_ops, - .capability = -1, - .no_check = 0, - .flags = INET_PROTOSW_ICSK, -}; - -/* - * This is the global socket data structure used for responding to - * the Out-of-the-blue (OOTB) packets. A control sock will be created - * for this socket at the initialization time. - */ -struct socket *dccp_ctl_socket; - -static char dccp_ctl_socket_err_msg[] __initdata = - KERN_ERR "DCCP: Failed to create the control socket.\n"; - -static int __init dccp_ctl_sock_init(void) -{ - int rc = sock_create_kern(PF_INET, SOCK_DCCP, IPPROTO_DCCP, - &dccp_ctl_socket); - if (rc < 0) - printk(dccp_ctl_socket_err_msg); - else { - dccp_ctl_socket->sk->sk_allocation = GFP_ATOMIC; - inet_sk(dccp_ctl_socket->sk)->uc_ttl = -1; - - /* Unhash it so that IP input processing does not even - * see it, we do not wish this socket to see incoming - * packets. - */ - dccp_ctl_socket->sk->sk_prot->unhash(dccp_ctl_socket->sk); - } - - return rc; -} - -#ifdef CONFIG_IP_DCCP_UNLOAD_HACK -void dccp_ctl_sock_exit(void) -{ - if (dccp_ctl_socket != NULL) { - sock_release(dccp_ctl_socket); - dccp_ctl_socket = NULL; - } -} - -EXPORT_SYMBOL_GPL(dccp_ctl_sock_exit); -#endif - -static int __init init_dccp_v4_mibs(void) +static int __init dccp_mib_init(void) { int rc = -ENOMEM; @@ -778,6 +920,13 @@ out_free_one: } +static void dccp_mib_exit(void) +{ + free_percpu(dccp_statistics[0]); + free_percpu(dccp_statistics[1]); + dccp_statistics[0] = dccp_statistics[1] = NULL; +} + static int thash_entries; module_param(thash_entries, int, 0444); MODULE_PARM_DESC(thash_entries, "Number of ehash buckets"); @@ -794,17 +943,14 @@ static int __init dccp_init(void) { unsigned long goal; int ehash_order, bhash_order, i; - int rc = proto_register(&dccp_prot, 1); - - if (rc) - goto out; + int rc = -ENOBUFS; dccp_hashinfo.bind_bucket_cachep = kmem_cache_create("dccp_bind_bucket", sizeof(struct inet_bind_bucket), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (!dccp_hashinfo.bind_bucket_cachep) - goto out_proto_unregister; + goto out; /* * Size and allocate the main established and bind bucket @@ -866,27 +1012,23 @@ static int __init dccp_init(void) INIT_HLIST_HEAD(&dccp_hashinfo.bhash[i].chain); } - if (init_dccp_v4_mibs()) + rc = dccp_mib_init(); + if (rc) goto out_free_dccp_bhash; - rc = -EAGAIN; - if (inet_add_protocol(&dccp_protocol, IPPROTO_DCCP)) - goto out_free_dccp_v4_mibs; - - inet_register_protosw(&dccp_v4_protosw); + rc = dccp_ackvec_init(); + if (rc) + goto out_free_dccp_mib; - rc = dccp_ctl_sock_init(); + rc = dccp_sysctl_init(); if (rc) - goto out_unregister_protosw; + goto out_ackvec_exit; out: return rc; -out_unregister_protosw: - inet_unregister_protosw(&dccp_v4_protosw); - inet_del_protocol(&dccp_protocol, IPPROTO_DCCP); -out_free_dccp_v4_mibs: - free_percpu(dccp_statistics[0]); - free_percpu(dccp_statistics[1]); - dccp_statistics[0] = dccp_statistics[1] = NULL; +out_ackvec_exit: + dccp_ackvec_exit(); +out_free_dccp_mib: + dccp_mib_exit(); out_free_dccp_bhash: free_pages((unsigned long)dccp_hashinfo.bhash, bhash_order); dccp_hashinfo.bhash = NULL; @@ -896,23 +1038,12 @@ out_free_dccp_ehash: out_free_bind_bucket_cachep: kmem_cache_destroy(dccp_hashinfo.bind_bucket_cachep); dccp_hashinfo.bind_bucket_cachep = NULL; -out_proto_unregister: - proto_unregister(&dccp_prot); goto out; } -static const char dccp_del_proto_err_msg[] __exitdata = - KERN_ERR "can't remove dccp net_protocol\n"; - static void __exit dccp_fini(void) { - inet_unregister_protosw(&dccp_v4_protosw); - - if (inet_del_protocol(&dccp_protocol, IPPROTO_DCCP) < 0) - printk(dccp_del_proto_err_msg); - - free_percpu(dccp_statistics[0]); - free_percpu(dccp_statistics[1]); + dccp_mib_exit(); free_pages((unsigned long)dccp_hashinfo.bhash, get_order(dccp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))); @@ -920,19 +1051,13 @@ static void __exit dccp_fini(void) get_order(dccp_hashinfo.ehash_size * sizeof(struct inet_ehash_bucket))); kmem_cache_destroy(dccp_hashinfo.bind_bucket_cachep); - proto_unregister(&dccp_prot); + dccp_ackvec_exit(); + dccp_sysctl_exit(); } module_init(dccp_init); module_exit(dccp_fini); -/* - * __stringify doesn't likes enums, so use SOCK_DCCP (6) and IPPROTO_DCCP (33) - * values directly, Also cover the case where the protocol is not specified, - * i.e. net-pf-PF_INET-proto-0-type-SOCK_DCCP - */ -MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-33-type-6"); -MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-0-type-6"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Arnaldo Carvalho de Melo "); MODULE_DESCRIPTION("DCCP - Datagram Congestion Controlled Protocol"); diff -puN /dev/null net/dccp/sysctl.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/dccp/sysctl.c 2006-02-27 22:37:47.000000000 -0800 @@ -0,0 +1,124 @@ +/* + * net/dccp/sysctl.c + * + * An implementation of the DCCP protocol + * Arnaldo Carvalho de Melo + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License v2 + * as published by the Free Software Foundation. + */ + +#include +#include +#include + +#ifndef CONFIG_SYSCTL +#error This file should not be compiled without CONFIG_SYSCTL defined +#endif + +extern int dccp_feat_default_sequence_window; +extern int dccp_feat_default_rx_ccid; +extern int dccp_feat_default_tx_ccid; +extern int dccp_feat_default_ack_ratio; +extern int dccp_feat_default_send_ack_vector; +extern int dccp_feat_default_send_ndp_count; + +static struct ctl_table dccp_default_table[] = { + { + .ctl_name = NET_DCCP_DEFAULT_SEQ_WINDOW, + .procname = "seq_window", + .data = &dccp_feat_default_sequence_window, + .maxlen = sizeof(dccp_feat_default_sequence_window), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_RX_CCID, + .procname = "rx_ccid", + .data = &dccp_feat_default_rx_ccid, + .maxlen = sizeof(dccp_feat_default_rx_ccid), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_TX_CCID, + .procname = "tx_ccid", + .data = &dccp_feat_default_tx_ccid, + .maxlen = sizeof(dccp_feat_default_tx_ccid), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_ACK_RATIO, + .procname = "ack_ratio", + .data = &dccp_feat_default_ack_ratio, + .maxlen = sizeof(dccp_feat_default_ack_ratio), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_SEND_ACKVEC, + .procname = "send_ackvec", + .data = &dccp_feat_default_send_ack_vector, + .maxlen = sizeof(dccp_feat_default_send_ack_vector), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = NET_DCCP_DEFAULT_SEND_NDP, + .procname = "send_ndp", + .data = &dccp_feat_default_send_ndp_count, + .maxlen = sizeof(dccp_feat_default_send_ndp_count), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { .ctl_name = 0, } +}; + +static struct ctl_table dccp_table[] = { + { + .ctl_name = NET_DCCP_DEFAULT, + .procname = "default", + .mode = 0555, + .child = dccp_default_table, + }, + { .ctl_name = 0, }, +}; + +static struct ctl_table dccp_dir_table[] = { + { + .ctl_name = NET_DCCP, + .procname = "dccp", + .mode = 0555, + .child = dccp_table, + }, + { .ctl_name = 0, }, +}; + +static struct ctl_table dccp_root_table[] = { + { + .ctl_name = CTL_NET, + .procname = "net", + .mode = 0555, + .child = dccp_dir_table, + }, + { .ctl_name = 0, }, +}; + +static struct ctl_table_header *dccp_table_header; + +int __init dccp_sysctl_init(void) +{ + dccp_table_header = register_sysctl_table(dccp_root_table, 1); + + return dccp_table_header != NULL ? 0 : -ENOMEM; +} + +void dccp_sysctl_exit(void) +{ + if (dccp_table_header != NULL) { + unregister_sysctl_table(dccp_table_header); + dccp_table_header = NULL; + } +} diff -puN net/dccp/timer.c~git-net net/dccp/timer.c --- devel/net/dccp/timer.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/dccp/timer.c 2006-02-27 22:37:47.000000000 -0800 @@ -31,7 +31,7 @@ static void dccp_write_err(struct sock * sk->sk_err = sk->sk_err_soft ? : ETIMEDOUT; sk->sk_error_report(sk); - dccp_v4_send_reset(sk, DCCP_RESET_CODE_ABORTED); + dccp_send_reset(sk, DCCP_RESET_CODE_ABORTED); dccp_done(sk); DCCP_INC_STATS_BH(DCCP_MIB_ABORTONTIMEOUT); } @@ -141,6 +141,17 @@ static void dccp_retransmit_timer(struct { struct inet_connection_sock *icsk = inet_csk(sk); + /* retransmit timer is used for feature negotiation throughout + * connection. In this case, no packet is re-transmitted, but rather an + * ack is generated and pending changes are splaced into its options. + */ + if (sk->sk_send_head == NULL) { + dccp_pr_debug("feat negotiation retransmit timeout %p\n", sk); + if (sk->sk_state == DCCP_OPEN) + dccp_send_ack(sk); + goto backoff; + } + /* * sk->sk_send_head has to have one skb with * DCCP_SKB_CB(skb)->dccpd_type set to one of the retransmittable DCCP @@ -177,6 +188,7 @@ static void dccp_retransmit_timer(struct goto out; } +backoff: icsk->icsk_backoff++; icsk->icsk_retransmits++; diff -puN net/ipv4/ah4.c~git-net net/ipv4/ah4.c --- devel/net/ipv4/ah4.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/ah4.c 2006-02-27 22:37:47.000000000 -0800 @@ -97,6 +97,7 @@ static int ah_output(struct xfrm_state * ah->reserved = 0; ah->spi = x->id.spi; ah->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); ahp->icv(ahp, skb, ah->auth_data); top_iph->tos = iph->tos; diff -puN net/ipv4/esp4.c~git-net net/ipv4/esp4.c --- devel/net/ipv4/esp4.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/esp4.c 2006-02-27 22:37:47.000000000 -0800 @@ -90,6 +90,7 @@ static int esp_output(struct xfrm_state esph->spi = x->id.spi; esph->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); if (esp->conf.ivlen) crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); diff -puN net/ipv4/fib_rules.c~git-net net/ipv4/fib_rules.c --- devel/net/ipv4/fib_rules.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/fib_rules.c 2006-02-27 22:37:47.000000000 -0800 @@ -40,6 +40,8 @@ #include #include #include +#include +#include #include #include @@ -52,7 +54,7 @@ struct fib_rule { - struct fib_rule *r_next; + struct hlist_node hlist; atomic_t r_clntref; u32 r_preference; unsigned char r_table; @@ -75,6 +77,7 @@ struct fib_rule #endif char r_ifname[IFNAMSIZ]; int r_dead; + struct rcu_head rcu; }; static struct fib_rule default_rule = { @@ -85,7 +88,6 @@ static struct fib_rule default_rule = { }; static struct fib_rule main_rule = { - .r_next = &default_rule, .r_clntref = ATOMIC_INIT(2), .r_preference = 0x7FFE, .r_table = RT_TABLE_MAIN, @@ -93,23 +95,24 @@ static struct fib_rule main_rule = { }; static struct fib_rule local_rule = { - .r_next = &main_rule, .r_clntref = ATOMIC_INIT(2), .r_table = RT_TABLE_LOCAL, .r_action = RTN_UNICAST, }; -static struct fib_rule *fib_rules = &local_rule; -static DEFINE_RWLOCK(fib_rules_lock); +static struct hlist_head fib_rules; + +/* writer func called from netlink -- rtnl_sem hold*/ int inet_rtm_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg) { struct rtattr **rta = arg; struct rtmsg *rtm = NLMSG_DATA(nlh); - struct fib_rule *r, **rp; + struct fib_rule *r; + struct hlist_node *node; int err = -ESRCH; - for (rp=&fib_rules; (r=*rp) != NULL; rp=&r->r_next) { + hlist_for_each_entry(r, node, &fib_rules, hlist) { if ((!rta[RTA_SRC-1] || memcmp(RTA_DATA(rta[RTA_SRC-1]), &r->r_src, 4) == 0) && rtm->rtm_src_len == r->r_src_len && rtm->rtm_dst_len == r->r_dst_len && @@ -126,10 +129,8 @@ int inet_rtm_delrule(struct sk_buff *skb if (r == &local_rule) break; - write_lock_bh(&fib_rules_lock); - *rp = r->r_next; + hlist_del_rcu(&r->hlist); r->r_dead = 1; - write_unlock_bh(&fib_rules_lock); fib_rule_put(r); err = 0; break; @@ -150,21 +151,30 @@ static struct fib_table *fib_empty_table return NULL; } +static inline void fib_rule_put_rcu(struct rcu_head *head) +{ + struct fib_rule *r = container_of(head, struct fib_rule, rcu); + kfree(r); +} + void fib_rule_put(struct fib_rule *r) { if (atomic_dec_and_test(&r->r_clntref)) { if (r->r_dead) - kfree(r); + call_rcu(&r->rcu, fib_rule_put_rcu); else printk("Freeing alive rule %p\n", r); } } +/* writer func called from netlink -- rtnl_sem hold*/ + int inet_rtm_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg) { struct rtattr **rta = arg; struct rtmsg *rtm = NLMSG_DATA(nlh); - struct fib_rule *r, *new_r, **rp; + struct fib_rule *r, *new_r, *last = NULL; + struct hlist_node *node = NULL; unsigned char table_id; if (rtm->rtm_src_len > 32 || rtm->rtm_dst_len > 32 || @@ -188,6 +198,7 @@ int inet_rtm_newrule(struct sk_buff *skb if (!new_r) return -ENOMEM; memset(new_r, 0, sizeof(*new_r)); + if (rta[RTA_SRC-1]) memcpy(&new_r->r_src, RTA_DATA(rta[RTA_SRC-1]), 4); if (rta[RTA_DST-1]) @@ -220,28 +231,28 @@ int inet_rtm_newrule(struct sk_buff *skb if (rta[RTA_FLOW-1]) memcpy(&new_r->r_tclassid, RTA_DATA(rta[RTA_FLOW-1]), 4); #endif + r = container_of(fib_rules.first, struct fib_rule, hlist); - rp = &fib_rules; if (!new_r->r_preference) { - r = fib_rules; - if (r && (r = r->r_next) != NULL) { - rp = &fib_rules->r_next; + if (r && r->hlist.next != NULL) { + r = container_of(r->hlist.next, struct fib_rule, hlist); if (r->r_preference) new_r->r_preference = r->r_preference - 1; } } - - while ( (r = *rp) != NULL ) { + + hlist_for_each_entry(r, node, &fib_rules, hlist) { if (r->r_preference > new_r->r_preference) break; - rp = &r->r_next; + last = r; } - - new_r->r_next = r; atomic_inc(&new_r->r_clntref); - write_lock_bh(&fib_rules_lock); - *rp = new_r; - write_unlock_bh(&fib_rules_lock); + + if (last) + hlist_add_after_rcu(&last->hlist, &new_r->hlist); + else + hlist_add_before_rcu(&new_r->hlist, &r->hlist); + return 0; } @@ -254,30 +265,30 @@ u32 fib_rules_tclass(struct fib_result * } #endif +/* callers should hold rtnl semaphore */ static void fib_rules_detach(struct net_device *dev) { + struct hlist_node *node; struct fib_rule *r; - for (r=fib_rules; r; r=r->r_next) { - if (r->r_ifindex == dev->ifindex) { - write_lock_bh(&fib_rules_lock); + hlist_for_each_entry(r, node, &fib_rules, hlist) { + if (r->r_ifindex == dev->ifindex) r->r_ifindex = -1; - write_unlock_bh(&fib_rules_lock); - } + } } +/* callers should hold rtnl semaphore */ + static void fib_rules_attach(struct net_device *dev) { + struct hlist_node *node; struct fib_rule *r; - for (r=fib_rules; r; r=r->r_next) { - if (r->r_ifindex == -1 && strcmp(dev->name, r->r_ifname) == 0) { - write_lock_bh(&fib_rules_lock); + hlist_for_each_entry(r, node, &fib_rules, hlist) { + if (r->r_ifindex == -1 && strcmp(dev->name, r->r_ifname) == 0) r->r_ifindex = dev->ifindex; - write_unlock_bh(&fib_rules_lock); - } } } @@ -286,14 +297,17 @@ int fib_lookup(const struct flowi *flp, int err; struct fib_rule *r, *policy; struct fib_table *tb; + struct hlist_node *node; u32 daddr = flp->fl4_dst; u32 saddr = flp->fl4_src; FRprintk("Lookup: %u.%u.%u.%u <- %u.%u.%u.%u ", NIPQUAD(flp->fl4_dst), NIPQUAD(flp->fl4_src)); - read_lock(&fib_rules_lock); - for (r = fib_rules; r; r=r->r_next) { + + rcu_read_lock(); + + hlist_for_each_entry_rcu(r, node, &fib_rules, hlist) { if (((saddr^r->r_src) & r->r_srcmask) || ((daddr^r->r_dst) & r->r_dstmask) || (r->r_tos && r->r_tos != flp->fl4_tos) || @@ -309,14 +323,14 @@ FRprintk("tb %d r %d ", r->r_table, r->r policy = r; break; case RTN_UNREACHABLE: - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -ENETUNREACH; default: case RTN_BLACKHOLE: - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -EINVAL; case RTN_PROHIBIT: - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -EACCES; } @@ -327,16 +341,16 @@ FRprintk("tb %d r %d ", r->r_table, r->r res->r = policy; if (policy) atomic_inc(&policy->r_clntref); - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return 0; } if (err < 0 && err != -EAGAIN) { - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return err; } } FRprintk("FAILURE\n"); - read_unlock(&fib_rules_lock); + rcu_read_unlock(); return -ENETUNREACH; } @@ -414,20 +428,25 @@ rtattr_failure: return -1; } +/* callers should hold rtnl semaphore */ + int inet_dump_rules(struct sk_buff *skb, struct netlink_callback *cb) { - int idx; + int idx = 0; int s_idx = cb->args[0]; struct fib_rule *r; + struct hlist_node *node; + + rcu_read_lock(); + hlist_for_each_entry(r, node, &fib_rules, hlist) { - read_lock(&fib_rules_lock); - for (r=fib_rules, idx=0; r; r = r->r_next, idx++) { if (idx < s_idx) continue; if (inet_fill_rule(skb, r, cb, NLM_F_MULTI) < 0) break; + idx++; } - read_unlock(&fib_rules_lock); + rcu_read_unlock(); cb->args[0] = idx; return skb->len; @@ -435,5 +454,9 @@ int inet_dump_rules(struct sk_buff *skb, void __init fib_rules_init(void) { + INIT_HLIST_HEAD(&fib_rules); + hlist_add_head(&local_rule.hlist, &fib_rules); + hlist_add_after(&local_rule.hlist, &main_rule.hlist); + hlist_add_after(&main_rule.hlist, &default_rule.hlist); register_netdevice_notifier(&fib_rules_notifier); } diff -puN net/ipv4/fib_trie.c~git-net net/ipv4/fib_trie.c --- devel/net/ipv4/fib_trie.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/fib_trie.c 2006-02-27 22:37:47.000000000 -0800 @@ -50,7 +50,7 @@ * Patrick McHardy */ -#define VERSION "0.404" +#define VERSION "0.406" #include #include @@ -84,7 +84,7 @@ #include "fib_lookup.h" #undef CONFIG_IP_FIB_TRIE_STATS -#define MAX_CHILDS 16384 +#define MAX_STAT_DEPTH 32 #define KEYLENGTH (8*sizeof(t_key)) #define MASK_PFX(k, l) (((l)==0)?0:(k >> (KEYLENGTH-l)) << (KEYLENGTH-l)) @@ -154,7 +154,7 @@ struct trie_stat { unsigned int tnodes; unsigned int leaves; unsigned int nullpointers; - unsigned int nodesizes[MAX_CHILDS]; + unsigned int nodesizes[MAX_STAT_DEPTH]; }; struct trie { @@ -2040,7 +2040,15 @@ rescan: static struct node *fib_trie_get_first(struct fib_trie_iter *iter, struct trie *t) { - struct node *n = rcu_dereference(t->trie); + struct node *n ; + + if(!t) + return NULL; + + n = rcu_dereference(t->trie); + + if(!iter) + return NULL; if (n && IS_TNODE(n)) { iter->tnode = (struct tnode *) n; @@ -2072,7 +2080,9 @@ static void trie_collect_stats(struct tr int i; s->tnodes++; - s->nodesizes[tn->bits]++; + if(tn->bits < MAX_STAT_DEPTH) + s->nodesizes[tn->bits]++; + for (i = 0; i < (1<bits); i++) if (!tn->child[i]) s->nullpointers++; @@ -2102,8 +2112,8 @@ static void trie_show_stats(struct seq_f seq_printf(seq, "\tInternal nodes: %d\n\t", stat->tnodes); bytes += sizeof(struct tnode) * stat->tnodes; - max = MAX_CHILDS-1; - while (max >= 0 && stat->nodesizes[max] == 0) + max = MAX_STAT_DEPTH; + while (max > 0 && stat->nodesizes[max-1] == 0) max--; pointers = 0; diff -puN net/ipv4/inet_connection_sock.c~git-net net/ipv4/inet_connection_sock.c --- devel/net/ipv4/inet_connection_sock.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/inet_connection_sock.c 2006-02-27 22:37:47.000000000 -0800 @@ -648,3 +648,22 @@ void inet_csk_addr2sockaddr(struct sock } EXPORT_SYMBOL_GPL(inet_csk_addr2sockaddr); + +int inet_csk_ctl_sock_create(struct socket **sock, unsigned short family, + unsigned short type, unsigned char protocol) +{ + int rc = sock_create_kern(family, type, protocol, sock); + + if (rc == 0) { + (*sock)->sk->sk_allocation = GFP_ATOMIC; + inet_sk((*sock)->sk)->uc_ttl = -1; + /* + * Unhash it so that IP input processing does not even see it, + * we do not wish this socket to see incoming packets. + */ + (*sock)->sk->sk_prot->unhash((*sock)->sk); + } + return rc; +} + +EXPORT_SYMBOL_GPL(inet_csk_ctl_sock_create); diff -puN net/ipv4/netfilter/arp_tables.c~git-net net/ipv4/netfilter/arp_tables.c --- devel/net/ipv4/netfilter/arp_tables.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/arp_tables.c 2006-02-27 22:37:47.000000000 -0800 @@ -208,6 +208,7 @@ static unsigned int arpt_error(struct sk const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -300,6 +301,7 @@ unsigned int arpt_do_table(struct sk_buf verdict = t->u.kernel.target->target(pskb, in, out, hook, + t->u.kernel.target, t->data, userdata); @@ -480,26 +482,31 @@ static inline int check_entry(struct arp } t->u.kernel.target = target; + ret = xt_check_target(target, NF_ARP, t->u.target_size - sizeof(*t), + name, e->comefrom, 0, 0); + if (ret) + goto err; + if (t->u.kernel.target == &arpt_standard_target) { if (!standard_check(t, size)) { ret = -EINVAL; goto out; } } else if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(name, e, t->data, + && !t->u.kernel.target->checkentry(name, e, target, t->data, t->u.target_size - sizeof(*t), e->comefrom)) { - module_put(t->u.kernel.target->me); duprintf("arp_tables: check failed for `%s'.\n", t->u.kernel.target->name); ret = -EINVAL; - goto out; + goto err; } (*i)++; return 0; - +err: + module_put(t->u.kernel.target->me); out: return ret; } @@ -555,7 +562,7 @@ static inline int cleanup_entry(struct a t = arpt_get_target(e); if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); return 0; @@ -1138,11 +1145,13 @@ void arpt_unregister_table(struct arpt_t /* The built-in targets: standard (NULL) and error. */ static struct arpt_target arpt_standard_target = { .name = ARPT_STANDARD_TARGET, + .targetsize = sizeof(int), }; static struct arpt_target arpt_error_target = { .name = ARPT_ERROR_TARGET, .target = arpt_error, + .targetsize = ARPT_FUNCTION_MAXNAMELEN, }; static struct nf_sockopt_ops arpt_sockopts = { diff -puN net/ipv4/netfilter/arpt_mangle.c~git-net net/ipv4/netfilter/arpt_mangle.c --- devel/net/ipv4/netfilter/arpt_mangle.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/arpt_mangle.c 2006-02-27 22:37:47.000000000 -0800 @@ -8,9 +8,10 @@ MODULE_AUTHOR("Bart De Schuymer "); MODULE_DESCRIPTION("Netfilter NAT helper module for PPTP"); @@ -198,7 +200,7 @@ pptp_outbound_pkt(struct sk_buff **pskb, /* only OUT_CALL_REQUEST, IN_CALL_REPLY, CALL_CLEAR_REQUEST pass * down to here */ DEBUGP("altering call id from 0x%04x to 0x%04x\n", - ntohs(*(u_int16_t *)pptpReq + cid_off), ntohs(new_callid)); + ntohs(REQ_CID(pptpReq, cid_off)), ntohs(new_callid)); /* mangle packet */ if (ip_nat_mangle_tcp_packet(pskb, ct, ctinfo, @@ -342,7 +344,7 @@ pptp_inbound_pkt(struct sk_buff **pskb, /* mangle packet */ DEBUGP("altering peer call id from 0x%04x to 0x%04x\n", - ntohs(*(u_int16_t *)pptpReq + pcid_off), ntohs(new_pcid)); + ntohs(REQ_CID(pptpReq, pcid_off)), ntohs(new_pcid)); if (ip_nat_mangle_tcp_packet(pskb, ct, ctinfo, pcid_off + sizeof(struct pptp_pkt_hdr) + @@ -353,7 +355,7 @@ pptp_inbound_pkt(struct sk_buff **pskb, if (new_cid) { DEBUGP("altering call id from 0x%04x to 0x%04x\n", - ntohs(*(u_int16_t *)pptpReq + cid_off), ntohs(new_cid)); + ntohs(REQ_CID(pptpReq, cid_off)), ntohs(new_cid)); if (ip_nat_mangle_tcp_packet(pskb, ct, ctinfo, cid_off + sizeof(struct pptp_pkt_hdr) + sizeof(struct PptpControlHeader), diff -puN net/ipv4/netfilter/ip_nat_rule.c~git-net net/ipv4/netfilter/ip_nat_rule.c --- devel/net/ipv4/netfilter/ip_nat_rule.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ip_nat_rule.c 2006-02-27 22:37:47.000000000 -0800 @@ -103,6 +103,7 @@ static unsigned int ipt_snat_target(stru const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct ipt_target *target, const void *targinfo, void *userinfo) { @@ -145,6 +146,7 @@ static unsigned int ipt_dnat_target(stru const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct ipt_target *target, const void *targinfo, void *userinfo) { @@ -170,6 +172,7 @@ static unsigned int ipt_dnat_target(stru static int ipt_snat_checkentry(const char *tablename, const void *entry, + const struct ipt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -181,28 +184,12 @@ static int ipt_snat_checkentry(const cha printk("SNAT: multiple ranges no longer supported\n"); return 0; } - - if (targinfosize != IPT_ALIGN(sizeof(struct ip_nat_multi_range_compat))) { - DEBUGP("SNAT: Target size %u wrong for %u ranges\n", - targinfosize, mr->rangesize); - return 0; - } - - /* Only allow these for NAT. */ - if (strcmp(tablename, "nat") != 0) { - DEBUGP("SNAT: wrong table %s\n", tablename); - return 0; - } - - if (hook_mask & ~(1 << NF_IP_POST_ROUTING)) { - DEBUGP("SNAT: hook mask 0x%x bad\n", hook_mask); - return 0; - } return 1; } static int ipt_dnat_checkentry(const char *tablename, const void *entry, + const struct ipt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -214,24 +201,6 @@ static int ipt_dnat_checkentry(const cha printk("DNAT: multiple ranges no longer supported\n"); return 0; } - - if (targinfosize != IPT_ALIGN(sizeof(struct ip_nat_multi_range_compat))) { - DEBUGP("DNAT: Target size %u wrong for %u ranges\n", - targinfosize, mr->rangesize); - return 0; - } - - /* Only allow these for NAT. */ - if (strcmp(tablename, "nat") != 0) { - DEBUGP("DNAT: wrong table %s\n", tablename); - return 0; - } - - if (hook_mask & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT))) { - DEBUGP("DNAT: hook mask 0x%x bad\n", hook_mask); - return 0; - } - return 1; } @@ -299,12 +268,18 @@ int ip_nat_rule_find(struct sk_buff **ps static struct ipt_target ipt_snat_reg = { .name = "SNAT", .target = ipt_snat_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = 1 << NF_IP_POST_ROUTING, .checkentry = ipt_snat_checkentry, }; static struct ipt_target ipt_dnat_reg = { .name = "DNAT", .target = ipt_dnat_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = 1 << NF_IP_PRE_ROUTING, .checkentry = ipt_dnat_checkentry, }; diff -puN net/ipv4/netfilter/ip_tables.c~git-net net/ipv4/netfilter/ip_tables.c --- devel/net/ipv4/netfilter/ip_tables.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ip_tables.c 2006-02-27 22:37:47.000000000 -0800 @@ -179,6 +179,7 @@ ipt_error(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -197,8 +198,8 @@ int do_match(struct ipt_entry_match *m, int *hotdrop) { /* Stop iteration if it doesn't match */ - if (!m->u.kernel.match->match(skb, in, out, m->data, offset, - skb->nh.iph->ihl*4, hotdrop)) + if (!m->u.kernel.match->match(skb, in, out, m->u.kernel.match, m->data, + offset, skb->nh.iph->ihl*4, hotdrop)) return 1; else return 0; @@ -305,6 +306,7 @@ ipt_do_table(struct sk_buff **pskb, verdict = t->u.kernel.target->target(pskb, in, out, hook, + t->u.kernel.target, t->data, userdata); @@ -464,7 +466,7 @@ cleanup_match(struct ipt_entry_match *m, return 1; if (m->u.kernel.match->destroy) - m->u.kernel.match->destroy(m->data, + m->u.kernel.match->destroy(m->u.kernel.match, m->data, m->u.match_size - sizeof(*m)); module_put(m->u.kernel.match->me); return 0; @@ -477,21 +479,12 @@ standard_check(const struct ipt_entry_ta struct ipt_standard_target *targ = (void *)t; /* Check standard info. */ - if (t->u.target_size - != IPT_ALIGN(sizeof(struct ipt_standard_target))) { - duprintf("standard_check: target size %u != %u\n", - t->u.target_size, - IPT_ALIGN(sizeof(struct ipt_standard_target))); - return 0; - } - if (targ->verdict >= 0 && targ->verdict > max_offset - sizeof(struct ipt_entry)) { duprintf("ipt_standard_check: bad verdict (%i)\n", targ->verdict); return 0; } - if (targ->verdict < -NF_MAX_VERDICT - 1) { duprintf("ipt_standard_check: bad negative verdict (%i)\n", targ->verdict); @@ -508,6 +501,7 @@ check_match(struct ipt_entry_match *m, unsigned int *i) { struct ipt_match *match; + int ret; match = try_then_request_module(xt_find_match(AF_INET, m->u.user.name, m->u.user.revision), @@ -518,18 +512,27 @@ check_match(struct ipt_entry_match *m, } m->u.kernel.match = match; + ret = xt_check_match(match, AF_INET, m->u.match_size - sizeof(*m), + name, hookmask, ip->proto, + ip->invflags & IPT_INV_PROTO); + if (ret) + goto err; + if (m->u.kernel.match->checkentry - && !m->u.kernel.match->checkentry(name, ip, m->data, + && !m->u.kernel.match->checkentry(name, ip, match, m->data, m->u.match_size - sizeof(*m), hookmask)) { - module_put(m->u.kernel.match->me); duprintf("ip_tables: check failed for `%s'.\n", m->u.kernel.match->name); - return -EINVAL; + ret = -EINVAL; + goto err; } (*i)++; return 0; +err: + module_put(m->u.kernel.match->me); + return ret; } static struct ipt_target ipt_standard_target; @@ -565,26 +568,32 @@ check_entry(struct ipt_entry *e, const c } t->u.kernel.target = target; + ret = xt_check_target(target, AF_INET, t->u.target_size - sizeof(*t), + name, e->comefrom, e->ip.proto, + e->ip.invflags & IPT_INV_PROTO); + if (ret) + goto err; + if (t->u.kernel.target == &ipt_standard_target) { if (!standard_check(t, size)) { ret = -EINVAL; goto cleanup_matches; } } else if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(name, e, t->data, + && !t->u.kernel.target->checkentry(name, e, target, t->data, t->u.target_size - sizeof(*t), e->comefrom)) { - module_put(t->u.kernel.target->me); duprintf("ip_tables: check failed for `%s'.\n", t->u.kernel.target->name); ret = -EINVAL; - goto cleanup_matches; + goto err; } (*i)++; return 0; - + err: + module_put(t->u.kernel.target->me); cleanup_matches: IPT_MATCH_ITERATE(e, cleanup_match, &j); return ret; @@ -645,7 +654,7 @@ cleanup_entry(struct ipt_entry *e, unsig IPT_MATCH_ITERATE(e, cleanup_match, NULL); t = ipt_get_target(e); if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); return 0; @@ -1277,6 +1286,7 @@ static int icmp_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -1310,28 +1320,27 @@ icmp_match(const struct sk_buff *skb, static int icmp_checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ipt_ip *ip = info; const struct ipt_icmp *icmpinfo = matchinfo; - /* Must specify proto == ICMP, and no unknown invflags */ - return ip->proto == IPPROTO_ICMP - && !(ip->invflags & IPT_INV_PROTO) - && matchsize == IPT_ALIGN(sizeof(struct ipt_icmp)) - && !(icmpinfo->invflags & ~IPT_ICMP_INV); + /* Must specify no unknown invflags */ + return !(icmpinfo->invflags & ~IPT_ICMP_INV); } /* The built-in targets: standard (NULL) and error. */ static struct ipt_target ipt_standard_target = { .name = IPT_STANDARD_TARGET, + .targetsize = sizeof(int), }; static struct ipt_target ipt_error_target = { .name = IPT_ERROR_TARGET, .target = ipt_error, + .targetsize = IPT_FUNCTION_MAXNAMELEN, }; static struct nf_sockopt_ops ipt_sockopts = { @@ -1346,8 +1355,10 @@ static struct nf_sockopt_ops ipt_sockopt static struct ipt_match icmp_matchstruct = { .name = "icmp", - .match = &icmp_match, - .checkentry = &icmp_checkentry, + .match = icmp_match, + .matchsize = sizeof(struct ipt_icmp), + .proto = IPPROTO_ICMP, + .checkentry = icmp_checkentry, }; static int __init init(void) diff -puN net/ipv4/netfilter/ipt_addrtype.c~git-net net/ipv4/netfilter/ipt_addrtype.c --- devel/net/ipv4/netfilter/ipt_addrtype.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_addrtype.c 2006-02-27 22:37:47.000000000 -0800 @@ -27,8 +27,9 @@ static inline int match_type(u_int32_t a return !!(mask & (1 << inet_addr_type(addr))); } -static int match(const struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, const void *matchinfo, +static int match(const struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, int *hotdrop) { const struct ipt_addrtype_info *info = matchinfo; @@ -43,23 +44,10 @@ static int match(const struct sk_buff *s return ret; } -static int checkentry(const char *tablename, const void *ip, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_addrtype_info))) { - printk(KERN_ERR "ipt_addrtype: invalid size (%u != %Zu)\n", - matchsize, IPT_ALIGN(sizeof(struct ipt_addrtype_info))); - return 0; - } - - return 1; -} - static struct ipt_match addrtype_match = { .name = "addrtype", .match = match, - .checkentry = checkentry, + .matchsize = sizeof(struct ipt_addrtype_info), .me = THIS_MODULE }; diff -puN net/ipv4/netfilter/ipt_ah.c~git-net net/ipv4/netfilter/ipt_ah.c --- devel/net/ipv4/netfilter/ipt_ah.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ah.c 2006-02-27 22:37:47.000000000 -0800 @@ -39,6 +39,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -71,37 +72,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ipt_ah *ahinfo = matchinfo; - const struct ipt_ip *ip = ip_void; - /* Must specify proto == AH, and no unknown invflags */ - if (ip->proto != IPPROTO_AH || (ip->invflags & IPT_INV_PROTO)) { - duprintf("ipt_ah: Protocol %u != %u\n", ip->proto, - IPPROTO_AH); - return 0; - } - if (matchinfosize != IPT_ALIGN(sizeof(struct ipt_ah))) { - duprintf("ipt_ah: matchsize %u != %u\n", - matchinfosize, IPT_ALIGN(sizeof(struct ipt_ah))); - return 0; - } + /* Must specify no unknown invflags */ if (ahinfo->invflags & ~IPT_AH_INV_MASK) { - duprintf("ipt_ah: unknown flags %X\n", - ahinfo->invflags); + duprintf("ipt_ah: unknown flags %X\n", ahinfo->invflags); return 0; } - return 1; } static struct ipt_match ah_match = { .name = "ah", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_ah), + .proto = IPPROTO_AH, + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_CLUSTERIP.c~git-net net/ipv4/netfilter/ipt_CLUSTERIP.c --- devel/net/ipv4/netfilter/ipt_CLUSTERIP.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_CLUSTERIP.c 2006-02-27 22:37:47.000000000 -0800 @@ -311,6 +311,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -380,6 +381,7 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -389,13 +391,6 @@ checkentry(const char *tablename, struct clusterip_config *config; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_clusterip_tgt_info))) { - printk(KERN_WARNING "CLUSTERIP: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_clusterip_tgt_info))); - return 0; - } - if (cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP && cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT && cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT_DPT) { @@ -465,9 +460,10 @@ checkentry(const char *tablename, } /* drop reference count of cluster config when rule is deleted */ -static void destroy(void *matchinfo, unsigned int matchinfosize) +static void destroy(const struct xt_target *target, void *targinfo, + unsigned int targinfosize) { - struct ipt_clusterip_tgt_info *cipinfo = matchinfo; + struct ipt_clusterip_tgt_info *cipinfo = targinfo; /* if no more entries are referencing the config, remove it * from the list and destroy the proc entry */ @@ -476,12 +472,13 @@ static void destroy(void *matchinfo, uns clusterip_config_put(cipinfo->config); } -static struct ipt_target clusterip_tgt = { - .name = "CLUSTERIP", - .target = &target, - .checkentry = &checkentry, - .destroy = &destroy, - .me = THIS_MODULE +static struct ipt_target clusterip_tgt = { + .name = "CLUSTERIP", + .target = target, + .targetsize = sizeof(struct ipt_clusterip_tgt_info), + .checkentry = checkentry, + .destroy = destroy, + .me = THIS_MODULE }; diff -puN net/ipv4/netfilter/ipt_dscp.c~git-net net/ipv4/netfilter/ipt_dscp.c --- devel/net/ipv4/netfilter/ipt_dscp.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_dscp.c 2006-02-27 22:37:47.000000000 -0800 @@ -19,8 +19,9 @@ MODULE_AUTHOR("Harald Welte tos&IPT_DSCP_MASK) == sh_dscp) ^ info->invert; } -static int checkentry(const char *tablename, const void *ip, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_dscp_info))) - return 0; - - return 1; -} - static struct ipt_match dscp_match = { .name = "dscp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_dscp_info), .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_DSCP.c~git-net net/ipv4/netfilter/ipt_DSCP.c --- devel/net/ipv4/netfilter/ipt_DSCP.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_DSCP.c 2006-02-27 22:37:47.000000000 -0800 @@ -29,6 +29,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -58,35 +59,25 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const u_int8_t dscp = ((struct ipt_DSCP_info *)targinfo)->dscp; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_DSCP_info))) { - printk(KERN_WARNING "DSCP: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_DSCP_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "DSCP: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if ((dscp > IPT_DSCP_MAX)) { printk(KERN_WARNING "DSCP: dscp %x out of range\n", dscp); return 0; } - return 1; } static struct ipt_target ipt_dscp_reg = { .name = "DSCP", .target = target, + .targetsize = sizeof(struct ipt_DSCP_info), + .table = "mangle", .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ecn.c~git-net net/ipv4/netfilter/ipt_ecn.c --- devel/net/ipv4/netfilter/ipt_ecn.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ecn.c 2006-02-27 22:37:47.000000000 -0800 @@ -65,8 +65,9 @@ static inline int match_tcp(const struct return 1; } -static int match(const struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, const void *matchinfo, +static int match(const struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, int *hotdrop) { const struct ipt_ecn_info *info = matchinfo; @@ -86,15 +87,13 @@ static int match(const struct sk_buff *s } static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ipt_ecn_info *info = matchinfo; const struct ipt_ip *ip = ip_void; - if (matchsize != IPT_ALIGN(sizeof(struct ipt_ecn_info))) - return 0; - if (info->operation & IPT_ECN_OP_MATCH_MASK) return 0; @@ -113,8 +112,9 @@ static int checkentry(const char *tablen static struct ipt_match ecn_match = { .name = "ecn", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_ecn_info), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ECN.c~git-net net/ipv4/netfilter/ipt_ECN.c --- devel/net/ipv4/netfilter/ipt_ECN.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ECN.c 2006-02-27 22:37:47.000000000 -0800 @@ -94,6 +94,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -114,6 +115,7 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -121,18 +123,6 @@ checkentry(const char *tablename, const struct ipt_ECN_info *einfo = (struct ipt_ECN_info *)targinfo; const struct ipt_entry *e = e_void; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_ECN_info))) { - printk(KERN_WARNING "ECN: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_ECN_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "ECN: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (einfo->operation & IPT_ECN_OP_MASK) { printk(KERN_WARNING "ECN: unsupported ECN operation %x\n", einfo->operation); @@ -143,20 +133,20 @@ checkentry(const char *tablename, einfo->ip_ect); return 0; } - if ((einfo->operation & (IPT_ECN_OP_SET_ECE|IPT_ECN_OP_SET_CWR)) && (e->ip.proto != IPPROTO_TCP || (e->ip.invflags & IPT_INV_PROTO))) { printk(KERN_WARNING "ECN: cannot use TCP operations on a " "non-tcp rule\n"); return 0; } - return 1; } static struct ipt_target ipt_ecn_reg = { .name = "ECN", .target = target, + .targetsize = sizeof(struct ipt_ECN_info), + .table = "mangle", .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_esp.c~git-net net/ipv4/netfilter/ipt_esp.c --- devel/net/ipv4/netfilter/ipt_esp.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_esp.c 2006-02-27 22:37:47.000000000 -0800 @@ -40,6 +40,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -72,37 +73,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ipt_esp *espinfo = matchinfo; - const struct ipt_ip *ip = ip_void; - /* Must specify proto == ESP, and no unknown invflags */ - if (ip->proto != IPPROTO_ESP || (ip->invflags & IPT_INV_PROTO)) { - duprintf("ipt_esp: Protocol %u != %u\n", ip->proto, - IPPROTO_ESP); - return 0; - } - if (matchinfosize != IPT_ALIGN(sizeof(struct ipt_esp))) { - duprintf("ipt_esp: matchsize %u != %u\n", - matchinfosize, IPT_ALIGN(sizeof(struct ipt_esp))); - return 0; - } + /* Must specify no unknown invflags */ if (espinfo->invflags & ~IPT_ESP_INV_MASK) { - duprintf("ipt_esp: unknown flags %X\n", - espinfo->invflags); + duprintf("ipt_esp: unknown flags %X\n", espinfo->invflags); return 0; } - return 1; } static struct ipt_match esp_match = { .name = "esp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_esp), + .proto = IPPROTO_ESP, + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_hashlimit.c~git-net net/ipv4/netfilter/ipt_hashlimit.c --- devel/net/ipv4/netfilter/ipt_hashlimit.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_hashlimit.c 2006-02-27 22:37:47.000000000 -0800 @@ -427,6 +427,7 @@ static int hashlimit_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -506,15 +507,13 @@ hashlimit_match(const struct sk_buff *sk static int hashlimit_checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { struct ipt_hashlimit_info *r = matchinfo; - if (matchsize != IPT_ALIGN(sizeof(struct ipt_hashlimit_info))) - return 0; - /* Check for overflow. */ if (r->cfg.burst == 0 || user2credits(r->cfg.avg * r->cfg.burst) < @@ -558,19 +557,21 @@ hashlimit_checkentry(const char *tablena } static void -hashlimit_destroy(void *matchinfo, unsigned int matchsize) +hashlimit_destroy(const struct xt_match *match, void *matchinfo, + unsigned int matchsize) { struct ipt_hashlimit_info *r = (struct ipt_hashlimit_info *) matchinfo; htable_put(r->hinfo); } -static struct ipt_match ipt_hashlimit = { - .name = "hashlimit", - .match = hashlimit_match, - .checkentry = hashlimit_checkentry, - .destroy = hashlimit_destroy, - .me = THIS_MODULE +static struct ipt_match ipt_hashlimit = { + .name = "hashlimit", + .match = hashlimit_match, + .matchsize = sizeof(struct ipt_hashlimit_info), + .checkentry = hashlimit_checkentry, + .destroy = hashlimit_destroy, + .me = THIS_MODULE }; /* PROC stuff */ diff -puN net/ipv4/netfilter/ipt_iprange.c~git-net net/ipv4/netfilter/ipt_iprange.c --- devel/net/ipv4/netfilter/ipt_iprange.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_iprange.c 2006-02-27 22:37:47.000000000 -0800 @@ -27,6 +27,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, int *hotdrop) { @@ -62,27 +63,12 @@ match(const struct sk_buff *skb, return 1; } -static int check(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - /* verify size */ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_iprange_info))) - return 0; - - return 1; -} - -static struct ipt_match iprange_match = -{ - .list = { NULL, NULL }, - .name = "iprange", - .match = &match, - .checkentry = &check, - .destroy = NULL, - .me = THIS_MODULE +static struct ipt_match iprange_match = { + .name = "iprange", + .match = match, + .matchsize = sizeof(struct ipt_iprange_info), + .destroy = NULL, + .me = THIS_MODULE }; static int __init init(void) diff -puN net/ipv4/netfilter/ipt_LOG.c~git-net net/ipv4/netfilter/ipt_LOG.c --- devel/net/ipv4/netfilter/ipt_LOG.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_LOG.c 2006-02-27 22:37:47.000000000 -0800 @@ -415,6 +415,7 @@ ipt_log_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -437,35 +438,29 @@ ipt_log_target(struct sk_buff **pskb, static int ipt_log_checkentry(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ipt_log_info *loginfo = targinfo; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_log_info))) { - DEBUGP("LOG: targinfosize %u != %u\n", - targinfosize, IPT_ALIGN(sizeof(struct ipt_log_info))); - return 0; - } - if (loginfo->level >= 8) { DEBUGP("LOG: level %u >= 8\n", loginfo->level); return 0; } - if (loginfo->prefix[sizeof(loginfo->prefix)-1] != '\0') { DEBUGP("LOG: prefix term %i\n", loginfo->prefix[sizeof(loginfo->prefix)-1]); return 0; } - return 1; } static struct ipt_target ipt_log_reg = { .name = "LOG", .target = ipt_log_target, + .targetsize = sizeof(struct ipt_log_info), .checkentry = ipt_log_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_MASQUERADE.c~git-net net/ipv4/netfilter/ipt_MASQUERADE.c --- devel/net/ipv4/netfilter/ipt_MASQUERADE.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_MASQUERADE.c 2006-02-27 22:37:47.000000000 -0800 @@ -41,25 +41,13 @@ static DEFINE_RWLOCK(masq_lock); static int masquerade_check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip_nat_multi_range_compat *mr = targinfo; - if (strcmp(tablename, "nat") != 0) { - DEBUGP("masquerade_check: bad table `%s'.\n", tablename); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP("masquerade_check: size %u != %u.\n", - targinfosize, sizeof(*mr)); - return 0; - } - if (hook_mask & ~(1 << NF_IP_POST_ROUTING)) { - DEBUGP("masquerade_check: bad hooks %x.\n", hook_mask); - return 0; - } if (mr->range[0].flags & IP_NAT_RANGE_MAP_IPS) { DEBUGP("masquerade_check: bad MAP_IPS.\n"); return 0; @@ -76,6 +64,7 @@ masquerade_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -179,6 +168,9 @@ static struct notifier_block masq_inet_n static struct ipt_target masquerade = { .name = "MASQUERADE", .target = masquerade_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = 1 << NF_IP_POST_ROUTING, .checkentry = masquerade_check, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_multiport.c~git-net net/ipv4/netfilter/ipt_multiport.c --- devel/net/ipv4/netfilter/ipt_multiport.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_multiport.c 2006-02-27 22:37:47.000000000 -0800 @@ -95,6 +95,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -127,6 +128,7 @@ static int match_v1(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -153,40 +155,19 @@ match_v1(const struct sk_buff *skb, return ports_match_v1(multiinfo, ntohs(pptr[0]), ntohs(pptr[1])); } -/* Called when user tries to insert an entry of this type. */ -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - return (matchsize == IPT_ALIGN(sizeof(struct ipt_multiport))); -} - -static int -checkentry_v1(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - return (matchsize == IPT_ALIGN(sizeof(struct ipt_multiport_v1))); -} - static struct ipt_match multiport_match = { .name = "multiport", .revision = 0, - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_multiport), .me = THIS_MODULE, }; static struct ipt_match multiport_match_v1 = { .name = "multiport", .revision = 1, - .match = &match_v1, - .checkentry = &checkentry_v1, + .match = match_v1, + .matchsize = sizeof(struct ipt_multiport_v1), .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_NETMAP.c~git-net net/ipv4/netfilter/ipt_NETMAP.c --- devel/net/ipv4/netfilter/ipt_NETMAP.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_NETMAP.c 2006-02-27 22:37:47.000000000 -0800 @@ -32,25 +32,13 @@ MODULE_DESCRIPTION("iptables 1:1 NAT map static int check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip_nat_multi_range_compat *mr = targinfo; - if (strcmp(tablename, "nat") != 0) { - DEBUGP(MODULENAME":check: bad table `%s'.\n", tablename); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP(MODULENAME":check: size %u.\n", targinfosize); - return 0; - } - if (hook_mask & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_POST_ROUTING) | - (1 << NF_IP_LOCAL_OUT))) { - DEBUGP(MODULENAME":check: bad hooks %x.\n", hook_mask); - return 0; - } if (!(mr->range[0].flags & IP_NAT_RANGE_MAP_IPS)) { DEBUGP(MODULENAME":check: bad MAP_IPS.\n"); return 0; @@ -67,6 +55,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -101,6 +90,10 @@ target(struct sk_buff **pskb, static struct ipt_target target_module = { .name = MODULENAME, .target = target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_POST_ROUTING) | + (1 << NF_IP_LOCAL_OUT), .checkentry = check, .me = THIS_MODULE }; diff -puN net/ipv4/netfilter/ipt_owner.c~git-net net/ipv4/netfilter/ipt_owner.c --- devel/net/ipv4/netfilter/ipt_owner.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_owner.c 2006-02-27 22:37:47.000000000 -0800 @@ -25,6 +25,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -53,37 +54,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ipt_owner_info *info = matchinfo; - if (hook_mask - & ~((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_POST_ROUTING))) { - printk("ipt_owner: only valid for LOCAL_OUT or POST_ROUTING.\n"); - return 0; - } - - if (matchsize != IPT_ALIGN(sizeof(struct ipt_owner_info))) { - printk("Matchsize %u != %Zu\n", matchsize, - IPT_ALIGN(sizeof(struct ipt_owner_info))); - return 0; - } - if (info->match & (IPT_OWNER_PID|IPT_OWNER_SID|IPT_OWNER_COMM)) { printk("ipt_owner: pid, sid and command matching " "not supported anymore\n"); return 0; } - return 1; } static struct ipt_match owner_match = { .name = "owner", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_owner_info), + .hooks = (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_POST_ROUTING), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -L net/ipv4/netfilter/ipt_policy.c -puN net/ipv4/netfilter/ipt_policy.c~git-net /dev/null --- devel/net/ipv4/netfilter/ipt_policy.c +++ /dev/null 2003-09-15 06:40:47.000000000 -0700 @@ -1,176 +0,0 @@ -/* IP tables module for matching IPsec policy - * - * Copyright (c) 2004,2005 Patrick McHardy, - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -MODULE_AUTHOR("Patrick McHardy "); -MODULE_DESCRIPTION("IPtables IPsec policy matching module"); -MODULE_LICENSE("GPL"); - - -static inline int -match_xfrm_state(struct xfrm_state *x, const struct ipt_policy_elem *e) -{ -#define MATCH_ADDR(x,y,z) (!e->match.x || \ - ((e->x.a4.s_addr == (e->y.a4.s_addr & (z))) \ - ^ e->invert.x)) -#define MATCH(x,y) (!e->match.x || ((e->x == (y)) ^ e->invert.x)) - - return MATCH_ADDR(saddr, smask, x->props.saddr.a4) && - MATCH_ADDR(daddr, dmask, x->id.daddr.a4) && - MATCH(proto, x->id.proto) && - MATCH(mode, x->props.mode) && - MATCH(spi, x->id.spi) && - MATCH(reqid, x->props.reqid); -} - -static int -match_policy_in(const struct sk_buff *skb, const struct ipt_policy_info *info) -{ - const struct ipt_policy_elem *e; - struct sec_path *sp = skb->sp; - int strict = info->flags & IPT_POLICY_MATCH_STRICT; - int i, pos; - - if (sp == NULL) - return -1; - if (strict && info->len != sp->len) - return 0; - - for (i = sp->len - 1; i >= 0; i--) { - pos = strict ? i - sp->len + 1 : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(sp->x[i].xvec, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? 1 : 0; -} - -static int -match_policy_out(const struct sk_buff *skb, const struct ipt_policy_info *info) -{ - const struct ipt_policy_elem *e; - struct dst_entry *dst = skb->dst; - int strict = info->flags & IPT_POLICY_MATCH_STRICT; - int i, pos; - - if (dst->xfrm == NULL) - return -1; - - for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { - pos = strict ? i : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(dst->xfrm, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? i == info->len : 0; -} - -static int match(const struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - const void *matchinfo, - int offset, - unsigned int protoff, - int *hotdrop) -{ - const struct ipt_policy_info *info = matchinfo; - int ret; - - if (info->flags & IPT_POLICY_MATCH_IN) - ret = match_policy_in(skb, info); - else - ret = match_policy_out(skb, info); - - if (ret < 0) - ret = info->flags & IPT_POLICY_MATCH_NONE ? 1 : 0; - else if (info->flags & IPT_POLICY_MATCH_NONE) - ret = 0; - - return ret; -} - -static int checkentry(const char *tablename, const void *ip_void, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - struct ipt_policy_info *info = matchinfo; - - if (matchsize != IPT_ALIGN(sizeof(*info))) { - printk(KERN_ERR "ipt_policy: matchsize %u != %zu\n", - matchsize, IPT_ALIGN(sizeof(*info))); - return 0; - } - if (!(info->flags & (IPT_POLICY_MATCH_IN|IPT_POLICY_MATCH_OUT))) { - printk(KERN_ERR "ipt_policy: neither incoming nor " - "outgoing policy selected\n"); - return 0; - } - if (hook_mask & (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_LOCAL_IN) - && info->flags & IPT_POLICY_MATCH_OUT) { - printk(KERN_ERR "ipt_policy: output policy not valid in " - "PRE_ROUTING and INPUT\n"); - return 0; - } - if (hook_mask & (1 << NF_IP_POST_ROUTING | 1 << NF_IP_LOCAL_OUT) - && info->flags & IPT_POLICY_MATCH_IN) { - printk(KERN_ERR "ipt_policy: input policy not valid in " - "POST_ROUTING and OUTPUT\n"); - return 0; - } - if (info->len > IPT_POLICY_MAX_ELEM) { - printk(KERN_ERR "ipt_policy: too many policy elements\n"); - return 0; - } - - return 1; -} - -static struct ipt_match policy_match = { - .name = "policy", - .match = match, - .checkentry = checkentry, - .me = THIS_MODULE, -}; - -static int __init init(void) -{ - return ipt_register_match(&policy_match); -} - -static void __exit fini(void) -{ - ipt_unregister_match(&policy_match); -} - -module_init(init); -module_exit(fini); diff -puN net/ipv4/netfilter/ipt_recent.c~git-net net/ipv4/netfilter/ipt_recent.c --- devel/net/ipv4/netfilter/ipt_recent.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_recent.c 2006-02-27 22:37:47.000000000 -0800 @@ -102,6 +102,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -318,7 +319,7 @@ static int ip_recent_ctrl(struct file *f skb->nh.iph->daddr = 0; /* Clear ttl since we have no way of knowing it */ skb->nh.iph->ttl = 0; - match(skb,NULL,NULL,info,0,0,NULL); + match(skb,NULL,NULL,NULL,info,0,0,NULL); kfree(skb->nh.iph); out_free_skb: @@ -356,6 +357,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -657,6 +659,7 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -670,8 +673,6 @@ checkentry(const char *tablename, if(debug) printk(KERN_INFO RECENT_NAME ": checkentry() entered.\n"); #endif - if (matchsize != IPT_ALIGN(sizeof(struct ipt_recent_info))) return 0; - /* seconds and hit_count only valid for CHECK/UPDATE */ if(info->check_set & IPT_RECENT_SET) { flag++; if(info->seconds || info->hit_count) return 0; } if(info->check_set & IPT_RECENT_REMOVE) { flag++; if(info->seconds || info->hit_count) return 0; } @@ -871,7 +872,7 @@ checkentry(const char *tablename, * up its memory. */ static void -destroy(void *matchinfo, unsigned int matchsize) +destroy(const struct xt_match *match, void *matchinfo, unsigned int matchsize) { const struct ipt_recent_info *info = matchinfo; struct recent_ip_tables *curr_table, *last_table; @@ -951,12 +952,13 @@ destroy(void *matchinfo, unsigned int ma /* This is the structure we pass to ipt_register to register our * module with iptables. */ -static struct ipt_match recent_match = { - .name = "recent", - .match = &match, - .checkentry = &checkentry, - .destroy = &destroy, - .me = THIS_MODULE +static struct ipt_match recent_match = { + .name = "recent", + .match = match, + .matchsize = sizeof(struct ipt_recent_info), + .checkentry = checkentry, + .destroy = destroy, + .me = THIS_MODULE }; /* Kernel module initialization. */ diff -puN net/ipv4/netfilter/ipt_REDIRECT.c~git-net net/ipv4/netfilter/ipt_REDIRECT.c --- devel/net/ipv4/netfilter/ipt_REDIRECT.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_REDIRECT.c 2006-02-27 22:37:47.000000000 -0800 @@ -34,24 +34,13 @@ MODULE_DESCRIPTION("iptables REDIRECT ta static int redirect_check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip_nat_multi_range_compat *mr = targinfo; - if (strcmp(tablename, "nat") != 0) { - DEBUGP("redirect_check: bad table `%s'.\n", table); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP("redirect_check: size %u.\n", targinfosize); - return 0; - } - if (hook_mask & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT))) { - DEBUGP("redirect_check: bad hooks %x.\n", hook_mask); - return 0; - } if (mr->range[0].flags & IP_NAT_RANGE_MAP_IPS) { DEBUGP("redirect_check: bad MAP_IPS.\n"); return 0; @@ -68,6 +57,7 @@ redirect_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -115,6 +105,9 @@ redirect_target(struct sk_buff **pskb, static struct ipt_target redirect_reg = { .name = "REDIRECT", .target = redirect_target, + .targetsize = sizeof(struct ip_nat_multi_range_compat), + .table = "nat", + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT), .checkentry = redirect_check, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_REJECT.c~git-net net/ipv4/netfilter/ipt_REJECT.c --- devel/net/ipv4/netfilter/ipt_REJECT.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_REJECT.c 2006-02-27 22:37:47.000000000 -0800 @@ -154,10 +154,6 @@ static void send_reset(struct sk_buff *o /* This packet will not be the same as the other: clear nf fields */ nf_reset(nskb); nskb->nfmark = 0; -#ifdef CONFIG_BRIDGE_NETFILTER - nf_bridge_put(nskb->nf_bridge); - nskb->nf_bridge = NULL; -#endif tcph = (struct tcphdr *)((u_int32_t*)nskb->nh.iph + nskb->nh.iph->ihl); @@ -236,6 +232,7 @@ static unsigned int reject(struct sk_buf const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -283,6 +280,7 @@ static unsigned int reject(struct sk_buf static int check(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -290,23 +288,6 @@ static int check(const char *tablename, const struct ipt_reject_info *rejinfo = targinfo; const struct ipt_entry *e = e_void; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_reject_info))) { - DEBUGP("REJECT: targinfosize %u != 0\n", targinfosize); - return 0; - } - - /* Only allow these for packet filtering. */ - if (strcmp(tablename, "filter") != 0) { - DEBUGP("REJECT: bad table `%s'.\n", tablename); - return 0; - } - if ((hook_mask & ~((1 << NF_IP_LOCAL_IN) - | (1 << NF_IP_FORWARD) - | (1 << NF_IP_LOCAL_OUT))) != 0) { - DEBUGP("REJECT: bad hook mask %X\n", hook_mask); - return 0; - } - if (rejinfo->with == IPT_ICMP_ECHOREPLY) { printk("REJECT: ECHOREPLY no longer supported.\n"); return 0; @@ -318,13 +299,16 @@ static int check(const char *tablename, return 0; } } - return 1; } static struct ipt_target ipt_reject_reg = { .name = "REJECT", .target = reject, + .targetsize = sizeof(struct ipt_reject_info), + .table = "filter", + .hooks = (1 << NF_IP_LOCAL_IN) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_LOCAL_OUT), .checkentry = check, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_SAME.c~git-net net/ipv4/netfilter/ipt_SAME.c --- devel/net/ipv4/netfilter/ipt_SAME.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_SAME.c 2006-02-27 22:37:47.000000000 -0800 @@ -50,6 +50,7 @@ MODULE_DESCRIPTION("iptables special SNA static int same_check(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -59,18 +60,6 @@ same_check(const char *tablename, mr->ipnum = 0; - if (strcmp(tablename, "nat") != 0) { - DEBUGP("same_check: bad table `%s'.\n", tablename); - return 0; - } - if (targinfosize != IPT_ALIGN(sizeof(*mr))) { - DEBUGP("same_check: size %u.\n", targinfosize); - return 0; - } - if (hook_mask & ~(1 << NF_IP_PRE_ROUTING | 1 << NF_IP_POST_ROUTING)) { - DEBUGP("same_check: bad hooks %x.\n", hook_mask); - return 0; - } if (mr->rangesize < 1) { DEBUGP("same_check: need at least one dest range.\n"); return 0; @@ -127,7 +116,7 @@ same_check(const char *tablename, } static void -same_destroy(void *targinfo, +same_destroy(const struct xt_target *target, void *targinfo, unsigned int targinfosize) { struct ipt_same_info *mr = targinfo; @@ -143,6 +132,7 @@ same_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -191,6 +181,9 @@ same_target(struct sk_buff **pskb, static struct ipt_target same_reg = { .name = "SAME", .target = same_target, + .targetsize = sizeof(struct ipt_same_info), + .table = "nat", + .hooks = (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_POST_ROUTING), .checkentry = same_check, .destroy = same_destroy, .me = THIS_MODULE, diff -puN net/ipv4/netfilter/ipt_TCPMSS.c~git-net net/ipv4/netfilter/ipt_TCPMSS.c --- devel/net/ipv4/netfilter/ipt_TCPMSS.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_TCPMSS.c 2006-02-27 22:37:47.000000000 -0800 @@ -48,6 +48,7 @@ ipt_tcpmss_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -211,6 +212,7 @@ static inline int find_syn_match(const s static int ipt_tcpmss_checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -218,13 +220,6 @@ ipt_tcpmss_checkentry(const char *tablen const struct ipt_tcpmss_info *tcpmssinfo = targinfo; const struct ipt_entry *e = e_void; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_tcpmss_info))) { - DEBUGP("ipt_tcpmss_checkentry: targinfosize %u != %u\n", - targinfosize, IPT_ALIGN(sizeof(struct ipt_tcpmss_info))); - return 0; - } - - if((tcpmssinfo->mss == IPT_TCPMSS_CLAMP_PMTU) && ((hook_mask & ~((1 << NF_IP_FORWARD) | (1 << NF_IP_LOCAL_OUT) @@ -233,11 +228,8 @@ ipt_tcpmss_checkentry(const char *tablen return 0; } - if (e->ip.proto == IPPROTO_TCP - && !(e->ip.invflags & IPT_INV_PROTO) - && IPT_MATCH_ITERATE(e, find_syn_match)) + if (IPT_MATCH_ITERATE(e, find_syn_match)) return 1; - printk("TCPMSS: Only works on TCP SYN packets\n"); return 0; } @@ -245,6 +237,8 @@ ipt_tcpmss_checkentry(const char *tablen static struct ipt_target ipt_tcpmss_reg = { .name = "TCPMSS", .target = ipt_tcpmss_target, + .targetsize = sizeof(struct ipt_tcpmss_info), + .proto = IPPROTO_TCP, .checkentry = ipt_tcpmss_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_tos.c~git-net net/ipv4/netfilter/ipt_tos.c --- devel/net/ipv4/netfilter/ipt_tos.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_tos.c 2006-02-27 22:37:47.000000000 -0800 @@ -21,6 +21,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -31,23 +32,10 @@ match(const struct sk_buff *skb, return (skb->nh.iph->tos == info->tos) ^ info->invert; } -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IPT_ALIGN(sizeof(struct ipt_tos_info))) - return 0; - - return 1; -} - static struct ipt_match tos_match = { .name = "tos", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ipt_tos_info), .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_TOS.c~git-net net/ipv4/netfilter/ipt_TOS.c --- devel/net/ipv4/netfilter/ipt_TOS.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_TOS.c 2006-02-27 22:37:47.000000000 -0800 @@ -25,6 +25,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -53,24 +54,13 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *e_void, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const u_int8_t tos = ((struct ipt_tos_target_info *)targinfo)->tos; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_tos_target_info))) { - printk(KERN_WARNING "TOS: targinfosize %u != %Zu\n", - targinfosize, - IPT_ALIGN(sizeof(struct ipt_tos_target_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "TOS: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (tos != IPTOS_LOWDELAY && tos != IPTOS_THROUGHPUT && tos != IPTOS_RELIABILITY @@ -79,13 +69,14 @@ checkentry(const char *tablename, printk(KERN_WARNING "TOS: bad tos value %#x\n", tos); return 0; } - return 1; } static struct ipt_target ipt_tos_reg = { .name = "TOS", .target = target, + .targetsize = sizeof(struct ipt_tos_target_info), + .table = "mangle", .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ttl.c~git-net net/ipv4/netfilter/ipt_ttl.c --- devel/net/ipv4/netfilter/ipt_ttl.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ttl.c 2006-02-27 22:37:47.000000000 -0800 @@ -19,8 +19,9 @@ MODULE_AUTHOR("Harald Welte mode > IPT_TTL_MAXMODE) { printk(KERN_WARNING "ipt_TTL: invalid or unknown Mode %u\n", info->mode); return 0; } - if ((info->mode != IPT_TTL_SET) && (info->ttl == 0)) return 0; - return 1; } static struct ipt_target ipt_TTL = { .name = "TTL", .target = ipt_ttl_target, + .targetsize = sizeof(struct ipt_TTL_info), + .table = "mangle", .checkentry = ipt_ttl_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/ipt_ULOG.c~git-net net/ipv4/netfilter/ipt_ULOG.c --- devel/net/ipv4/netfilter/ipt_ULOG.c~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/ipt_ULOG.c 2006-02-27 22:37:47.000000000 -0800 @@ -303,6 +303,7 @@ static unsigned int ipt_ulog_target(stru const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { struct ipt_ulog_info *loginfo = (struct ipt_ulog_info *) targinfo; @@ -339,42 +340,37 @@ static void ipt_logfn(unsigned int pf, static int ipt_ulog_checkentry(const char *tablename, const void *e, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hookmask) { struct ipt_ulog_info *loginfo = (struct ipt_ulog_info *) targinfo; - if (targinfosize != IPT_ALIGN(sizeof(struct ipt_ulog_info))) { - DEBUGP("ipt_ULOG: targinfosize %u != 0\n", targinfosize); - return 0; - } - if (loginfo->prefix[sizeof(loginfo->prefix) - 1] != '\0') { DEBUGP("ipt_ULOG: prefix term %i\n", loginfo->prefix[sizeof(loginfo->prefix) - 1]); return 0; } - if (loginfo->qthreshold > ULOG_MAX_QLEN) { DEBUGP("ipt_ULOG: queue threshold %i > MAX_QLEN\n", loginfo->qthreshold); return 0; } - return 1; } static struct ipt_target ipt_ulog_reg = { .name = "ULOG", .target = ipt_ulog_target, + .targetsize = sizeof(struct ipt_ulog_info), .checkentry = ipt_ulog_checkentry, .me = THIS_MODULE, }; static struct nf_logger ipt_ulog_logger = { .name = "ipt_ULOG", - .logfn = &ipt_logfn, + .logfn = ipt_logfn, .me = THIS_MODULE, }; diff -puN net/ipv4/netfilter/Kconfig~git-net net/ipv4/netfilter/Kconfig --- devel/net/ipv4/netfilter/Kconfig~git-net 2006-02-27 22:37:46.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/Kconfig 2006-02-27 22:37:47.000000000 -0800 @@ -303,16 +303,6 @@ config IP_NF_MATCH_HASHLIMIT destination IP' or `500pps from any given source IP' with a single IPtables rule. -config IP_NF_MATCH_POLICY - tristate "IPsec policy match support" - depends on IP_NF_IPTABLES && XFRM - help - Policy matching allows you to match packets based on the - IPsec policy that was used during decapsulation/will - be used during encapsulation. - - To compile it as a module, choose M here. If unsure, say N. - # `filter', generic and specific targets config IP_NF_FILTER tristate "Packet filtering" diff -puN net/ipv4/netfilter/Makefile~git-net net/ipv4/netfilter/Makefile --- devel/net/ipv4/netfilter/Makefile~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/Makefile 2006-02-27 22:37:47.000000000 -0800 @@ -57,7 +57,6 @@ obj-$(CONFIG_IP_NF_MATCH_DSCP) += ipt_ds obj-$(CONFIG_IP_NF_MATCH_AH_ESP) += ipt_ah.o ipt_esp.o obj-$(CONFIG_IP_NF_MATCH_TTL) += ipt_ttl.o obj-$(CONFIG_IP_NF_MATCH_ADDRTYPE) += ipt_addrtype.o -obj-$(CONFIG_IP_NF_MATCH_POLICY) += ipt_policy.o # targets obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o diff -puN net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c~git-net net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c --- devel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c 2006-02-27 22:37:47.000000000 -0800 @@ -141,19 +141,21 @@ static unsigned int ipv4_conntrack_help( { struct nf_conn *ct; enum ip_conntrack_info ctinfo; + struct nf_conn_help *help; /* This is where we call the helper: as the packet goes out. */ ct = nf_ct_get(*pskb, &ctinfo); - if (ct && ct->helper) { - unsigned int ret; - ret = ct->helper->help(pskb, - (*pskb)->nh.raw - (*pskb)->data - + (*pskb)->nh.iph->ihl*4, - ct, ctinfo); - if (ret != NF_ACCEPT) - return ret; - } - return NF_ACCEPT; + if (!ct) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help || !help->helper) + return NF_ACCEPT; + + return help->helper->help(pskb, + (*pskb)->nh.raw - (*pskb)->data + + (*pskb)->nh.iph->ihl*4, + ct, ctinfo); } static unsigned int ipv4_conntrack_defrag(unsigned int hooknum, diff -puN net/ipv4/sysctl_net_ipv4.c~git-net net/ipv4/sysctl_net_ipv4.c --- devel/net/ipv4/sysctl_net_ipv4.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv4/sysctl_net_ipv4.c 2006-02-27 22:37:47.000000000 -0800 @@ -664,6 +664,22 @@ ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = NET_TCP_MTU_PROBING, + .procname = "tcp_mtu_probing", + .data = &sysctl_tcp_mtu_probing, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_TCP_BASE_MSS, + .procname = "tcp_base_mss", + .data = &sysctl_tcp_base_mss, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0 } }; diff -puN net/ipv4/tcp_input.c~git-net net/ipv4/tcp_input.c --- devel/net/ipv4/tcp_input.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_input.c 2006-02-27 22:37:47.000000000 -0800 @@ -1891,6 +1891,34 @@ static void tcp_try_to_open(struct sock } } +static void tcp_mtup_probe_failed(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + icsk->icsk_mtup.search_high = icsk->icsk_mtup.probe_size - 1; + icsk->icsk_mtup.probe_size = 0; +} + +static void tcp_mtup_probe_success(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + + /* FIXME: breaks with very large cwnd */ + tp->prior_ssthresh = tcp_current_ssthresh(sk); + tp->snd_cwnd = tp->snd_cwnd * + tcp_mss_to_mtu(sk, tp->mss_cache) / + icsk->icsk_mtup.probe_size; + tp->snd_cwnd_cnt = 0; + tp->snd_cwnd_stamp = tcp_time_stamp; + tp->rcv_ssthresh = tcp_current_ssthresh(sk); + + icsk->icsk_mtup.search_low = icsk->icsk_mtup.probe_size; + icsk->icsk_mtup.probe_size = 0; + tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); +} + + /* Process an event, which can update packets-in-flight not trivially. * Main goal of this function is to calculate new estimate for left_out, * taking into account both packets sitting in receiver's buffer and @@ -2023,6 +2051,17 @@ tcp_fastretrans_alert(struct sock *sk, u return; } + /* MTU probe failure: don't reduce cwnd */ + if (icsk->icsk_ca_state < TCP_CA_CWR && + icsk->icsk_mtup.probe_size && + tp->snd_una == tp->mtu_probe.probe_seq_start) { + tcp_mtup_probe_failed(sk); + /* Restores the reduction we did in tcp_mtup_probe() */ + tp->snd_cwnd++; + tcp_simple_retransmit(sk); + return; + } + /* Otherwise enter Recovery state */ if (IsReno(tp)) @@ -2242,6 +2281,13 @@ static int tcp_clean_rtx_queue(struct so acked |= FLAG_SYN_ACKED; tp->retrans_stamp = 0; } + + /* MTU probing checks */ + if (icsk->icsk_mtup.probe_size) { + if (!after(tp->mtu_probe.probe_seq_end, TCP_SKB_CB(skb)->end_seq)) { + tcp_mtup_probe_success(sk, skb); + } + } if (sacked) { if (sacked & TCPCB_RETRANS) { @@ -4101,6 +4147,7 @@ static int tcp_rcv_synsent_state_process if (tp->rx_opt.sack_ok && sysctl_tcp_fack) tp->rx_opt.sack_ok |= 2; + tcp_mtup_init(sk); tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); tcp_initialize_rcv_mss(sk); @@ -4211,6 +4258,7 @@ discard: if (tp->ecn_flags&TCP_ECN_OK) sock_set_flag(sk, SOCK_NO_LARGESEND); + tcp_mtup_init(sk); tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); tcp_initialize_rcv_mss(sk); @@ -4399,6 +4447,7 @@ int tcp_rcv_state_process(struct sock *s */ tp->lsndtime = tcp_time_stamp; + tcp_mtup_init(sk); tcp_initialize_rcv_mss(sk); tcp_init_buffer_space(sk); tcp_fast_path_on(tp); diff -puN net/ipv4/tcp_ipv4.c~git-net net/ipv4/tcp_ipv4.c --- devel/net/ipv4/tcp_ipv4.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_ipv4.c 2006-02-27 22:37:47.000000000 -0800 @@ -900,6 +900,7 @@ struct sock *tcp_v4_syn_recv_sock(struct inet_csk(newsk)->icsk_ext_hdr_len = newinet->opt->optlen; newinet->id = newtp->write_seq ^ jiffies; + tcp_mtup_init(newsk); tcp_sync_mss(newsk, dst_mtu(dst)); newtp->advmss = dst_metric(dst, RTAX_ADVMSS); tcp_initialize_rcv_mss(newsk); @@ -1827,21 +1828,10 @@ struct proto tcp_prot = { .rsk_prot = &tcp_request_sock_ops, }; - - void __init tcp_v4_init(struct net_proto_family *ops) { - int err = sock_create_kern(PF_INET, SOCK_RAW, IPPROTO_TCP, &tcp_socket); - if (err < 0) + if (inet_csk_ctl_sock_create(&tcp_socket, PF_INET, SOCK_RAW, IPPROTO_TCP) < 0) panic("Failed to create the TCP control socket.\n"); - tcp_socket->sk->sk_allocation = GFP_ATOMIC; - inet_sk(tcp_socket->sk)->uc_ttl = -1; - - /* Unhash it so that IP input processing does not even - * see it, we do not wish this socket to see incoming - * packets. - */ - tcp_socket->sk->sk_prot->unhash(tcp_socket->sk); } EXPORT_SYMBOL(ipv4_specific); diff -puN net/ipv4/tcp_output.c~git-net net/ipv4/tcp_output.c --- devel/net/ipv4/tcp_output.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_output.c 2006-02-27 22:37:47.000000000 -0800 @@ -51,6 +51,12 @@ int sysctl_tcp_retrans_collapse = 1; */ int sysctl_tcp_tso_win_divisor = 3; +int sysctl_tcp_mtu_probing = 0; +int sysctl_tcp_base_mss = 512; + +EXPORT_SYMBOL(sysctl_tcp_mtu_probing); +EXPORT_SYMBOL(sysctl_tcp_base_mss); + static void update_send_head(struct sock *sk, struct tcp_sock *tp, struct sk_buff *skb) { @@ -681,6 +687,62 @@ int tcp_trim_head(struct sock *sk, struc return 0; } +/* Not accounting for SACKs here. */ +int tcp_mtu_to_mss(struct sock *sk, int pmtu) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + int mss_now; + + /* Calculate base mss without TCP options: + It is MMS_S - sizeof(tcphdr) of rfc1122 + */ + mss_now = pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcphdr); + + /* Clamp it (mss_clamp does not include tcp options) */ + if (mss_now > tp->rx_opt.mss_clamp) + mss_now = tp->rx_opt.mss_clamp; + + /* Now subtract optional transport overhead */ + mss_now -= icsk->icsk_ext_hdr_len; + + /* Then reserve room for full set of TCP options and 8 bytes of data */ + if (mss_now < 48) + mss_now = 48; + + /* Now subtract TCP options size, not including SACKs */ + mss_now -= tp->tcp_header_len - sizeof(struct tcphdr); + + return mss_now; +} + +/* Inverse of above */ +int tcp_mss_to_mtu(struct sock *sk, int mss) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + int mtu; + + mtu = mss + + tp->tcp_header_len + + icsk->icsk_ext_hdr_len + + icsk->icsk_af_ops->net_header_len; + + return mtu; +} + +void tcp_mtup_init(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + + icsk->icsk_mtup.enabled = sysctl_tcp_mtu_probing > 1; + icsk->icsk_mtup.search_high = tp->rx_opt.mss_clamp + sizeof(struct tcphdr) + + icsk->icsk_af_ops->net_header_len; + icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, sysctl_tcp_base_mss); + icsk->icsk_mtup.probe_size = 0; +} + /* This function synchronize snd mss to current pmtu/exthdr set. tp->rx_opt.user_mss is mss set by user by TCP_MAXSEG. It does NOT counts @@ -708,25 +770,12 @@ unsigned int tcp_sync_mss(struct sock *s { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); - /* Calculate base mss without TCP options: - It is MMS_S - sizeof(tcphdr) of rfc1122 - */ - int mss_now = (pmtu - icsk->icsk_af_ops->net_header_len - - sizeof(struct tcphdr)); - - /* Clamp it (mss_clamp does not include tcp options) */ - if (mss_now > tp->rx_opt.mss_clamp) - mss_now = tp->rx_opt.mss_clamp; - - /* Now subtract optional transport overhead */ - mss_now -= icsk->icsk_ext_hdr_len; + int mss_now; - /* Then reserve room for full set of TCP options and 8 bytes of data */ - if (mss_now < 48) - mss_now = 48; + if (icsk->icsk_mtup.search_high > pmtu) + icsk->icsk_mtup.search_high = pmtu; - /* Now subtract TCP options size, not including SACKs */ - mss_now -= tp->tcp_header_len - sizeof(struct tcphdr); + mss_now = tcp_mtu_to_mss(sk, pmtu); /* Bound mss with half of window */ if (tp->max_window && mss_now > (tp->max_window>>1)) @@ -734,6 +783,8 @@ unsigned int tcp_sync_mss(struct sock *s /* And store cached results */ icsk->icsk_pmtu_cookie = pmtu; + if (icsk->icsk_mtup.enabled) + mss_now = min(mss_now, tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low)); tp->mss_cache = mss_now; return mss_now; @@ -1059,6 +1110,140 @@ static int tcp_tso_should_defer(struct s return 1; } +/* Create a new MTU probe if we are ready. + * Returns 0 if we should wait to probe (no cwnd available), + * 1 if a probe was sent, + * -1 otherwise */ +static int tcp_mtu_probe(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + struct sk_buff *skb, *nskb, *next; + int len; + int probe_size; + unsigned int pif; + int copy; + int mss_now; + + /* Not currently probing/verifying, + * not in recovery, + * have enough cwnd, and + * not SACKing (the variable headers throw things off) */ + if (!icsk->icsk_mtup.enabled || + icsk->icsk_mtup.probe_size || + inet_csk(sk)->icsk_ca_state != TCP_CA_Open || + tp->snd_cwnd < 11 || + tp->rx_opt.eff_sacks) + return -1; + + /* Very simple search strategy: just double the MSS. */ + mss_now = tcp_current_mss(sk, 0); + probe_size = 2*tp->mss_cache; + if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high)) { + /* TODO: set timer for probe_converge_event */ + return -1; + } + + /* Have enough data in the send queue to probe? */ + len = 0; + if ((skb = sk->sk_send_head) == NULL) + return -1; + while ((len += skb->len) < probe_size && !tcp_skb_is_last(sk, skb)) + skb = skb->next; + if (len < probe_size) + return -1; + + /* Receive window check. */ + if (after(TCP_SKB_CB(skb)->seq + probe_size, tp->snd_una + tp->snd_wnd)) { + if (tp->snd_wnd < probe_size) + return -1; + else + return 0; + } + + /* Do we need to wait to drain cwnd? */ + pif = tcp_packets_in_flight(tp); + if (pif + 2 > tp->snd_cwnd) { + /* With no packets in flight, don't stall. */ + if (pif == 0) + return -1; + else + return 0; + } + + /* We're allowed to probe. Build it now. */ + if ((nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC)) == NULL) + return -1; + sk_charge_skb(sk, nskb); + + skb = sk->sk_send_head; + __skb_insert(nskb, skb->prev, skb, &sk->sk_write_queue); + sk->sk_send_head = nskb; + + TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq; + TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size; + TCP_SKB_CB(nskb)->flags = TCPCB_FLAG_ACK; + TCP_SKB_CB(nskb)->sacked = 0; + nskb->csum = 0; + if (skb->ip_summed == CHECKSUM_HW) + nskb->ip_summed = CHECKSUM_HW; + + len = 0; + while (len < probe_size) { + next = skb->next; + + copy = min_t(int, skb->len, probe_size - len); + if (nskb->ip_summed) + skb_copy_bits(skb, 0, skb_put(nskb, copy), copy); + else + nskb->csum = skb_copy_and_csum_bits(skb, 0, + skb_put(nskb, copy), copy, nskb->csum); + + if (skb->len <= copy) { + /* We've eaten all the data from this skb. + * Throw it away. */ + TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags; + __skb_unlink(skb, &sk->sk_write_queue); + sk_stream_free_skb(sk, skb); + } else { + TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags & + ~(TCPCB_FLAG_FIN|TCPCB_FLAG_PSH); + if (!skb_shinfo(skb)->nr_frags) { + skb_pull(skb, copy); + if (skb->ip_summed != CHECKSUM_HW) + skb->csum = csum_partial(skb->data, skb->len, 0); + } else { + __pskb_trim_head(skb, copy); + tcp_set_skb_tso_segs(sk, skb, mss_now); + } + TCP_SKB_CB(skb)->seq += copy; + } + + len += copy; + skb = next; + } + tcp_init_tso_segs(sk, nskb, nskb->len); + + /* We're ready to send. If this fails, the probe will + * be resegmented into mss-sized pieces by tcp_write_xmit(). */ + TCP_SKB_CB(nskb)->when = tcp_time_stamp; + if (!tcp_transmit_skb(sk, nskb, 1, GFP_ATOMIC)) { + /* Decrement cwnd here because we are sending + * effectively two packets. */ + tp->snd_cwnd--; + update_send_head(sk, tp, nskb); + + icsk->icsk_mtup.probe_size = tcp_mss_to_mtu(sk, nskb->len); + tp->mtu_probe.probe_seq_start = TCP_SKB_CB(nskb)->seq; + tp->mtu_probe.probe_seq_end = TCP_SKB_CB(nskb)->end_seq; + + return 1; + } + + return -1; +} + + /* This routine writes packets to the network. It advances the * send_head. This happens as incoming acks open up the remote * window for us. @@ -1072,6 +1257,7 @@ static int tcp_write_xmit(struct sock *s struct sk_buff *skb; unsigned int tso_segs, sent_pkts; int cwnd_quota; + int result; /* If we are closed, the bytes will have to remain here. * In time closedown will finish, we empty the write queue and all @@ -1081,12 +1267,20 @@ static int tcp_write_xmit(struct sock *s return 0; sent_pkts = 0; + + /* Do MTU probing. */ + if ((result = tcp_mtu_probe(sk)) == 0) { + return 0; + } else if (result > 0) { + sent_pkts = 1; + } + while ((skb = sk->sk_send_head)) { unsigned int limit; tso_segs = tcp_init_tso_segs(sk, skb, mss_now); BUG_ON(!tso_segs); - + cwnd_quota = tcp_cwnd_test(tp, skb); if (!cwnd_quota) break; @@ -1451,9 +1645,15 @@ void tcp_simple_retransmit(struct sock * int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); unsigned int cur_mss = tcp_current_mss(sk, 0); int err; - + + /* Inconslusive MTU probe */ + if (icsk->icsk_mtup.probe_size) { + icsk->icsk_mtup.probe_size = 0; + } + /* Do not sent more than we queued. 1/4 is reserved for possible * copying overhead: fragmentation, tunneling, mangling etc. */ @@ -1879,6 +2079,7 @@ static void tcp_connect_init(struct sock if (tp->rx_opt.user_mss) tp->rx_opt.mss_clamp = tp->rx_opt.user_mss; tp->max_window = 0; + tcp_mtup_init(sk); tcp_sync_mss(sk, dst_mtu(dst)); if (!tp->window_clamp) @@ -2176,3 +2377,4 @@ EXPORT_SYMBOL(tcp_make_synack); EXPORT_SYMBOL(tcp_simple_retransmit); EXPORT_SYMBOL(tcp_sync_mss); EXPORT_SYMBOL(sysctl_tcp_tso_win_divisor); +EXPORT_SYMBOL(tcp_mtup_init); diff -puN net/ipv4/tcp_timer.c~git-net net/ipv4/tcp_timer.c --- devel/net/ipv4/tcp_timer.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv4/tcp_timer.c 2006-02-27 22:37:47.000000000 -0800 @@ -119,8 +119,10 @@ static int tcp_orphan_retries(struct soc /* A write timeout has occurred. Process the after effects. */ static int tcp_write_timeout(struct sock *sk) { - const struct inet_connection_sock *icsk = inet_csk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + struct tcp_sock *tp = tcp_sk(sk); int retry_until; + int mss; if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { if (icsk->icsk_retransmits) @@ -128,25 +130,19 @@ static int tcp_write_timeout(struct sock retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries; } else { if (icsk->icsk_retransmits >= sysctl_tcp_retries1) { - /* NOTE. draft-ietf-tcpimpl-pmtud-01.txt requires pmtu black - hole detection. :-( - - It is place to make it. It is not made. I do not want - to make it. It is disgusting. It does not work in any - case. Let me to cite the same draft, which requires for - us to implement this: - - "The one security concern raised by this memo is that ICMP black holes - are often caused by over-zealous security administrators who block - all ICMP messages. It is vitally important that those who design and - deploy security systems understand the impact of strict filtering on - upper-layer protocols. The safest web site in the world is worthless - if most TCP implementations cannot transfer data from it. It would - be far nicer to have all of the black holes fixed rather than fixing - all of the TCP implementations." - - Golden words :-). - */ + /* Black hole detection */ + if (sysctl_tcp_mtu_probing) { + if (!icsk->icsk_mtup.enabled) { + icsk->icsk_mtup.enabled = 1; + tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); + } else { + mss = min(sysctl_tcp_base_mss, + tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low)/2); + mss = max(mss, 68 - tp->tcp_header_len); + icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); + tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); + } + } dst_negative_advice(&sk->sk_dst_cache); } diff -puN net/ipv6/addrconf.c~git-net net/ipv6/addrconf.c --- devel/net/ipv6/addrconf.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/addrconf.c 2006-02-27 22:37:47.000000000 -0800 @@ -78,8 +78,6 @@ #ifdef CONFIG_IPV6_PRIVACY #include -#include -#include #endif #include @@ -110,8 +108,6 @@ static int __ipv6_try_regen_rndid(struct static void ipv6_regen_rndid(unsigned long data); static int desync_factor = MAX_DESYNC_FACTOR * HZ; -static struct crypto_tfm *md5_tfm; -static DEFINE_SPINLOCK(md5_tfm_lock); #endif static int ipv6_count_addresses(struct inet6_dev *idev); @@ -169,6 +165,15 @@ struct ipv6_devconf ipv6_devconf = { .max_desync_factor = MAX_DESYNC_FACTOR, #endif .max_addresses = IPV6_MAX_ADDRESSES, + .accept_ra_defrtr = 1, + .accept_ra_pinfo = 1, +#ifdef CONFIG_IPV6_ROUTER_PREF + .accept_ra_rtr_pref = 1, + .rtr_probe_interval = 60 * HZ, +#ifdef CONFIG_IPV6_ROUTE_INFO + .accept_ra_rt_info_max_plen = 0, +#endif +#endif }; static struct ipv6_devconf ipv6_devconf_dflt = { @@ -190,6 +195,15 @@ static struct ipv6_devconf ipv6_devconf_ .max_desync_factor = MAX_DESYNC_FACTOR, #endif .max_addresses = IPV6_MAX_ADDRESSES, + .accept_ra_defrtr = 1, + .accept_ra_pinfo = 1, +#ifdef CONFIG_IPV6_ROUTER_PREF + .accept_ra_rtr_pref = 1, + .rtr_probe_interval = 60 * HZ, +#ifdef CONFIG_IPV6_ROUTE_INFO + .accept_ra_rt_info_max_plen = 0, +#endif +#endif }; /* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ @@ -371,8 +385,6 @@ static struct inet6_dev * ipv6_add_dev(s in6_dev_hold(ndev); #ifdef CONFIG_IPV6_PRIVACY - get_random_bytes(ndev->rndid, sizeof(ndev->rndid)); - get_random_bytes(ndev->entropy, sizeof(ndev->entropy)); init_timer(&ndev->regen_timer); ndev->regen_timer.function = ipv6_regen_rndid; ndev->regen_timer.data = (unsigned long) ndev; @@ -1305,52 +1317,67 @@ static void addrconf_leave_anycast(struc __ipv6_dev_ac_dec(ifp->idev, &addr); } +static int addrconf_ifid_eui48(u8 *eui, struct net_device *dev) +{ + if (dev->addr_len != ETH_ALEN) + return -1; + memcpy(eui, dev->dev_addr, 3); + memcpy(eui + 5, dev->dev_addr + 3, 3); + + /* + * The zSeries OSA network cards can be shared among various + * OS instances, but the OSA cards have only one MAC address. + * This leads to duplicate address conflicts in conjunction + * with IPv6 if more than one instance uses the same card. + * + * The driver for these cards can deliver a unique 16-bit + * identifier for each instance sharing the same card. It is + * placed instead of 0xFFFE in the interface identifier. The + * "u" bit of the interface identifier is not inverted in this + * case. Hence the resulting interface identifier has local + * scope according to RFC2373. + */ + if (dev->dev_id) { + eui[3] = (dev->dev_id >> 8) & 0xFF; + eui[4] = dev->dev_id & 0xFF; + } else { + eui[3] = 0xFF; + eui[4] = 0xFE; + eui[0] ^= 2; + } + return 0; +} + +static int addrconf_ifid_arcnet(u8 *eui, struct net_device *dev) +{ + /* XXX: inherit EUI-64 from other interface -- yoshfuji */ + if (dev->addr_len != ARCNET_ALEN) + return -1; + memset(eui, 0, 7); + eui[7] = *(u8*)dev->dev_addr; + return 0; +} + +static int addrconf_ifid_infiniband(u8 *eui, struct net_device *dev) +{ + if (dev->addr_len != INFINIBAND_ALEN) + return -1; + memcpy(eui, dev->dev_addr + 12, 8); + eui[0] |= 2; + return 0; +} + static int ipv6_generate_eui64(u8 *eui, struct net_device *dev) { switch (dev->type) { case ARPHRD_ETHER: case ARPHRD_FDDI: case ARPHRD_IEEE802_TR: - if (dev->addr_len != ETH_ALEN) - return -1; - memcpy(eui, dev->dev_addr, 3); - memcpy(eui + 5, dev->dev_addr + 3, 3); - - /* - * The zSeries OSA network cards can be shared among various - * OS instances, but the OSA cards have only one MAC address. - * This leads to duplicate address conflicts in conjunction - * with IPv6 if more than one instance uses the same card. - * - * The driver for these cards can deliver a unique 16-bit - * identifier for each instance sharing the same card. It is - * placed instead of 0xFFFE in the interface identifier. The - * "u" bit of the interface identifier is not inverted in this - * case. Hence the resulting interface identifier has local - * scope according to RFC2373. - */ - if (dev->dev_id) { - eui[3] = (dev->dev_id >> 8) & 0xFF; - eui[4] = dev->dev_id & 0xFF; - } else { - eui[3] = 0xFF; - eui[4] = 0xFE; - eui[0] ^= 2; - } - return 0; + return addrconf_ifid_eui48(eui, dev); case ARPHRD_ARCNET: - /* XXX: inherit EUI-64 from other interface -- yoshfuji */ - if (dev->addr_len != ARCNET_ALEN) - return -1; - memset(eui, 0, 7); - eui[7] = *(u8*)dev->dev_addr; - return 0; + return addrconf_ifid_arcnet(eui, dev); case ARPHRD_INFINIBAND: - if (dev->addr_len != INFINIBAND_ALEN) - return -1; - memcpy(eui, dev->dev_addr + 12, 8); - eui[0] |= 2; - return 0; + return addrconf_ifid_infiniband(eui, dev); } return -1; } @@ -1376,34 +1403,9 @@ static int ipv6_inherit_eui64(u8 *eui, s /* (re)generation of randomized interface identifier (RFC 3041 3.2, 3.5) */ static int __ipv6_regen_rndid(struct inet6_dev *idev) { - struct net_device *dev; - struct scatterlist sg[2]; - - sg_set_buf(&sg[0], idev->entropy, 8); - sg_set_buf(&sg[1], idev->work_eui64, 8); - - dev = idev->dev; - - if (ipv6_generate_eui64(idev->work_eui64, dev)) { - printk(KERN_INFO - "__ipv6_regen_rndid(idev=%p): cannot get EUI64 identifier; use random bytes.\n", - idev); - get_random_bytes(idev->work_eui64, sizeof(idev->work_eui64)); - } regen: - spin_lock(&md5_tfm_lock); - if (unlikely(md5_tfm == NULL)) { - spin_unlock(&md5_tfm_lock); - return -1; - } - crypto_digest_init(md5_tfm); - crypto_digest_update(md5_tfm, sg, 2); - crypto_digest_final(md5_tfm, idev->work_digest); - spin_unlock(&md5_tfm_lock); - - memcpy(idev->rndid, &idev->work_digest[0], 8); + get_random_bytes(idev->rndid, sizeof(idev->rndid)); idev->rndid[0] &= ~0x02; - memcpy(idev->entropy, &idev->work_digest[8], 8); /* * : @@ -2143,7 +2145,6 @@ static void addrconf_ip6_tnl_config(stru return; } ip6_tnl_add_linklocal(idev); - addrconf_add_mroute(dev); } static int addrconf_notify(struct notifier_block *this, unsigned long event, @@ -3133,6 +3134,15 @@ static void inline ipv6_store_devconf(st array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor; #endif array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses; + array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr; + array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo; +#ifdef CONFIG_IPV6_ROUTER_PREF + array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref; + array[DEVCONF_RTR_PROBE_INTERVAL] = cnf->rtr_probe_interval; +#ifdef CONFIV_IPV6_ROUTE_INFO + array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] = cnf->accept_ra_rt_info_max_plen; +#endif +#endif } static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev, @@ -3586,6 +3596,51 @@ static struct addrconf_sysctl_table .proc_handler = &proc_dointvec, }, { + .ctl_name = NET_IPV6_ACCEPT_RA_DEFRTR, + .procname = "accept_ra_defrtr", + .data = &ipv6_devconf.accept_ra_defrtr, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_ACCEPT_RA_PINFO, + .procname = "accept_ra_pinfo", + .data = &ipv6_devconf.accept_ra_pinfo, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, +#ifdef CONFIG_IPV6_ROUTER_PREF + { + .ctl_name = NET_IPV6_ACCEPT_RA_RTR_PREF, + .procname = "accept_ra_rtr_pref", + .data = &ipv6_devconf.accept_ra_rtr_pref, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = NET_IPV6_RTR_PROBE_INTERVAL, + .procname = "router_probe_interval", + .data = &ipv6_devconf.rtr_probe_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_jiffies, + .strategy = &sysctl_jiffies, + }, +#ifdef CONFIV_IPV6_ROUTE_INFO + { + .ctl_name = NET_IPV6_ACCEPT_RA_RT_INFO_MAX_PLEN, + .procname = "accept_ra_rt_info_max_plen", + .data = &ipv6_devconf.accept_ra_rt_info_max_plen, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, +#endif +#endif + { .ctl_name = 0, /* sentinel */ } }, @@ -3760,13 +3815,6 @@ int __init addrconf_init(void) register_netdevice_notifier(&ipv6_dev_notf); -#ifdef CONFIG_IPV6_PRIVACY - md5_tfm = crypto_alloc_tfm("md5", 0); - if (unlikely(md5_tfm == NULL)) - printk(KERN_WARNING - "failed to load transform for md5\n"); -#endif - addrconf_verify(0); rtnetlink_links[PF_INET6] = inet6_rtnetlink_table; #ifdef CONFIG_SYSCTL @@ -3829,11 +3877,6 @@ void __exit addrconf_cleanup(void) rtnl_unlock(); -#ifdef CONFIG_IPV6_PRIVACY - crypto_free_tfm(md5_tfm); - md5_tfm = NULL; -#endif - #ifdef CONFIG_PROC_FS proc_net_remove("if_inet6"); #endif diff -puN net/ipv6/ah6.c~git-net net/ipv6/ah6.c --- devel/net/ipv6/ah6.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/ah6.c 2006-02-27 22:37:47.000000000 -0800 @@ -213,6 +213,7 @@ static int ah6_output(struct xfrm_state ah->reserved = 0; ah->spi = x->id.spi; ah->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); ahp->icv(ahp, skb, ah->auth_data); err = 0; diff -puN net/ipv6/esp6.c~git-net net/ipv6/esp6.c --- devel/net/ipv6/esp6.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/esp6.c 2006-02-27 22:37:47.000000000 -0800 @@ -94,6 +94,7 @@ static int esp6_output(struct xfrm_state esph->spi = x->id.spi; esph->seq_no = htonl(++x->replay.oseq); + xfrm_aevent_doreplay(x); if (esp->conf.ivlen) crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); diff -puN net/ipv6/ip6_fib.c~git-net net/ipv6/ip6_fib.c --- devel/net/ipv6/ip6_fib.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/ip6_fib.c 2006-02-27 22:37:47.000000000 -0800 @@ -1105,7 +1105,6 @@ static int fib6_age(struct rt6_info *rt, if (rt->rt6i_flags&RTF_EXPIRES && rt->rt6i_expires) { if (time_after(now, rt->rt6i_expires)) { RT6_TRACE("expiring %p\n", rt); - rt6_reset_dflt_pointer(rt); return -1; } gc_args.more++; diff -puN net/ipv6/ip6_output.c~git-net net/ipv6/ip6_output.c --- devel/net/ipv6/ip6_output.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/ip6_output.c 2006-02-27 22:37:47.000000000 -0800 @@ -733,28 +733,29 @@ int ip6_dst_lookup(struct sock *sk, stru if (*dst) { struct rt6_info *rt = (struct rt6_info*)*dst; - /* Yes, checking route validity in not connected - case is not very simple. Take into account, - that we do not support routing by source, TOS, - and MSG_DONTROUTE --ANK (980726) - - 1. If route was host route, check that - cached destination is current. - If it is network route, we still may - check its validity using saved pointer - to the last used address: daddr_cache. - We do not want to save whole address now, - (because main consumer of this service - is tcp, which has not this problem), - so that the last trick works only on connected - sockets. - 2. oif also should be the same. - */ - + /* Yes, checking route validity in not connected + * case is not very simple. Take into account, + * that we do not support routing by source, TOS, + * and MSG_DONTROUTE --ANK (980726) + * + * 1. If route was host route, check that + * cached destination is current. + * If it is network route, we still may + * check its validity using saved pointer + * to the last used address: daddr_cache. + * We do not want to save whole address now, + * (because main consumer of this service + * is tcp, which has not this problem), + * so that the last trick works only on connected + * sockets. + * 2. oif also should be the same. + */ if (((rt->rt6i_dst.plen != 128 || - !ipv6_addr_equal(&fl->fl6_dst, &rt->rt6i_dst.addr)) + !ipv6_addr_equal(&fl->fl6_dst, + &rt->rt6i_dst.addr)) && (np->daddr_cache == NULL || - !ipv6_addr_equal(&fl->fl6_dst, np->daddr_cache))) + !ipv6_addr_equal(&fl->fl6_dst, + np->daddr_cache))) || (fl->oif && fl->oif != (*dst)->dev->ifindex)) { dst_release(*dst); *dst = NULL; diff -puN net/ipv6/ipcomp6.c~git-net net/ipv6/ipcomp6.c --- devel/net/ipv6/ipcomp6.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/ipcomp6.c 2006-02-27 22:37:47.000000000 -0800 @@ -286,8 +286,8 @@ static void ipcomp6_free_scratches(void) for_each_cpu(i) { void *scratch = *per_cpu_ptr(scratches, i); - if (scratch) - vfree(scratch); + + vfree(scratch); } free_percpu(scratches); diff -puN net/ipv6/Kconfig~git-net net/ipv6/Kconfig --- devel/net/ipv6/Kconfig~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/Kconfig 2006-02-27 22:37:47.000000000 -0800 @@ -6,8 +6,6 @@ config IPV6 tristate "The IPv6 protocol" default m - select CRYPTO if IPV6_PRIVACY - select CRYPTO_MD5 if IPV6_PRIVACY ---help--- This is complemental support for the IP version 6. You will still be able to do traditional IPv4 networking as well. @@ -22,7 +20,7 @@ config IPV6 module will be called ipv6. config IPV6_PRIVACY - bool "IPv6: Privacy Extensions (RFC 3041) support" + bool "IPv6: Privacy Extensions support" depends on IPV6 ---help--- Privacy Extensions for Stateless Address Autoconfiguration in IPv6 @@ -30,6 +28,9 @@ config IPV6_PRIVACY pseudo-random global-scope unicast address(es) will assigned to your interface(s). + We use our standard pseudo random algorithm to generate randomized + interface identifier, instead of one described in RFC 3041. + By default, kernel do not generate temporary addresses. To use temporary addresses, do @@ -37,6 +38,25 @@ config IPV6_PRIVACY See for details. +config IPV6_ROUTER_PREF + bool "IPv6: Router Preference (RFC 4191) support" + depends on IPV6 + ---help--- + Router Preference is an optional extension to the Router + Advertisement message to improve the ability of hosts + to pick more appropriate router, especially when the hosts + is placed in a multi-homed network. + + If unsure, say N. + +config IPV6_ROUTE_INFO + bool "IPv6: Route Information (RFC 4191) support (EXPERIMENTAL)" + depends on IPV6_ROUTER_PREF && EXPERIMENTAL + ---help--- + This is experimental support of Route Information. + + If unsure, say N. + config INET6_AH tristate "IPv6: AH transformation" depends on IPV6 diff -puN net/ipv6/ndisc.c~git-net net/ipv6/ndisc.c --- devel/net/ipv6/ndisc.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/ndisc.c 2006-02-27 22:37:47.000000000 -0800 @@ -156,7 +156,11 @@ struct neigh_table nd_tbl = { /* ND options */ struct ndisc_options { - struct nd_opt_hdr *nd_opt_array[__ND_OPT_MAX]; + struct nd_opt_hdr *nd_opt_array[__ND_OPT_ARRAY_MAX]; +#ifdef CONFIG_IPV6_ROUTE_INFO + struct nd_opt_hdr *nd_opts_ri; + struct nd_opt_hdr *nd_opts_ri_end; +#endif }; #define nd_opts_src_lladdr nd_opt_array[ND_OPT_SOURCE_LL_ADDR] @@ -255,6 +259,13 @@ static struct ndisc_options *ndisc_parse if (ndopts->nd_opt_array[nd_opt->nd_opt_type] == 0) ndopts->nd_opt_array[nd_opt->nd_opt_type] = nd_opt; break; +#ifdef CONFIG_IPV6_ROUTE_INFO + case ND_OPT_ROUTE_INFO: + ndopts->nd_opts_ri_end = nd_opt; + if (!ndopts->nd_opts_ri) + ndopts->nd_opts_ri = nd_opt; + break; +#endif default: /* * Unknown options must be silently ignored, @@ -1019,10 +1030,11 @@ static void ndisc_router_discovery(struc struct ra_msg *ra_msg = (struct ra_msg *) skb->h.raw; struct neighbour *neigh = NULL; struct inet6_dev *in6_dev; - struct rt6_info *rt; + struct rt6_info *rt = NULL; int lifetime; struct ndisc_options ndopts; int optlen; + unsigned int pref = 0; __u8 * opt = (__u8 *)(ra_msg + 1); @@ -1081,8 +1093,19 @@ static void ndisc_router_discovery(struc (ra_msg->icmph.icmp6_addrconf_other ? IF_RA_OTHERCONF : 0); + if (!in6_dev->cnf.accept_ra_defrtr) + goto skip_defrtr; + lifetime = ntohs(ra_msg->icmph.icmp6_rt_lifetime); +#ifdef CONFIG_IPV6_ROUTER_PREF + pref = ra_msg->icmph.icmp6_router_pref; + /* 10b is handled as if it were 00b (medium) */ + if (pref == ICMPV6_ROUTER_PREF_INVALID || + in6_dev->cnf.accept_ra_rtr_pref) + pref = ICMPV6_ROUTER_PREF_MEDIUM; +#endif + rt = rt6_get_dflt_router(&skb->nh.ipv6h->saddr, skb->dev); if (rt) @@ -1098,7 +1121,7 @@ static void ndisc_router_discovery(struc ND_PRINTK3(KERN_DEBUG "ICMPv6 RA: adding default router.\n"); - rt = rt6_add_dflt_router(&skb->nh.ipv6h->saddr, skb->dev); + rt = rt6_add_dflt_router(&skb->nh.ipv6h->saddr, skb->dev, pref); if (rt == NULL) { ND_PRINTK0(KERN_ERR "ICMPv6 RA: %s() failed to add default route.\n", @@ -1117,6 +1140,8 @@ static void ndisc_router_discovery(struc return; } neigh->flags |= NTF_ROUTER; + } else if (rt) { + rt->rt6i_flags |= (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref); } if (rt) @@ -1128,6 +1153,8 @@ static void ndisc_router_discovery(struc rt->u.dst.metrics[RTAX_HOPLIMIT-1] = ra_msg->icmph.icmp6_hop_limit; } +skip_defrtr: + /* * Update Reachable Time and Retrans Timer */ @@ -1186,7 +1213,21 @@ static void ndisc_router_discovery(struc NEIGH_UPDATE_F_ISROUTER); } - if (ndopts.nd_opts_pi) { +#ifdef CONFIG_IPV6_ROUTE_INFO + if (in6_dev->cnf.accept_ra_rtr_pref && ndopts.nd_opts_ri) { + struct nd_opt_hdr *p; + for (p = ndopts.nd_opts_ri; + p; + p = ndisc_next_option(p, ndopts.nd_opts_ri_end)) { + if (((struct route_info *)p)->prefix_len > in6_dev->cnf.accept_ra_rt_info_max_plen) + continue; + rt6_route_rcv(skb->dev, (u8*)p, (p->nd_opt_len) << 3, + &skb->nh.ipv6h->saddr); + } + } +#endif + + if (in6_dev->cnf.accept_ra_pinfo && ndopts.nd_opts_pi) { struct nd_opt_hdr *p; for (p = ndopts.nd_opts_pi; p; diff -puN net/ipv6/netfilter/ip6_tables.c~git-net net/ipv6/netfilter/ip6_tables.c --- devel/net/ipv6/netfilter/ip6_tables.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6_tables.c 2006-02-27 22:37:47.000000000 -0800 @@ -94,19 +94,6 @@ do { \ #define up(x) do { printk("UP:%u:" #x "\n", __LINE__); up(x); } while(0) #endif -int -ip6_masked_addrcmp(const struct in6_addr *addr1, const struct in6_addr *mask, - const struct in6_addr *addr2) -{ - int i; - for( i = 0; i < 16; i++){ - if((addr1->s6_addr[i] & mask->s6_addr[i]) != - (addr2->s6_addr[i] & mask->s6_addr[i])) - return 1; - } - return 0; -} - /* Check for an extension */ int ip6t_ext_hdr(u8 nexthdr) @@ -135,10 +122,10 @@ ip6_packet_match(const struct sk_buff *s #define FWINV(bool,invflg) ((bool) ^ !!(ip6info->invflags & invflg)) - if (FWINV(ip6_masked_addrcmp(&ipv6->saddr, &ip6info->smsk, - &ip6info->src), IP6T_INV_SRCIP) - || FWINV(ip6_masked_addrcmp(&ipv6->daddr, &ip6info->dmsk, - &ip6info->dst), IP6T_INV_DSTIP)) { + if (FWINV(ipv6_masked_addr_cmp(&ipv6->saddr, &ip6info->smsk, + &ip6info->src), IP6T_INV_SRCIP) + || FWINV(ipv6_masked_addr_cmp(&ipv6->daddr, &ip6info->dmsk, + &ip6info->dst), IP6T_INV_DSTIP)) { dprintf("Source or dest mismatch.\n"); /* dprintf("SRC: %u. Mask: %u. Target: %u.%s\n", ip->saddr, @@ -232,6 +219,7 @@ ip6t_error(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -251,7 +239,7 @@ int do_match(struct ip6t_entry_match *m, int *hotdrop) { /* Stop iteration if it doesn't match */ - if (!m->u.kernel.match->match(skb, in, out, m->data, + if (!m->u.kernel.match->match(skb, in, out, m->u.kernel.match, m->data, offset, protoff, hotdrop)) return 1; else @@ -373,6 +361,7 @@ ip6t_do_table(struct sk_buff **pskb, verdict = t->u.kernel.target->target(pskb, in, out, hook, + t->u.kernel.target, t->data, userdata); @@ -531,7 +520,7 @@ cleanup_match(struct ip6t_entry_match *m return 1; if (m->u.kernel.match->destroy) - m->u.kernel.match->destroy(m->data, + m->u.kernel.match->destroy(m->u.kernel.match, m->data, m->u.match_size - sizeof(*m)); module_put(m->u.kernel.match->me); return 0; @@ -544,21 +533,12 @@ standard_check(const struct ip6t_entry_t struct ip6t_standard_target *targ = (void *)t; /* Check standard info. */ - if (t->u.target_size - != IP6T_ALIGN(sizeof(struct ip6t_standard_target))) { - duprintf("standard_check: target size %u != %u\n", - t->u.target_size, - IP6T_ALIGN(sizeof(struct ip6t_standard_target))); - return 0; - } - if (targ->verdict >= 0 && targ->verdict > max_offset - sizeof(struct ip6t_entry)) { duprintf("ip6t_standard_check: bad verdict (%i)\n", targ->verdict); return 0; } - if (targ->verdict < -NF_MAX_VERDICT - 1) { duprintf("ip6t_standard_check: bad negative verdict (%i)\n", targ->verdict); @@ -575,6 +555,7 @@ check_match(struct ip6t_entry_match *m, unsigned int *i) { struct ip6t_match *match; + int ret; match = try_then_request_module(xt_find_match(AF_INET6, m->u.user.name, m->u.user.revision), @@ -585,18 +566,27 @@ check_match(struct ip6t_entry_match *m, } m->u.kernel.match = match; + ret = xt_check_match(match, AF_INET6, m->u.match_size - sizeof(*m), + name, hookmask, ipv6->proto, + ipv6->invflags & IP6T_INV_PROTO); + if (ret) + goto err; + if (m->u.kernel.match->checkentry - && !m->u.kernel.match->checkentry(name, ipv6, m->data, + && !m->u.kernel.match->checkentry(name, ipv6, match, m->data, m->u.match_size - sizeof(*m), hookmask)) { - module_put(m->u.kernel.match->me); duprintf("ip_tables: check failed for `%s'.\n", m->u.kernel.match->name); - return -EINVAL; + ret = -EINVAL; + goto err; } (*i)++; return 0; +err: + module_put(m->u.kernel.match->me); + return ret; } static struct ip6t_target ip6t_standard_target; @@ -632,26 +622,32 @@ check_entry(struct ip6t_entry *e, const } t->u.kernel.target = target; + ret = xt_check_target(target, AF_INET6, t->u.target_size - sizeof(*t), + name, e->comefrom, e->ipv6.proto, + e->ipv6.invflags & IP6T_INV_PROTO); + if (ret) + goto err; + if (t->u.kernel.target == &ip6t_standard_target) { if (!standard_check(t, size)) { ret = -EINVAL; goto cleanup_matches; } } else if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(name, e, t->data, + && !t->u.kernel.target->checkentry(name, e, target, t->data, t->u.target_size - sizeof(*t), e->comefrom)) { - module_put(t->u.kernel.target->me); duprintf("ip_tables: check failed for `%s'.\n", t->u.kernel.target->name); ret = -EINVAL; - goto cleanup_matches; + goto err; } (*i)++; return 0; - + err: + module_put(t->u.kernel.target->me); cleanup_matches: IP6T_MATCH_ITERATE(e, cleanup_match, &j); return ret; @@ -712,7 +708,7 @@ cleanup_entry(struct ip6t_entry *e, unsi IP6T_MATCH_ITERATE(e, cleanup_match, NULL); t = ip6t_get_target(e); if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); return 0; @@ -1333,6 +1329,7 @@ static int icmp6_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -1365,28 +1362,27 @@ icmp6_match(const struct sk_buff *skb, static int icmp6_checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ip6t_ip6 *ipv6 = entry; const struct ip6t_icmp *icmpinfo = matchinfo; - /* Must specify proto == ICMP, and no unknown invflags */ - return ipv6->proto == IPPROTO_ICMPV6 - && !(ipv6->invflags & IP6T_INV_PROTO) - && matchsize == IP6T_ALIGN(sizeof(struct ip6t_icmp)) - && !(icmpinfo->invflags & ~IP6T_ICMP_INV); + /* Must specify no unknown invflags */ + return !(icmpinfo->invflags & ~IP6T_ICMP_INV); } /* The built-in targets: standard (NULL) and error. */ static struct ip6t_target ip6t_standard_target = { .name = IP6T_STANDARD_TARGET, + .targetsize = sizeof(int), }; static struct ip6t_target ip6t_error_target = { .name = IP6T_ERROR_TARGET, .target = ip6t_error, + .targetsize = IP6T_FUNCTION_MAXNAMELEN, }; static struct nf_sockopt_ops ip6t_sockopts = { @@ -1402,7 +1398,9 @@ static struct nf_sockopt_ops ip6t_sockop static struct ip6t_match icmp6_matchstruct = { .name = "icmp6", .match = &icmp6_match, - .checkentry = &icmp6_checkentry, + .matchsize = sizeof(struct ip6t_icmp), + .checkentry = icmp6_checkentry, + .proto = IPPROTO_ICMPV6, }; static int __init init(void) @@ -1515,7 +1513,6 @@ EXPORT_SYMBOL(ip6t_unregister_table); EXPORT_SYMBOL(ip6t_do_table); EXPORT_SYMBOL(ip6t_ext_hdr); EXPORT_SYMBOL(ipv6_find_hdr); -EXPORT_SYMBOL(ip6_masked_addrcmp); module_init(init); module_exit(fini); diff -puN net/ipv6/netfilter/ip6t_ah.c~git-net net/ipv6/netfilter/ip6t_ah.c --- devel/net/ipv6/netfilter/ip6t_ah.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_ah.c 2006-02-27 22:37:47.000000000 -0800 @@ -44,6 +44,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -99,17 +100,13 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_ah *ahinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_ah))) { - DEBUGP("ip6t_ah: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_ah))); - return 0; - } if (ahinfo->invflags & ~IP6T_AH_INV_MASK) { DEBUGP("ip6t_ah: unknown flags %X\n", ahinfo->invflags); return 0; @@ -119,8 +116,9 @@ checkentry(const char *tablename, static struct ip6t_match ah_match = { .name = "ah", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_ah), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_dst.c~git-net net/ipv6/netfilter/ip6t_dst.c --- devel/net/ipv6/netfilter/ip6t_dst.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_dst.c 2006-02-27 22:37:47.000000000 -0800 @@ -55,6 +55,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -179,22 +180,17 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_opts *optsinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_opts))) { - DEBUGP("ip6t_opts: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_opts))); - return 0; - } if (optsinfo->invflags & ~IP6T_OPTS_INV_MASK) { DEBUGP("ip6t_opts: unknown flags %X\n", optsinfo->invflags); return 0; } - return 1; } @@ -204,8 +200,9 @@ static struct ip6t_match opts_match = { #else .name = "dst", #endif - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_opts), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_esp.c~git-net net/ipv6/netfilter/ip6t_esp.c --- devel/net/ipv6/netfilter/ip6t_esp.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_esp.c 2006-02-27 22:37:47.000000000 -0800 @@ -44,6 +44,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -77,17 +78,13 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_esp *espinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_esp))) { - DEBUGP("ip6t_esp: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_esp))); - return 0; - } if (espinfo->invflags & ~IP6T_ESP_INV_MASK) { DEBUGP("ip6t_esp: unknown flags %X\n", espinfo->invflags); @@ -98,8 +95,9 @@ checkentry(const char *tablename, static struct ip6t_match esp_match = { .name = "esp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_esp), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_eui64.c~git-net net/ipv6/netfilter/ip6t_eui64.c --- devel/net/ipv6/netfilter/ip6t_eui64.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_eui64.c 2006-02-27 22:37:47.000000000 -0800 @@ -22,6 +22,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -60,30 +61,12 @@ match(const struct sk_buff *skb, return 0; } -static int -ip6t_eui64_checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (hook_mask - & ~((1 << NF_IP6_PRE_ROUTING) | (1 << NF_IP6_LOCAL_IN) | - (1 << NF_IP6_FORWARD))) { - printk("ip6t_eui64: only valid for PRE_ROUTING, LOCAL_IN or FORWARD.\n"); - return 0; - } - - if (matchsize != IP6T_ALIGN(sizeof(int))) - return 0; - - return 1; -} - static struct ip6t_match eui64_match = { .name = "eui64", - .match = &match, - .checkentry = &ip6t_eui64_checkentry, + .match = match, + .matchsize = sizeof(int), + .hooks = (1 << NF_IP6_PRE_ROUTING) | (1 << NF_IP6_LOCAL_IN) | + (1 << NF_IP6_FORWARD), .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_frag.c~git-net net/ipv6/netfilter/ip6t_frag.c --- devel/net/ipv6/netfilter/ip6t_frag.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_frag.c 2006-02-27 22:37:47.000000000 -0800 @@ -43,6 +43,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -116,29 +117,25 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_frag *fraginfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_frag))) { - DEBUGP("ip6t_frag: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_frag))); - return 0; - } if (fraginfo->invflags & ~IP6T_FRAG_INV_MASK) { DEBUGP("ip6t_frag: unknown flags %X\n", fraginfo->invflags); return 0; } - return 1; } static struct ip6t_match frag_match = { .name = "frag", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_frag), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_hbh.c~git-net net/ipv6/netfilter/ip6t_hbh.c --- devel/net/ipv6/netfilter/ip6t_hbh.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_hbh.c 2006-02-27 22:37:47.000000000 -0800 @@ -55,6 +55,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -179,22 +180,17 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_opts *optsinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_opts))) { - DEBUGP("ip6t_opts: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_opts))); - return 0; - } if (optsinfo->invflags & ~IP6T_OPTS_INV_MASK) { DEBUGP("ip6t_opts: unknown flags %X\n", optsinfo->invflags); return 0; } - return 1; } @@ -204,8 +200,9 @@ static struct ip6t_match opts_match = { #else .name = "dst", #endif - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_opts), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_hl.c~git-net net/ipv6/netfilter/ip6t_hl.c --- devel/net/ipv6/netfilter/ip6t_hl.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_hl.c 2006-02-27 22:37:47.000000000 -0800 @@ -18,10 +18,10 @@ MODULE_AUTHOR("Maciej Soltysiak nh.ipv6h; @@ -48,20 +48,10 @@ static int match(const struct sk_buff *s return 0; } -static int checkentry(const char *tablename, const void *entry, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_hl_info))) - return 0; - - return 1; -} - static struct ip6t_match hl_match = { .name = "hl", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_hl_info), .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_HL.c~git-net net/ipv6/netfilter/ip6t_HL.c --- devel/net/ipv6/netfilter/ip6t_HL.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_HL.c 2006-02-27 22:37:47.000000000 -0800 @@ -21,6 +21,7 @@ static unsigned int ip6t_hl_target(struc const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { struct ipv6hdr *ip6h; @@ -63,43 +64,31 @@ static unsigned int ip6t_hl_target(struc static int ip6t_hl_checkentry(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct ip6t_HL_info *info = targinfo; - if (targinfosize != IP6T_ALIGN(sizeof(struct ip6t_HL_info))) { - printk(KERN_WARNING "ip6t_HL: targinfosize %u != %Zu\n", - targinfosize, - IP6T_ALIGN(sizeof(struct ip6t_HL_info))); - return 0; - } - - if (strcmp(tablename, "mangle")) { - printk(KERN_WARNING "ip6t_HL: can only be called from " - "\"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (info->mode > IP6T_HL_MAXMODE) { printk(KERN_WARNING "ip6t_HL: invalid or unknown Mode %u\n", info->mode); return 0; } - if ((info->mode != IP6T_HL_SET) && (info->hop_limit == 0)) { printk(KERN_WARNING "ip6t_HL: increment/decrement doesn't " "make sense with value 0\n"); return 0; } - return 1; } static struct ip6t_target ip6t_HL = { .name = "HL", .target = ip6t_hl_target, + .targetsize = sizeof(struct ip6t_HL_info), + .table = "mangle", .checkentry = ip6t_hl_checkentry, .me = THIS_MODULE }; diff -puN net/ipv6/netfilter/ip6t_ipv6header.c~git-net net/ipv6/netfilter/ip6t_ipv6header.c --- devel/net/ipv6/netfilter/ip6t_ipv6header.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_ipv6header.c 2006-02-27 22:37:47.000000000 -0800 @@ -29,6 +29,7 @@ static int ipv6header_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -125,17 +126,13 @@ ipv6header_match(const struct sk_buff *s static int ipv6header_checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ip6t_ipv6header_info *info = matchinfo; - /* Check for obvious errors */ - /* This match is valid in all hooks! */ - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_ipv6header_info))) - return 0; - /* invflags is 0 or 0xff in hard mode */ if ((!info->modeflag) && info->invflags != 0x00 && info->invflags != 0xFF) @@ -147,6 +144,7 @@ ipv6header_checkentry(const char *tablen static struct ip6t_match ip6t_ipv6header_match = { .name = "ipv6header", .match = &ipv6header_match, + .matchsize = sizeof(struct ip6t_ipv6header_info), .checkentry = &ipv6header_checkentry, .destroy = NULL, .me = THIS_MODULE, diff -puN net/ipv6/netfilter/ip6t_LOG.c~git-net net/ipv6/netfilter/ip6t_LOG.c --- devel/net/ipv6/netfilter/ip6t_LOG.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_LOG.c 2006-02-27 22:37:47.000000000 -0800 @@ -426,6 +426,7 @@ ip6t_log_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -449,35 +450,29 @@ ip6t_log_target(struct sk_buff **pskb, static int ip6t_log_checkentry(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { const struct ip6t_log_info *loginfo = targinfo; - if (targinfosize != IP6T_ALIGN(sizeof(struct ip6t_log_info))) { - DEBUGP("LOG: targinfosize %u != %u\n", - targinfosize, IP6T_ALIGN(sizeof(struct ip6t_log_info))); - return 0; - } - if (loginfo->level >= 8) { DEBUGP("LOG: level %u >= 8\n", loginfo->level); return 0; } - if (loginfo->prefix[sizeof(loginfo->prefix)-1] != '\0') { DEBUGP("LOG: prefix term %i\n", loginfo->prefix[sizeof(loginfo->prefix)-1]); return 0; } - return 1; } static struct ip6t_target ip6t_log_reg = { .name = "LOG", .target = ip6t_log_target, + .targetsize = sizeof(struct ip6t_log_info), .checkentry = ip6t_log_checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_multiport.c~git-net net/ipv6/netfilter/ip6t_multiport.c --- devel/net/ipv6/netfilter/ip6t_multiport.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_multiport.c 2006-02-27 22:37:47.000000000 -0800 @@ -51,6 +51,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -85,6 +86,7 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -92,13 +94,9 @@ checkentry(const char *tablename, const struct ip6t_ip6 *ip = info; const struct ip6t_multiport *multiinfo = matchinfo; - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_multiport))) - return 0; - /* Must specify proto == TCP/UDP, no unknown flags or bad count */ return (ip->proto == IPPROTO_TCP || ip->proto == IPPROTO_UDP) && !(ip->invflags & IP6T_INV_PROTO) - && matchsize == IP6T_ALIGN(sizeof(struct ip6t_multiport)) && (multiinfo->flags == IP6T_MULTIPORT_SOURCE || multiinfo->flags == IP6T_MULTIPORT_DESTINATION || multiinfo->flags == IP6T_MULTIPORT_EITHER) @@ -107,8 +105,9 @@ checkentry(const char *tablename, static struct ip6t_match multiport_match = { .name = "multiport", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_multiport), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/ip6t_owner.c~git-net net/ipv6/netfilter/ip6t_owner.c --- devel/net/ipv6/netfilter/ip6t_owner.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_owner.c 2006-02-27 22:37:47.000000000 -0800 @@ -26,6 +26,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -54,34 +55,27 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct ip6t_owner_info *info = matchinfo; - if (hook_mask - & ~((1 << NF_IP6_LOCAL_OUT) | (1 << NF_IP6_POST_ROUTING))) { - printk("ip6t_owner: only valid for LOCAL_OUT or POST_ROUTING.\n"); - return 0; - } - - if (matchsize != IP6T_ALIGN(sizeof(struct ip6t_owner_info))) - return 0; - if (info->match & (IP6T_OWNER_PID | IP6T_OWNER_SID)) { printk("ipt_owner: pid and sid matching " "not supported anymore\n"); return 0; } - return 1; } static struct ip6t_match owner_match = { .name = "owner", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_owner_info), + .hooks = (1 << NF_IP6_LOCAL_OUT) | (1 << NF_IP6_POST_ROUTING), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -L net/ipv6/netfilter/ip6t_policy.c -puN net/ipv6/netfilter/ip6t_policy.c~git-net /dev/null --- devel/net/ipv6/netfilter/ip6t_policy.c +++ /dev/null 2003-09-15 06:40:47.000000000 -0700 @@ -1,176 +0,0 @@ -/* IP tables module for matching IPsec policy - * - * Copyright (c) 2004,2005 Patrick McHardy, - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -MODULE_AUTHOR("Patrick McHardy "); -MODULE_DESCRIPTION("IPtables IPsec policy matching module"); -MODULE_LICENSE("GPL"); - - -static inline int -match_xfrm_state(struct xfrm_state *x, const struct ip6t_policy_elem *e) -{ -#define MATCH_ADDR(x,y,z) (!e->match.x || \ - ((!ip6_masked_addrcmp(&e->x.a6, &e->y.a6, z)) \ - ^ e->invert.x)) -#define MATCH(x,y) (!e->match.x || ((e->x == (y)) ^ e->invert.x)) - - return MATCH_ADDR(saddr, smask, (struct in6_addr *)&x->props.saddr.a6) && - MATCH_ADDR(daddr, dmask, (struct in6_addr *)&x->id.daddr.a6) && - MATCH(proto, x->id.proto) && - MATCH(mode, x->props.mode) && - MATCH(spi, x->id.spi) && - MATCH(reqid, x->props.reqid); -} - -static int -match_policy_in(const struct sk_buff *skb, const struct ip6t_policy_info *info) -{ - const struct ip6t_policy_elem *e; - struct sec_path *sp = skb->sp; - int strict = info->flags & IP6T_POLICY_MATCH_STRICT; - int i, pos; - - if (sp == NULL) - return -1; - if (strict && info->len != sp->len) - return 0; - - for (i = sp->len - 1; i >= 0; i--) { - pos = strict ? i - sp->len + 1 : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(sp->x[i].xvec, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? 1 : 0; -} - -static int -match_policy_out(const struct sk_buff *skb, const struct ip6t_policy_info *info) -{ - const struct ip6t_policy_elem *e; - struct dst_entry *dst = skb->dst; - int strict = info->flags & IP6T_POLICY_MATCH_STRICT; - int i, pos; - - if (dst->xfrm == NULL) - return -1; - - for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { - pos = strict ? i : 0; - if (pos >= info->len) - return 0; - e = &info->pol[pos]; - - if (match_xfrm_state(dst->xfrm, e)) { - if (!strict) - return 1; - } else if (strict) - return 0; - } - - return strict ? i == info->len : 0; -} - -static int match(const struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - const void *matchinfo, - int offset, - unsigned int protoff, - int *hotdrop) -{ - const struct ip6t_policy_info *info = matchinfo; - int ret; - - if (info->flags & IP6T_POLICY_MATCH_IN) - ret = match_policy_in(skb, info); - else - ret = match_policy_out(skb, info); - - if (ret < 0) - ret = info->flags & IP6T_POLICY_MATCH_NONE ? 1 : 0; - else if (info->flags & IP6T_POLICY_MATCH_NONE) - ret = 0; - - return ret; -} - -static int checkentry(const char *tablename, const void *ip_void, - void *matchinfo, unsigned int matchsize, - unsigned int hook_mask) -{ - struct ip6t_policy_info *info = matchinfo; - - if (matchsize != IP6T_ALIGN(sizeof(*info))) { - printk(KERN_ERR "ip6t_policy: matchsize %u != %zu\n", - matchsize, IP6T_ALIGN(sizeof(*info))); - return 0; - } - if (!(info->flags & (IP6T_POLICY_MATCH_IN|IP6T_POLICY_MATCH_OUT))) { - printk(KERN_ERR "ip6t_policy: neither incoming nor " - "outgoing policy selected\n"); - return 0; - } - if (hook_mask & (1 << NF_IP6_PRE_ROUTING | 1 << NF_IP6_LOCAL_IN) - && info->flags & IP6T_POLICY_MATCH_OUT) { - printk(KERN_ERR "ip6t_policy: output policy not valid in " - "PRE_ROUTING and INPUT\n"); - return 0; - } - if (hook_mask & (1 << NF_IP6_POST_ROUTING | 1 << NF_IP6_LOCAL_OUT) - && info->flags & IP6T_POLICY_MATCH_IN) { - printk(KERN_ERR "ip6t_policy: input policy not valid in " - "POST_ROUTING and OUTPUT\n"); - return 0; - } - if (info->len > IP6T_POLICY_MAX_ELEM) { - printk(KERN_ERR "ip6t_policy: too many policy elements\n"); - return 0; - } - - return 1; -} - -static struct ip6t_match policy_match = { - .name = "policy", - .match = match, - .checkentry = checkentry, - .me = THIS_MODULE, -}; - -static int __init init(void) -{ - return ip6t_register_match(&policy_match); -} - -static void __exit fini(void) -{ - ip6t_unregister_match(&policy_match); -} - -module_init(init); -module_exit(fini); diff -puN net/ipv6/netfilter/ip6t_REJECT.c~git-net net/ipv6/netfilter/ip6t_REJECT.c --- devel/net/ipv6/netfilter/ip6t_REJECT.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_REJECT.c 2006-02-27 22:37:47.000000000 -0800 @@ -179,6 +179,7 @@ static unsigned int reject6_target(struc const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -221,6 +222,7 @@ static unsigned int reject6_target(struc static int check(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) @@ -228,24 +230,6 @@ static int check(const char *tablename, const struct ip6t_reject_info *rejinfo = targinfo; const struct ip6t_entry *e = entry; - if (targinfosize != IP6T_ALIGN(sizeof(struct ip6t_reject_info))) { - DEBUGP("ip6t_REJECT: targinfosize %u != 0\n", targinfosize); - return 0; - } - - /* Only allow these for packet filtering. */ - if (strcmp(tablename, "filter") != 0) { - DEBUGP("ip6t_REJECT: bad table `%s'.\n", tablename); - return 0; - } - - if ((hook_mask & ~((1 << NF_IP6_LOCAL_IN) - | (1 << NF_IP6_FORWARD) - | (1 << NF_IP6_LOCAL_OUT))) != 0) { - DEBUGP("ip6t_REJECT: bad hook mask %X\n", hook_mask); - return 0; - } - if (rejinfo->with == IP6T_ICMP6_ECHOREPLY) { printk("ip6t_REJECT: ECHOREPLY is not supported.\n"); return 0; @@ -257,13 +241,16 @@ static int check(const char *tablename, return 0; } } - return 1; } static struct ip6t_target ip6t_reject_reg = { .name = "REJECT", .target = reject6_target, + .targetsize = sizeof(struct ip6t_reject_info), + .table = "filter", + .hooks = (1 << NF_IP6_LOCAL_IN) | (1 << NF_IP6_FORWARD) | + (1 << NF_IP6_LOCAL_OUT), .checkentry = check, .me = THIS_MODULE }; diff -puN net/ipv6/netfilter/ip6t_rt.c~git-net net/ipv6/netfilter/ip6t_rt.c --- devel/net/ipv6/netfilter/ip6t_rt.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/ip6t_rt.c 2006-02-27 22:37:47.000000000 -0800 @@ -45,6 +45,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -194,17 +195,13 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ip6t_rt *rtinfo = matchinfo; - if (matchinfosize != IP6T_ALIGN(sizeof(struct ip6t_rt))) { - DEBUGP("ip6t_rt: matchsize %u != %u\n", - matchinfosize, IP6T_ALIGN(sizeof(struct ip6t_rt))); - return 0; - } if (rtinfo->invflags & ~IP6T_RT_INV_MASK) { DEBUGP("ip6t_rt: unknown flags %X\n", rtinfo->invflags); return 0; @@ -222,8 +219,9 @@ checkentry(const char *tablename, static struct ip6t_match rt_match = { .name = "rt", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct ip6t_rt), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/ipv6/netfilter/Kconfig~git-net net/ipv6/netfilter/Kconfig --- devel/net/ipv6/netfilter/Kconfig~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/Kconfig 2006-02-27 22:37:47.000000000 -0800 @@ -133,16 +133,6 @@ config IP6_NF_MATCH_EUI64 To compile it as a module, choose M here. If unsure, say N. -config IP6_NF_MATCH_POLICY - tristate "IPsec policy match support" - depends on IP6_NF_IPTABLES && XFRM - help - Policy matching allows you to match packets based on the - IPsec policy that was used during decapsulation/will - be used during encapsulation. - - To compile it as a module, choose M here. If unsure, say N. - # The targets config IP6_NF_FILTER tristate "Packet filtering" diff -puN net/ipv6/netfilter/Makefile~git-net net/ipv6/netfilter/Makefile --- devel/net/ipv6/netfilter/Makefile~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/Makefile 2006-02-27 22:37:47.000000000 -0800 @@ -9,7 +9,6 @@ obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_ obj-$(CONFIG_IP6_NF_MATCH_IPV6HEADER) += ip6t_ipv6header.o obj-$(CONFIG_IP6_NF_MATCH_FRAG) += ip6t_frag.o obj-$(CONFIG_IP6_NF_MATCH_AHESP) += ip6t_esp.o ip6t_ah.o -obj-$(CONFIG_IP6_NF_MATCH_POLICY) += ip6t_policy.o obj-$(CONFIG_IP6_NF_MATCH_EUI64) += ip6t_eui64.o obj-$(CONFIG_IP6_NF_MATCH_MULTIPORT) += ip6t_multiport.o obj-$(CONFIG_IP6_NF_MATCH_OWNER) += ip6t_owner.o diff -puN net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c~git-net net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c --- devel/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c 2006-02-27 22:37:47.000000000 -0800 @@ -179,31 +179,36 @@ static unsigned int ipv6_confirm(unsigne int (*okfn)(struct sk_buff *)) { struct nf_conn *ct; + struct nf_conn_help *help; enum ip_conntrack_info ctinfo; + unsigned int ret, protoff; + unsigned int extoff = (u8*)((*pskb)->nh.ipv6h + 1) + - (*pskb)->data; + unsigned char pnum = (*pskb)->nh.ipv6h->nexthdr; + /* This is where we call the helper: as the packet goes out. */ ct = nf_ct_get(*pskb, &ctinfo); - if (ct && ct->helper) { - unsigned int ret, protoff; - unsigned int extoff = (u8*)((*pskb)->nh.ipv6h + 1) - - (*pskb)->data; - unsigned char pnum = (*pskb)->nh.ipv6h->nexthdr; - - protoff = nf_ct_ipv6_skip_exthdr(*pskb, extoff, &pnum, - (*pskb)->len - extoff); - if (protoff < 0 || protoff > (*pskb)->len || - pnum == NEXTHDR_FRAGMENT) { - DEBUGP("proto header not found\n"); - return NF_ACCEPT; - } + if (!ct) + goto out; - ret = ct->helper->help(pskb, protoff, ct, ctinfo); - if (ret != NF_ACCEPT) - return ret; + help = nfct_help(ct); + if (!help || !help->helper) + goto out; + + protoff = nf_ct_ipv6_skip_exthdr(*pskb, extoff, &pnum, + (*pskb)->len - extoff); + if (protoff < 0 || protoff > (*pskb)->len || + pnum == NEXTHDR_FRAGMENT) { + DEBUGP("proto header not found\n"); + return NF_ACCEPT; } + ret = help->helper->help(pskb, protoff, ct, ctinfo); + if (ret != NF_ACCEPT) + return ret; +out: /* We've seen it coming out the other side: confirm it */ - return nf_conntrack_confirm(pskb); } diff -puN net/ipv6/netfilter/nf_conntrack_reasm.c~git-net net/ipv6/netfilter/nf_conntrack_reasm.c --- devel/net/ipv6/netfilter/nf_conntrack_reasm.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/netfilter/nf_conntrack_reasm.c 2006-02-27 22:37:47.000000000 -0800 @@ -313,8 +313,8 @@ static struct nf_ct_frag6_queue *nf_ct_f #ifdef CONFIG_SMP hlist_for_each_entry(fq, n, &nf_ct_frag6_hash[hash], list) { if (fq->id == fq_in->id && - !ipv6_addr_cmp(&fq_in->saddr, &fq->saddr) && - !ipv6_addr_cmp(&fq_in->daddr, &fq->daddr)) { + ipv6_addr_equal(&fq_in->saddr, &fq->saddr) && + ipv6_addr_equal(&fq_in->daddr, &fq->daddr)) { atomic_inc(&fq->refcnt); write_unlock(&nf_ct_frag6_lock); fq_in->last_in |= COMPLETE; @@ -376,8 +376,8 @@ fq_find(u32 id, struct in6_addr *src, st read_lock(&nf_ct_frag6_lock); hlist_for_each_entry(fq, n, &nf_ct_frag6_hash[hash], list) { if (fq->id == id && - !ipv6_addr_cmp(src, &fq->saddr) && - !ipv6_addr_cmp(dst, &fq->daddr)) { + ipv6_addr_equal(src, &fq->saddr) && + ipv6_addr_equal(dst, &fq->daddr)) { atomic_inc(&fq->refcnt); read_unlock(&nf_ct_frag6_lock); return fq; diff -puN net/ipv6/route.c~git-net net/ipv6/route.c --- devel/net/ipv6/route.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/route.c 2006-02-27 22:37:47.000000000 -0800 @@ -72,6 +72,10 @@ #define RT6_TRACE(x...) do { ; } while (0) #endif +#define CLONE_OFFLINK_ROUTE 0 + +#define RT6_SELECT_F_IFACE 0x1 +#define RT6_SELECT_F_REACHABLE 0x2 static int ip6_rt_max_size = 4096; static int ip6_rt_gc_min_interval = HZ / 2; @@ -94,6 +98,14 @@ static int ip6_pkt_discard_out(struct s static void ip6_link_failure(struct sk_buff *skb); static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu); +#ifdef CONFIG_IPV6_ROUTE_INFO +static struct rt6_info *rt6_add_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex, + unsigned pref); +static struct rt6_info *rt6_get_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex); +#endif + static struct dst_ops ip6_dst_ops = { .family = AF_INET6, .protocol = __constant_htons(ETH_P_IPV6), @@ -214,150 +226,211 @@ static __inline__ struct rt6_info *rt6_d return rt; } +#ifdef CONFIG_IPV6_ROUTER_PREF +static void rt6_probe(struct rt6_info *rt) +{ + struct neighbour *neigh = rt ? rt->rt6i_nexthop : NULL; + /* + * Okay, this does not seem to be appropriate + * for now, however, we need to check if it + * is really so; aka Router Reachability Probing. + * + * Router Reachability Probe MUST be rate-limited + * to no more than one per minute. + */ + if (!neigh || (neigh->nud_state & NUD_VALID)) + return; + read_lock_bh(&neigh->lock); + if (!(neigh->nud_state & NUD_VALID) && + time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) { + struct in6_addr mcaddr; + struct in6_addr *target; + + neigh->updated = jiffies; + read_unlock_bh(&neigh->lock); + + target = (struct in6_addr *)&neigh->primary_key; + addrconf_addr_solict_mult(target, &mcaddr); + ndisc_send_ns(rt->rt6i_dev, NULL, target, &mcaddr, NULL); + } else + read_unlock_bh(&neigh->lock); +} +#else +static inline void rt6_probe(struct rt6_info *rt) +{ + return; +} +#endif + /* - * pointer to the last default router chosen. BH is disabled locally. + * Default Router Selection (RFC 2461 6.3.6) */ -static struct rt6_info *rt6_dflt_pointer; -static DEFINE_SPINLOCK(rt6_dflt_lock); +static int inline rt6_check_dev(struct rt6_info *rt, int oif) +{ + struct net_device *dev = rt->rt6i_dev; + if (!oif || dev->ifindex == oif) + return 2; + if ((dev->flags & IFF_LOOPBACK) && + rt->rt6i_idev && rt->rt6i_idev->dev->ifindex == oif) + return 1; + return 0; +} -void rt6_reset_dflt_pointer(struct rt6_info *rt) +static int inline rt6_check_neigh(struct rt6_info *rt) { - spin_lock_bh(&rt6_dflt_lock); - if (rt == NULL || rt == rt6_dflt_pointer) { - RT6_TRACE("reset default router: %p->NULL\n", rt6_dflt_pointer); - rt6_dflt_pointer = NULL; + struct neighbour *neigh = rt->rt6i_nexthop; + int m = 0; + if (neigh) { + read_lock_bh(&neigh->lock); + if (neigh->nud_state & NUD_VALID) + m = 1; + read_unlock_bh(&neigh->lock); } - spin_unlock_bh(&rt6_dflt_lock); + return m; } -/* Default Router Selection (RFC 2461 6.3.6) */ -static struct rt6_info *rt6_best_dflt(struct rt6_info *rt, int oif) +static int rt6_score_route(struct rt6_info *rt, int oif, + int strict) { - struct rt6_info *match = NULL; - struct rt6_info *sprt; - int mpri = 0; - - for (sprt = rt; sprt; sprt = sprt->u.next) { - struct neighbour *neigh; - int m = 0; - - if (!oif || - (sprt->rt6i_dev && - sprt->rt6i_dev->ifindex == oif)) - m += 8; + int m = rt6_check_dev(rt, oif); + if (!m && (strict & RT6_SELECT_F_IFACE)) + return -1; +#ifdef CONFIG_IPV6_ROUTER_PREF + m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->rt6i_flags)) << 2; +#endif + if (rt6_check_neigh(rt)) + m |= 16; + else if (strict & RT6_SELECT_F_REACHABLE) + return -1; + return m; +} - if (rt6_check_expired(sprt)) - continue; +static struct rt6_info *rt6_select(struct rt6_info **head, int oif, + int strict) +{ + struct rt6_info *match = NULL, *last = NULL; + struct rt6_info *rt, *rt0 = *head; + u32 metric; + int mpri = -1; - if (sprt == rt6_dflt_pointer) - m += 4; + RT6_TRACE("%s(head=%p(*head=%p), oif=%d)\n", + __FUNCTION__, head, head ? *head : NULL, oif); - if ((neigh = sprt->rt6i_nexthop) != NULL) { - read_lock_bh(&neigh->lock); - switch (neigh->nud_state) { - case NUD_REACHABLE: - m += 3; - break; + for (rt = rt0, metric = rt0->rt6i_metric; + rt && rt->rt6i_metric == metric; + rt = rt->u.next) { + int m; - case NUD_STALE: - case NUD_DELAY: - case NUD_PROBE: - m += 2; - break; + if (rt6_check_expired(rt)) + continue; - case NUD_NOARP: - case NUD_PERMANENT: - m += 1; - break; + last = rt; - case NUD_INCOMPLETE: - default: - read_unlock_bh(&neigh->lock); - continue; - } - read_unlock_bh(&neigh->lock); - } else { + m = rt6_score_route(rt, oif, strict); + if (m < 0) continue; - } - if (m > mpri || m >= 12) { - match = sprt; + if (m > mpri) { + rt6_probe(match); + match = rt; mpri = m; - if (m >= 12) { - /* we choose the last default router if it - * is in (probably) reachable state. - * If route changed, we should do pmtu - * discovery. --yoshfuji - */ - break; - } + } else { + rt6_probe(rt); } } - spin_lock(&rt6_dflt_lock); - if (!match) { - /* - * No default routers are known to be reachable. - * SHOULD round robin - */ - if (rt6_dflt_pointer) { - for (sprt = rt6_dflt_pointer->u.next; - sprt; sprt = sprt->u.next) { - if (sprt->u.dst.obsolete <= 0 && - sprt->u.dst.error == 0 && - !rt6_check_expired(sprt)) { - match = sprt; - break; - } - } - for (sprt = rt; - !match && sprt; - sprt = sprt->u.next) { - if (sprt->u.dst.obsolete <= 0 && - sprt->u.dst.error == 0 && - !rt6_check_expired(sprt)) { - match = sprt; - break; - } - if (sprt == rt6_dflt_pointer) - break; - } - } + if (!match && + (strict & RT6_SELECT_F_REACHABLE) && + last && last != rt0) { + /* no entries matched; do round-robin */ + *head = rt0->u.next; + rt0->u.next = last->u.next; + last->u.next = rt0; } - if (match) { - if (rt6_dflt_pointer != match) - RT6_TRACE("changed default router: %p->%p\n", - rt6_dflt_pointer, match); - rt6_dflt_pointer = match; + RT6_TRACE("%s() => %p, score=%d\n", + __FUNCTION__, match, mpri); + + return (match ? match : &ip6_null_entry); +} + +#ifdef CONFIG_IPV6_ROUTE_INFO +int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, + struct in6_addr *gwaddr) +{ + struct route_info *rinfo = (struct route_info *) opt; + struct in6_addr prefix_buf, *prefix; + unsigned int pref; + u32 lifetime; + struct rt6_info *rt; + + if (len < sizeof(struct route_info)) { + return -EINVAL; } - spin_unlock(&rt6_dflt_lock); - if (!match) { - /* - * Last Resort: if no default routers found, - * use addrconf default route. - * We don't record this route. - */ - for (sprt = ip6_routing_table.leaf; - sprt; sprt = sprt->u.next) { - if (!rt6_check_expired(sprt) && - (sprt->rt6i_flags & RTF_DEFAULT) && - (!oif || - (sprt->rt6i_dev && - sprt->rt6i_dev->ifindex == oif))) { - match = sprt; - break; - } + /* Sanity check for prefix_len and length */ + if (rinfo->length > 3) { + return -EINVAL; + } else if (rinfo->prefix_len > 128) { + return -EINVAL; + } else if (rinfo->prefix_len > 64) { + if (rinfo->length < 2) { + return -EINVAL; } - if (!match) { - /* no default route. give up. */ - match = &ip6_null_entry; + } else if (rinfo->prefix_len > 0) { + if (rinfo->length < 1) { + return -EINVAL; } } - return match; + pref = rinfo->route_pref; + if (pref == ICMPV6_ROUTER_PREF_INVALID) + pref = ICMPV6_ROUTER_PREF_MEDIUM; + + lifetime = htonl(rinfo->lifetime); + if (lifetime == 0xffffffff) { + /* infinity */ + } else if (lifetime > 0x7fffffff/HZ) { + /* Avoid arithmetic overflow */ + lifetime = 0x7fffffff/HZ - 1; + } + + if (rinfo->length == 3) + prefix = (struct in6_addr *)rinfo->prefix; + else { + /* this function is safe */ + ipv6_addr_prefix(&prefix_buf, + (struct in6_addr *)rinfo->prefix, + rinfo->prefix_len); + prefix = &prefix_buf; + } + + rt = rt6_get_route_info(prefix, rinfo->prefix_len, gwaddr, dev->ifindex); + + if (rt && !lifetime) { + ip6_del_rt(rt, NULL, NULL, NULL); + rt = NULL; + } + + if (!rt && lifetime) + rt = rt6_add_route_info(prefix, rinfo->prefix_len, gwaddr, dev->ifindex, + pref); + else if (rt) + rt->rt6i_flags = RTF_ROUTEINFO | + (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref); + + if (rt) { + if (lifetime == 0xffffffff) { + rt->rt6i_flags &= ~RTF_EXPIRES; + } else { + rt->rt6i_expires = jiffies + HZ * lifetime; + rt->rt6i_flags |= RTF_EXPIRES; + } + dst_release(&rt->u.dst); + } + return 0; } +#endif struct rt6_info *rt6_lookup(struct in6_addr *daddr, struct in6_addr *saddr, int oif, int strict) @@ -397,14 +470,9 @@ int ip6_ins_rt(struct rt6_info *rt, stru return err; } -/* No rt6_lock! If COW failed, the function returns dead route entry - with dst->error set to errno value. - */ - -static struct rt6_info *rt6_cow(struct rt6_info *ort, struct in6_addr *daddr, - struct in6_addr *saddr, struct netlink_skb_parms *req) +static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, struct in6_addr *daddr, + struct in6_addr *saddr) { - int err; struct rt6_info *rt; /* @@ -435,25 +503,30 @@ static struct rt6_info *rt6_cow(struct r rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway); - dst_hold(&rt->u.dst); - - err = ip6_ins_rt(rt, NULL, NULL, req); - if (err == 0) - return rt; + } - rt->u.dst.error = err; + return rt; +} - return rt; +static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort, struct in6_addr *daddr) +{ + struct rt6_info *rt = ip6_rt_copy(ort); + if (rt) { + ipv6_addr_copy(&rt->rt6i_dst.addr, daddr); + rt->rt6i_dst.plen = 128; + rt->rt6i_flags |= RTF_CACHE; + if (rt->rt6i_flags & RTF_REJECT) + rt->u.dst.error = ort->u.dst.error; + rt->u.dst.flags |= DST_HOST; + rt->rt6i_nexthop = neigh_clone(ort->rt6i_nexthop); } - dst_hold(&ip6_null_entry.u.dst); - return &ip6_null_entry; + return rt; } #define BACKTRACK() \ -if (rt == &ip6_null_entry && strict) { \ +if (rt == &ip6_null_entry) { \ while ((fn = fn->parent) != NULL) { \ if (fn->fn_flags & RTN_ROOT) { \ - dst_hold(&rt->u.dst); \ goto out; \ } \ if (fn->fn_flags & RTN_RTINFO) \ @@ -465,115 +538,138 @@ if (rt == &ip6_null_entry && strict) { \ void ip6_route_input(struct sk_buff *skb) { struct fib6_node *fn; - struct rt6_info *rt; + struct rt6_info *rt, *nrt; int strict; int attempts = 3; + int err; + int reachable = RT6_SELECT_F_REACHABLE; - strict = ipv6_addr_type(&skb->nh.ipv6h->daddr) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL); + strict = ipv6_addr_type(&skb->nh.ipv6h->daddr) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL) ? RT6_SELECT_F_IFACE : 0; relookup: read_lock_bh(&rt6_lock); +restart_2: fn = fib6_lookup(&ip6_routing_table, &skb->nh.ipv6h->daddr, &skb->nh.ipv6h->saddr); restart: - rt = fn->leaf; - - if ((rt->rt6i_flags & RTF_CACHE)) { - rt = rt6_device_match(rt, skb->dev->ifindex, strict); - BACKTRACK(); - dst_hold(&rt->u.dst); - goto out; - } - - rt = rt6_device_match(rt, skb->dev->ifindex, strict); + rt = rt6_select(&fn->leaf, skb->dev->ifindex, strict | reachable); BACKTRACK(); + if (rt == &ip6_null_entry || + rt->rt6i_flags & RTF_CACHE) + goto out; - if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) { - struct rt6_info *nrt; - dst_hold(&rt->u.dst); - read_unlock_bh(&rt6_lock); + dst_hold(&rt->u.dst); + read_unlock_bh(&rt6_lock); - nrt = rt6_cow(rt, &skb->nh.ipv6h->daddr, - &skb->nh.ipv6h->saddr, - &NETLINK_CB(skb)); + if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) + nrt = rt6_alloc_cow(rt, &skb->nh.ipv6h->daddr, &skb->nh.ipv6h->saddr); + else { +#if CLONE_OFFLINK_ROUTE + nrt = rt6_alloc_clone(rt, &skb->nh.ipv6h->daddr); +#else + goto out2; +#endif + } - dst_release(&rt->u.dst); - rt = nrt; + dst_release(&rt->u.dst); + rt = nrt ? : &ip6_null_entry; - if (rt->u.dst.error != -EEXIST || --attempts <= 0) + dst_hold(&rt->u.dst); + if (nrt) { + err = ip6_ins_rt(nrt, NULL, NULL, &NETLINK_CB(skb)); + if (!err) goto out2; - - /* Race condition! In the gap, when rt6_lock was - released someone could insert this route. Relookup. - */ - dst_release(&rt->u.dst); - goto relookup; } - dst_hold(&rt->u.dst); + + if (--attempts <= 0) + goto out2; + + /* + * Race condition! In the gap, when rt6_lock was + * released someone could insert this route. Relookup. + */ + dst_release(&rt->u.dst); + goto relookup; out: + if (reachable) { + reachable = 0; + goto restart_2; + } + dst_hold(&rt->u.dst); read_unlock_bh(&rt6_lock); out2: rt->u.dst.lastuse = jiffies; rt->u.dst.__use++; skb->dst = (struct dst_entry *) rt; + return; } struct dst_entry * ip6_route_output(struct sock *sk, struct flowi *fl) { struct fib6_node *fn; - struct rt6_info *rt; + struct rt6_info *rt, *nrt; int strict; int attempts = 3; + int err; + int reachable = RT6_SELECT_F_REACHABLE; - strict = ipv6_addr_type(&fl->fl6_dst) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL); + strict = ipv6_addr_type(&fl->fl6_dst) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL) ? RT6_SELECT_F_IFACE : 0; relookup: read_lock_bh(&rt6_lock); +restart_2: fn = fib6_lookup(&ip6_routing_table, &fl->fl6_dst, &fl->fl6_src); restart: - rt = fn->leaf; - - if ((rt->rt6i_flags & RTF_CACHE)) { - rt = rt6_device_match(rt, fl->oif, strict); - BACKTRACK(); - dst_hold(&rt->u.dst); + rt = rt6_select(&fn->leaf, fl->oif, strict | reachable); + BACKTRACK(); + if (rt == &ip6_null_entry || + rt->rt6i_flags & RTF_CACHE) goto out; - } - if (rt->rt6i_flags & RTF_DEFAULT) { - if (rt->rt6i_metric >= IP6_RT_PRIO_ADDRCONF) - rt = rt6_best_dflt(rt, fl->oif); - } else { - rt = rt6_device_match(rt, fl->oif, strict); - BACKTRACK(); - } - if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) { - struct rt6_info *nrt; - dst_hold(&rt->u.dst); - read_unlock_bh(&rt6_lock); + dst_hold(&rt->u.dst); + read_unlock_bh(&rt6_lock); - nrt = rt6_cow(rt, &fl->fl6_dst, &fl->fl6_src, NULL); + if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) + nrt = rt6_alloc_cow(rt, &fl->fl6_dst, &fl->fl6_src); + else { +#if CLONE_OFFLINK_ROUTE + nrt = rt6_alloc_clone(rt, &fl->fl6_dst); +#else + goto out2; +#endif + } - dst_release(&rt->u.dst); - rt = nrt; + dst_release(&rt->u.dst); + rt = nrt ? : &ip6_null_entry; - if (rt->u.dst.error != -EEXIST || --attempts <= 0) + dst_hold(&rt->u.dst); + if (nrt) { + err = ip6_ins_rt(nrt, NULL, NULL, NULL); + if (!err) goto out2; - - /* Race condition! In the gap, when rt6_lock was - released someone could insert this route. Relookup. - */ - dst_release(&rt->u.dst); - goto relookup; } - dst_hold(&rt->u.dst); + + if (--attempts <= 0) + goto out2; + + /* + * Race condition! In the gap, when rt6_lock was + * released someone could insert this route. Relookup. + */ + dst_release(&rt->u.dst); + goto relookup; out: + if (reachable) { + reachable = 0; + goto restart_2; + } + dst_hold(&rt->u.dst); read_unlock_bh(&rt6_lock); out2: rt->u.dst.lastuse = jiffies; @@ -999,8 +1095,6 @@ int ip6_del_rt(struct rt6_info *rt, stru write_lock_bh(&rt6_lock); - rt6_reset_dflt_pointer(NULL); - err = fib6_del(rt, nlh, _rtattr, req); dst_release(&rt->u.dst); @@ -1050,59 +1144,63 @@ static int ip6_route_del(struct in6_rtms void rt6_redirect(struct in6_addr *dest, struct in6_addr *saddr, struct neighbour *neigh, u8 *lladdr, int on_link) { - struct rt6_info *rt, *nrt; - - /* Locate old route to this destination. */ - rt = rt6_lookup(dest, NULL, neigh->dev->ifindex, 1); - - if (rt == NULL) - return; - - if (neigh->dev != rt->rt6i_dev) - goto out; + struct rt6_info *rt, *nrt = NULL; + int strict; + struct fib6_node *fn; /* - * Current route is on-link; redirect is always invalid. - * - * Seems, previous statement is not true. It could - * be node, which looks for us as on-link (f.e. proxy ndisc) - * But then router serving it might decide, that we should - * know truth 8)8) --ANK (980726). + * Get the "current" route for this destination and + * check if the redirect has come from approriate router. + * + * RFC 2461 specifies that redirects should only be + * accepted if they come from the nexthop to the target. + * Due to the way the routes are chosen, this notion + * is a bit fuzzy and one might need to check all possible + * routes. */ - if (!(rt->rt6i_flags&RTF_GATEWAY)) - goto out; + strict = ipv6_addr_type(dest) & (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL); - /* - * RFC 2461 specifies that redirects should only be - * accepted if they come from the nexthop to the target. - * Due to the way default routers are chosen, this notion - * is a bit fuzzy and one might need to check all default - * routers. - */ - if (!ipv6_addr_equal(saddr, &rt->rt6i_gateway)) { - if (rt->rt6i_flags & RTF_DEFAULT) { - struct rt6_info *rt1; - - read_lock(&rt6_lock); - for (rt1 = ip6_routing_table.leaf; rt1; rt1 = rt1->u.next) { - if (ipv6_addr_equal(saddr, &rt1->rt6i_gateway)) { - dst_hold(&rt1->u.dst); - dst_release(&rt->u.dst); - read_unlock(&rt6_lock); - rt = rt1; - goto source_ok; - } - } - read_unlock(&rt6_lock); + read_lock_bh(&rt6_lock); + fn = fib6_lookup(&ip6_routing_table, dest, NULL); +restart: + for (rt = fn->leaf; rt; rt = rt->u.next) { + /* + * Current route is on-link; redirect is always invalid. + * + * Seems, previous statement is not true. It could + * be node, which looks for us as on-link (f.e. proxy ndisc) + * But then router serving it might decide, that we should + * know truth 8)8) --ANK (980726). + */ + if (rt6_check_expired(rt)) + continue; + if (!(rt->rt6i_flags & RTF_GATEWAY)) + continue; + if (neigh->dev != rt->rt6i_dev) + continue; + if (!ipv6_addr_equal(saddr, &rt->rt6i_gateway)) + continue; + break; + } + if (rt) + dst_hold(&rt->u.dst); + else if (strict) { + while ((fn = fn->parent) != NULL) { + if (fn->fn_flags & RTN_ROOT) + break; + if (fn->fn_flags & RTN_RTINFO) + goto restart; } + } + read_unlock_bh(&rt6_lock); + + if (!rt) { if (net_ratelimit()) printk(KERN_DEBUG "rt6_redirect: source isn't a valid nexthop " "for redirect target\n"); - goto out; + return; } -source_ok: - /* * We have finally decided to accept it. */ @@ -1210,38 +1308,27 @@ void rt6_pmtu_discovery(struct in6_addr 1. It is connected route. Action: COW 2. It is gatewayed route or NONEXTHOP route. Action: clone it. */ - if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) { - nrt = rt6_cow(rt, daddr, saddr, NULL); - if (!nrt->u.dst.error) { - nrt->u.dst.metrics[RTAX_MTU-1] = pmtu; - if (allfrag) - nrt->u.dst.metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG; - /* According to RFC 1981, detecting PMTU increase shouldn't be - happened within 5 mins, the recommended timer is 10 mins. - Here this route expiration time is set to ip6_rt_mtu_expires - which is 10 mins. After 10 mins the decreased pmtu is expired - and detecting PMTU increase will be automatically happened. - */ - dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires); - nrt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES; - } - dst_release(&nrt->u.dst); - } else { - nrt = ip6_rt_copy(rt); - if (nrt == NULL) - goto out; - ipv6_addr_copy(&nrt->rt6i_dst.addr, daddr); - nrt->rt6i_dst.plen = 128; - nrt->u.dst.flags |= DST_HOST; - nrt->rt6i_nexthop = neigh_clone(rt->rt6i_nexthop); - dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires); - nrt->rt6i_flags |= RTF_DYNAMIC|RTF_CACHE|RTF_EXPIRES; + if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP)) + nrt = rt6_alloc_cow(rt, daddr, saddr); + else + nrt = rt6_alloc_clone(rt, daddr); + + if (nrt) { nrt->u.dst.metrics[RTAX_MTU-1] = pmtu; if (allfrag) nrt->u.dst.metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG; + + /* According to RFC 1981, detecting PMTU increase shouldn't be + * happened within 5 mins, the recommended timer is 10 mins. + * Here this route expiration time is set to ip6_rt_mtu_expires + * which is 10 mins. After 10 mins the decreased pmtu is expired + * and detecting PMTU increase will be automatically happened. + */ + dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires); + nrt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES; + ip6_ins_rt(nrt, NULL, NULL, NULL); } - out: dst_release(&rt->u.dst); } @@ -1280,6 +1367,57 @@ static struct rt6_info * ip6_rt_copy(str return rt; } +#ifdef CONFIG_IPV6_ROUTE_INFO +static struct rt6_info *rt6_get_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex) +{ + struct fib6_node *fn; + struct rt6_info *rt = NULL; + + write_lock_bh(&rt6_lock); + fn = fib6_locate(&ip6_routing_table, prefix ,prefixlen, NULL, 0); + if (!fn) + goto out; + + for (rt = fn->leaf; rt; rt = rt->u.next) { + if (rt->rt6i_dev->ifindex != ifindex) + continue; + if ((rt->rt6i_flags & (RTF_ROUTEINFO|RTF_GATEWAY)) != (RTF_ROUTEINFO|RTF_GATEWAY)) + continue; + if (!ipv6_addr_equal(&rt->rt6i_gateway, gwaddr)) + continue; + dst_hold(&rt->u.dst); + break; + } +out: + write_unlock_bh(&rt6_lock); + return rt; +} + +static struct rt6_info *rt6_add_route_info(struct in6_addr *prefix, int prefixlen, + struct in6_addr *gwaddr, int ifindex, + unsigned pref) +{ + struct in6_rtmsg rtmsg; + + memset(&rtmsg, 0, sizeof(rtmsg)); + rtmsg.rtmsg_type = RTMSG_NEWROUTE; + ipv6_addr_copy(&rtmsg.rtmsg_dst, prefix); + rtmsg.rtmsg_dst_len = prefixlen; + ipv6_addr_copy(&rtmsg.rtmsg_gateway, gwaddr); + rtmsg.rtmsg_metric = 1024; + rtmsg.rtmsg_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_ROUTEINFO | RTF_UP | RTF_PREF(pref); + /* We should treat it as a default route if prefix length is 0. */ + if (!prefixlen) + rtmsg.rtmsg_flags |= RTF_DEFAULT; + rtmsg.rtmsg_ifindex = ifindex; + + ip6_route_add(&rtmsg, NULL, NULL, NULL); + + return rt6_get_route_info(prefix, prefixlen, gwaddr, ifindex); +} +#endif + struct rt6_info *rt6_get_dflt_router(struct in6_addr *addr, struct net_device *dev) { struct rt6_info *rt; @@ -1290,6 +1428,7 @@ struct rt6_info *rt6_get_dflt_router(str write_lock_bh(&rt6_lock); for (rt = fn->leaf; rt; rt=rt->u.next) { if (dev == rt->rt6i_dev && + ((rt->rt6i_flags & (RTF_ADDRCONF | RTF_DEFAULT)) == (RTF_ADDRCONF | RTF_DEFAULT)) && ipv6_addr_equal(&rt->rt6i_gateway, addr)) break; } @@ -1300,7 +1439,8 @@ struct rt6_info *rt6_get_dflt_router(str } struct rt6_info *rt6_add_dflt_router(struct in6_addr *gwaddr, - struct net_device *dev) + struct net_device *dev, + unsigned int pref) { struct in6_rtmsg rtmsg; @@ -1308,7 +1448,8 @@ struct rt6_info *rt6_add_dflt_router(str rtmsg.rtmsg_type = RTMSG_NEWROUTE; ipv6_addr_copy(&rtmsg.rtmsg_gateway, gwaddr); rtmsg.rtmsg_metric = 1024; - rtmsg.rtmsg_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT | RTF_UP | RTF_EXPIRES; + rtmsg.rtmsg_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT | RTF_UP | RTF_EXPIRES | + RTF_PREF(pref); rtmsg.rtmsg_ifindex = dev->ifindex; @@ -1326,8 +1467,6 @@ restart: if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF)) { dst_hold(&rt->u.dst); - rt6_reset_dflt_pointer(NULL); - read_unlock_bh(&rt6_lock); ip6_del_rt(rt, NULL, NULL, NULL); diff -puN net/ipv6/tcp_ipv6.c~git-net net/ipv6/tcp_ipv6.c --- devel/net/ipv6/tcp_ipv6.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/ipv6/tcp_ipv6.c 2006-02-27 22:37:47.000000000 -0800 @@ -987,6 +987,7 @@ static struct sock * tcp_v6_syn_recv_soc inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen + newnp->opt->opt_flen); + tcp_mtup_init(newsk); tcp_sync_mss(newsk, dst_mtu(dst)); newtp->advmss = dst_metric(dst, RTAX_ADVMSS); tcp_initialize_rcv_mss(newsk); @@ -1604,21 +1605,12 @@ static struct inet_protosw tcpv6_protosw void __init tcpv6_init(void) { - int err; - /* register inet6 protocol */ if (inet6_add_protocol(&tcpv6_protocol, IPPROTO_TCP) < 0) printk(KERN_ERR "tcpv6_init: Could not register protocol\n"); inet6_register_protosw(&tcpv6_protosw); - err = sock_create_kern(PF_INET6, SOCK_RAW, IPPROTO_TCP, &tcp6_socket); - if (err < 0) + if (inet_csk_ctl_sock_create(&tcp6_socket, PF_INET6, SOCK_RAW, + IPPROTO_TCP) < 0) panic("Failed to create the TCPv6 control socket.\n"); - tcp6_socket->sk->sk_allocation = GFP_ATOMIC; - - /* Unhash it so that IP input processing does not even - * see it, we do not wish this socket to see incoming - * packets. - */ - tcp6_socket->sk->sk_prot->unhash(tcp6_socket->sk); } diff -puN net/key/af_key.c~git-net net/key/af_key.c --- devel/net/key/af_key.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/key/af_key.c 2006-02-27 22:37:47.000000000 -0800 @@ -2651,6 +2651,8 @@ static int pfkey_send_notify(struct xfrm return key_notify_sa(x, c); case XFRM_MSG_FLUSHSA: return key_notify_sa_flush(c); + case XFRM_MSG_NEWAE: /* not yet supported */ + break; default: printk("pfkey: Unknown SA event %d\n", c->event); break; diff -puN net/llc/af_llc.c~git-net net/llc/af_llc.c --- devel/net/llc/af_llc.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/llc/af_llc.c 2006-02-27 22:37:47.000000000 -0800 @@ -54,7 +54,7 @@ static int llc_ui_wait_for_busy_core(str * * Return the next unused link number for a given sap. */ -static __inline__ u16 llc_ui_next_link_no(int sap) +static inline u16 llc_ui_next_link_no(int sap) { return llc_ui_sap_link_no_max[sap]++; } @@ -65,7 +65,7 @@ static __inline__ u16 llc_ui_next_link_n * * Given an ARP header type return the corresponding ethernet protocol. */ -static __inline__ u16 llc_proto_type(u16 arphrd) +static inline u16 llc_proto_type(u16 arphrd) { return arphrd == ARPHRD_IEEE802_TR ? htons(ETH_P_TR_802_2) : htons(ETH_P_802_2); @@ -75,7 +75,7 @@ static __inline__ u16 llc_proto_type(u16 * llc_ui_addr_null - determines if a address structure is null * @addr: Address to test if null. */ -static __inline__ u8 llc_ui_addr_null(struct sockaddr_llc *addr) +static inline u8 llc_ui_addr_null(struct sockaddr_llc *addr) { return !memcmp(addr, &llc_ui_addrnull, sizeof(*addr)); } @@ -89,8 +89,7 @@ static __inline__ u8 llc_ui_addr_null(st * operation the user would like to perform and the type of socket. * Returns the correct llc header length. */ -static __inline__ u8 llc_ui_header_len(struct sock *sk, - struct sockaddr_llc *addr) +static inline u8 llc_ui_header_len(struct sock *sk, struct sockaddr_llc *addr) { u8 rc = LLC_PDU_LEN_U; @@ -138,7 +137,7 @@ static void llc_ui_sk_init(struct socket } static struct proto llc_proto = { - .name = "DDP", + .name = "LLC", .owner = THIS_MODULE, .obj_size = sizeof(struct llc_sock), }; @@ -188,8 +187,10 @@ static int llc_ui_release(struct socket llc->laddr.lsap, llc->daddr.lsap); if (!llc_send_disc(sk)) llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo); - if (!sock_flag(sk, SOCK_ZAPPED)) + if (!sock_flag(sk, SOCK_ZAPPED)) { + llc_sap_put(llc->sap); llc_sap_remove_socket(llc->sap, sk); + } release_sock(sk); if (llc->dev) dev_put(llc->dev); diff -puN net/llc/llc_core.c~git-net net/llc/llc_core.c --- devel/net/llc/llc_core.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/llc/llc_core.c 2006-02-27 22:37:47.000000000 -0800 @@ -127,7 +127,6 @@ struct llc_sap *llc_sap_open(unsigned ch goto out; sap->laddr.lsap = lsap; sap->rcv_func = func; - llc_sap_hold(sap); llc_add_sap(sap); out: write_unlock_bh(&llc_sap_list_lock); diff -puN net/netfilter/Kconfig~git-net net/netfilter/Kconfig --- devel/net/netfilter/Kconfig~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/Kconfig 2006-02-27 22:37:47.000000000 -0800 @@ -279,6 +279,16 @@ config NETFILTER_XT_MATCH_MARK To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_POLICY + tristate 'IPsec "policy" match support' + depends on NETFILTER_XTABLES && XFRM + help + Policy matching allows you to match packets based on the + IPsec policy that was used during decapsulation/will + be used during encapsulation. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_PHYSDEV tristate '"physdev" match support' depends on NETFILTER_XTABLES && BRIDGE_NETFILTER diff -puN net/netfilter/Makefile~git-net net/netfilter/Makefile --- devel/net/netfilter/Makefile~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/Makefile 2006-02-27 22:37:47.000000000 -0800 @@ -40,6 +40,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o obj-$(CONFIG_NETFILTER_XT_MATCH_MARK) += xt_mark.o +obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o diff -puN net/netfilter/nf_conntrack_core.c~git-net net/netfilter/nf_conntrack_core.c --- devel/net/netfilter/nf_conntrack_core.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_core.c 2006-02-27 22:37:47.000000000 -0800 @@ -3,7 +3,7 @@ extension. */ /* (C) 1999-2001 Paul `Rusty' Russell - * (C) 2002-2005 Netfilter Core Team + * (C) 2002-2006 Netfilter Core Team * (C) 2003,2004 USAGI/WIDE Project * * This program is free software; you can redistribute it and/or modify @@ -20,6 +20,9 @@ * - generalize L3 protocol denendent part. * 23 Mar 2004: Yasuyuki Kozakai @USAGI * - add support various size of conntrack structures. + * 26 Jan 2006: Harald Welte + * - restructure nf_conn (introduce nf_conn_help) + * - redesign 'features' how they were originally intended * * Derived from net/ipv4/netfilter/ip_conntrack_core.c */ @@ -55,7 +58,7 @@ #include #include -#define NF_CONNTRACK_VERSION "0.4.1" +#define NF_CONNTRACK_VERSION "0.5.0" #if 0 #define DEBUGP printk @@ -259,21 +262,8 @@ static inline u_int32_t hash_conntrack(c nf_conntrack_hash_rnd); } -/* Initialize "struct nf_conn" which has spaces for helper */ -static int -init_conntrack_for_helper(struct nf_conn *conntrack, u_int32_t features) -{ - - conntrack->help = (union nf_conntrack_help *) - (((unsigned long)conntrack->data - + (__alignof__(union nf_conntrack_help) - 1)) - & (~((unsigned long)(__alignof__(union nf_conntrack_help) -1)))); - return 0; -} - int nf_conntrack_register_cache(u_int32_t features, const char *name, - size_t size, - int (*init)(struct nf_conn *, u_int32_t)) + size_t size) { int ret = 0; char *cache_name; @@ -296,8 +286,7 @@ int nf_conntrack_register_cache(u_int32_ DEBUGP("nf_conntrack_register_cache: already resisterd.\n"); if ((!strncmp(nf_ct_cache[features].name, name, NF_CT_FEATURES_NAMELEN)) - && nf_ct_cache[features].size == size - && nf_ct_cache[features].init_conntrack == init) { + && nf_ct_cache[features].size == size) { DEBUGP("nf_conntrack_register_cache: reusing.\n"); nf_ct_cache[features].use++; ret = 0; @@ -340,7 +329,6 @@ int nf_conntrack_register_cache(u_int32_ write_lock_bh(&nf_ct_cache_lock); nf_ct_cache[features].use = 1; nf_ct_cache[features].size = size; - nf_ct_cache[features].init_conntrack = init; nf_ct_cache[features].cachep = cachep; nf_ct_cache[features].name = cache_name; write_unlock_bh(&nf_ct_cache_lock); @@ -377,7 +365,6 @@ void nf_conntrack_unregister_cache(u_int name = nf_ct_cache[features].name; nf_ct_cache[features].cachep = NULL; nf_ct_cache[features].name = NULL; - nf_ct_cache[features].init_conntrack = NULL; nf_ct_cache[features].size = 0; write_unlock_bh(&nf_ct_cache_lock); @@ -432,11 +419,15 @@ nf_ct_invert_tuple(struct nf_conntrack_t /* nf_conntrack_expect helper functions */ void nf_ct_unlink_expect(struct nf_conntrack_expect *exp) { + struct nf_conn_help *master_help = nfct_help(exp->master); + + NF_CT_ASSERT(master_help); ASSERT_WRITE_LOCK(&nf_conntrack_lock); NF_CT_ASSERT(!timer_pending(&exp->timeout)); + list_del(&exp->list); NF_CT_STAT_INC(expect_delete); - exp->master->expecting--; + master_help->expecting--; nf_conntrack_expect_put(exp); } @@ -508,9 +499,10 @@ find_expectation(const struct nf_conntra void nf_ct_remove_expectations(struct nf_conn *ct) { struct nf_conntrack_expect *i, *tmp; + struct nf_conn_help *help = nfct_help(ct); /* Optimization: most connection never expect any others. */ - if (ct->expecting == 0) + if (!help || help->expecting == 0) return; list_for_each_entry_safe(i, tmp, &nf_conntrack_expect_list, list) { @@ -713,6 +705,7 @@ __nf_conntrack_confirm(struct sk_buff ** conntrack_tuple_cmp, struct nf_conntrack_tuple_hash *, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, NULL)) { + struct nf_conn_help *help; /* Remove from unconfirmed list */ list_del(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list); @@ -726,7 +719,8 @@ __nf_conntrack_confirm(struct sk_buff ** set_bit(IPS_CONFIRMED_BIT, &ct->status); NF_CT_STAT_INC(insert); write_unlock_bh(&nf_conntrack_lock); - if (ct->helper) + help = nfct_help(ct); + if (help && help->helper) nf_conntrack_event_cache(IPCT_HELPER, *pskb); #ifdef CONFIG_NF_NAT_NEEDED if (test_bit(IPS_SRC_NAT_DONE_BIT, &ct->status) || @@ -842,8 +836,9 @@ __nf_conntrack_alloc(const struct nf_con { struct nf_conn *conntrack = NULL; u_int32_t features = 0; + struct nf_conntrack_helper *helper; - if (!nf_conntrack_hash_rnd_initted) { + if (unlikely(!nf_conntrack_hash_rnd_initted)) { get_random_bytes(&nf_conntrack_hash_rnd, 4); nf_conntrack_hash_rnd_initted = 1; } @@ -863,8 +858,11 @@ __nf_conntrack_alloc(const struct nf_con /* find features needed by this conntrack. */ features = l3proto->get_features(orig); + + /* FIXME: protect helper list per RCU */ read_lock_bh(&nf_conntrack_lock); - if (__nf_ct_helper_find(repl) != NULL) + helper = __nf_ct_helper_find(repl); + if (helper) features |= NF_CT_F_HELP; read_unlock_bh(&nf_conntrack_lock); @@ -872,7 +870,7 @@ __nf_conntrack_alloc(const struct nf_con read_lock_bh(&nf_ct_cache_lock); - if (!nf_ct_cache[features].use) { + if (unlikely(!nf_ct_cache[features].use)) { DEBUGP("nf_conntrack_alloc: not supported features = 0x%x\n", features); goto out; @@ -886,12 +884,10 @@ __nf_conntrack_alloc(const struct nf_con memset(conntrack, 0, nf_ct_cache[features].size); conntrack->features = features; - if (nf_ct_cache[features].init_conntrack && - nf_ct_cache[features].init_conntrack(conntrack, features) < 0) { - DEBUGP("nf_conntrack_alloc: failed to init\n"); - kmem_cache_free(nf_ct_cache[features].cachep, conntrack); - conntrack = NULL; - goto out; + if (helper) { + struct nf_conn_help *help = nfct_help(conntrack); + NF_CT_ASSERT(help); + help->helper = helper; } atomic_set(&conntrack->ct_general.use, 1); @@ -972,11 +968,8 @@ init_conntrack(const struct nf_conntrack #endif nf_conntrack_get(&conntrack->master->ct_general); NF_CT_STAT_INC(expect_new); - } else { - conntrack->helper = __nf_ct_helper_find(&repl_tuple); - + } else NF_CT_STAT_INC(new); - } /* Overload tuple linked list to put us in unconfirmed list. */ list_add(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL].list, &unconfirmed); @@ -1206,14 +1199,16 @@ void nf_conntrack_expect_put(struct nf_c static void nf_conntrack_expect_insert(struct nf_conntrack_expect *exp) { + struct nf_conn_help *master_help = nfct_help(exp->master); + atomic_inc(&exp->use); - exp->master->expecting++; + master_help->expecting++; list_add(&exp->list, &nf_conntrack_expect_list); init_timer(&exp->timeout); exp->timeout.data = (unsigned long)exp; exp->timeout.function = expectation_timed_out; - exp->timeout.expires = jiffies + exp->master->helper->timeout * HZ; + exp->timeout.expires = jiffies + master_help->helper->timeout * HZ; add_timer(&exp->timeout); exp->id = ++nf_conntrack_expect_next_id; @@ -1239,10 +1234,12 @@ static void evict_oldest_expect(struct n static inline int refresh_timer(struct nf_conntrack_expect *i) { + struct nf_conn_help *master_help = nfct_help(i->master); + if (!del_timer(&i->timeout)) return 0; - i->timeout.expires = jiffies + i->master->helper->timeout*HZ; + i->timeout.expires = jiffies + master_help->helper->timeout*HZ; add_timer(&i->timeout); return 1; } @@ -1251,8 +1248,11 @@ int nf_conntrack_expect_related(struct n { struct nf_conntrack_expect *i; struct nf_conn *master = expect->master; + struct nf_conn_help *master_help = nfct_help(master); int ret; + NF_CT_ASSERT(master_help); + DEBUGP("nf_conntrack_expect_related %p\n", related_to); DEBUGP("tuple: "); NF_CT_DUMP_TUPLE(&expect->tuple); DEBUGP("mask: "); NF_CT_DUMP_TUPLE(&expect->mask); @@ -1271,8 +1271,8 @@ int nf_conntrack_expect_related(struct n } } /* Will be over limit? */ - if (master->helper->max_expected && - master->expecting >= master->helper->max_expected) + if (master_help->helper->max_expected && + master_help->expecting >= master_help->helper->max_expected) evict_oldest_expect(master); nf_conntrack_expect_insert(expect); @@ -1283,24 +1283,6 @@ out: return ret; } -/* Alter reply tuple (maybe alter helper). This is for NAT, and is - implicitly racy: see __nf_conntrack_confirm */ -void nf_conntrack_alter_reply(struct nf_conn *conntrack, - const struct nf_conntrack_tuple *newreply) -{ - write_lock_bh(&nf_conntrack_lock); - /* Should be unconfirmed, so not in hash table yet */ - NF_CT_ASSERT(!nf_ct_is_confirmed(conntrack)); - - DEBUGP("Altering reply tuple of %p to ", conntrack); - NF_CT_DUMP_TUPLE(newreply); - - conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *newreply; - if (!conntrack->master && conntrack->expecting == 0) - conntrack->helper = __nf_ct_helper_find(newreply); - write_unlock_bh(&nf_conntrack_lock); -} - int nf_conntrack_helper_register(struct nf_conntrack_helper *me) { int ret; @@ -1308,9 +1290,8 @@ int nf_conntrack_helper_register(struct ret = nf_conntrack_register_cache(NF_CT_F_HELP, "nf_conntrack:help", sizeof(struct nf_conn) - + sizeof(union nf_conntrack_help) - + __alignof__(union nf_conntrack_help), - init_conntrack_for_helper); + + sizeof(struct nf_conn_help) + + __alignof__(struct nf_conn_help)); if (ret < 0) { printk(KERN_ERR "nf_conntrack_helper_reigster: Unable to create slab cache for conntracks\n"); return ret; @@ -1338,9 +1319,12 @@ __nf_conntrack_helper_find_byname(const static inline int unhelp(struct nf_conntrack_tuple_hash *i, const struct nf_conntrack_helper *me) { - if (nf_ct_tuplehash_to_ctrack(i)->helper == me) { - nf_conntrack_event(IPCT_HELPER, nf_ct_tuplehash_to_ctrack(i)); - nf_ct_tuplehash_to_ctrack(i)->helper = NULL; + struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(i); + struct nf_conn_help *help = nfct_help(ct); + + if (help && help->helper == me) { + nf_conntrack_event(IPCT_HELPER, ct); + help->helper = NULL; } return 0; } @@ -1356,7 +1340,8 @@ void nf_conntrack_helper_unregister(stru /* Get rid of expectations */ list_for_each_entry_safe(exp, tmp, &nf_conntrack_expect_list, list) { - if (exp->master->helper == me && del_timer(&exp->timeout)) { + struct nf_conn_help *help = nfct_help(exp->master); + if (help->helper == me && del_timer(&exp->timeout)) { nf_ct_unlink_expect(exp); nf_conntrack_expect_put(exp); } @@ -1697,7 +1682,7 @@ int __init nf_conntrack_init(void) } ret = nf_conntrack_register_cache(NF_CT_F_BASIC, "nf_conntrack:basic", - sizeof(struct nf_conn), NULL); + sizeof(struct nf_conn)); if (ret < 0) { printk(KERN_ERR "Unable to create nf_conn slab cache\n"); goto err_free_hash; diff -puN net/netfilter/nf_conntrack_ftp.c~git-net net/netfilter/nf_conntrack_ftp.c --- devel/net/netfilter/nf_conntrack_ftp.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_ftp.c 2006-02-27 22:37:47.000000000 -0800 @@ -440,7 +440,7 @@ static int help(struct sk_buff **pskb, u32 seq; int dir = CTINFO2DIR(ctinfo); unsigned int matchlen, matchoff; - struct ip_ct_ftp_master *ct_ftp_info = &ct->help->ct_ftp_info; + struct ip_ct_ftp_master *ct_ftp_info = &nfct_help(ct)->help.ct_ftp_info; struct nf_conntrack_expect *exp; struct nf_conntrack_man cmd = {}; diff -puN net/netfilter/nf_conntrack_netlink.c~git-net net/netfilter/nf_conntrack_netlink.c --- devel/net/netfilter/nf_conntrack_netlink.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_netlink.c 2006-02-27 22:37:47.000000000 -0800 @@ -2,7 +2,7 @@ * protocol helpers and general trouble making from userspace. * * (C) 2001 by Jay Schulist - * (C) 2002-2005 by Harald Welte + * (C) 2002-2006 by Harald Welte * (C) 2003 by Patrick Mchardy * (C) 2005 by Pablo Neira Ayuso * @@ -44,7 +44,7 @@ MODULE_LICENSE("GPL"); -static char __initdata version[] = "0.92"; +static char __initdata version[] = "0.93"; #if 0 #define DEBUGP printk @@ -165,15 +165,16 @@ static inline int ctnetlink_dump_helpinfo(struct sk_buff *skb, const struct nf_conn *ct) { struct nfattr *nest_helper; + const struct nf_conn_help *help = nfct_help(ct); - if (!ct->helper) + if (!help || !help->helper) return 0; nest_helper = NFA_NEST(skb, CTA_HELP); - NFA_PUT(skb, CTA_HELP_NAME, strlen(ct->helper->name), ct->helper->name); + NFA_PUT(skb, CTA_HELP_NAME, strlen(help->helper->name), help->helper->name); - if (ct->helper->to_nfattr) - ct->helper->to_nfattr(skb, ct); + if (help->helper->to_nfattr) + help->helper->to_nfattr(skb, ct); NFA_NEST_END(skb, nest_helper); @@ -337,9 +338,10 @@ static int ctnetlink_conntrack_event(str group = NFNLGRP_CONNTRACK_UPDATE; } else return NOTIFY_DONE; - - /* FIXME: Check if there are any listeners before, don't hurt performance */ - + + if (!nfnetlink_has_listeners(group)) + return NOTIFY_DONE; + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb) return NOTIFY_DONE; @@ -903,11 +905,17 @@ static inline int ctnetlink_change_helper(struct nf_conn *ct, struct nfattr *cda[]) { struct nf_conntrack_helper *helper; + struct nf_conn_help *help = nfct_help(ct); char *helpname; int err; DEBUGP("entered %s\n", __FUNCTION__); + if (!help) { + /* FIXME: we need to reallocate and rehash */ + return -EBUSY; + } + /* don't change helper of sibling connections */ if (ct->master) return -EINVAL; @@ -924,18 +932,18 @@ ctnetlink_change_helper(struct nf_conn * return -EINVAL; } - if (ct->helper) { + if (help->helper) { if (!helper) { /* we had a helper before ... */ nf_ct_remove_expectations(ct); - ct->helper = NULL; + help->helper = NULL; } else { /* need to zero data of old helper */ - memset(&ct->help, 0, sizeof(ct->help)); + memset(&help->help, 0, sizeof(help->help)); } } - ct->helper = helper; + help->helper = helper; return 0; } @@ -1050,14 +1058,9 @@ ctnetlink_create_conntrack(struct nfattr ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1])); #endif - ct->helper = nf_ct_helper_find_get(rtuple); - add_timer(&ct->timeout); nf_conntrack_hash_insert(ct); - if (ct->helper) - nf_ct_helper_put(ct->helper); - DEBUGP("conntrack with id %u inserted\n", ct->id); return 0; @@ -1417,7 +1420,8 @@ ctnetlink_del_expect(struct sock *ctnl, } list_for_each_entry_safe(exp, tmp, &nf_conntrack_expect_list, list) { - if (exp->master->helper == h + struct nf_conn_help *m_help = nfct_help(exp->master); + if (m_help->helper == h && del_timer(&exp->timeout)) { nf_ct_unlink_expect(exp); nf_conntrack_expect_put(exp); @@ -1452,6 +1456,7 @@ ctnetlink_create_expect(struct nfattr *c struct nf_conntrack_tuple_hash *h = NULL; struct nf_conntrack_expect *exp; struct nf_conn *ct; + struct nf_conn_help *help; int err = 0; DEBUGP("entered %s\n", __FUNCTION__); @@ -1472,8 +1477,9 @@ ctnetlink_create_expect(struct nfattr *c if (!h) return -ENOENT; ct = nf_ct_tuplehash_to_ctrack(h); + help = nfct_help(ct); - if (!ct->helper) { + if (!help || !help->helper) { /* such conntrack hasn't got any helper, abort */ err = -EINVAL; goto out; diff -puN net/netfilter/nf_conntrack_standalone.c~git-net net/netfilter/nf_conntrack_standalone.c --- devel/net/netfilter/nf_conntrack_standalone.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/nf_conntrack_standalone.c 2006-02-27 22:37:47.000000000 -0800 @@ -839,7 +839,6 @@ EXPORT_SYMBOL(nf_conntrack_l3proto_unreg EXPORT_SYMBOL(nf_conntrack_protocol_register); EXPORT_SYMBOL(nf_conntrack_protocol_unregister); EXPORT_SYMBOL(nf_ct_invert_tuplepr); -EXPORT_SYMBOL(nf_conntrack_alter_reply); EXPORT_SYMBOL(nf_conntrack_destroyed); EXPORT_SYMBOL(need_conntrack); EXPORT_SYMBOL(nf_conntrack_helper_register); diff -puN net/netfilter/nfnetlink.c~git-net net/netfilter/nfnetlink.c --- devel/net/netfilter/nfnetlink.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/nfnetlink.c 2006-02-27 22:37:47.000000000 -0800 @@ -191,6 +191,12 @@ nfnetlink_check_attributes(struct nfnetl return 0; } +int nfnetlink_has_listeners(unsigned int group) +{ + return netlink_has_listeners(nfnl, group); +} +EXPORT_SYMBOL_GPL(nfnetlink_has_listeners); + int nfnetlink_send(struct sk_buff *skb, u32 pid, unsigned group, int echo) { gfp_t allocation = in_interrupt() ? GFP_ATOMIC : GFP_KERNEL; diff -puN net/netfilter/nfnetlink_log.c~git-net net/netfilter/nfnetlink_log.c --- devel/net/netfilter/nfnetlink_log.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/nfnetlink_log.c 2006-02-27 22:37:47.000000000 -0800 @@ -11,6 +11,10 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. * + * 2006-01-26 Harald Welte + * - Add optional local and global sequence number to detect lost + * events from userspace + * */ #include #include @@ -68,11 +72,14 @@ struct nfulnl_instance { unsigned int nlbufsiz; /* netlink buffer allocation size */ unsigned int qthreshold; /* threshold of the queue */ u_int32_t copy_range; + u_int32_t seq; /* instance-local sequential counter */ u_int16_t group_num; /* number of this queue */ + u_int16_t flags; u_int8_t copy_mode; }; static DEFINE_RWLOCK(instances_lock); +static atomic_t global_seq; #define INSTANCE_BUCKETS 16 static struct hlist_head instance_table[INSTANCE_BUCKETS]; @@ -310,6 +317,16 @@ nfulnl_set_qthresh(struct nfulnl_instanc return 0; } +static int +nfulnl_set_flags(struct nfulnl_instance *inst, u_int16_t flags) +{ + spin_lock_bh(&inst->lock); + inst->flags = ntohs(flags); + spin_unlock_bh(&inst->lock); + + return 0; +} + static struct sk_buff *nfulnl_alloc_skb(unsigned int inst_size, unsigned int pkt_size) { @@ -377,6 +394,8 @@ static void nfulnl_timer(unsigned long d spin_unlock_bh(&inst->lock); } +/* This is an inline function, we don't really care about a long + * list of arguments */ static inline int __build_packet_message(struct nfulnl_instance *inst, const struct sk_buff *skb, @@ -515,6 +534,17 @@ __build_packet_message(struct nfulnl_ins read_unlock_bh(&skb->sk->sk_callback_lock); } + /* local sequence number */ + if (inst->flags & NFULNL_CFG_F_SEQ) { + tmp_uint = htonl(inst->seq++); + NFA_PUT(inst->skb, NFULA_SEQ, sizeof(tmp_uint), &tmp_uint); + } + /* global sequence number */ + if (inst->flags & NFULNL_CFG_F_SEQ_GLOBAL) { + tmp_uint = atomic_inc_return(&global_seq); + NFA_PUT(inst->skb, NFULA_SEQ_GLOBAL, sizeof(tmp_uint), &tmp_uint); + } + if (data_len) { struct nfattr *nfa; int size = NFA_LENGTH(data_len); @@ -607,6 +637,11 @@ nfulnl_log_packet(unsigned int pf, spin_lock_bh(&inst->lock); + if (inst->flags & NFULNL_CFG_F_SEQ) + size += NFA_SPACE(sizeof(u_int32_t)); + if (inst->flags & NFULNL_CFG_F_SEQ_GLOBAL) + size += NFA_SPACE(sizeof(u_int32_t)); + qthreshold = inst->qthreshold; /* per-rule qthreshold overrides per-instance */ if (qthreshold > li->u.ulog.qthreshold) @@ -736,10 +771,14 @@ static const int nfula_min[NFULA_MAX] = [NFULA_TIMESTAMP-1] = sizeof(struct nfulnl_msg_packet_timestamp), [NFULA_IFINDEX_INDEV-1] = sizeof(u_int32_t), [NFULA_IFINDEX_OUTDEV-1]= sizeof(u_int32_t), + [NFULA_IFINDEX_PHYSINDEV-1] = sizeof(u_int32_t), + [NFULA_IFINDEX_PHYSOUTDEV-1] = sizeof(u_int32_t), [NFULA_HWADDR-1] = sizeof(struct nfulnl_msg_packet_hw), [NFULA_PAYLOAD-1] = 0, [NFULA_PREFIX-1] = 0, [NFULA_UID-1] = sizeof(u_int32_t), + [NFULA_SEQ-1] = sizeof(u_int32_t), + [NFULA_SEQ_GLOBAL-1] = sizeof(u_int32_t), }; static const int nfula_cfg_min[NFULA_CFG_MAX] = { @@ -748,6 +787,7 @@ static const int nfula_cfg_min[NFULA_CFG [NFULA_CFG_TIMEOUT-1] = sizeof(u_int32_t), [NFULA_CFG_QTHRESH-1] = sizeof(u_int32_t), [NFULA_CFG_NLBUFSIZ-1] = sizeof(u_int32_t), + [NFULA_CFG_FLAGS-1] = sizeof(u_int16_t), }; static int @@ -859,6 +899,12 @@ nfulnl_recv_config(struct sock *ctnl, st nfulnl_set_qthresh(inst, ntohl(qthresh)); } + if (nfula[NFULA_CFG_FLAGS-1]) { + u_int16_t flags = + *(u_int16_t *)NFA_DATA(nfula[NFULA_CFG_FLAGS-1]); + nfulnl_set_flags(inst, ntohl(flags)); + } + out_put: instance_put(inst); return ret; diff -puN net/netfilter/x_tables.c~git-net net/netfilter/x_tables.c --- devel/net/netfilter/x_tables.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/x_tables.c 2006-02-27 22:37:47.000000000 -0800 @@ -52,6 +52,12 @@ enum { MATCH, }; +static const char *xt_prefix[NPROTO] = { + [AF_INET] = "ip", + [AF_INET6] = "ip6", + [NF_ARP] = "arp", +}; + /* Registration hooks for targets. */ int xt_register_target(int af, struct xt_target *target) @@ -158,18 +164,12 @@ struct xt_target *xt_find_target(int af, } EXPORT_SYMBOL(xt_find_target); -static const char *xt_prefix[NPROTO] = { - [AF_INET] = "ipt_%s", - [AF_INET6] = "ip6t_%s", - [NF_ARP] = "arpt_%s", -}; - struct xt_target *xt_request_find_target(int af, const char *name, u8 revision) { struct xt_target *target; target = try_then_request_module(xt_find_target(af, name, revision), - xt_prefix[af], name); + "%st_%s", xt_prefix[af], name); if (IS_ERR(target) || !target) return NULL; return target; @@ -237,6 +237,64 @@ int xt_find_revision(int af, const char } EXPORT_SYMBOL_GPL(xt_find_revision); +int xt_check_match(const struct xt_match *match, unsigned short family, + unsigned int size, const char *table, unsigned int hook_mask, + unsigned short proto, int inv_proto) +{ + if (XT_ALIGN(match->matchsize) != size) { + printk("%s_tables: %s match: invalid size %Zu != %u\n", + xt_prefix[family], match->name, + XT_ALIGN(match->matchsize), size); + return -EINVAL; + } + if (match->table && strcmp(match->table, table)) { + printk("%s_tables: %s match: only valid in %s table, not %s\n", + xt_prefix[family], match->name, match->table, table); + return -EINVAL; + } + if (match->hooks && (hook_mask & ~match->hooks) != 0) { + printk("%s_tables: %s match: bad hook_mask %u\n", + xt_prefix[family], match->name, hook_mask); + return -EINVAL; + } + if (match->proto && (match->proto != proto || inv_proto)) { + printk("%s_tables: %s match: only valid for protocol %u\n", + xt_prefix[family], match->name, match->proto); + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL_GPL(xt_check_match); + +int xt_check_target(const struct xt_target *target, unsigned short family, + unsigned int size, const char *table, unsigned int hook_mask, + unsigned short proto, int inv_proto) +{ + if (XT_ALIGN(target->targetsize) != size) { + printk("%s_tables: %s target: invalid size %Zu != %u\n", + xt_prefix[family], target->name, + XT_ALIGN(target->targetsize), size); + return -EINVAL; + } + if (target->table && strcmp(target->table, table)) { + printk("%s_tables: %s target: only valid in %s table, not %s\n", + xt_prefix[family], target->name, target->table, table); + return -EINVAL; + } + if (target->hooks && (hook_mask & ~target->hooks) != 0) { + printk("%s_tables: %s target: bad hook_mask %u\n", + xt_prefix[family], target->name, hook_mask); + return -EINVAL; + } + if (target->proto && (target->proto != proto || inv_proto)) { + printk("%s_tables: %s target: only valid for protocol %u\n", + xt_prefix[family], target->name, target->proto); + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL_GPL(xt_check_target); + struct xt_table_info *xt_alloc_table_info(unsigned int size) { struct xt_table_info *newinfo; diff -puN net/netfilter/xt_CLASSIFY.c~git-net net/netfilter/xt_CLASSIFY.c --- devel/net/netfilter/xt_CLASSIFY.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_CLASSIFY.c 2006-02-27 22:37:47.000000000 -0800 @@ -28,6 +28,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -39,47 +40,22 @@ target(struct sk_buff **pskb, return XT_CONTINUE; } -static int -checkentry(const char *tablename, - const void *e, - void *targinfo, - unsigned int targinfosize, - unsigned int hook_mask) -{ - if (targinfosize != XT_ALIGN(sizeof(struct xt_classify_target_info))){ - printk(KERN_ERR "CLASSIFY: invalid size (%u != %Zu).\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_classify_target_info))); - return 0; - } - - if (hook_mask & ~((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD) | - (1 << NF_IP_POST_ROUTING))) { - printk(KERN_ERR "CLASSIFY: only valid in LOCAL_OUT, FORWARD " - "and POST_ROUTING.\n"); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_ERR "CLASSIFY: can only be called from " - "\"mangle\" table, not \"%s\".\n", - tablename); - return 0; - } - - return 1; -} - static struct xt_target classify_reg = { .name = "CLASSIFY", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_classify_target_info), + .table = "mangle", + .hooks = (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_POST_ROUTING), .me = THIS_MODULE, }; static struct xt_target classify6_reg = { .name = "CLASSIFY", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_classify_target_info), + .table = "mangle", + .hooks = (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_POST_ROUTING), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_comment.c~git-net net/netfilter/xt_comment.c --- devel/net/netfilter/xt_comment.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_comment.c 2006-02-27 22:37:47.000000000 -0800 @@ -19,6 +19,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protooff, @@ -28,30 +29,17 @@ match(const struct sk_buff *skb, return 1; } -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - /* Check the size */ - if (matchsize != XT_ALIGN(sizeof(struct xt_comment_info))) - return 0; - return 1; -} - static struct xt_match comment_match = { .name = "comment", .match = match, - .checkentry = checkentry, + .matchsize = sizeof(struct xt_comment_info), .me = THIS_MODULE }; static struct xt_match comment6_match = { .name = "comment", .match = match, - .checkentry = checkentry, + .matchsize = sizeof(struct xt_comment_info), .me = THIS_MODULE }; diff -puN net/netfilter/xt_connbytes.c~git-net net/netfilter/xt_connbytes.c --- devel/net/netfilter/xt_connbytes.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_connbytes.c 2006-02-27 22:37:47.000000000 -0800 @@ -44,6 +44,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -122,15 +123,13 @@ match(const struct sk_buff *skb, static int check(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct xt_connbytes_info *sinfo = matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_connbytes_info))) - return 0; - if (sinfo->what != XT_CONNBYTES_PKTS && sinfo->what != XT_CONNBYTES_BYTES && sinfo->what != XT_CONNBYTES_AVGPKT) @@ -146,14 +145,16 @@ static int check(const char *tablename, static struct xt_match connbytes_match = { .name = "connbytes", - .match = &match, - .checkentry = &check, + .match = match, + .checkentry = check, + .matchsize = sizeof(struct xt_connbytes_info), .me = THIS_MODULE }; static struct xt_match connbytes6_match = { .name = "connbytes", - .match = &match, - .checkentry = &check, + .match = match, + .checkentry = check, + .matchsize = sizeof(struct xt_connbytes_info), .me = THIS_MODULE }; diff -puN net/netfilter/xt_connmark.c~git-net net/netfilter/xt_connmark.c --- devel/net/netfilter/xt_connmark.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_connmark.c 2006-02-27 22:37:47.000000000 -0800 @@ -35,6 +35,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -52,37 +53,36 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - struct xt_connmark_info *cm = - (struct xt_connmark_info *)matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_connmark_info))) - return 0; + struct xt_connmark_info *cm = (struct xt_connmark_info *)matchinfo; if (cm->mark > 0xffffffff || cm->mask > 0xffffffff) { printk(KERN_WARNING "connmark: only support 32bit mark\n"); return 0; } - return 1; } static struct xt_match connmark_match = { - .name = "connmark", - .match = &match, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "connmark", + .match = match, + .matchsize = sizeof(struct xt_connmark_info), + .checkentry = checkentry, + .me = THIS_MODULE }; + static struct xt_match connmark6_match = { - .name = "connmark", - .match = &match, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "connmark", + .match = match, + .matchsize = sizeof(struct xt_connmark_info), + .checkentry = checkentry, + .me = THIS_MODULE }; - static int __init init(void) { int ret; diff -puN net/netfilter/xt_CONNMARK.c~git-net net/netfilter/xt_CONNMARK.c --- devel/net/netfilter/xt_CONNMARK.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_CONNMARK.c 2006-02-27 22:37:47.000000000 -0800 @@ -37,6 +37,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -74,17 +75,12 @@ target(struct sk_buff **pskb, static int checkentry(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct xt_connmark_target_info *matchinfo = targinfo; - if (targinfosize != XT_ALIGN(sizeof(struct xt_connmark_target_info))) { - printk(KERN_WARNING "CONNMARK: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_connmark_target_info))); - return 0; - } if (matchinfo->mode == XT_CONNMARK_RESTORE) { if (strcmp(tablename, "mangle") != 0) { @@ -102,16 +98,19 @@ checkentry(const char *tablename, } static struct xt_target connmark_reg = { - .name = "CONNMARK", - .target = &target, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "CONNMARK", + .target = target, + .targetsize = sizeof(struct xt_connmark_target_info), + .checkentry = checkentry, + .me = THIS_MODULE }; + static struct xt_target connmark6_reg = { - .name = "CONNMARK", - .target = &target, - .checkentry = &checkentry, - .me = THIS_MODULE + .name = "CONNMARK", + .target = target, + .targetsize = sizeof(struct xt_connmark_target_info), + .checkentry = checkentry, + .me = THIS_MODULE }; static int __init init(void) diff -puN net/netfilter/xt_conntrack.c~git-net net/netfilter/xt_conntrack.c --- devel/net/netfilter/xt_conntrack.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_conntrack.c 2006-02-27 22:37:47.000000000 -0800 @@ -32,6 +32,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -118,6 +119,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -201,22 +203,10 @@ match(const struct sk_buff *skb, #endif /* CONFIG_NF_IP_CONNTRACK */ -static int check(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_conntrack_info))) - return 0; - - return 1; -} - static struct xt_match conntrack_match = { .name = "conntrack", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_conntrack_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_dccp.c~git-net net/netfilter/xt_dccp.c --- devel/net/netfilter/xt_dccp.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_dccp.c 2006-02-27 22:37:47.000000000 -0800 @@ -95,6 +95,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -129,56 +130,34 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ipt_ip *ip = inf; - const struct xt_dccp_info *info; + const struct xt_dccp_info *info = matchinfo; - info = (const struct xt_dccp_info *)matchinfo; - - return ip->proto == IPPROTO_DCCP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_dccp_info)) - && !(info->flags & ~XT_DCCP_VALID_FLAGS) - && !(info->invflags & ~XT_DCCP_VALID_FLAGS) - && !(info->invflags & ~info->flags); -} - -static int -checkentry6(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ip6t_ip6 *ip = inf; - const struct xt_dccp_info *info; - - info = (const struct xt_dccp_info *)matchinfo; - - return ip->proto == IPPROTO_DCCP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_dccp_info)) - && !(info->flags & ~XT_DCCP_VALID_FLAGS) + return !(info->flags & ~XT_DCCP_VALID_FLAGS) && !(info->invflags & ~XT_DCCP_VALID_FLAGS) && !(info->invflags & ~info->flags); } - static struct xt_match dccp_match = { .name = "dccp", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_dccp_info), + .proto = IPPROTO_DCCP, + .checkentry = checkentry, .me = THIS_MODULE, }; static struct xt_match dccp6_match = { .name = "dccp", - .match = &match, - .checkentry = &checkentry6, + .match = match, + .matchsize = sizeof(struct xt_dccp_info), + .proto = IPPROTO_DCCP, + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_helper.c~git-net net/netfilter/xt_helper.c --- devel/net/netfilter/xt_helper.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_helper.c 2006-02-27 22:37:47.000000000 -0800 @@ -42,6 +42,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -89,6 +90,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -96,6 +98,7 @@ match(const struct sk_buff *skb, { const struct xt_helper_info *info = matchinfo; struct nf_conn *ct; + struct nf_conn_help *master_help; enum ip_conntrack_info ctinfo; int ret = info->invert; @@ -111,7 +114,8 @@ match(const struct sk_buff *skb, } read_lock_bh(&nf_conntrack_lock); - if (!ct->master->helper) { + master_help = nfct_help(ct->master); + if (!master_help || !master_help->helper) { DEBUGP("xt_helper: master ct %p has no helper\n", exp->expectant); goto out_unlock; @@ -123,8 +127,8 @@ match(const struct sk_buff *skb, if (info->name[0] == '\0') ret ^= 1; else - ret ^= !strncmp(ct->master->helper->name, info->name, - strlen(ct->master->helper->name)); + ret ^= !strncmp(master_help->helper->name, info->name, + strlen(master_help->helper->name)); out_unlock: read_unlock_bh(&nf_conntrack_lock); return ret; @@ -133,6 +137,7 @@ out_unlock: static int check(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -140,24 +145,21 @@ static int check(const char *tablename, struct xt_helper_info *info = matchinfo; info->name[29] = '\0'; - - /* verify size */ - if (matchsize != XT_ALIGN(sizeof(struct xt_helper_info))) - return 0; - return 1; } static struct xt_match helper_match = { .name = "helper", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_helper_info), + .checkentry = check, .me = THIS_MODULE, }; static struct xt_match helper6_match = { .name = "helper", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_helper_info), + .checkentry = check, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_length.c~git-net net/netfilter/xt_length.c --- devel/net/netfilter/xt_length.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_length.c 2006-02-27 22:37:47.000000000 -0800 @@ -24,6 +24,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -39,6 +40,7 @@ static int match6(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -50,29 +52,17 @@ match6(const struct sk_buff *skb, return (pktlen >= info->min && pktlen <= info->max) ^ info->invert; } -static int -checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_length_info))) - return 0; - - return 1; -} - static struct xt_match length_match = { .name = "length", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_length_info), .me = THIS_MODULE, }; + static struct xt_match length6_match = { .name = "length", - .match = &match6, - .checkentry = &checkentry, + .match = match6, + .matchsize = sizeof(struct xt_length_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_limit.c~git-net net/netfilter/xt_limit.c --- devel/net/netfilter/xt_limit.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_limit.c 2006-02-27 22:37:47.000000000 -0800 @@ -68,6 +68,7 @@ static int ipt_limit_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -107,15 +108,13 @@ user2credits(u_int32_t user) static int ipt_limit_checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { struct xt_rateinfo *r = matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_rateinfo))) - return 0; - /* Check for overflow. */ if (r->burst == 0 || user2credits(r->avg * r->burst) < user2credits(r->avg)) { @@ -140,12 +139,14 @@ ipt_limit_checkentry(const char *tablena static struct xt_match ipt_limit_reg = { .name = "limit", .match = ipt_limit_match, + .matchsize = sizeof(struct xt_rateinfo), .checkentry = ipt_limit_checkentry, .me = THIS_MODULE, }; static struct xt_match limit6_reg = { .name = "limit", .match = ipt_limit_match, + .matchsize = sizeof(struct xt_rateinfo), .checkentry = ipt_limit_checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_mac.c~git-net net/netfilter/xt_mac.c --- devel/net/netfilter/xt_mac.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_mac.c 2006-02-27 22:37:47.000000000 -0800 @@ -27,6 +27,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -42,37 +43,20 @@ match(const struct sk_buff *skb, ^ info->invert)); } -static int -ipt_mac_checkentry(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - /* FORWARD isn't always valid, but it's nice to be able to do --RR */ - if (hook_mask - & ~((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_IN) - | (1 << NF_IP_FORWARD))) { - printk("xt_mac: only valid for PRE_ROUTING, LOCAL_IN or FORWARD.\n"); - return 0; - } - - if (matchsize != XT_ALIGN(sizeof(struct xt_mac_info))) - return 0; - - return 1; -} - static struct xt_match mac_match = { .name = "mac", - .match = &match, - .checkentry = &ipt_mac_checkentry, + .match = match, + .matchsize = sizeof(struct xt_mac_info), + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_IN) | + (1 << NF_IP_FORWARD), .me = THIS_MODULE, }; static struct xt_match mac6_match = { .name = "mac", - .match = &match, - .checkentry = &ipt_mac_checkentry, + .match = match, + .matchsize = sizeof(struct xt_mac_info), + .hooks = (1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_IN) | + (1 << NF_IP_FORWARD), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_mark.c~git-net net/netfilter/xt_mark.c --- devel/net/netfilter/xt_mark.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_mark.c 2006-02-27 22:37:47.000000000 -0800 @@ -23,6 +23,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -36,34 +37,33 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *entry, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { struct xt_mark_info *minfo = (struct xt_mark_info *) matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_mark_info))) - return 0; - if (minfo->mark > 0xffffffff || minfo->mask > 0xffffffff) { printk(KERN_WARNING "mark: only supports 32bit mark\n"); return 0; } - return 1; } static struct xt_match mark_match = { .name = "mark", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_mark_info), + .checkentry = checkentry, .me = THIS_MODULE, }; static struct xt_match mark6_match = { .name = "mark", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_mark_info), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_MARK.c~git-net net/netfilter/xt_MARK.c --- devel/net/netfilter/xt_MARK.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_MARK.c 2006-02-27 22:37:47.000000000 -0800 @@ -26,6 +26,7 @@ target_v0(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -42,6 +43,7 @@ target_v1(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -72,53 +74,30 @@ target_v1(struct sk_buff **pskb, static int checkentry_v0(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct xt_mark_target_info *markinfo = targinfo; - if (targinfosize != XT_ALIGN(sizeof(struct xt_mark_target_info))) { - printk(KERN_WARNING "MARK: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_mark_target_info))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "MARK: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (markinfo->mark > 0xffffffff) { printk(KERN_WARNING "MARK: Only supports 32bit wide mark\n"); return 0; } - return 1; } static int checkentry_v1(const char *tablename, const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) { struct xt_mark_target_info_v1 *markinfo = targinfo; - if (targinfosize != XT_ALIGN(sizeof(struct xt_mark_target_info_v1))){ - printk(KERN_WARNING "MARK: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_mark_target_info_v1))); - return 0; - } - - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "MARK: can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } - if (markinfo->mode != XT_MARK_SET && markinfo->mode != XT_MARK_AND && markinfo->mode != XT_MARK_OR) { @@ -126,18 +105,18 @@ checkentry_v1(const char *tablename, markinfo->mode); return 0; } - if (markinfo->mark > 0xffffffff) { printk(KERN_WARNING "MARK: Only supports 32bit wide mark\n"); return 0; } - return 1; } static struct xt_target ipt_mark_reg_v0 = { .name = "MARK", .target = target_v0, + .targetsize = sizeof(struct xt_mark_target_info), + .table = "mangle", .checkentry = checkentry_v0, .me = THIS_MODULE, .revision = 0, @@ -146,6 +125,8 @@ static struct xt_target ipt_mark_reg_v0 static struct xt_target ipt_mark_reg_v1 = { .name = "MARK", .target = target_v1, + .targetsize = sizeof(struct xt_mark_target_info_v1), + .table = "mangle", .checkentry = checkentry_v1, .me = THIS_MODULE, .revision = 1, @@ -154,6 +135,8 @@ static struct xt_target ipt_mark_reg_v1 static struct xt_target ip6t_mark_reg_v0 = { .name = "MARK", .target = target_v0, + .targetsize = sizeof(struct xt_mark_target_info), + .table = "mangle", .checkentry = checkentry_v0, .me = THIS_MODULE, .revision = 0, diff -puN net/netfilter/xt_NFQUEUE.c~git-net net/netfilter/xt_NFQUEUE.c --- devel/net/netfilter/xt_NFQUEUE.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_NFQUEUE.c 2006-02-27 22:37:47.000000000 -0800 @@ -28,6 +28,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -36,41 +37,24 @@ target(struct sk_buff **pskb, return NF_QUEUE_NR(tinfo->queuenum); } -static int -checkentry(const char *tablename, - const void *entry, - void *targinfo, - unsigned int targinfosize, - unsigned int hook_mask) -{ - if (targinfosize != XT_ALIGN(sizeof(struct xt_NFQ_info))) { - printk(KERN_WARNING "NFQUEUE: targinfosize %u != %Zu\n", - targinfosize, - XT_ALIGN(sizeof(struct xt_NFQ_info))); - return 0; - } - - return 1; -} - static struct xt_target ipt_NFQ_reg = { .name = "NFQUEUE", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_NFQ_info), .me = THIS_MODULE, }; static struct xt_target ip6t_NFQ_reg = { .name = "NFQUEUE", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_NFQ_info), .me = THIS_MODULE, }; static struct xt_target arpt_NFQ_reg = { .name = "NFQUEUE", .target = target, - .checkentry = checkentry, + .targetsize = sizeof(struct xt_NFQ_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_NOTRACK.c~git-net net/netfilter/xt_NOTRACK.c --- devel/net/netfilter/xt_NOTRACK.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_NOTRACK.c 2006-02-27 22:37:47.000000000 -0800 @@ -15,6 +15,7 @@ target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -33,38 +34,20 @@ target(struct sk_buff **pskb, return XT_CONTINUE; } -static int -checkentry(const char *tablename, - const void *entry, - void *targinfo, - unsigned int targinfosize, - unsigned int hook_mask) -{ - if (targinfosize != 0) { - printk(KERN_WARNING "NOTRACK: targinfosize %u != 0\n", - targinfosize); - return 0; - } - - if (strcmp(tablename, "raw") != 0) { - printk(KERN_WARNING "NOTRACK: can only be called from \"raw\" table, not \"%s\"\n", tablename); - return 0; - } - - return 1; -} - -static struct xt_target notrack_reg = { - .name = "NOTRACK", - .target = target, - .checkentry = checkentry, - .me = THIS_MODULE, +static struct xt_target notrack_reg = { + .name = "NOTRACK", + .target = target, + .targetsize = 0, + .table = "raw", + .me = THIS_MODULE, }; -static struct xt_target notrack6_reg = { - .name = "NOTRACK", - .target = target, - .checkentry = checkentry, - .me = THIS_MODULE, + +static struct xt_target notrack6_reg = { + .name = "NOTRACK", + .target = target, + .targetsize = 0, + .table = "raw", + .me = THIS_MODULE, }; static int __init init(void) diff -puN net/netfilter/xt_physdev.c~git-net net/netfilter/xt_physdev.c --- devel/net/netfilter/xt_physdev.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_physdev.c 2006-02-27 22:37:47.000000000 -0800 @@ -26,6 +26,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -102,14 +103,13 @@ match_outdev: static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { const struct xt_physdev_info *info = matchinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_physdev_info))) - return 0; if (!(info->bitmask & XT_PHYSDEV_OP_MASK) || info->bitmask & ~XT_PHYSDEV_OP_MASK) return 0; @@ -118,15 +118,17 @@ checkentry(const char *tablename, static struct xt_match physdev_match = { .name = "physdev", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_physdev_info), + .checkentry = checkentry, .me = THIS_MODULE, }; static struct xt_match physdev6_match = { .name = "physdev", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_physdev_info), + .checkentry = checkentry, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_pkttype.c~git-net net/netfilter/xt_pkttype.c --- devel/net/netfilter/xt_pkttype.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_pkttype.c 2006-02-27 22:37:47.000000000 -0800 @@ -22,6 +22,7 @@ MODULE_ALIAS("ip6t_pkttype"); static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -32,32 +33,20 @@ static int match(const struct sk_buff *s return (skb->pkt_type == info->pkttype) ^ info->invert; } -static int checkentry(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_pkttype_info))) - return 0; - - return 1; -} - static struct xt_match pkttype_match = { .name = "pkttype", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_pkttype_info), .me = THIS_MODULE, }; + static struct xt_match pkttype6_match = { .name = "pkttype", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_pkttype_info), .me = THIS_MODULE, }; - static int __init init(void) { int ret; diff -puN /dev/null net/netfilter/xt_policy.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/net/netfilter/xt_policy.c 2006-02-27 22:37:47.000000000 -0800 @@ -0,0 +1,209 @@ +/* IP tables module for matching IPsec policy + * + * Copyright (c) 2004,2005 Patrick McHardy, + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include + +MODULE_AUTHOR("Patrick McHardy "); +MODULE_DESCRIPTION("Xtables IPsec policy matching module"); +MODULE_LICENSE("GPL"); + +static inline int +xt_addr_cmp(const union xt_policy_addr *a1, const union xt_policy_addr *m, + const union xt_policy_addr *a2, unsigned short family) +{ + switch (family) { + case AF_INET: + return (a1->a4.s_addr ^ a2->a4.s_addr) & m->a4.s_addr; + case AF_INET6: + return ipv6_masked_addr_cmp(&a1->a6, &m->a6, &a2->a6); + } + return 0; +} + +static inline int +match_xfrm_state(struct xfrm_state *x, const struct xt_policy_elem *e, + unsigned short family) +{ +#define MATCH_ADDR(x,y,z) (!e->match.x || \ + (xt_addr_cmp(&e->x, &e->y, z, family) \ + ^ e->invert.x)) +#define MATCH(x,y) (!e->match.x || ((e->x == (y)) ^ e->invert.x)) + + return MATCH_ADDR(saddr, smask, (union xt_policy_addr *)&x->props.saddr) && + MATCH_ADDR(daddr, dmask, (union xt_policy_addr *)&x->id.daddr.a4) && + MATCH(proto, x->id.proto) && + MATCH(mode, x->props.mode) && + MATCH(spi, x->id.spi) && + MATCH(reqid, x->props.reqid); +} + +static int +match_policy_in(const struct sk_buff *skb, const struct xt_policy_info *info, + unsigned short family) +{ + const struct xt_policy_elem *e; + struct sec_path *sp = skb->sp; + int strict = info->flags & XT_POLICY_MATCH_STRICT; + int i, pos; + + if (sp == NULL) + return -1; + if (strict && info->len != sp->len) + return 0; + + for (i = sp->len - 1; i >= 0; i--) { + pos = strict ? i - sp->len + 1 : 0; + if (pos >= info->len) + return 0; + e = &info->pol[pos]; + + if (match_xfrm_state(sp->x[i].xvec, e, family)) { + if (!strict) + return 1; + } else if (strict) + return 0; + } + + return strict ? 1 : 0; +} + +static int +match_policy_out(const struct sk_buff *skb, const struct xt_policy_info *info, + unsigned short family) +{ + const struct xt_policy_elem *e; + struct dst_entry *dst = skb->dst; + int strict = info->flags & XT_POLICY_MATCH_STRICT; + int i, pos; + + if (dst->xfrm == NULL) + return -1; + + for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { + pos = strict ? i : 0; + if (pos >= info->len) + return 0; + e = &info->pol[pos]; + + if (match_xfrm_state(dst->xfrm, e, family)) { + if (!strict) + return 1; + } else if (strict) + return 0; + } + + return strict ? i == info->len : 0; +} + +static int match(const struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + const struct xt_match *match, + const void *matchinfo, + int offset, + unsigned int protoff, + int *hotdrop) +{ + const struct xt_policy_info *info = matchinfo; + int ret; + + if (info->flags & XT_POLICY_MATCH_IN) + ret = match_policy_in(skb, info, match->family); + else + ret = match_policy_out(skb, info, match->family); + + if (ret < 0) + ret = info->flags & XT_POLICY_MATCH_NONE ? 1 : 0; + else if (info->flags & XT_POLICY_MATCH_NONE) + ret = 0; + + return ret; +} + +static int checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, + void *matchinfo, unsigned int matchsize, + unsigned int hook_mask) +{ + struct xt_policy_info *info = matchinfo; + + if (!(info->flags & (XT_POLICY_MATCH_IN|XT_POLICY_MATCH_OUT))) { + printk(KERN_ERR "xt_policy: neither incoming nor " + "outgoing policy selected\n"); + return 0; + } + /* hook values are equal for IPv4 and IPv6 */ + if (hook_mask & (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_LOCAL_IN) + && info->flags & XT_POLICY_MATCH_OUT) { + printk(KERN_ERR "xt_policy: output policy not valid in " + "PRE_ROUTING and INPUT\n"); + return 0; + } + if (hook_mask & (1 << NF_IP_POST_ROUTING | 1 << NF_IP_LOCAL_OUT) + && info->flags & XT_POLICY_MATCH_IN) { + printk(KERN_ERR "xt_policy: input policy not valid in " + "POST_ROUTING and OUTPUT\n"); + return 0; + } + if (info->len > XT_POLICY_MAX_ELEM) { + printk(KERN_ERR "xt_policy: too many policy elements\n"); + return 0; + } + return 1; +} + +static struct xt_match policy_match = { + .name = "policy", + .family = AF_INET, + .match = match, + .matchsize = sizeof(struct xt_policy_info), + .checkentry = checkentry, + .me = THIS_MODULE, +}; + +static struct xt_match policy6_match = { + .name = "policy", + .family = AF_INET6, + .match = match, + .matchsize = sizeof(struct xt_policy_info), + .checkentry = checkentry, + .me = THIS_MODULE, +}; + +static int __init init(void) +{ + int ret; + + ret = xt_register_match(AF_INET, &policy_match); + if (ret) + return ret; + ret = xt_register_match(AF_INET6, &policy6_match); + if (ret) + xt_unregister_match(AF_INET, &policy_match); + return ret; +} + +static void __exit fini(void) +{ + xt_unregister_match(AF_INET6, &policy6_match); + xt_unregister_match(AF_INET, &policy_match); +} + +module_init(init); +module_exit(fini); +MODULE_ALIAS("ipt_policy"); +MODULE_ALIAS("ip6t_policy"); diff -puN net/netfilter/xt_realm.c~git-net net/netfilter/xt_realm.c --- devel/net/netfilter/xt_realm.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_realm.c 2006-02-27 22:37:47.000000000 -0800 @@ -27,6 +27,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -38,30 +39,12 @@ match(const struct sk_buff *skb, return (info->id == (dst->tclassid & info->mask)) ^ info->invert; } -static int check(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (hook_mask - & ~((1 << NF_IP_POST_ROUTING) | (1 << NF_IP_FORWARD) | - (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_LOCAL_IN))) { - printk("xt_realm: only valid for POST_ROUTING, LOCAL_OUT, " - "LOCAL_IN or FORWARD.\n"); - return 0; - } - if (matchsize != XT_ALIGN(sizeof(struct xt_realm_info))) { - printk("xt_realm: invalid matchsize.\n"); - return 0; - } - return 1; -} - static struct xt_match realm_match = { .name = "realm", - .match = match, - .checkentry = check, + .match = match, + .matchsize = sizeof(struct xt_realm_info), + .hooks = (1 << NF_IP_POST_ROUTING) | (1 << NF_IP_FORWARD) | + (1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_LOCAL_IN), .me = THIS_MODULE }; diff -puN net/netfilter/xt_sctp.c~git-net net/netfilter/xt_sctp.c --- devel/net/netfilter/xt_sctp.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_sctp.c 2006-02-27 22:37:47.000000000 -0800 @@ -123,6 +123,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -162,19 +163,14 @@ match(const struct sk_buff *skb, static int checkentry(const char *tablename, const void *inf, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct xt_sctp_info *info; - const struct ipt_ip *ip = inf; - - info = (const struct xt_sctp_info *)matchinfo; + const struct xt_sctp_info *info = matchinfo; - return ip->proto == IPPROTO_SCTP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_sctp_info)) - && !(info->flags & ~XT_SCTP_VALID_FLAGS) + return !(info->flags & ~XT_SCTP_VALID_FLAGS) && !(info->invflags & ~XT_SCTP_VALID_FLAGS) && !(info->invflags & ~info->flags) && ((!(info->flags & XT_SCTP_CHUNK_TYPES)) || @@ -184,47 +180,23 @@ checkentry(const char *tablename, | SCTP_CHUNK_MATCH_ONLY))); } -static int -checkentry6(const char *tablename, - const void *inf, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct xt_sctp_info *info; - const struct ip6t_ip6 *ip = inf; - - info = (const struct xt_sctp_info *)matchinfo; - - return ip->proto == IPPROTO_SCTP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_sctp_info)) - && !(info->flags & ~XT_SCTP_VALID_FLAGS) - && !(info->invflags & ~XT_SCTP_VALID_FLAGS) - && !(info->invflags & ~info->flags) - && ((!(info->flags & XT_SCTP_CHUNK_TYPES)) || - (info->chunk_match_type & - (SCTP_CHUNK_MATCH_ALL - | SCTP_CHUNK_MATCH_ANY - | SCTP_CHUNK_MATCH_ONLY))); -} - - -static struct xt_match sctp_match = -{ - .name = "sctp", - .match = &match, - .checkentry = &checkentry, - .me = THIS_MODULE -}; -static struct xt_match sctp6_match = -{ - .name = "sctp", - .match = &match, - .checkentry = &checkentry6, - .me = THIS_MODULE +static struct xt_match sctp_match = { + .name = "sctp", + .match = match, + .matchsize = sizeof(struct xt_sctp_info), + .proto = IPPROTO_SCTP, + .checkentry = checkentry, + .me = THIS_MODULE }; +static struct xt_match sctp6_match = { + .name = "sctp", + .match = match, + .matchsize = sizeof(struct xt_sctp_info), + .proto = IPPROTO_SCTP, + .checkentry = checkentry, + .me = THIS_MODULE +}; static int __init init(void) { diff -puN net/netfilter/xt_state.c~git-net net/netfilter/xt_state.c --- devel/net/netfilter/xt_state.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_state.c 2006-02-27 22:37:47.000000000 -0800 @@ -24,6 +24,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -43,29 +44,17 @@ match(const struct sk_buff *skb, return (sinfo->statemask & statebit); } -static int check(const char *tablename, - const void *ip, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - if (matchsize != XT_ALIGN(sizeof(struct xt_state_info))) - return 0; - - return 1; -} - static struct xt_match state_match = { .name = "state", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_state_info), .me = THIS_MODULE, }; static struct xt_match state6_match = { .name = "state", - .match = &match, - .checkentry = &check, + .match = match, + .matchsize = sizeof(struct xt_state_info), .me = THIS_MODULE, }; diff -puN net/netfilter/xt_string.c~git-net net/netfilter/xt_string.c --- devel/net/netfilter/xt_string.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_string.c 2006-02-27 22:37:47.000000000 -0800 @@ -24,6 +24,7 @@ MODULE_ALIAS("ip6t_string"); static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -43,6 +44,7 @@ static int match(const struct sk_buff *s static int checkentry(const char *tablename, const void *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) @@ -50,9 +52,6 @@ static int checkentry(const char *tablen struct xt_string_info *conf = matchinfo; struct ts_config *ts_conf; - if (matchsize != XT_ALIGN(sizeof(struct xt_string_info))) - return 0; - /* Damn, can't handle this case properly with iptables... */ if (conf->from_offset > conf->to_offset) return 0; @@ -67,7 +66,8 @@ static int checkentry(const char *tablen return 1; } -static void destroy(void *matchinfo, unsigned int matchsize) +static void destroy(const struct xt_match *match, void *matchinfo, + unsigned int matchsize) { textsearch_destroy(STRING_TEXT_PRIV(matchinfo)->config); } @@ -75,6 +75,7 @@ static void destroy(void *matchinfo, uns static struct xt_match string_match = { .name = "string", .match = match, + .matchsize = sizeof(struct xt_string_info), .checkentry = checkentry, .destroy = destroy, .me = THIS_MODULE @@ -82,6 +83,7 @@ static struct xt_match string_match = { static struct xt_match string6_match = { .name = "string", .match = match, + .matchsize = sizeof(struct xt_string_info), .checkentry = checkentry, .destroy = destroy, .me = THIS_MODULE diff -puN net/netfilter/xt_tcpmss.c~git-net net/netfilter/xt_tcpmss.c --- devel/net/netfilter/xt_tcpmss.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_tcpmss.c 2006-02-27 22:37:47.000000000 -0800 @@ -81,6 +81,7 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -92,58 +93,19 @@ match(const struct sk_buff *skb, info->invert, hotdrop); } -static int -checkentry(const char *tablename, - const void *ipinfo, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ipt_ip *ip = ipinfo; - if (matchsize != XT_ALIGN(sizeof(struct xt_tcpmss_match_info))) - return 0; - - /* Must specify -p tcp */ - if (ip->proto != IPPROTO_TCP || (ip->invflags & IPT_INV_PROTO)) { - printk("tcpmss: Only works on TCP packets\n"); - return 0; - } - - return 1; -} - -static int -checkentry6(const char *tablename, - const void *ipinfo, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ip6t_ip6 *ip = ipinfo; - - if (matchsize != XT_ALIGN(sizeof(struct xt_tcpmss_match_info))) - return 0; - - /* Must specify -p tcp */ - if (ip->proto != IPPROTO_TCP || (ip->invflags & XT_INV_PROTO)) { - printk("tcpmss: Only works on TCP packets\n"); - return 0; - } - - return 1; -} - static struct xt_match tcpmss_match = { .name = "tcpmss", - .match = &match, - .checkentry = &checkentry, + .match = match, + .matchsize = sizeof(struct xt_tcpmss_match_info), + .proto = IPPROTO_TCP, .me = THIS_MODULE, }; static struct xt_match tcpmss6_match = { .name = "tcpmss", - .match = &match, - .checkentry = &checkentry6, + .match = match, + .matchsize = sizeof(struct xt_tcpmss_match_info), + .proto = IPPROTO_TCP, .me = THIS_MODULE, }; diff -puN net/netfilter/xt_tcpudp.c~git-net net/netfilter/xt_tcpudp.c --- devel/net/netfilter/xt_tcpudp.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netfilter/xt_tcpudp.c 2006-02-27 22:37:47.000000000 -0800 @@ -74,6 +74,7 @@ static int tcp_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -138,43 +139,22 @@ tcp_match(const struct sk_buff *skb, static int tcp_checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { - const struct ipt_ip *ip = info; const struct xt_tcp *tcpinfo = matchinfo; - /* Must specify proto == TCP, and no unknown invflags */ - return ip->proto == IPPROTO_TCP - && !(ip->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_tcp)) - && !(tcpinfo->invflags & ~XT_TCP_INV_MASK); + /* Must specify no unknown invflags */ + return !(tcpinfo->invflags & ~XT_TCP_INV_MASK); } -/* Called when user tries to insert an entry of this type. */ -static int -tcp6_checkentry(const char *tablename, - const void *entry, - void *matchinfo, - unsigned int matchsize, - unsigned int hook_mask) -{ - const struct ip6t_ip6 *ipv6 = entry; - const struct xt_tcp *tcpinfo = matchinfo; - - /* Must specify proto == TCP, and no unknown invflags */ - return ipv6->proto == IPPROTO_TCP - && !(ipv6->invflags & XT_INV_PROTO) - && matchsize == XT_ALIGN(sizeof(struct xt_tcp)) - && !(tcpinfo->invflags & ~XT_TCP_INV_MASK); -} - - static int udp_match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, unsigned int protoff, @@ -208,87 +188,49 @@ udp_match(const struct sk_buff *skb, static int udp_checkentry(const char *tablename, const void *info, + const struct xt_match *match, void *matchinfo, - unsigned int matchinfosize, - unsigned int hook_mask) -{ - const struct ipt_ip *ip = info; - const struct xt_udp *udpinfo = matchinfo; - - /* Must specify proto == UDP, and no unknown invflags */ - if (ip->proto != IPPROTO_UDP || (ip->invflags & XT_INV_PROTO)) { - duprintf("ipt_udp: Protocol %u != %u\n", ip->proto, - IPPROTO_UDP); - return 0; - } - if (matchinfosize != XT_ALIGN(sizeof(struct xt_udp))) { - duprintf("ipt_udp: matchsize %u != %u\n", - matchinfosize, XT_ALIGN(sizeof(struct xt_udp))); - return 0; - } - if (udpinfo->invflags & ~XT_UDP_INV_MASK) { - duprintf("ipt_udp: unknown flags %X\n", - udpinfo->invflags); - return 0; - } - - return 1; -} - -/* Called when user tries to insert an entry of this type. */ -static int -udp6_checkentry(const char *tablename, - const void *entry, - void *matchinfo, - unsigned int matchinfosize, + unsigned int matchsize, unsigned int hook_mask) { - const struct ip6t_ip6 *ipv6 = entry; - const struct xt_udp *udpinfo = matchinfo; + const struct xt_tcp *udpinfo = matchinfo; - /* Must specify proto == UDP, and no unknown invflags */ - if (ipv6->proto != IPPROTO_UDP || (ipv6->invflags & XT_INV_PROTO)) { - duprintf("ip6t_udp: Protocol %u != %u\n", ipv6->proto, - IPPROTO_UDP); - return 0; - } - if (matchinfosize != XT_ALIGN(sizeof(struct xt_udp))) { - duprintf("ip6t_udp: matchsize %u != %u\n", - matchinfosize, XT_ALIGN(sizeof(struct xt_udp))); - return 0; - } - if (udpinfo->invflags & ~XT_UDP_INV_MASK) { - duprintf("ip6t_udp: unknown flags %X\n", - udpinfo->invflags); - return 0; - } - - return 1; + /* Must specify no unknown invflags */ + return !(udpinfo->invflags & ~XT_UDP_INV_MASK); } static struct xt_match tcp_matchstruct = { .name = "tcp", - .match = &tcp_match, - .checkentry = &tcp_checkentry, + .match = tcp_match, + .matchsize = sizeof(struct xt_tcp), + .proto = IPPROTO_TCP, + .checkentry = tcp_checkentry, .me = THIS_MODULE, }; + static struct xt_match tcp6_matchstruct = { .name = "tcp", - .match = &tcp_match, - .checkentry = &tcp6_checkentry, + .match = tcp_match, + .matchsize = sizeof(struct xt_tcp), + .proto = IPPROTO_TCP, + .checkentry = tcp_checkentry, .me = THIS_MODULE, }; static struct xt_match udp_matchstruct = { .name = "udp", - .match = &udp_match, - .checkentry = &udp_checkentry, + .match = udp_match, + .matchsize = sizeof(struct xt_udp), + .proto = IPPROTO_UDP, + .checkentry = udp_checkentry, .me = THIS_MODULE, }; static struct xt_match udp6_matchstruct = { .name = "udp", - .match = &udp_match, - .checkentry = &udp6_checkentry, + .match = udp_match, + .matchsize = sizeof(struct xt_udp), + .proto = IPPROTO_UDP, + .checkentry = udp_checkentry, .me = THIS_MODULE, }; diff -puN net/netlink/af_netlink.c~git-net net/netlink/af_netlink.c --- devel/net/netlink/af_netlink.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/netlink/af_netlink.c 2006-02-27 22:37:47.000000000 -0800 @@ -106,6 +106,7 @@ struct nl_pid_hash { struct netlink_table { struct nl_pid_hash hash; struct hlist_head mc_list; + unsigned long *listeners; unsigned int nl_nonroot; unsigned int groups; struct module *module; @@ -296,6 +297,24 @@ static inline int nl_pid_hash_dilute(str static const struct proto_ops netlink_ops; +static void +netlink_update_listeners(struct sock *sk) +{ + struct netlink_table *tbl = &nl_table[sk->sk_protocol]; + struct hlist_node *node; + unsigned long mask; + unsigned int i; + + for (i = 0; i < NLGRPSZ(tbl->groups)/sizeof(unsigned long); i++) { + mask = 0; + sk_for_each_bound(sk, node, &tbl->mc_list) + mask |= nlk_sk(sk)->groups[i]; + tbl->listeners[i] = mask; + } + /* this function is only called with the netlink table "grabbed", which + * makes sure updates are visible before bind or setsockopt return. */ +} + static int netlink_insert(struct sock *sk, u32 pid) { struct nl_pid_hash *hash = &nl_table[sk->sk_protocol].hash; @@ -456,12 +475,14 @@ static int netlink_release(struct socket if (nlk->module) module_put(nlk->module); + netlink_table_grab(); if (nlk->flags & NETLINK_KERNEL_SOCKET) { - netlink_table_grab(); + kfree(nl_table[sk->sk_protocol].listeners); nl_table[sk->sk_protocol].module = NULL; nl_table[sk->sk_protocol].registered = 0; - netlink_table_ungrab(); - } + } else if (nlk->subscriptions) + netlink_update_listeners(sk); + netlink_table_ungrab(); kfree(nlk->groups); nlk->groups = NULL; @@ -589,6 +610,7 @@ static int netlink_bind(struct socket *s hweight32(nladdr->nl_groups) - hweight32(nlk->groups[0])); nlk->groups[0] = (nlk->groups[0] & ~0xffffffffUL) | nladdr->nl_groups; + netlink_update_listeners(sk); netlink_table_ungrab(); return 0; @@ -807,6 +829,17 @@ retry: return netlink_sendskb(sk, skb, ssk->sk_protocol); } +int netlink_has_listeners(struct sock *sk, unsigned int group) +{ + int res = 0; + + BUG_ON(!(nlk_sk(sk)->flags & NETLINK_KERNEL_SOCKET)); + if (group - 1 < nl_table[sk->sk_protocol].groups) + res = test_bit(group - 1, nl_table[sk->sk_protocol].listeners); + return res; +} +EXPORT_SYMBOL_GPL(netlink_has_listeners); + static __inline__ int netlink_broadcast_deliver(struct sock *sk, struct sk_buff *skb) { struct netlink_sock *nlk = nlk_sk(sk); @@ -1011,6 +1044,7 @@ static int netlink_setsockopt(struct soc else __clear_bit(val - 1, nlk->groups); netlink_update_subscriptions(sk, subscriptions); + netlink_update_listeners(sk); netlink_table_ungrab(); err = 0; break; @@ -1236,6 +1270,7 @@ netlink_kernel_create(int unit, unsigned struct socket *sock; struct sock *sk; struct netlink_sock *nlk; + unsigned long *listeners = NULL; if (!nl_table) return NULL; @@ -1249,6 +1284,13 @@ netlink_kernel_create(int unit, unsigned if (__netlink_create(sock, unit) < 0) goto out_sock_release; + if (groups < 32) + groups = 32; + + listeners = kzalloc(NLGRPSZ(groups), GFP_KERNEL); + if (!listeners) + goto out_sock_release; + sk = sock->sk; sk->sk_data_ready = netlink_data_ready; if (input) @@ -1261,7 +1303,8 @@ netlink_kernel_create(int unit, unsigned nlk->flags |= NETLINK_KERNEL_SOCKET; netlink_table_grab(); - nl_table[unit].groups = groups < 32 ? 32 : groups; + nl_table[unit].groups = groups; + nl_table[unit].listeners = listeners; nl_table[unit].module = module; nl_table[unit].registered = 1; netlink_table_ungrab(); @@ -1269,6 +1312,7 @@ netlink_kernel_create(int unit, unsigned return sk; out_sock_release: + kfree(listeners); sock_release(sock); return NULL; } diff -puN net/sched/act_ipt.c~git-net net/sched/act_ipt.c --- devel/net/sched/act_ipt.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/act_ipt.c 2006-02-27 22:37:47.000000000 -0800 @@ -70,7 +70,8 @@ ipt_init_target(struct ipt_entry_target t->u.kernel.target = target; if (t->u.kernel.target->checkentry - && !t->u.kernel.target->checkentry(table, NULL, t->data, + && !t->u.kernel.target->checkentry(table, NULL, + t->u.kernel.target, t->data, t->u.target_size - sizeof(*t), hook)) { DPRINTK("ipt_init_target: check failed for `%s'.\n", @@ -86,7 +87,7 @@ static void ipt_destroy_target(struct ipt_entry_target *t) { if (t->u.kernel.target->destroy) - t->u.kernel.target->destroy(t->data, + t->u.kernel.target->destroy(t->u.kernel.target, t->data, t->u.target_size - sizeof(*t)); module_put(t->u.kernel.target->me); } @@ -224,8 +225,9 @@ tcf_ipt(struct sk_buff *skb, struct tc_a /* iptables targets take a double skb pointer in case the skb * needs to be replaced. We don't own the skb, so this must not * happen. The pskb_expand_head above should make sure of this */ - ret = p->t->u.kernel.target->target(&skb, skb->dev, NULL, - p->hook, p->t->data, NULL); + ret = p->t->u.kernel.target->target(&skb, skb->dev, NULL, p->hook, + p->t->u.kernel.target, p->t->data, + NULL); switch (ret) { case NF_ACCEPT: result = TC_ACT_OK; diff -puN net/sched/sch_atm.c~git-net net/sched/sch_atm.c --- devel/net/sched/sch_atm.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/sch_atm.c 2006-02-27 22:37:47.000000000 -0800 @@ -638,6 +638,7 @@ static int atm_tc_dump_class(struct Qdis sch,p,flow,skb,tcm); if (!find_flow(p,flow)) return -EINVAL; tcm->tcm_handle = flow->classid; + tcm->tcm_info = flow->q->handle; rta = (struct rtattr *) b; RTA_PUT(skb,TCA_OPTIONS,0,NULL); RTA_PUT(skb,TCA_ATM_HDR,flow->hdr_len,flow->hdr); diff -puN net/sched/sch_dsmark.c~git-net net/sched/sch_dsmark.c --- devel/net/sched/sch_dsmark.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/sch_dsmark.c 2006-02-27 22:37:47.000000000 -0800 @@ -438,6 +438,7 @@ static int dsmark_dump_class(struct Qdis return -EINVAL; tcm->tcm_handle = TC_H_MAKE(TC_H_MAJ(sch->handle), cl-1); + tcm->tcm_info = p->q->handle; opts = RTA_NEST(skb, TCA_OPTIONS); RTA_PUT_U8(skb,TCA_DSMARK_MASK, p->mask[cl-1]); diff -puN net/sched/sch_netem.c~git-net net/sched/sch_netem.c --- devel/net/sched/sch_netem.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/sch_netem.c 2006-02-27 22:37:47.000000000 -0800 @@ -252,9 +252,9 @@ static int netem_requeue(struct sk_buff static unsigned int netem_drop(struct Qdisc* sch) { struct netem_sched_data *q = qdisc_priv(sch); - unsigned int len; + unsigned int len = 0; - if ((len = q->qdisc->ops->drop(q->qdisc)) != 0) { + if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { sch->q.qlen--; sch->qstats.drops++; } diff -puN net/sched/sch_prio.c~git-net net/sched/sch_prio.c --- devel/net/sched/sch_prio.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/sch_prio.c 2006-02-27 22:37:47.000000000 -0800 @@ -165,7 +165,7 @@ static unsigned int prio_drop(struct Qdi for (prio = q->bands-1; prio >= 0; prio--) { qdisc = q->queues[prio]; - if ((len = qdisc->ops->drop(qdisc)) != 0) { + if (qdisc->ops->drop && (len = qdisc->ops->drop(qdisc)) != 0) { sch->q.qlen--; return len; } diff -puN net/sched/sch_red.c~git-net net/sched/sch_red.c --- devel/net/sched/sch_red.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/sch_red.c 2006-02-27 22:37:47.000000000 -0800 @@ -44,6 +44,7 @@ struct red_sched_data unsigned char flags; struct red_parms parms; struct red_stats stats; + struct Qdisc *qdisc; }; static inline int red_use_ecn(struct red_sched_data *q) @@ -59,8 +60,10 @@ static inline int red_use_harddrop(struc static int red_enqueue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; + int ret; - q->parms.qavg = red_calc_qavg(&q->parms, sch->qstats.backlog); + q->parms.qavg = red_calc_qavg(&q->parms, child->qstats.backlog); if (red_is_idling(&q->parms)) red_end_of_idle_period(&q->parms); @@ -91,11 +94,16 @@ static int red_enqueue(struct sk_buff *s break; } - if (sch->qstats.backlog + skb->len <= q->limit) - return qdisc_enqueue_tail(skb, sch); - - q->stats.pdrop++; - return qdisc_drop(skb, sch); + ret = child->enqueue(skb, child); + if (likely(ret == NET_XMIT_SUCCESS)) { + sch->bstats.bytes += skb->len; + sch->bstats.packets++; + sch->q.qlen++; + } else { + q->stats.pdrop++; + sch->qstats.drops++; + } + return ret; congestion_drop: qdisc_drop(skb, sch); @@ -105,21 +113,30 @@ congestion_drop: static int red_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; + int ret; if (red_is_idling(&q->parms)) red_end_of_idle_period(&q->parms); - return qdisc_requeue(skb, sch); + ret = child->ops->requeue(skb, child); + if (likely(ret == NET_XMIT_SUCCESS)) { + sch->qstats.requeues++; + sch->q.qlen++; + } + return ret; } static struct sk_buff * red_dequeue(struct Qdisc* sch) { struct sk_buff *skb; struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; - skb = qdisc_dequeue_head(sch); - - if (skb == NULL && !red_is_idling(&q->parms)) + skb = child->dequeue(child); + if (skb) + sch->q.qlen--; + else if (!red_is_idling(&q->parms)) red_start_of_idle_period(&q->parms); return skb; @@ -127,14 +144,14 @@ static struct sk_buff * red_dequeue(stru static unsigned int red_drop(struct Qdisc* sch) { - struct sk_buff *skb; struct red_sched_data *q = qdisc_priv(sch); + struct Qdisc *child = q->qdisc; + unsigned int len; - skb = qdisc_dequeue_tail(sch); - if (skb) { - unsigned int len = skb->len; + if (child->ops->drop && (len = child->ops->drop(child)) > 0) { q->stats.other++; - qdisc_drop(skb, sch); + sch->qstats.drops++; + sch->q.qlen--; return len; } @@ -148,15 +165,48 @@ static void red_reset(struct Qdisc* sch) { struct red_sched_data *q = qdisc_priv(sch); - qdisc_reset_queue(sch); + qdisc_reset(q->qdisc); + sch->q.qlen = 0; red_restart(&q->parms); } +static void red_destroy(struct Qdisc *sch) +{ + struct red_sched_data *q = qdisc_priv(sch); + qdisc_destroy(q->qdisc); +} + +static struct Qdisc *red_create_dflt(struct net_device *dev, u32 limit) +{ + struct Qdisc *q = qdisc_create_dflt(dev, &bfifo_qdisc_ops); + struct rtattr *rta; + int ret; + + if (q) { + rta = kmalloc(RTA_LENGTH(sizeof(struct tc_fifo_qopt)), + GFP_KERNEL); + if (rta) { + rta->rta_type = RTM_NEWQDISC; + rta->rta_len = RTA_LENGTH(sizeof(struct tc_fifo_qopt)); + ((struct tc_fifo_qopt *)RTA_DATA(rta))->limit = limit; + + ret = q->ops->change(q, rta); + kfree(rta); + + if (ret == 0) + return q; + } + qdisc_destroy(q); + } + return NULL; +} + static int red_change(struct Qdisc *sch, struct rtattr *opt) { struct red_sched_data *q = qdisc_priv(sch); struct rtattr *tb[TCA_RED_MAX]; struct tc_red_qopt *ctl; + struct Qdisc *child = NULL; if (opt == NULL || rtattr_parse_nested(tb, TCA_RED_MAX, opt)) return -EINVAL; @@ -169,9 +219,17 @@ static int red_change(struct Qdisc *sch, ctl = RTA_DATA(tb[TCA_RED_PARMS-1]); + if (ctl->limit > 0) { + child = red_create_dflt(sch->dev, ctl->limit); + if (child == NULL) + return -ENOMEM; + } + sch_tree_lock(sch); q->flags = ctl->flags; q->limit = ctl->limit; + if (child) + qdisc_destroy(xchg(&q->qdisc, child)); red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog, ctl->Plog, ctl->Scell_log, @@ -186,6 +244,9 @@ static int red_change(struct Qdisc *sch, static int red_init(struct Qdisc* sch, struct rtattr *opt) { + struct red_sched_data *q = qdisc_priv(sch); + + q->qdisc = &noop_qdisc; return red_change(sch, opt); } @@ -224,15 +285,101 @@ static int red_dump_stats(struct Qdisc * return gnet_stats_copy_app(d, &st, sizeof(st)); } +static int red_dump_class(struct Qdisc *sch, unsigned long cl, + struct sk_buff *skb, struct tcmsg *tcm) +{ + struct red_sched_data *q = qdisc_priv(sch); + + if (cl != 1) + return -ENOENT; + tcm->tcm_handle |= TC_H_MIN(1); + tcm->tcm_info = q->qdisc->handle; + return 0; +} + +static int red_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, + struct Qdisc **old) +{ + struct red_sched_data *q = qdisc_priv(sch); + + if (new == NULL) + new = &noop_qdisc; + + sch_tree_lock(sch); + *old = xchg(&q->qdisc, new); + qdisc_reset(*old); + sch->q.qlen = 0; + sch_tree_unlock(sch); + return 0; +} + +static struct Qdisc *red_leaf(struct Qdisc *sch, unsigned long arg) +{ + struct red_sched_data *q = qdisc_priv(sch); + return q->qdisc; +} + +static unsigned long red_get(struct Qdisc *sch, u32 classid) +{ + return 1; +} + +static void red_put(struct Qdisc *sch, unsigned long arg) +{ + return; +} + +static int red_change_class(struct Qdisc *sch, u32 classid, u32 parentid, + struct rtattr **tca, unsigned long *arg) +{ + return -ENOSYS; +} + +static int red_delete(struct Qdisc *sch, unsigned long cl) +{ + return -ENOSYS; +} + +static void red_walk(struct Qdisc *sch, struct qdisc_walker *walker) +{ + if (!walker->stop) { + if (walker->count >= walker->skip) + if (walker->fn(sch, 1, walker) < 0) { + walker->stop = 1; + return; + } + walker->count++; + } +} + +static struct tcf_proto **red_find_tcf(struct Qdisc *sch, unsigned long cl) +{ + return NULL; +} + +static struct Qdisc_class_ops red_class_ops = { + .graft = red_graft, + .leaf = red_leaf, + .get = red_get, + .put = red_put, + .change = red_change_class, + .delete = red_delete, + .walk = red_walk, + .tcf_chain = red_find_tcf, + .dump = red_dump_class, +}; + static struct Qdisc_ops red_qdisc_ops = { .id = "red", .priv_size = sizeof(struct red_sched_data), + .cl_ops = &red_class_ops, .enqueue = red_enqueue, .dequeue = red_dequeue, .requeue = red_requeue, .drop = red_drop, .init = red_init, .reset = red_reset, + .destroy = red_destroy, .change = red_change, .dump = red_dump, .dump_stats = red_dump_stats, diff -puN net/sched/sch_sfq.c~git-net net/sched/sch_sfq.c --- devel/net/sched/sch_sfq.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/sch_sfq.c 2006-02-27 22:37:47.000000000 -0800 @@ -232,6 +232,7 @@ static unsigned int sfq_drop(struct Qdis sfq_dec(q, x); sch->q.qlen--; sch->qstats.drops++; + sch->qstats.backlog -= len; return len; } @@ -248,6 +249,7 @@ static unsigned int sfq_drop(struct Qdis sch->q.qlen--; q->ht[q->hash[d]] = SFQ_DEPTH; sch->qstats.drops++; + sch->qstats.backlog -= len; return len; } @@ -266,6 +268,7 @@ sfq_enqueue(struct sk_buff *skb, struct q->ht[hash] = x = q->dep[SFQ_DEPTH].next; q->hash[x] = hash; } + sch->qstats.backlog += skb->len; __skb_queue_tail(&q->qs[x], skb); sfq_inc(q, x); if (q->qs[x].qlen == 1) { /* The flow is new */ @@ -301,6 +304,7 @@ sfq_requeue(struct sk_buff *skb, struct q->ht[hash] = x = q->dep[SFQ_DEPTH].next; q->hash[x] = hash; } + sch->qstats.backlog += skb->len; __skb_queue_head(&q->qs[x], skb); sfq_inc(q, x); if (q->qs[x].qlen == 1) { /* The flow is new */ @@ -344,6 +348,7 @@ sfq_dequeue(struct Qdisc* sch) skb = __skb_dequeue(&q->qs[a]); sfq_dec(q, a); sch->q.qlen--; + sch->qstats.backlog -= skb->len; /* Is the slot empty? */ if (q->qs[a].qlen == 0) { diff -puN net/sched/sch_tbf.c~git-net net/sched/sch_tbf.c --- devel/net/sched/sch_tbf.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/sched/sch_tbf.c 2006-02-27 22:37:47.000000000 -0800 @@ -177,9 +177,9 @@ static int tbf_requeue(struct sk_buff *s static unsigned int tbf_drop(struct Qdisc* sch) { struct tbf_sched_data *q = qdisc_priv(sch); - unsigned int len; + unsigned int len = 0; - if ((len = q->qdisc->ops->drop(q->qdisc)) != 0) { + if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { sch->q.qlen--; sch->qstats.drops++; } @@ -341,13 +341,14 @@ static int tbf_change(struct Qdisc* sch, if (max_size < 0) goto done; - if (q->qdisc == &noop_qdisc) { + if (qopt->limit > 0) { if ((child = tbf_create_dflt_qdisc(sch->dev, qopt->limit)) == NULL) goto done; } sch_tree_lock(sch); - if (child) q->qdisc = child; + if (child) + qdisc_destroy(xchg(&q->qdisc, child)); q->limit = qopt->limit; q->mtu = qopt->mtu; q->max_size = max_size; diff -puN net/socket.c~git-net net/socket.c --- devel/net/socket.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/socket.c 2006-02-27 22:37:47.000000000 -0800 @@ -348,8 +348,8 @@ static struct dentry_operations sockfs_d /* * Obtains the first available file descriptor and sets it up for use. * - * This function creates file structure and maps it to fd space - * of current process. On success it returns file descriptor + * These functions create file structures and maps them to fd space + * of the current process. On success it returns file descriptor * and file struct implicitly stored in sock->file. * Note that another thread may close file descriptor before we return * from this function. We use the fact that now we do not refer @@ -362,52 +362,67 @@ static struct dentry_operations sockfs_d * but we take care of internal coherence yet. */ -int sock_map_fd(struct socket *sock) +static int sock_alloc_fd(struct file **filep) { int fd; - struct qstr this; - char name[32]; - - /* - * Find a file descriptor suitable for return to the user. - */ fd = get_unused_fd(); - if (fd >= 0) { + if (likely(fd >= 0)) { struct file *file = get_empty_filp(); - if (!file) { + *filep = file; + if (unlikely(!file)) { put_unused_fd(fd); - fd = -ENFILE; - goto out; + return -ENFILE; } + } else + *filep = NULL; + return fd; +} - this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); - this.name = name; - this.hash = SOCK_INODE(sock)->i_ino; - - file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this); - if (!file->f_dentry) { - put_filp(file); +static int sock_attach_fd(struct socket *sock, struct file *file) +{ + struct qstr this; + char name[32]; + + this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); + this.name = name; + this.hash = SOCK_INODE(sock)->i_ino; + + file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this); + if (unlikely(!file->f_dentry)) + return -ENOMEM; + + file->f_dentry->d_op = &sockfs_dentry_operations; + d_add(file->f_dentry, SOCK_INODE(sock)); + file->f_vfsmnt = mntget(sock_mnt); + file->f_mapping = file->f_dentry->d_inode->i_mapping; + + sock->file = file; + file->f_op = SOCK_INODE(sock)->i_fop = &socket_file_ops; + file->f_mode = FMODE_READ | FMODE_WRITE; + file->f_flags = O_RDWR; + file->f_pos = 0; + file->private_data = sock; + + return 0; +} + +int sock_map_fd(struct socket *sock) +{ + struct file *newfile; + int fd = sock_alloc_fd(&newfile); + + if (likely(fd >= 0)) { + int err = sock_attach_fd(sock, newfile); + + if (unlikely(err < 0)) { + put_filp(newfile); put_unused_fd(fd); - fd = -ENOMEM; - goto out; + return err; } - file->f_dentry->d_op = &sockfs_dentry_operations; - d_add(file->f_dentry, SOCK_INODE(sock)); - file->f_vfsmnt = mntget(sock_mnt); - file->f_mapping = file->f_dentry->d_inode->i_mapping; - - sock->file = file; - file->f_op = SOCK_INODE(sock)->i_fop = &socket_file_ops; - file->f_mode = FMODE_READ | FMODE_WRITE; - file->f_flags = O_RDWR; - file->f_pos = 0; - file->private_data = sock; - fd_install(fd, file); + fd_install(fd, newfile); } - -out: return fd; } @@ -1349,7 +1364,8 @@ asmlinkage long sys_listen(int fd, int b asmlinkage long sys_accept(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen) { struct socket *sock, *newsock; - int err, len; + struct file *newfile; + int err, len, newfd; char address[MAX_SOCK_ADDR]; sock = sockfd_lookup(fd, &err); @@ -1369,28 +1385,38 @@ asmlinkage long sys_accept(int fd, struc */ __module_get(newsock->ops->owner); + newfd = sock_alloc_fd(&newfile); + if (unlikely(newfd < 0)) { + err = newfd; + goto out_release; + } + + err = sock_attach_fd(newsock, newfile); + if (err < 0) + goto out_fd; + err = security_socket_accept(sock, newsock); if (err) - goto out_release; + goto out_fd; err = sock->ops->accept(sock, newsock, sock->file->f_flags); if (err < 0) - goto out_release; + goto out_fd; if (upeer_sockaddr) { if(newsock->ops->getname(newsock, (struct sockaddr *)address, &len, 2)<0) { err = -ECONNABORTED; - goto out_release; + goto out_fd; } err = move_addr_to_user(address, len, upeer_sockaddr, upeer_addrlen); if (err < 0) - goto out_release; + goto out_fd; } /* File flags are not inherited via accept() unlike another OSes. */ - if ((err = sock_map_fd(newsock)) < 0) - goto out_release; + fd_install(newfd, newfile); + err = newfd; security_socket_post_accept(sock, newsock); @@ -1398,6 +1424,9 @@ out_put: sockfd_put(sock); out: return err; +out_fd: + put_filp(newfile); + put_unused_fd(newfd); out_release: sock_release(newsock); goto out_put; diff -puN net/unix/af_unix.c~git-net net/unix/af_unix.c --- devel/net/unix/af_unix.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/unix/af_unix.c 2006-02-27 22:37:47.000000000 -0800 @@ -1427,15 +1427,15 @@ static int unix_stream_sendmsg(struct ki while(sent < len) { /* - * Optimisation for the fact that under 0.01% of X messages typically - * need breaking up. + * Optimisation for the fact that under 0.01% of X + * messages typically need breaking up. */ - size=len-sent; + size = len-sent; /* Keep two messages in the pipe so it schedules better */ - if (size > sk->sk_sndbuf / 2 - 64) - size = sk->sk_sndbuf / 2 - 64; + if (size > ((sk->sk_sndbuf >> 1) - 64)) + size = (sk->sk_sndbuf >> 1) - 64; if (size > SKB_MAX_ALLOC) size = SKB_MAX_ALLOC; diff -puN net/xfrm/xfrm_policy.c~git-net net/xfrm/xfrm_policy.c --- devel/net/xfrm/xfrm_policy.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/xfrm/xfrm_policy.c 2006-02-27 22:37:47.000000000 -0800 @@ -203,7 +203,7 @@ static void xfrm_policy_timer(unsigned l } if (warn) - km_policy_expired(xp, dir, 0); + km_policy_expired(xp, dir, 0, 0); if (next != LONG_MAX && !mod_timer(&xp->timer, jiffies + make_jiffies(next))) xfrm_pol_hold(xp); @@ -216,7 +216,7 @@ out: expired: read_unlock(&xp->lock); if (!xfrm_policy_delete(xp, dir)) - km_policy_expired(xp, dir, 1); + km_policy_expired(xp, dir, 1, 0); xfrm_pol_put(xp); } @@ -621,6 +621,7 @@ int xfrm_policy_delete(struct xfrm_polic } return -ENOENT; } +EXPORT_SYMBOL(xfrm_policy_delete); int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol) { diff -puN net/xfrm/xfrm_state.c~git-net net/xfrm/xfrm_state.c --- devel/net/xfrm/xfrm_state.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/xfrm/xfrm_state.c 2006-02-27 22:37:47.000000000 -0800 @@ -20,6 +20,15 @@ #include #include +struct sock *xfrm_nl; +EXPORT_SYMBOL(xfrm_nl); + +u32 sysctl_xfrm_aevent_etime = XFRM_AE_ETIME; +EXPORT_SYMBOL(sysctl_xfrm_aevent_etime); + +u32 sysctl_xfrm_aevent_rseqth = XFRM_AE_SEQT_SIZE; +EXPORT_SYMBOL(sysctl_xfrm_aevent_rseqth); + /* Each xfrm_state may be linked to two tables: 1. Hash table by (spi,daddr,ah/esp) to find SA by SPI. (input,ctl) @@ -50,18 +59,20 @@ static DEFINE_SPINLOCK(xfrm_state_gc_loc static int xfrm_state_gc_flush_bundles; -static int __xfrm_state_delete(struct xfrm_state *x); +int __xfrm_state_delete(struct xfrm_state *x); static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family); static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo); -static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); -static void km_state_expired(struct xfrm_state *x, int hard); +int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol); +void km_state_expired(struct xfrm_state *x, int hard, u32 pid); static void xfrm_state_gc_destroy(struct xfrm_state *x) { if (del_timer(&x->timer)) BUG(); + if (del_timer(&x->rtimer)) + BUG(); kfree(x->aalg); kfree(x->ealg); kfree(x->calg); @@ -153,7 +164,7 @@ static void xfrm_timer_handler(unsigned x->km.dying = warn; if (warn) - km_state_expired(x, 0); + km_state_expired(x, 0, 0); resched: if (next != LONG_MAX && !mod_timer(&x->timer, jiffies + make_jiffies(next))) @@ -168,13 +179,15 @@ expired: goto resched; } if (!__xfrm_state_delete(x) && x->id.spi) - km_state_expired(x, 1); + km_state_expired(x, 1, 0); out: spin_unlock(&x->lock); xfrm_state_put(x); } +static void xfrm_replay_timer_handler(unsigned long data); + struct xfrm_state *xfrm_state_alloc(void) { struct xfrm_state *x; @@ -190,11 +203,16 @@ struct xfrm_state *xfrm_state_alloc(void init_timer(&x->timer); x->timer.function = xfrm_timer_handler; x->timer.data = (unsigned long)x; + init_timer(&x->rtimer); + x->rtimer.function = xfrm_replay_timer_handler; + x->rtimer.data = (unsigned long)x; x->curlft.add_time = (unsigned long)xtime.tv_sec; x->lft.soft_byte_limit = XFRM_INF; x->lft.soft_packet_limit = XFRM_INF; x->lft.hard_byte_limit = XFRM_INF; x->lft.hard_packet_limit = XFRM_INF; + x->replay_maxage = 0; + x->replay_maxdiff = 0; spin_lock_init(&x->lock); } return x; @@ -212,7 +230,7 @@ void __xfrm_state_destroy(struct xfrm_st } EXPORT_SYMBOL(__xfrm_state_destroy); -static int __xfrm_state_delete(struct xfrm_state *x) +int __xfrm_state_delete(struct xfrm_state *x) { int err = -ESRCH; @@ -249,6 +267,7 @@ static int __xfrm_state_delete(struct xf return err; } +EXPORT_SYMBOL(__xfrm_state_delete); int xfrm_state_delete(struct xfrm_state *x) { @@ -426,6 +445,10 @@ static void __xfrm_state_insert(struct x if (!mod_timer(&x->timer, jiffies + HZ)) xfrm_state_hold(x); + if (x->replay_maxage && + !mod_timer(&x->rtimer, jiffies + x->replay_maxage)) + xfrm_state_hold(x); + wake_up(&km_waitq); } @@ -580,7 +603,7 @@ int xfrm_state_check_expire(struct xfrm_ (x->curlft.bytes >= x->lft.soft_byte_limit || x->curlft.packets >= x->lft.soft_packet_limit)) { x->km.dying = 1; - km_state_expired(x, 0); + km_state_expired(x, 0, 0); } return 0; } @@ -762,6 +785,61 @@ out: } EXPORT_SYMBOL(xfrm_state_walk); + +void xfrm_replay_notify(struct xfrm_state *x, int event) +{ + struct km_event c; + /* we send notify messages in case + * 1. we updated on of the sequence numbers, and the seqno difference + * is at least x->replay_maxdiff, in this case we also update the + * timeout of our timer function + * 2. if x->replay_maxage has elapsed since last update, + * and there were changes + * + * The state structure must be locked! + */ + + switch (event) { + case XFRM_REPLAY_UPDATE: + if (x->replay_maxdiff && + (x->replay.seq - x->preplay.seq < x->replay_maxdiff) && + (x->replay.oseq - x->preplay.oseq < x->replay_maxdiff)) + return; + + break; + + case XFRM_REPLAY_TIMEOUT: + if ((x->replay.seq == x->preplay.seq) && + (x->replay.bitmap == x->preplay.bitmap) && + (x->replay.oseq == x->preplay.oseq)) + return; + + break; + } + + memcpy(&x->preplay, &x->replay, sizeof(struct xfrm_replay_state)); + c.event = XFRM_MSG_NEWAE; + c.data.aevent = event; + km_state_notify(x, &c); + + if (x->replay_maxage && + !mod_timer(&x->rtimer, jiffies + x->replay_maxage)) + xfrm_state_hold(x); +} +EXPORT_SYMBOL(xfrm_replay_notify); + +static void xfrm_replay_timer_handler(unsigned long data) +{ + struct xfrm_state *x = (struct xfrm_state*)data; + + spin_lock(&x->lock); + + if (xfrm_aevent_is_on() && x->km.state == XFRM_STATE_VALID) + xfrm_replay_notify(x, XFRM_REPLAY_TIMEOUT); + + spin_unlock(&x->lock); +} + int xfrm_replay_check(struct xfrm_state *x, u32 seq) { u32 diff; @@ -805,6 +883,9 @@ void xfrm_replay_advance(struct xfrm_sta diff = x->replay.seq - seq; x->replay.bitmap |= (1U << diff); } + + if (xfrm_aevent_is_on()) + xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); } EXPORT_SYMBOL(xfrm_replay_advance); @@ -835,11 +916,12 @@ void km_state_notify(struct xfrm_state * EXPORT_SYMBOL(km_policy_notify); EXPORT_SYMBOL(km_state_notify); -static void km_state_expired(struct xfrm_state *x, int hard) +void km_state_expired(struct xfrm_state *x, int hard, u32 pid) { struct km_event c; c.data.hard = hard; + c.pid = pid; c.event = XFRM_MSG_EXPIRE; km_state_notify(x, &c); @@ -847,11 +929,12 @@ static void km_state_expired(struct xfrm wake_up(&km_waitq); } +EXPORT_SYMBOL(km_state_expired); /* * We send to all registered managers regardless of failure * We are happy with one success */ -static int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) +int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol) { int err = -EINVAL, acqret; struct xfrm_mgr *km; @@ -865,6 +948,7 @@ static int km_query(struct xfrm_state *x read_unlock(&xfrm_km_lock); return err; } +EXPORT_SYMBOL(km_query); int km_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr, u16 sport) { @@ -883,17 +967,19 @@ int km_new_mapping(struct xfrm_state *x, } EXPORT_SYMBOL(km_new_mapping); -void km_policy_expired(struct xfrm_policy *pol, int dir, int hard) +void km_policy_expired(struct xfrm_policy *pol, int dir, int hard, u32 pid) { struct km_event c; c.data.hard = hard; + c.pid = pid; c.event = XFRM_MSG_POLEXPIRE; km_policy_notify(pol, dir, &c); if (hard) wake_up(&km_waitq); } +EXPORT_SYMBOL(km_policy_expired); int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen) { diff -puN net/xfrm/xfrm_user.c~git-net net/xfrm/xfrm_user.c --- devel/net/xfrm/xfrm_user.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/net/xfrm/xfrm_user.c 2006-02-27 22:37:47.000000000 -0800 @@ -28,8 +28,6 @@ #include #include -static struct sock *xfrm_nl; - static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type) { struct rtattr *rt = xfrma[type - 1]; @@ -276,6 +274,56 @@ static void copy_from_user_state(struct x->props.flags = p->flags; } +/* + * someday when pfkey also has support, we could have the code + * somehow made shareable and move it to xfrm_state.c - JHS + * +*/ +static int xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) +{ + int err = - EINVAL; + struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; + struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; + struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH-1]; + struct rtattr *rt = xfrma[XFRMA_REPLAY_THRESH-1]; + + if (rp) { + struct xfrm_replay_state *replay; + if (RTA_PAYLOAD(rp) < sizeof(*replay)) + goto error; + replay = RTA_DATA(rp); + memcpy(&x->replay, replay, sizeof(*replay)); + memcpy(&x->preplay, replay, sizeof(*replay)); + } + + if (lt) { + struct xfrm_lifetime_cur *ltime; + if (RTA_PAYLOAD(lt) < sizeof(*ltime)) + goto error; + ltime = RTA_DATA(lt); + x->curlft.bytes = ltime->bytes; + x->curlft.packets = ltime->packets; + x->curlft.add_time = ltime->add_time; + x->curlft.use_time = ltime->use_time; + } + + if (et) { + if (RTA_PAYLOAD(et) < sizeof(u32)) + goto error; + x->replay_maxage = *(u32*)RTA_DATA(et); + } + + if (rt) { + if (RTA_PAYLOAD(rt) < sizeof(u32)) + goto error; + x->replay_maxdiff = *(u32*)RTA_DATA(rt); + } + + return 0; +error: + return err; +} + static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p, struct rtattr **xfrma, int *errp) @@ -311,6 +359,18 @@ static struct xfrm_state *xfrm_state_con goto error; x->km.seq = p->seq; + x->replay_maxdiff = sysctl_xfrm_aevent_rseqth; + /* sysctl_xfrm_aevent_etime is in 100ms units */ + x->replay_maxage = (sysctl_xfrm_aevent_etime*HZ)/XFRM_AE_ETH_M; + x->preplay.bitmap = 0; + x->preplay.seq = x->replay.seq+x->replay_maxdiff; + x->preplay.oseq = x->replay.oseq +x->replay_maxdiff; + + /* override default values from above */ + + err = xfrm_update_ae_params(x, (struct rtattr **)xfrma); + if (err < 0) + goto error; return x; @@ -1025,9 +1085,142 @@ static int xfrm_flush_sa(struct sk_buff return 0; } -static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) + +static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, struct km_event *c) +{ + struct xfrm_aevent_id *id; + struct nlmsghdr *nlh; + struct xfrm_lifetime_cur ltime; + unsigned char *b = skb->tail; + + nlh = NLMSG_PUT(skb, c->pid, c->seq, XFRM_MSG_NEWAE, sizeof(*id)); + id = NLMSG_DATA(nlh); + nlh->nlmsg_flags = 0; + + id->sa_id.daddr = x->id.daddr; + id->sa_id.spi = x->id.spi; + id->sa_id.family = x->props.family; + id->sa_id.proto = x->id.proto; + id->flags = c->data.aevent; + + RTA_PUT(skb, XFRMA_REPLAY_VAL, sizeof(x->replay), &x->replay); + + ltime.bytes = x->curlft.bytes; + ltime.packets = x->curlft.packets; + ltime.add_time = x->curlft.add_time; + ltime.use_time = x->curlft.use_time; + + RTA_PUT(skb, XFRMA_LTIME_VAL, sizeof(struct xfrm_lifetime_cur), <ime); + + if (id->flags&XFRM_AE_RTHR) { + RTA_PUT(skb,XFRMA_REPLAY_THRESH,sizeof(u32),&x->replay_maxdiff); + } + + if (id->flags&XFRM_AE_ETHR) { + u32 etimer = x->replay_maxage*10/HZ; + RTA_PUT(skb,XFRMA_ETIMER_THRESH,sizeof(u32),&etimer); + } + + nlh->nlmsg_len = skb->tail - b; + return skb->len; + +rtattr_failure: +nlmsg_failure: + skb_trim(skb, b - skb->data); + return -1; +} + +static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { + struct xfrm_state *x; + struct sk_buff *r_skb; + int err; struct km_event c; + struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); + struct xfrm_usersa_id *id = &p->sa_id; + + len += RTA_SPACE(sizeof(struct xfrm_replay_state)); + len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur)); + + if (p->flags&XFRM_AE_RTHR) + len+=RTA_SPACE(sizeof(u32)); + + if (p->flags&XFRM_AE_ETHR) + len+=RTA_SPACE(sizeof(u32)); + + r_skb = alloc_skb(len, GFP_ATOMIC); + if (r_skb == NULL) + return -ENOMEM; + + x = xfrm_state_lookup(&id->daddr, id->spi, id->proto, id->family); + if (x == NULL) { + kfree(r_skb); + return -ESRCH; + } + + /* + * XXX: is this lock really needed - none of the other + * gets lock (the concern is things getting updated + * while we are still reading) - jhs + */ + spin_lock_bh(&x->lock); + c.data.aevent = p->flags; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + + if (build_aevent(r_skb, x, &c) < 0) + BUG(); + err = netlink_unicast(xfrm_nl, r_skb, + NETLINK_CB(skb).pid, MSG_DONTWAIT); + spin_unlock_bh(&x->lock); + xfrm_state_put(x); + return err; +} + +static int xfrm_new_ae(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_state *x; + struct km_event c; + int err = - EINVAL; + struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; + struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; + + if (!lt && !rp) + return err; + + /* pedantic mode - thou shalt sayeth replaceth */ + if (!(nlh->nlmsg_flags&NLM_F_REPLACE)) + return err; + + x = xfrm_state_lookup(&p->sa_id.daddr, p->sa_id.spi, p->sa_id.proto, p->sa_id.family); + if (x == NULL) + return -ESRCH; + + if (x->km.state != XFRM_STATE_VALID) + goto out; + + spin_lock_bh(&x->lock); + err = xfrm_update_ae_params(x,(struct rtattr **)xfrma); + spin_unlock_bh(&x->lock); + if (err < 0) + goto out; + + c.event = nlh->nlmsg_type; + c.seq = nlh->nlmsg_seq; + c.pid = nlh->nlmsg_pid; + c.data.aevent = XFRM_AE_CU; + km_state_notify(x, &c); + err = 0; +out: + xfrm_state_put(x); + return err; +} + +static int xfrm_flush_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ +struct km_event c; xfrm_policy_flush(); c.event = nlh->nlmsg_type; @@ -1037,6 +1230,139 @@ static int xfrm_flush_policy(struct sk_b return 0; } +static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_policy *xp; + struct xfrm_user_polexpire *up = NLMSG_DATA(nlh); + struct xfrm_userpolicy_info *p = &up->pol; + int err = -ENOENT; + + if (p->index) + xp = xfrm_policy_byid(p->dir, p->index, 0); + else { + struct rtattr **rtattrs = (struct rtattr **)xfrma; + struct rtattr *rt = rtattrs[XFRMA_SEC_CTX-1]; + struct xfrm_policy tmp; + + err = verify_sec_ctx_len(rtattrs); + if (err) + return err; + + memset(&tmp, 0, sizeof(struct xfrm_policy)); + if (rt) { + struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); + + if ((err = security_xfrm_policy_alloc(&tmp, uctx))) + return err; + } + xp = xfrm_policy_bysel_ctx(p->dir, &p->sel, tmp.security, 0); + security_xfrm_policy_free(&tmp); + } + + if (xp == NULL) + return err; + read_lock(&xp->lock); + if (xp->dead) { + read_unlock(&xp->lock); + goto out; + } + + read_unlock(&xp->lock); + err = 0; + if (up->hard) { + xfrm_policy_delete(xp, p->dir); + } else { + // reset the timers here? + printk("Dont know what to do with soft policy expire\n"); + } + km_policy_expired(xp, p->dir, up->hard, current->pid); + +out: + xfrm_pol_put(xp); + return err; +} + +static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_state *x; + int err; + struct xfrm_user_expire *ue = NLMSG_DATA(nlh); + struct xfrm_usersa_info *p = &ue->state; + + x = xfrm_state_lookup(&p->id.daddr, p->id.spi, p->id.proto, p->family); + err = -ENOENT; + + if (x == NULL) + return err; + + err = -EINVAL; + + spin_lock_bh(&x->lock); + if (x->km.state != XFRM_STATE_VALID) + goto out; + km_state_expired(x, ue->hard, current->pid); + + if (ue->hard) + __xfrm_state_delete(x); +out: + spin_unlock_bh(&x->lock); + xfrm_state_put(x); + return err; +} + +static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) +{ + struct xfrm_policy *xp; + struct xfrm_user_tmpl *ut; + int i; + struct rtattr *rt = xfrma[XFRMA_TMPL-1]; + + struct xfrm_user_acquire *ua = NLMSG_DATA(nlh); + struct xfrm_state *x = xfrm_state_alloc(); + int err = -ENOMEM; + + if (!x) + return err; + + err = verify_newpolicy_info(&ua->policy); + if (err) { + printk("BAD policy passed\n"); + kfree(x); + return err; + } + + /* build an XP */ + xp = xfrm_policy_construct(&ua->policy, (struct rtattr **) xfrma, &err); if (!xp) { + kfree(x); + return err; + } + + memcpy(&x->id, &ua->id, sizeof(ua->id)); + memcpy(&x->props.saddr, &ua->saddr, sizeof(ua->saddr)); + memcpy(&x->sel, &ua->sel, sizeof(ua->sel)); + + ut = RTA_DATA(rt); + /* extract the templates and for each call km_key */ + for (i = 0; i < xp->xfrm_nr; i++, ut++) { + struct xfrm_tmpl *t = &xp->xfrm_vec[i]; + memcpy(&x->id, &t->id, sizeof(x->id)); + x->props.mode = t->mode; + x->props.reqid = t->reqid; + x->props.family = ut->family; + t->aalgos = ua->aalgos; + t->ealgos = ua->ealgos; + t->calgos = ua->calgos; + err = km_query(x, t, xp); + + } + + kfree(x); + kfree(xp); + + return 0; +} + + #define XMSGSIZE(type) NLMSG_LENGTH(sizeof(struct type)) static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = { @@ -1054,6 +1380,8 @@ static const int xfrm_msg_min[XFRM_NR_MS [XFRM_MSG_POLEXPIRE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire), [XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush), [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = NLMSG_LENGTH(0), + [XFRM_MSG_NEWAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), + [XFRM_MSG_GETAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), }; #undef XMSGSIZE @@ -1071,10 +1399,15 @@ static struct xfrm_link { [XFRM_MSG_GETPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_get_policy, .dump = xfrm_dump_policy }, [XFRM_MSG_ALLOCSPI - XFRM_MSG_BASE] = { .doit = xfrm_alloc_userspi }, + [XFRM_MSG_ACQUIRE - XFRM_MSG_BASE] = { .doit = xfrm_add_acquire }, + [XFRM_MSG_EXPIRE - XFRM_MSG_BASE] = { .doit = xfrm_add_sa_expire }, [XFRM_MSG_UPDPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_add_policy }, [XFRM_MSG_UPDSA - XFRM_MSG_BASE] = { .doit = xfrm_add_sa }, + [XFRM_MSG_POLEXPIRE - XFRM_MSG_BASE] = { .doit = xfrm_add_pol_expire}, [XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = { .doit = xfrm_flush_sa }, [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_flush_policy }, + [XFRM_MSG_NEWAE - XFRM_MSG_BASE] = { .doit = xfrm_new_ae }, + [XFRM_MSG_GETAE - XFRM_MSG_BASE] = { .doit = xfrm_get_ae }, }; static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *errp) @@ -1163,19 +1496,19 @@ static void xfrm_netlink_rcv(struct sock } while (qlen); } -static int build_expire(struct sk_buff *skb, struct xfrm_state *x, int hard) +static int build_expire(struct sk_buff *skb, struct xfrm_state *x, struct km_event *c) { struct xfrm_user_expire *ue; struct nlmsghdr *nlh; unsigned char *b = skb->tail; - nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_EXPIRE, + nlh = NLMSG_PUT(skb, c->pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue)); ue = NLMSG_DATA(nlh); nlh->nlmsg_flags = 0; copy_to_user_state(x, &ue->state); - ue->hard = (hard != 0) ? 1 : 0; + ue->hard = (c->data.hard != 0) ? 1 : 0; nlh->nlmsg_len = skb->tail - b; return skb->len; @@ -1194,13 +1527,31 @@ static int xfrm_exp_state_notify(struct if (skb == NULL) return -ENOMEM; - if (build_expire(skb, x, c->data.hard) < 0) + if (build_expire(skb, x, c) < 0) BUG(); NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); } +static int xfrm_aevent_state_notify(struct xfrm_state *x, struct km_event *c) +{ + struct sk_buff *skb; + int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); + + len += RTA_SPACE(sizeof(struct xfrm_replay_state)); + len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur)); + skb = alloc_skb(len, GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + + if (build_aevent(skb, x, c) < 0) + BUG(); + + NETLINK_CB(skb).dst_group = XFRMNLGRP_AEVENTS; + return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC); +} + static int xfrm_notify_sa_flush(struct km_event *c) { struct xfrm_usersa_flush *p; @@ -1313,6 +1664,8 @@ static int xfrm_send_state_notify(struct switch (c->event) { case XFRM_MSG_EXPIRE: return xfrm_exp_state_notify(x, c); + case XFRM_MSG_NEWAE: + return xfrm_aevent_state_notify(x, c); case XFRM_MSG_DELSA: case XFRM_MSG_UPDSA: case XFRM_MSG_NEWSA: @@ -1443,13 +1796,14 @@ static struct xfrm_policy *xfrm_compile_ } static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp, - int dir, int hard) + int dir, struct km_event *c) { struct xfrm_user_polexpire *upe; struct nlmsghdr *nlh; + int hard = c->data.hard; unsigned char *b = skb->tail; - nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe)); + nlh = NLMSG_PUT(skb, c->pid, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe)); upe = NLMSG_DATA(nlh); nlh->nlmsg_flags = 0; @@ -1480,7 +1834,7 @@ static int xfrm_exp_policy_notify(struct if (skb == NULL) return -ENOMEM; - if (build_polexpire(skb, xp, dir, c->data.hard) < 0) + if (build_polexpire(skb, xp, dir, c) < 0) BUG(); NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; @@ -1618,3 +1972,4 @@ module_init(xfrm_user_init); module_exit(xfrm_user_exit); MODULE_LICENSE("GPL"); MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_XFRM); + diff -puN security/selinux/nlmsgtab.c~git-net security/selinux/nlmsgtab.c --- devel/security/selinux/nlmsgtab.c~git-net 2006-02-27 22:37:47.000000000 -0800 +++ devel-akpm/security/selinux/nlmsgtab.c 2006-02-27 22:37:47.000000000 -0800 @@ -88,8 +88,15 @@ static struct nlmsg_perm nlmsg_xfrm_perm { XFRM_MSG_DELPOLICY, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, { XFRM_MSG_GETPOLICY, NETLINK_XFRM_SOCKET__NLMSG_READ }, { XFRM_MSG_ALLOCSPI, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_ACQUIRE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_EXPIRE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, { XFRM_MSG_UPDPOLICY, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, { XFRM_MSG_UPDSA, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_POLEXPIRE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_FLUSHSA, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_FLUSHPOLICY, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_NEWAE, NETLINK_XFRM_SOCKET__NLMSG_WRITE }, + { XFRM_MSG_GETAE, NETLINK_XFRM_SOCKET__NLMSG_READ }, }; static struct nlmsg_perm nlmsg_audit_perms[] = _