From roland@redhat.com Fri Oct 22 21:29:24 2004 Return-Path: X-Original-To: jbarnes@spamtin.engr.sgi.com Delivered-To: jbarnes@spamtin.engr.sgi.com Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by spamtin.engr.sgi.com (Postfix) with ESMTP id 856FD240410B for ; Fri, 22 Oct 2004 21:29:31 -0700 (PDT) Received: from internal-mail-relay.corp.sgi.com (internal-mail-relay.corp.sgi.com [198.149.32.51]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id i9N4TULG5460916 for ; Fri, 22 Oct 2004 21:29:31 -0700 (PDT) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by internal-mail-relay.corp.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id i9N4TUlb144451462 for ; Fri, 22 Oct 2004 21:29:30 -0700 (PDT) X-ASG-Debug-ID: 1098505767-15066-582-0 X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8155DD005717 for ; Fri, 22 Oct 2004 21:29:28 -0700 (PDT) Received: (qmail 17461 invoked from network); 23 Oct 2004 04:29:26 -0000 Received: from gateway.sf.frob.com ([64.81.54.130]) (envelope-sender ) by mail6.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 23 Oct 2004 04:29:26 -0000 Received: from magilla.sf.frob.com (magilla.sf.frob.com [198.49.250.228]) by gateway.sf.frob.com (Postfix) with ESMTP id AAF04357B; Fri, 22 Oct 2004 21:29:25 -0700 (PDT) Received: from magilla.sf.frob.com (localhost.localdomain [127.0.0.1]) by magilla.sf.frob.com (8.12.11/8.12.9) with ESMTP id i9N4TP0d027402; Fri, 22 Oct 2004 21:29:25 -0700 Received: (from roland@localhost) by magilla.sf.frob.com (8.12.11/8.12.11/Submit) id i9N4TOZK027399; Fri, 22 Oct 2004 21:29:24 -0700 Date: Fri, 22 Oct 2004 21:29:24 -0700 Message-Id: <200410230429.i9N4TOZK027399@magilla.sf.frob.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: Andrew Morton , Linus Torvalds , Jesse Barnes , Linux Kernel Mailing List X-ASG-Orig-Subj: Re: Fw: BUG_ONs in signal.c? Subject: Re: Fw: BUG_ONs in signal.c? In-Reply-To: Roland McGrath's message of Friday, 22 October 2004 21:14:39 -0700 <200410230414.i9N4Edia027359@magilla.sf.frob.com> X-Fcc: ~/Mail/linus X-Windows: ignorance is our most important resource. X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=3.5 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=1000.0 tests= X-Barracuda-Spam-Report: Code version 2.64, rules version 2.1.453 Rule breakdown below pts rule name description ---- ---------------------- ------------------------------------------- X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on spamtin.engr.sgi.com X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.0 X-UID: 72401 X-Length: 5252 Oh, duh. The race is obvious. Sorry for the confusion there. I think this is the way to fix it. --- linux-2.6/kernel/signal.c 19 Oct 2004 15:03:02 -0000 1.143 +++ linux-2.6/kernel/signal.c 23 Oct 2004 04:23:31 -0000 @@ -1909,22 +1910,16 @@ relock: * Anything else is fatal, maybe with a core dump. */ current->flags |= PF_SIGNALED; - if (sig_kernel_coredump(signr) && - do_coredump((long)signr, signr, regs)) { + if (sig_kernel_coredump(signr)) { /* - * That killed all other threads in the group and - * synchronized with their demise, so there can't - * be any more left to kill now. The group_exit - * flags are set by do_coredump. Note that - * thread_group_empty won't always be true yet, - * because those threads were blocked in __exit_mm - * and we just let them go to finish dying. - */ - const int code = signr | 0x80; - BUG_ON(!current->signal->group_exit); - BUG_ON(current->signal->group_exit_code != code); - do_exit(code); - /* NOTREACHED */ + * If it was able to dump core, this kills all + * other threads in the group and synchronizes with + * their demise. If we lost the race with another + * thread getting here, it set group_exit_code + * first and our do_group_exit call below will use + * that value and ignore the one we pass it. + */ + do_coredump((long)signr, signr, regs); } /* While looking at this, I noticed a bug (not directly related) in do_coredump. It was setting the "core dumped" flag even when the format dumping hook failed (e.g. for memory allocation failures). --- linux-2.6/fs/exec.c 19 Oct 2004 15:05:13 -0000 1.146 +++ linux-2.6/fs/exec.c 23 Oct 2004 04:23:42 -0000 @@ -1417,7 +1417,8 @@ int do_coredump(long signr, int exit_cod retval = binfmt->core_dump(signr, regs, file); - current->signal->group_exit_code |= 0x80; + if (retval) + current->signal->group_exit_code |= 0x80; close_fail: filp_close(file, NULL); fail_unlock: Thanks, Roland