From: Kenji Kaneshige This patch fixes following bug. found in our test. == Pid: 7905, CPU 4, comm: memload psr : 0000121008022018 ifs : 800000000000038a ip : [] Not tainted ip is at handle_IPI+0x101/0x380 unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 rnat: 0000000000565aa9 bsps: a0000001000943b0 pr : 00000000005685a9 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000e4340 b6 : a000000100054580 b7 : a000000000010640 f6 : 000000000000000000000 f7 : 000000000000000000000 f8 : 000000000000000000000 f9 : 1003effffffffffffc000 f10 : 1003e0000000000000000 f11 : 000000000000000000000 r1 : a000000100c14e40 r2 : fffffffffffffffe r3 : fffffffffffffffe r8 : 0000000000000001 r9 : 0000000000000000 r10 : 0000000000000000 r11 : 0000000000017d00 r12 : e0000001341dfe30 r13 : e0000001341d8000 r14 : 0000000000000001 r15 : e0000140040165f8 r16 : a000000100861400 r17 : 0000000000000000 r18 : 0000000000000010 r19 : a000000100a2cc48 r20 : 00000000000002fa r21 : ffffffffffff0028 r22 : a0000001009ce2c0 r23 : a0000001009ce298 r24 : a000000100011a60 r25 : e0000001341dfe58 r26 : c000000000000008 r27 : 000000000000000f r28 : a000000000010620 r29 : a000000100879120 r30 : 0000000000000008 r31 : 0000000000550a41 Call Trace: [] show_stack+0x40/0xa0 sp=e0000001341df9c0 bsp=e0000001341d9338 [] show_regs+0x840/0x880 sp=e0000001341dfb90 bsp=e0000001341d92e0 [] die+0x1c0/0x2a0 sp=e0000001341dfb90 bsp=e0000001341d9298 [] ia64_do_page_fault+0x8a0/0x9e0 sp=e0000001341dfbb0 bsp=e0000001341d9248 [] __ia64_leave_kernel+0x0/0x280 sp=e0000001341dfc60 bsp=e0000001341d9248 [] handle_IPI+0x100/0x380 sp=e0000001341dfe30 bsp=e0000001341d91f0 [] handle_IRQ_event+0xa0/0x140 sp=e0000001341dfe30 bsp=e0000001341d91b0 [] __do_IRQ+0x130/0x3e0 sp=e0000001341dfe30 bsp=e0000001341d9168 [] ia64_handle_irq+0xf0/0x1a0 sp=e0000001341dfe30 bsp=e0000001341d9138 [] __ia64_leave_kernel+0x0/0x280 sp=e0000001341dfe30 bsp=e0000001341d9138 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():1, irqs_disabled():0 == This is because handle_IPI() accesses call_data, which is NULL. The scenario is 1. on CPU A: smp_call_funcion() is called. 2. on CPU A: ncpus = num_online_cpus() - 1 is called. set ncpus=X. 3. on CPU B: by cpu_hotplug, some cpus comes up to be online. changes online_map. 4. on CPU A: smp_call_function() sends IPI. The targets of IPI is determined by cpu_online_map(), which was changed. Then, IPIs are sent to *X+1* processors. 5. on CPU A: wait for X(=ncpus) acks. 6. on CPU A: after getting all Acks. set call_data = NULL 7. on some other cpu, which received *delayed* IPI(by some irq-disable ops) will access call_data and panics. This patch moves ncpus=num_online_cpu() under ipi_call_lock. x86 implemtation does this. Signed-off-by: Kenji Kaneshige Acked-by: Satoru Takeuchi Acked-by: KAMEZAWA Hiroyuki Cc: "Luck, Tony" Signed-off-by: Andrew Morton --- arch/ia64/kernel/smp.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff -puN arch/ia64/kernel/smp.c~ia64-cpu-hotplug-fix-conflict-between-cpu-hot-add-and-ipi arch/ia64/kernel/smp.c --- a/arch/ia64/kernel/smp.c~ia64-cpu-hotplug-fix-conflict-between-cpu-hot-add-and-ipi +++ a/arch/ia64/kernel/smp.c @@ -382,10 +382,14 @@ int smp_call_function (void (*func) (void *info), void *info, int nonatomic, int wait) { struct call_data_struct data; - int cpus = num_online_cpus()-1; + int cpus; - if (!cpus) + spin_lock(&call_lock); + cpus = num_online_cpus() - 1; + if (!cpus) { + spin_unlock(&call_lock); return 0; + } /* Can deadlock when called with interrupts disabled */ WARN_ON(irqs_disabled()); @@ -397,8 +401,6 @@ smp_call_function (void (*func) (void *i if (wait) atomic_set(&data.finished, 0); - spin_lock(&call_lock); - call_data = &data; mb(); /* ensure store to call_data precedes setting of IPI_CALL_FUNC */ send_IPI_allbutself(IPI_CALL_FUNC); _