From: "Eric W. Biederman" Currently move_native_irq disables and renables the irq we are migrating to ensure we don't take that irq when we are actually doing the migration operation. Disabling the irq needs to happen but sometimes doing the work is move_native_irq is too late. On x86 with ioapics the irq move sequences needs to be: edge_triggered: mask irq. move irq. unmask irq. ack irq. level_triggered: mask irq. ack irq. move irq. unmask irq. We can easily perform the edge triggered sequence, with the current defintion of move_native_irq. However the level triggered case does not map well. For that I have added move_masked_irq, to allow me to disable the irqs around both the ack and the move. Q: Why have we not seen this problem earlier? A: The only symptom I have been able to reproduce is that if we change the vector before acknowleding an irq the wrong irq is acknowledged. Since we currently are not reprogramming the irq vector during migration no problems show up. We have to mask the irq before we acknowledge the irq or else we could hit a window where an irq is asserted just before we acknowledge it. Edge triggered irqs do not have this problem because acknowledgements do not propogate in the same way. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton --- include/linux/irq.h | 6 ++++++ kernel/irq/migration.c | 28 +++++++++++++++++++++------- 2 files changed, 27 insertions(+), 7 deletions(-) diff -puN include/linux/irq.h~genirq-irq-add-moved_masked_irq include/linux/irq.h --- a/include/linux/irq.h~genirq-irq-add-moved_masked_irq +++ a/include/linux/irq.h @@ -206,6 +206,7 @@ static inline void set_native_irq_info(i void set_pending_irq(unsigned int irq, cpumask_t mask); void move_native_irq(int irq); +void move_masked_irq(int irq); #ifdef CONFIG_PCI_MSI /* @@ -247,6 +248,10 @@ static inline void move_native_irq(int i { } +static inline void move_masked_irq(int irq) +{ +} + static inline void set_pending_irq(unsigned int irq, cpumask_t mask) { } @@ -262,6 +267,7 @@ static inline void set_irq_info(int irq, #define move_irq(x) #define move_native_irq(x) +#define move_masked_irq(x) #endif /* CONFIG_SMP */ diff -puN kernel/irq/migration.c~genirq-irq-add-moved_masked_irq kernel/irq/migration.c --- a/kernel/irq/migration.c~genirq-irq-add-moved_masked_irq +++ a/kernel/irq/migration.c @@ -12,7 +12,7 @@ void set_pending_irq(unsigned int irq, c spin_unlock_irqrestore(&desc->lock, flags); } -void move_native_irq(int irq) +void move_masked_irq(int irq) { struct irq_desc *desc = irq_desc + irq; cpumask_t tmp; @@ -48,15 +48,29 @@ void move_native_irq(int irq) * when an active trigger is comming in. This could * cause some ioapics to mal-function. * Being paranoid i guess! + * + * For correct operation this depends on the caller + * masking the irqs. */ if (likely(!cpus_empty(tmp))) { - if (likely(!(desc->status & IRQ_DISABLED))) - desc->chip->disable(irq); - desc->chip->set_affinity(irq,tmp); - - if (likely(!(desc->status & IRQ_DISABLED))) - desc->chip->enable(irq); } cpus_clear(irq_desc[irq].pending_mask); } + +void move_native_irq(int irq) +{ + struct irq_desc *desc = irq_desc + irq; + + if (likely(!(desc->status & IRQ_MOVE_PENDING))) + return; + + if (likely(!(desc->status & IRQ_DISABLED))) + desc->chip->disable(irq); + + move_masked_irq(irq); + + if (likely(!(desc->status & IRQ_DISABLED))) + desc->chip->enable(irq); +} + _