From: "Nicholas Piggin" <npiggin@gmail.com> To: "Timothy Pearson" <tpearson@raptorengineering.com>, "Michael Ellerman" <mpe@ellerman.id.au> Cc: "Jens Axboe" <axboe@kernel.dk>, "regressions" <regressions@lists.linux.dev>, "christophe leroy" <christophe.leroy@csgroup.eu>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org> Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save Date: Tue, 21 Nov 2023 17:56:10 +1000 [thread overview] Message-ID: <CX4BUXJD624S.2U2UZ9M6H198U@wheely> (raw) In-Reply-To: <608737213.48890358.1700529837699.JavaMail.zimbra@raptorengineeringinc.com> On Tue Nov 21, 2023 at 11:23 AM AEST, Timothy Pearson wrote: > > > ----- Original Message ----- > > From: "Michael Ellerman" <mpe@ellerman.id.au> > > To: "Timothy Pearson" <tpearson@raptorengineering.com> > > Cc: "Jens Axboe" <axboe@kernel.dk>, "regressions" <regressions@lists.linux.dev>, "npiggin" <npiggin@gmail.com>, > > "christophe leroy" <christophe.leroy@csgroup.eu>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org> > > Sent: Monday, November 20, 2023 5:39:52 PM > > Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save > > > Timothy Pearson <tpearson@raptorengineering.com> writes: > >> ----- Original Message ----- > >>> From: "Michael Ellerman" <mpe@ellerman.id.au> > > ... > >>> > >>> But we now have a new path, because io-uring can call copy_process() via > >>> create_io_thread() from the signal handling path. That's OK if the signal is > >>> handled as we return from a syscall, but it's not OK if the signal is handled > >>> due to some other interrupt. > >>> > >>> Which is: > >>> > >>> interrupt_return_srr_user() > >>> interrupt_exit_user_prepare() > >>> interrupt_exit_user_prepare_main() > >>> do_notify_resume() > >>> get_signal() > >>> task_work_run() > >>> create_worker_cb() > >>> create_io_worker() > >>> copy_process() > >>> dup_task_struct() > >>> arch_dup_task_struct() > >>> flush_all_to_thread() > >>> save_all() > >>> if (tsk->thread.regs->msr & MSR_FP) > >>> save_fpu() > >>> # fr0 is clobbered and potentially live in userspace > >>> > >>> > >>> So tldr I think the corruption is only an issue since io-uring started doing > >>> the clone via signal, which I think matches the observed timeline of this bug > >>> appearing. > >> > >> I agree the corruption really only started showing up in earnest on > >> io_uring clone-via-signal, as this was confirmed several times in the > >> course of debugging. > > > > Thanks. > > > >> Note as well that I may very well have a wrong call order in the > >> commit message, since I was relying on a couple of WARN_ON() macros I > >> inserted to check for a similar (but not identical) condition and > >> didn't spend much time getting new traces after identifying the root > >> cause. > > > > Yep no worries. I'll reword it to incorporate the full path from my mail. > > > >> I went back and grabbed some real world system-wide stack traces, since I now > >> know what to trigger on. A typical example is: > >> > >> interrupt_return_srr_user() > >> interrupt_exit_user_prepare() > >> interrupt_exit_user_prepare_main() > >> schedule() > >> __schedule() > >> __switch_to() > >> giveup_all() > >> # tsk->thread.regs->msr MSR_FP is still set here > >> __giveup_fpu() > >> save_fpu() > >> # fr0 is clobbered and potentially live in userspace > > > > fr0 is not live there. > > > > __giveup_fpu() does roughly: > > > > msr = tsk->thread.regs->msr; > > msr &= ~(MSR_FP|MSR_FE0|MSR_FE1); > > msr &= ~MSR_VSX; > > tsk->thread.regs = msr; > > > > ie. it clears the FP etc. bits from the task's MSR. That means the FP > > state will be reloaded from the thread struct before the task is run again. > > > > Also on that path we're switching to another task, so we'll be reloading > > the other task's FP state before returning to userspace. > > > > So I don't see any bug there. > > Yeah, you're right. I was trying to get traces while doing something else, and didn't think that all the way through, sorry. :) It's not going to be super easy to get a real trace (I was triggering the WARN_ON() from of fr0 getting set to to FPSCR), so let's just assume it's mainly the path you manually found above and update the commit message accordingly. > > > There's only two places that call save_fpu() and skip the giveup logic, > > which is save_all() and kvmppc_save_user_regs(). > > Now that's interesting as well, since it might explain some issues I've seen for years on a specific QEMU workload. Once this is backported to stable I'll need to get the kernels updated on those boxes and see if the issues disappear... KVM issue is actually slightly different. You need this (lightly verified to solve issue so far). --- From c12fbed0e12207058282a2411da59b43b1f96c49 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin <npiggin@gmail.com> Date: Tue, 21 Nov 2023 17:03:55 +1000 Subject: [PATCH] KVM: PPC: Book3S HV: Fix KVM_RUN ioctl clobbering FP/VEC registers Before running a guest, the host process's FP/VEC registers are saved away like a context switch or a kernel use of those regs. The guest FP/VEC registers can then be loaded as needed. The host process registers would be restored lazily when it uses FP/VEC again. KVM HV did not do this correctly. The registers are saved in the thread struct, but the FP/VEC/VSX bits remain enabled in the user MSR, leading the kernel to think they are still valid. Even after they are clobbered by guest registers. This leads to the host process getting guest FP/VEC register values. KVM must be invoked by ioctl in this path, and almost certainly that means a C call to a wrapper function, but that still leaves real possibility of corruption in for non-volatile registers to cause problems for QEMU process. --- arch/powerpc/kernel/process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 392404688cec..9452a54d356c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1198,11 +1198,11 @@ void kvmppc_save_user_regs(void) usermsr = current->thread.regs->msr; + /* Caller has enabled FP/VEC/VSX/TM in MSR */ if (usermsr & MSR_FP) - save_fpu(current); - + __giveup_fpu(current); if (usermsr & MSR_VEC) - save_altivec(current); + __giveup_altivec(current); #ifdef CONFIG_PPC_TRANSACTIONAL_MEM if (usermsr & MSR_TM) { -- 2.42.0
WARNING: multiple messages have this Message-ID (diff)
From: "Nicholas Piggin" <npiggin@gmail.com> To: "Timothy Pearson" <tpearson@raptorengineering.com>, "Michael Ellerman" <mpe@ellerman.id.au> Cc: Jens Axboe <axboe@kernel.dk>, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, regressions <regressions@lists.linux.dev> Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save Date: Tue, 21 Nov 2023 17:56:10 +1000 [thread overview] Message-ID: <CX4BUXJD624S.2U2UZ9M6H198U@wheely> (raw) In-Reply-To: <608737213.48890358.1700529837699.JavaMail.zimbra@raptorengineeringinc.com> On Tue Nov 21, 2023 at 11:23 AM AEST, Timothy Pearson wrote: > > > ----- Original Message ----- > > From: "Michael Ellerman" <mpe@ellerman.id.au> > > To: "Timothy Pearson" <tpearson@raptorengineering.com> > > Cc: "Jens Axboe" <axboe@kernel.dk>, "regressions" <regressions@lists.linux.dev>, "npiggin" <npiggin@gmail.com>, > > "christophe leroy" <christophe.leroy@csgroup.eu>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org> > > Sent: Monday, November 20, 2023 5:39:52 PM > > Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save > > > Timothy Pearson <tpearson@raptorengineering.com> writes: > >> ----- Original Message ----- > >>> From: "Michael Ellerman" <mpe@ellerman.id.au> > > ... > >>> > >>> But we now have a new path, because io-uring can call copy_process() via > >>> create_io_thread() from the signal handling path. That's OK if the signal is > >>> handled as we return from a syscall, but it's not OK if the signal is handled > >>> due to some other interrupt. > >>> > >>> Which is: > >>> > >>> interrupt_return_srr_user() > >>> interrupt_exit_user_prepare() > >>> interrupt_exit_user_prepare_main() > >>> do_notify_resume() > >>> get_signal() > >>> task_work_run() > >>> create_worker_cb() > >>> create_io_worker() > >>> copy_process() > >>> dup_task_struct() > >>> arch_dup_task_struct() > >>> flush_all_to_thread() > >>> save_all() > >>> if (tsk->thread.regs->msr & MSR_FP) > >>> save_fpu() > >>> # fr0 is clobbered and potentially live in userspace > >>> > >>> > >>> So tldr I think the corruption is only an issue since io-uring started doing > >>> the clone via signal, which I think matches the observed timeline of this bug > >>> appearing. > >> > >> I agree the corruption really only started showing up in earnest on > >> io_uring clone-via-signal, as this was confirmed several times in the > >> course of debugging. > > > > Thanks. > > > >> Note as well that I may very well have a wrong call order in the > >> commit message, since I was relying on a couple of WARN_ON() macros I > >> inserted to check for a similar (but not identical) condition and > >> didn't spend much time getting new traces after identifying the root > >> cause. > > > > Yep no worries. I'll reword it to incorporate the full path from my mail. > > > >> I went back and grabbed some real world system-wide stack traces, since I now > >> know what to trigger on. A typical example is: > >> > >> interrupt_return_srr_user() > >> interrupt_exit_user_prepare() > >> interrupt_exit_user_prepare_main() > >> schedule() > >> __schedule() > >> __switch_to() > >> giveup_all() > >> # tsk->thread.regs->msr MSR_FP is still set here > >> __giveup_fpu() > >> save_fpu() > >> # fr0 is clobbered and potentially live in userspace > > > > fr0 is not live there. > > > > __giveup_fpu() does roughly: > > > > msr = tsk->thread.regs->msr; > > msr &= ~(MSR_FP|MSR_FE0|MSR_FE1); > > msr &= ~MSR_VSX; > > tsk->thread.regs = msr; > > > > ie. it clears the FP etc. bits from the task's MSR. That means the FP > > state will be reloaded from the thread struct before the task is run again. > > > > Also on that path we're switching to another task, so we'll be reloading > > the other task's FP state before returning to userspace. > > > > So I don't see any bug there. > > Yeah, you're right. I was trying to get traces while doing something else, and didn't think that all the way through, sorry. :) It's not going to be super easy to get a real trace (I was triggering the WARN_ON() from of fr0 getting set to to FPSCR), so let's just assume it's mainly the path you manually found above and update the commit message accordingly. > > > There's only two places that call save_fpu() and skip the giveup logic, > > which is save_all() and kvmppc_save_user_regs(). > > Now that's interesting as well, since it might explain some issues I've seen for years on a specific QEMU workload. Once this is backported to stable I'll need to get the kernels updated on those boxes and see if the issues disappear... KVM issue is actually slightly different. You need this (lightly verified to solve issue so far). --- From c12fbed0e12207058282a2411da59b43b1f96c49 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin <npiggin@gmail.com> Date: Tue, 21 Nov 2023 17:03:55 +1000 Subject: [PATCH] KVM: PPC: Book3S HV: Fix KVM_RUN ioctl clobbering FP/VEC registers Before running a guest, the host process's FP/VEC registers are saved away like a context switch or a kernel use of those regs. The guest FP/VEC registers can then be loaded as needed. The host process registers would be restored lazily when it uses FP/VEC again. KVM HV did not do this correctly. The registers are saved in the thread struct, but the FP/VEC/VSX bits remain enabled in the user MSR, leading the kernel to think they are still valid. Even after they are clobbered by guest registers. This leads to the host process getting guest FP/VEC register values. KVM must be invoked by ioctl in this path, and almost certainly that means a C call to a wrapper function, but that still leaves real possibility of corruption in for non-volatile registers to cause problems for QEMU process. --- arch/powerpc/kernel/process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 392404688cec..9452a54d356c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1198,11 +1198,11 @@ void kvmppc_save_user_regs(void) usermsr = current->thread.regs->msr; + /* Caller has enabled FP/VEC/VSX/TM in MSR */ if (usermsr & MSR_FP) - save_fpu(current); - + __giveup_fpu(current); if (usermsr & MSR_VEC) - save_altivec(current); + __giveup_altivec(current); #ifdef CONFIG_PPC_TRANSACTIONAL_MEM if (usermsr & MSR_TM) { -- 2.42.0
next prev parent reply other threads:[~2023-11-21 7:56 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-11-19 15:18 [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save Timothy Pearson 2023-11-20 7:10 ` Michael Ellerman 2023-11-20 14:32 ` Nicholas Piggin 2023-11-20 16:45 ` Timothy Pearson 2023-11-20 16:45 ` Timothy Pearson 2023-11-20 23:39 ` Michael Ellerman 2023-11-20 23:39 ` Michael Ellerman 2023-11-21 0:27 ` Nicholas Piggin 2023-11-21 0:27 ` Nicholas Piggin 2023-11-21 1:23 ` Timothy Pearson 2023-11-21 1:23 ` Timothy Pearson 2023-11-21 7:56 ` Nicholas Piggin [this message] 2023-11-21 7:56 ` Nicholas Piggin 2023-11-21 4:10 ` Timothy Pearson 2023-11-21 4:10 ` Timothy Pearson 2023-11-21 4:26 ` Timothy Pearson 2023-11-21 4:26 ` Timothy Pearson 2023-11-21 7:54 ` Nicholas Piggin 2023-11-21 7:54 ` Nicholas Piggin 2023-11-22 5:01 ` Michael Ellerman 2023-11-22 5:01 ` Michael Ellerman 2023-11-24 0:01 ` Timothy Pearson 2023-11-24 0:01 ` Timothy Pearson 2023-11-27 18:39 ` Timothy Pearson 2023-11-27 18:39 ` Timothy Pearson 2023-11-27 19:58 ` Christophe Leroy 2023-11-27 19:58 ` Christophe Leroy 2023-11-28 0:59 ` Michael Ellerman 2023-11-28 0:59 ` Michael Ellerman 2023-11-28 1:40 ` Nicholas Piggin 2023-11-28 1:40 ` Nicholas Piggin 2023-11-27 22:53 ` Michael Ellerman 2023-11-27 22:53 ` Michael Ellerman 2023-11-28 12:57 ` Michael Ellerman 2023-11-28 12:57 ` Michael Ellerman 2023-11-30 16:29 ` Timothy Pearson 2023-11-30 16:29 ` Timothy Pearson 2023-11-21 0:18 ` Nicholas Piggin 2023-11-21 0:18 ` Nicholas Piggin 2023-12-02 23:00 ` Michael Ellerman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CX4BUXJD624S.2U2UZ9M6H198U@wheely \ --to=npiggin@gmail.com \ --cc=axboe@kernel.dk \ --cc=christophe.leroy@csgroup.eu \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mpe@ellerman.id.au \ --cc=regressions@lists.linux.dev \ --cc=tpearson@raptorengineering.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.