Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Timothy Pearson" <tpearson@raptorengineering.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"regressions" <regressions@lists.linux.dev>,
	"christophe leroy" <christophe.leroy@csgroup.eu>,
	"linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register  save
Date: Tue, 21 Nov 2023 17:56:10 +1000	[thread overview]
Message-ID: <CX4BUXJD624S.2U2UZ9M6H198U@wheely> (raw)
In-Reply-To: <608737213.48890358.1700529837699.JavaMail.zimbra@raptorengineeringinc.com>

On Tue Nov 21, 2023 at 11:23 AM AEST, Timothy Pearson wrote:
>
>
> ----- Original Message -----
> > From: "Michael Ellerman" <mpe@ellerman.id.au>
> > To: "Timothy Pearson" <tpearson@raptorengineering.com>
> > Cc: "Jens Axboe" <axboe@kernel.dk>, "regressions" <regressions@lists.linux.dev>, "npiggin" <npiggin@gmail.com>,
> > "christophe leroy" <christophe.leroy@csgroup.eu>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
> > Sent: Monday, November 20, 2023 5:39:52 PM
> > Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register  save
>
> > Timothy Pearson <tpearson@raptorengineering.com> writes:
> >> ----- Original Message -----
> >>> From: "Michael Ellerman" <mpe@ellerman.id.au>
> > ...
> >>> 
> >>> But we now have a new path, because io-uring can call copy_process() via
> >>> create_io_thread() from the signal handling path. That's OK if the signal is
> >>> handled as we return from a syscall, but it's not OK if the signal is handled
> >>> due to some other interrupt.
> >>> 
> >>> Which is:
> >>> 
> >>> interrupt_return_srr_user()
> >>>  interrupt_exit_user_prepare()
> >>>    interrupt_exit_user_prepare_main()
> >>>      do_notify_resume()
> >>>        get_signal()
> >>>          task_work_run()
> >>>            create_worker_cb()
> >>>              create_io_worker()
> >>>                copy_process()
> >>>                  dup_task_struct()
> >>>                    arch_dup_task_struct()
> >>>                      flush_all_to_thread()
> >>>                        save_all()
> >>>                          if (tsk->thread.regs->msr & MSR_FP)
> >>>                            save_fpu()
> >>>                            # fr0 is clobbered and potentially live in userspace
> >>> 
> >>> 
> >>> So tldr I think the corruption is only an issue since io-uring started doing
> >>> the clone via signal, which I think matches the observed timeline of this bug
> >>> appearing.
> >>
> >> I agree the corruption really only started showing up in earnest on
> >> io_uring clone-via-signal, as this was confirmed several times in the
> >> course of debugging.
> > 
> > Thanks.
> > 
> >> Note as well that I may very well have a wrong call order in the
> >> commit message, since I was relying on a couple of WARN_ON() macros I
> >> inserted to check for a similar (but not identical) condition and
> >> didn't spend much time getting new traces after identifying the root
> >> cause.
> > 
> > Yep no worries. I'll reword it to incorporate the full path from my mail.
> > 
> >> I went back and grabbed some real world system-wide stack traces, since I now
> >> know what to trigger on.  A typical example is:
> >>
> >> interrupt_return_srr_user()
> >>  interrupt_exit_user_prepare()
> >>   interrupt_exit_user_prepare_main()
> >>    schedule()
> >>     __schedule()
> >>      __switch_to()
> >>       giveup_all()
> >>        # tsk->thread.regs->msr MSR_FP is still set here
> >>        __giveup_fpu()
> >>         save_fpu()
> >>         # fr0 is clobbered and potentially live in userspace
> > 
> > fr0 is not live there.
> > 
> > __giveup_fpu() does roughly:
> > 
> >	msr = tsk->thread.regs->msr;
> >	msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
> >        msr &= ~MSR_VSX;
> >	tsk->thread.regs = msr;
> > 
> > ie. it clears the FP etc. bits from the task's MSR. That means the FP
> > state will be reloaded from the thread struct before the task is run again.
> > 
> > Also on that path we're switching to another task, so we'll be reloading
> > the other task's FP state before returning to userspace.
> > 
> > So I don't see any bug there.
>
> Yeah, you're right.  I was trying to get traces while doing something else, and didn't think that all the way through, sorry. :)  It's not going to be super easy to get a real trace (I was triggering the WARN_ON() from of fr0 getting set to to FPSCR), so let's just assume it's mainly the path you manually found above and update the commit message accordingly.
>
> > There's only two places that call save_fpu() and skip the giveup logic,
> > which is save_all() and kvmppc_save_user_regs().
>
> Now that's interesting as well, since it might explain some issues I've seen for years on a specific QEMU workload.  Once this is backported to stable I'll need to get the kernels updated on those boxes and see if the issues disappear...

KVM issue is actually slightly different. You need this (lightly
verified to solve issue so far).

---

From c12fbed0e12207058282a2411da59b43b1f96c49 Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin@gmail.com>
Date: Tue, 21 Nov 2023 17:03:55 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Fix KVM_RUN ioctl clobbering FP/VEC
 registers

Before running a guest, the host process's FP/VEC registers are saved
away like a context switch or a kernel use of those regs. The guest
FP/VEC registers can then be loaded as needed. The host process
registers would be restored lazily when it uses FP/VEC again.

KVM HV did not do this correctly. The registers are saved in the
thread struct, but the FP/VEC/VSX bits remain enabled in the user
MSR, leading the kernel to think they are still valid. Even after
they are clobbered by guest registers. This leads to the host
process getting guest FP/VEC register values.

KVM must be invoked by ioctl in this path, and almost certainly that
means a C call to a wrapper function, but that still leaves real
possibility of corruption in for non-volatile registers to cause
problems for QEMU process.
---
 arch/powerpc/kernel/process.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 392404688cec..9452a54d356c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1198,11 +1198,11 @@ void kvmppc_save_user_regs(void)
 
 	usermsr = current->thread.regs->msr;
 
+	/* Caller has enabled FP/VEC/VSX/TM in MSR */
 	if (usermsr & MSR_FP)
-		save_fpu(current);
-
+		__giveup_fpu(current);
 	if (usermsr & MSR_VEC)
-		save_altivec(current);
+		__giveup_altivec(current);
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	if (usermsr & MSR_TM) {
-- 
2.42.0

WARNING: multiple messages have this Message-ID (diff)

From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Timothy Pearson" <tpearson@raptorengineering.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>
Cc: Jens Axboe <axboe@kernel.dk>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	regressions <regressions@lists.linux.dev>
Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register  save
Date: Tue, 21 Nov 2023 17:56:10 +1000	[thread overview]
Message-ID: <CX4BUXJD624S.2U2UZ9M6H198U@wheely> (raw)
In-Reply-To: <608737213.48890358.1700529837699.JavaMail.zimbra@raptorengineeringinc.com>

On Tue Nov 21, 2023 at 11:23 AM AEST, Timothy Pearson wrote:
>
>
> ----- Original Message -----
> > From: "Michael Ellerman" <mpe@ellerman.id.au>
> > To: "Timothy Pearson" <tpearson@raptorengineering.com>
> > Cc: "Jens Axboe" <axboe@kernel.dk>, "regressions" <regressions@lists.linux.dev>, "npiggin" <npiggin@gmail.com>,
> > "christophe leroy" <christophe.leroy@csgroup.eu>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
> > Sent: Monday, November 20, 2023 5:39:52 PM
> > Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register  save
>
> > Timothy Pearson <tpearson@raptorengineering.com> writes:
> >> ----- Original Message -----
> >>> From: "Michael Ellerman" <mpe@ellerman.id.au>
> > ...
> >>> 
> >>> But we now have a new path, because io-uring can call copy_process() via
> >>> create_io_thread() from the signal handling path. That's OK if the signal is
> >>> handled as we return from a syscall, but it's not OK if the signal is handled
> >>> due to some other interrupt.
> >>> 
> >>> Which is:
> >>> 
> >>> interrupt_return_srr_user()
> >>>  interrupt_exit_user_prepare()
> >>>    interrupt_exit_user_prepare_main()
> >>>      do_notify_resume()
> >>>        get_signal()
> >>>          task_work_run()
> >>>            create_worker_cb()
> >>>              create_io_worker()
> >>>                copy_process()
> >>>                  dup_task_struct()
> >>>                    arch_dup_task_struct()
> >>>                      flush_all_to_thread()
> >>>                        save_all()
> >>>                          if (tsk->thread.regs->msr & MSR_FP)
> >>>                            save_fpu()
> >>>                            # fr0 is clobbered and potentially live in userspace
> >>> 
> >>> 
> >>> So tldr I think the corruption is only an issue since io-uring started doing
> >>> the clone via signal, which I think matches the observed timeline of this bug
> >>> appearing.
> >>
> >> I agree the corruption really only started showing up in earnest on
> >> io_uring clone-via-signal, as this was confirmed several times in the
> >> course of debugging.
> > 
> > Thanks.
> > 
> >> Note as well that I may very well have a wrong call order in the
> >> commit message, since I was relying on a couple of WARN_ON() macros I
> >> inserted to check for a similar (but not identical) condition and
> >> didn't spend much time getting new traces after identifying the root
> >> cause.
> > 
> > Yep no worries. I'll reword it to incorporate the full path from my mail.
> > 
> >> I went back and grabbed some real world system-wide stack traces, since I now
> >> know what to trigger on.  A typical example is:
> >>
> >> interrupt_return_srr_user()
> >>  interrupt_exit_user_prepare()
> >>   interrupt_exit_user_prepare_main()
> >>    schedule()
> >>     __schedule()
> >>      __switch_to()
> >>       giveup_all()
> >>        # tsk->thread.regs->msr MSR_FP is still set here
> >>        __giveup_fpu()
> >>         save_fpu()
> >>         # fr0 is clobbered and potentially live in userspace
> > 
> > fr0 is not live there.
> > 
> > __giveup_fpu() does roughly:
> > 
> >	msr = tsk->thread.regs->msr;
> >	msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
> >        msr &= ~MSR_VSX;
> >	tsk->thread.regs = msr;
> > 
> > ie. it clears the FP etc. bits from the task's MSR. That means the FP
> > state will be reloaded from the thread struct before the task is run again.
> > 
> > Also on that path we're switching to another task, so we'll be reloading
> > the other task's FP state before returning to userspace.
> > 
> > So I don't see any bug there.
>
> Yeah, you're right.  I was trying to get traces while doing something else, and didn't think that all the way through, sorry. :)  It's not going to be super easy to get a real trace (I was triggering the WARN_ON() from of fr0 getting set to to FPSCR), so let's just assume it's mainly the path you manually found above and update the commit message accordingly.
>
> > There's only two places that call save_fpu() and skip the giveup logic,
> > which is save_all() and kvmppc_save_user_regs().
>
> Now that's interesting as well, since it might explain some issues I've seen for years on a specific QEMU workload.  Once this is backported to stable I'll need to get the kernels updated on those boxes and see if the issues disappear...

KVM issue is actually slightly different. You need this (lightly
verified to solve issue so far).

---

From c12fbed0e12207058282a2411da59b43b1f96c49 Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin@gmail.com>
Date: Tue, 21 Nov 2023 17:03:55 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Fix KVM_RUN ioctl clobbering FP/VEC
 registers

Before running a guest, the host process's FP/VEC registers are saved
away like a context switch or a kernel use of those regs. The guest
FP/VEC registers can then be loaded as needed. The host process
registers would be restored lazily when it uses FP/VEC again.

KVM HV did not do this correctly. The registers are saved in the
thread struct, but the FP/VEC/VSX bits remain enabled in the user
MSR, leading the kernel to think they are still valid. Even after
they are clobbered by guest registers. This leads to the host
process getting guest FP/VEC register values.

KVM must be invoked by ioctl in this path, and almost certainly that
means a C call to a wrapper function, but that still leaves real
possibility of corruption in for non-volatile registers to cause
problems for QEMU process.
---
 arch/powerpc/kernel/process.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 392404688cec..9452a54d356c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1198,11 +1198,11 @@ void kvmppc_save_user_regs(void)
 
 	usermsr = current->thread.regs->msr;
 
+	/* Caller has enabled FP/VEC/VSX/TM in MSR */
 	if (usermsr & MSR_FP)
-		save_fpu(current);
-
+		__giveup_fpu(current);
 	if (usermsr & MSR_VEC)
-		save_altivec(current);
+		__giveup_altivec(current);
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	if (usermsr & MSR_TM) {
-- 
2.42.0

next prev parent reply	other threads:[~2023-11-21  7:56 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-19 15:18 [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save Timothy Pearson
2023-11-20  7:10 ` Michael Ellerman
2023-11-20 14:32   ` Nicholas Piggin
2023-11-20 16:45   ` Timothy Pearson
2023-11-20 16:45     ` Timothy Pearson
2023-11-20 23:39     ` Michael Ellerman
2023-11-20 23:39       ` Michael Ellerman
2023-11-21  0:27       ` Nicholas Piggin
2023-11-21  0:27         ` Nicholas Piggin
2023-11-21  1:23       ` Timothy Pearson
2023-11-21  1:23         ` Timothy Pearson
2023-11-21  7:56         ` Nicholas Piggin [this message]
2023-11-21  7:56           ` Nicholas Piggin
2023-11-21  4:10       ` Timothy Pearson
2023-11-21  4:10         ` Timothy Pearson
2023-11-21  4:26         ` Timothy Pearson
2023-11-21  4:26           ` Timothy Pearson
2023-11-21  7:54         ` Nicholas Piggin
2023-11-21  7:54           ` Nicholas Piggin
2023-11-22  5:01         ` Michael Ellerman
2023-11-22  5:01           ` Michael Ellerman
2023-11-24  0:01           ` Timothy Pearson
2023-11-24  0:01             ` Timothy Pearson
2023-11-27 18:39             ` Timothy Pearson
2023-11-27 18:39               ` Timothy Pearson
2023-11-27 19:58               ` Christophe Leroy
2023-11-27 19:58                 ` Christophe Leroy
2023-11-28  0:59                 ` Michael Ellerman
2023-11-28  0:59                   ` Michael Ellerman
2023-11-28  1:40                   ` Nicholas Piggin
2023-11-28  1:40                     ` Nicholas Piggin
2023-11-27 22:53               ` Michael Ellerman
2023-11-27 22:53                 ` Michael Ellerman
2023-11-28 12:57                 ` Michael Ellerman
2023-11-28 12:57                   ` Michael Ellerman
2023-11-30 16:29                   ` Timothy Pearson
2023-11-30 16:29                     ` Timothy Pearson
2023-11-21  0:18     ` Nicholas Piggin
2023-11-21  0:18       ` Nicholas Piggin
2023-12-02 23:00 ` Michael Ellerman

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:392404688ce dfblob:392404688ce dfblob:9452a54d356
dfblob:9452a54d356
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CX4BUXJD624S.2U2UZ9M6H198U@wheely \
    --to=npiggin@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=regressions@lists.linux.dev \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.