All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Michael Ellerman" <mpe@ellerman.id.au>,
	"Timothy Pearson" <tpearson@raptorengineering.com>,
	"Jens Axboe" <axboe@kernel.dk>,
	"regressions" <regressions@lists.linux.dev>,
	"christophe leroy" <christophe.leroy@csgroup.eu>,
	"linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register  save
Date: Tue, 21 Nov 2023 00:32:41 +1000	[thread overview]
Message-ID: <CX3PNZ6WVZT2.1FF3ZLUTDCX6R@wheely> (raw)
In-Reply-To: <877cmc7ve9.fsf@mail.lhotse>

Yeah, awesome.

On Mon Nov 20, 2023 at 5:10 PM AEST, Michael Ellerman wrote:
> Hi Timothy,
>
> Great work debugging this. I think your fix is good, but I want to understand it
> a bit more to make sure I can explain why we haven't seen it outside of io-uring.

Analysis seems right to me.

Probably the best minimal fix. But I wonder if we should just use the
one path for saving/flushing/giving up, just use giveup instead of
save?

KVM looks odd too, and actually gets this wrong. In a way that's not
fixed by Timothy's patch, because it's just not restoring userspace
registers at all. Fortunately QEMU isn't in the habit of using non
volatile FP/VEC registers over a VCPU ioctl, but there's no reason it
couldn't do since GCC/LLVM can easily use them. KVM really wants to be
using giveup.

Thanks,
Nick

> If this can be triggered outside of io-uring then I have even more backporting
> in my future :}
>
> Typically save_fpu() is called from __giveup_fpu() which saves the FP regs and
> also *turns off FP* in the tasks MSR, meaning the kernel will reload the FP regs
> from the thread struct before letting the task use FP again. So in that case
> save_fpu() is free to clobber fr0 because the FP regs no longer hold live values
> for the task.
>
> There is another case though, which is the path via:
>   copy_process()
>     dup_task_struct()
>       arch_dup_task_struct()
>         flush_all_to_thread()
>           save_all()
>
> That path saves the FP regs but leaves them live. That's meant as an
> optimisation for a process that's using FP/VSX and then calls fork(), leaving
> the regs live means the parent process doesn't have to take a fault after the
> fork to get its FP regs back.
>
> That path does clobber fr0, but fr0 is volatile across a syscall, and the only
> way to reach copy_process() from userspace is via a syscall. So in normal usage
> fr0 being clobbered across a syscall shouldn't cause data corruption.
>
> Even if we handle a signal on the return from the fork() syscall, the worst that
> happens is that the task's thread struct holds the clobbered fr0, but the task
> doesn't care (because fr0 is volatile across the syscall anyway).
>
> That path is something like:
>
> system_call_vectored_common()
>   system_call_exception()
>     sys_fork()
>       kernel_clone()
>         copy_process()
>           dup_task_struct()
>             arch_dup_task_struct()
>               flush_all_to_thread()
>                 save_all()
>                   if (tsk->thread.regs->msr & MSR_FP)
>                     save_fpu()
>                     # does not clear MSR_FP from regs->msr
>   syscall_exit_prepare()
>     interrupt_exit_user_prepare_main()
>       do_notify_resume()
>         get_signal()
>         handle_rt_signal64()
>           prepare_setup_sigcontext()
>             flush_fp_to_thread()
>               if (tsk->thread.regs->msr & MSR_FP)
>                 giveup_fpu()
>                   __giveup_fpu
>                     save_fpu()
>                     # clobbered fr0 is saved, but task considers it volatile
>                     # across syscall anyway
>
>
> But we now have a new path, because io-uring can call copy_process() via
> create_io_thread() from the signal handling path. That's OK if the signal is
> handled as we return from a syscall, but it's not OK if the signal is handled
> due to some other interrupt.
>
> Which is:
>
> interrupt_return_srr_user()
>   interrupt_exit_user_prepare()
>     interrupt_exit_user_prepare_main()
>       do_notify_resume()
>         get_signal()
>           task_work_run()
>             create_worker_cb()
>               create_io_worker()
>                 copy_process()
>                   dup_task_struct()
>                     arch_dup_task_struct()
>                       flush_all_to_thread()
>                         save_all()
>                           if (tsk->thread.regs->msr & MSR_FP)
>                             save_fpu()
>                             # fr0 is clobbered and potentially live in userspace
>
>
> So tldr I think the corruption is only an issue since io-uring started doing
> the clone via signal, which I think matches the observed timeline of this bug
> appearing.
>
> Gotta run home, will have a closer look at the actual patch later on.
>
> cheers
>
>
> Timothy Pearson <tpearson@raptorengineering.com> writes:
> > During floating point and vector save to thread data fr0/vs0 are clobbered
> > by the FPSCR/VSCR store routine.  This leads to userspace register corruption
> > and application data corruption / crash under the following rare condition:
> >
> >  * A userspace thread is executing with VSX/FP mode enabled
> >  * The userspace thread is making active use of fr0 and/or vs0
> >  * An IPI is taken in kernel mode, forcing the userspace thread to reschedule
> >  * The userspace thread is interrupted by the IPI before accessing data it
> >    previously stored in fr0/vs0
> >  * The thread being switched in by the IPI has a pending signal
> >
> > If these exact criteria are met, then the following sequence happens:
> >
> >  * The existing thread FP storage is still valid before the IPI, due to a
> >    prior call to save_fpu() or store_fp_state().  Note that the current
> >    fr0/vs0 registers have been clobbered, so the FP/VSX state in registers
> >    is now invalid pending a call to restore_fp()/restore_altivec().
> >  * IPI -- FP/VSX register state remains invalid
> >  * interrupt_exit_user_prepare_main() calls do_notify_resume(),
> >    due to the pending signal
> >  * do_notify_resume() eventually calls save_fpu() via giveup_fpu(), which
> >    merrily reads and saves the invalid FP/VSX state to thread local storage.
> >  * interrupt_exit_user_prepare_main() calls restore_math(), writing the invalid
> >    FP/VSX state back to registers.
> >  * Execution is released to userspace, and the application crashes or corrupts
> >    data.
> >
> > Without the pending signal, do_notify_resume() is never called, therefore the
> > invalid register state does't matter as it is overwritten nearly immediately
> > by interrupt_exit_user_prepare_main() calling restore_math() before return
> > to userspace.
> >
> > Restore fr0/vs0 after FPSCR/VSCR store has completed for both the fp and
> > altivec register save paths.
> >
> > Tested under QEMU in kvm mode, running on a Talos II workstation with dual
> > POWER9 DD2.2 CPUs.
> >
> > Closes: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/
> > Closes: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/
> > Tested-by: Timothy Pearson <tpearson@raptorengineering.com>
> > Tested-by: Jens Axboe <axboe@kernel.dk>
> > Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
> > ---
> >  arch/powerpc/kernel/fpu.S    | 13 +++++++++++++
> >  arch/powerpc/kernel/vector.S |  2 ++
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> > index 6a9acfb690c9..2f8f3f93cbb6 100644
> > --- a/arch/powerpc/kernel/fpu.S
> > +++ b/arch/powerpc/kernel/fpu.S
> > @@ -23,6 +23,15 @@
> >  #include <asm/feature-fixups.h>
> >  
> >  #ifdef CONFIG_VSX
> > +#define __REST_1FPVSR(n,c,base)						\
> > +BEGIN_FTR_SECTION							\
> > +	b	2f;							\
> > +END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
> > +	REST_FPR(n,base);						\
> > +	b	3f;							\
> > +2:	REST_VSR(n,c,base);						\
> > +3:
> > +
> >  #define __REST_32FPVSRS(n,c,base)					\
> >  BEGIN_FTR_SECTION							\
> >  	b	2f;							\
> > @@ -41,9 +50,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
> >  2:	SAVE_32VSRS(n,c,base);						\
> >  3:
> >  #else
> > +#define __REST_1FPVSR(n,b,base)		REST_FPR(n, base)
> >  #define __REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
> >  #define __SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
> >  #endif
> > +#define REST_1FPVSR(n,c,base)   __REST_1FPVSR(n,__REG_##c,__REG_##base)
> >  #define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base)
> >  #define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
> >  
> > @@ -67,6 +78,7 @@ _GLOBAL(store_fp_state)
> >  	SAVE_32FPVSRS(0, R4, R3)
> >  	mffs	fr0
> >  	stfd	fr0,FPSTATE_FPSCR(r3)
> > +	REST_1FPVSR(0, R4, R3)
> >  	blr
> >  EXPORT_SYMBOL(store_fp_state)
> >  
> > @@ -138,4 +150,5 @@ _GLOBAL(save_fpu)
> >  2:	SAVE_32FPVSRS(0, R4, R6)
> >  	mffs	fr0
> >  	stfd	fr0,FPSTATE_FPSCR(r6)
> > +	REST_1FPVSR(0, R4, R6)
> >  	blr
> > diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
> > index 4094e4c4c77a..80b3f6e476b6 100644
> > --- a/arch/powerpc/kernel/vector.S
> > +++ b/arch/powerpc/kernel/vector.S
> > @@ -33,6 +33,7 @@ _GLOBAL(store_vr_state)
> >  	mfvscr	v0
> >  	li	r4, VRSTATE_VSCR
> >  	stvx	v0, r4, r3
> > +	lvx	v0, 0, r3
> >  	blr
> >  EXPORT_SYMBOL(store_vr_state)
> >  
> > @@ -109,6 +110,7 @@ _GLOBAL(save_altivec)
> >  	mfvscr	v0
> >  	li	r4,VRSTATE_VSCR
> >  	stvx	v0,r4,r7
> > +	lvx	v0,0,r7
> >  	blr
> >  
> >  #ifdef CONFIG_VSX
> > -- 
> > 2.39.2


  reply	other threads:[~2023-11-20 14:32 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-19 15:18 [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save Timothy Pearson
2023-11-20  7:10 ` Michael Ellerman
2023-11-20 14:32   ` Nicholas Piggin [this message]
2023-11-20 16:45   ` Timothy Pearson
2023-11-20 16:45     ` Timothy Pearson
2023-11-20 23:39     ` Michael Ellerman
2023-11-20 23:39       ` Michael Ellerman
2023-11-21  0:27       ` Nicholas Piggin
2023-11-21  0:27         ` Nicholas Piggin
2023-11-21  1:23       ` Timothy Pearson
2023-11-21  1:23         ` Timothy Pearson
2023-11-21  7:56         ` Nicholas Piggin
2023-11-21  7:56           ` Nicholas Piggin
2023-11-21  4:10       ` Timothy Pearson
2023-11-21  4:10         ` Timothy Pearson
2023-11-21  4:26         ` Timothy Pearson
2023-11-21  4:26           ` Timothy Pearson
2023-11-21  7:54         ` Nicholas Piggin
2023-11-21  7:54           ` Nicholas Piggin
2023-11-22  5:01         ` Michael Ellerman
2023-11-22  5:01           ` Michael Ellerman
2023-11-24  0:01           ` Timothy Pearson
2023-11-24  0:01             ` Timothy Pearson
2023-11-27 18:39             ` Timothy Pearson
2023-11-27 18:39               ` Timothy Pearson
2023-11-27 19:58               ` Christophe Leroy
2023-11-27 19:58                 ` Christophe Leroy
2023-11-28  0:59                 ` Michael Ellerman
2023-11-28  0:59                   ` Michael Ellerman
2023-11-28  1:40                   ` Nicholas Piggin
2023-11-28  1:40                     ` Nicholas Piggin
2023-11-27 22:53               ` Michael Ellerman
2023-11-27 22:53                 ` Michael Ellerman
2023-11-28 12:57                 ` Michael Ellerman
2023-11-28 12:57                   ` Michael Ellerman
2023-11-30 16:29                   ` Timothy Pearson
2023-11-30 16:29                     ` Timothy Pearson
2023-11-21  0:18     ` Nicholas Piggin
2023-11-21  0:18       ` Nicholas Piggin
2023-12-02 23:00 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CX3PNZ6WVZT2.1FF3ZLUTDCX6R@wheely \
    --to=npiggin@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=regressions@lists.linux.dev \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.