All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Aditya Gupta" <adityag@linux.ibm.com>, <linuxppc-dev@ozlabs.org>,
	<mpe@ellerman.id.au>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>,
	"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Hari Bathini <hbathini@linux.ibm.com>
Subject: Re: [RFC PATCH v2 1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs
Date: Thu, 15 Jun 2023 22:10:01 +1000	[thread overview]
Message-ID: <CTD7OOB3NTH8.1QSVBACQ2VI3V@wheely> (raw)
In-Reply-To: <20230615091047.90433-1-adityag@linux.ibm.com>

On Thu Jun 15, 2023 at 7:10 PM AEST, Aditya Gupta wrote:
> ppc_save_regs() skips one stack frame while saving the CPU register states.
> Instead of saving current R1, it pulls the previous stack frame pointer.
>
> When vmcores caused by direct panic call (such as `echo c >
> /proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
> backtrace correctly. On further analysis, it was found that it was because
> of mismatch between r1 and NIP.
>
> GDB uses NIP to get current function symbol and uses corresponding debug
> info of that function to unwind previous frames, but due to the
> mismatching r1 and NIP, the unwinding does not work, and it fails to
> unwind to the 2nd frame and hence does not show the backtrace.
>
> GDB backtrace with vmcore of kernel without this patch:
>
> ---------
> (gdb) bt
>  #0  0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>,
>     newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
>  #1  __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974
>  #2  0x0000000000000063 in ?? ()
>  #3  0xc000000003579320 in ?? ()
> ---------
>
> Further analysis revealed that the mismatch occurred because
> "ppc_save_regs" was saving the previous stack's SP instead of the current
> r1. This patch fixes this by storing current r1 in the saved pt_regs.
>
> GDB backtrace with vmcore of patched kernel:
>
> --------
> (gdb) bt
>  #0  0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8)
>     at ./arch/powerpc/include/asm/kexec.h:69
>  #1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
>  #2  0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n")
>     at kernel/panic.c:358
>  #3  0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
>  #4  0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false)
>     at drivers/tty/sysrq.c:602
>  #5  0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>,
>     count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
>  #6  0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>,
>     buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340
>  #7  proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>,
>     ppos=<optimized out>) at fs/proc/inode.c:352
>  #8  0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00,
>     buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>,
>     count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582
>  #9  0xc0000000005b4264 in ksys_write (fd=<optimized out>,
>     buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2)
>     at fs/read_write.c:637
>  #10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>)
>     at arch/powerpc/kernel/syscall.c:171
>  #11 0xc00000000000c270 in system_call_vectored_common ()
>     at arch/powerpc/kernel/interrupt_64.S:192
> --------
>
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>

So this now saves regs as though it was an interrupt taken in the
caller, at the instruction after the call to ppc_save_regs, whereas
previously the NIP was there, but R1 came from the caller's caller
and that mismatch is what causes gdb's dwarf unwinder to go haywire.

Nice catch, and I think I follow the fix and I think I agree with it.
Before the bug was introduced, NIP also came from the grandparent.
Which is what xmon presumably wanted, but since then I think maybe it
makes more sense to just have the parent caller.

Reivewed-by: Nicholas Piggin <npiggin@gmail.com>
Fixes: d16a58f8854b1 ("powerpc: Improve ppc_save_regs()")

Thanks,
Nick

  reply	other threads:[~2023-06-15 12:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15  9:10 [RFC PATCH v2 1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs Aditya Gupta
2023-06-15 12:10 ` Nicholas Piggin [this message]
2023-06-19  3:48   ` Aditya Gupta
2023-07-03  5:26 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CTD7OOB3NTH8.1QSVBACQ2VI3V@wheely \
    --to=npiggin@gmail.com \
    --cc=adityag@linux.ibm.com \
    --cc=hbathini@linux.ibm.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=sourabhjain@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.