From: Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> To: Michael Ellerman <mpe@ellerman.id.au>, Nicholas Piggin <npiggin@gmail.com>, Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Nathan Lynch <nathanl@linux.ibm.com>, Tyrel Datwyler <tyreld@linux.ibm.com>, Nick Child <nnac123@linux.ibm.com>, Andrew Donnellan <ajd@linux.ibm.com>, Scott Cheloha <cheloha@linux.ibm.com>, Laurent Dufour <ldufour@linux.ibm.com>, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Date: Mon, 06 Mar 2023 15:33:47 -0600 [thread overview] Message-ID: <20230220-rtas-queue-for-6-4-v1-8-010e4416f13f@linux.ibm.com> (raw) In-Reply-To: <20230220-rtas-queue-for-6-4-v1-0-010e4416f13f@linux.ibm.com> From: Nathan Lynch <nathanl@linux.ibm.com> The kernel can handle retrying RTAS function calls in response to -2/990x in the sys_rtas() handler instead of relaying the intermediate status to user space. Justifications: * Currently it's nondeterministic and quite variable in practice whether a retry status is returned for any given invocation of sys_rtas(). Therefore user space code cannot be expecting a retry result without already being broken. * This tends to significantly reduce the total number of system calls issued by programs such as drmgr which make use of sys_rtas(), improving the experience of tracing and debugging such programs. This is the main motivation for me: I think this change will make it easier for us to characterize current sys_rtas() use cases as we move them to other interfaces over time. * It reduces the number of opportunities for user space to leave complex operations, such as those associated with DLPAR, incomplete and diffcult to recover. * We can expect performance improvements for existing sys_rtas() users, not only because of overall reduction in the number of system calls issued, but also due to the better handling of -2/990x in the kernel. For example, librtas still sleeps for 1ms on -2, which is completely unnecessary. Performance differences for PHB add and remove on a small P10 PowerVM partition are included below. For add, elapsed time is slightly reduced. For remove, there are more significant improvements: the number of context switches is reduced by an order of magnitude, and elapsed time is reduced by over half. (- before, + after): Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs): - 1,847.58 msec task-clock # 0.135 CPUs utilized ( +- 14.15% ) - 10,867 cs # 9.800 K/sec ( +- 14.14% ) + 1,901.15 msec task-clock # 0.148 CPUs utilized ( +- 14.13% ) + 10,451 cs # 9.158 K/sec ( +- 14.14% ) - 13.656557 +- 0.000124 seconds time elapsed ( +- 0.00% ) + 12.88080 +- 0.00404 seconds time elapsed ( +- 0.03% ) Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs): - 1,473.75 msec task-clock # 0.092 CPUs utilized ( +- 14.15% ) - 2,652 cs # 3.000 K/sec ( +- 14.16% ) + 1,444.55 msec task-clock # 0.221 CPUs utilized ( +- 14.14% ) + 104 cs # 119.957 /sec ( +- 14.63% ) - 15.99718 +- 0.00801 seconds time elapsed ( +- 0.05% ) + 6.54256 +- 0.00830 seconds time elapsed ( +- 0.13% ) Move the existing rtas_lock-guarded critical section in sys_rtas() into a conventional rtas_busy_delay()-based loop, returning to user space only when a final success or failure result is available. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> --- arch/powerpc/kernel/rtas.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 47a2aa43d7d4..c330a22ccc70 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs, /* We assume to be passed big endian arguments */ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) { - struct pin_cookie cookie; struct rtas_args args; unsigned long flags; char *buff_copy, *errbuf = NULL; @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) buff_copy = get_errorlog_buffer(); - raw_spin_lock_irqsave(&rtas_lock, flags); - cookie = lockdep_pin_lock(&rtas_lock); + do { + struct pin_cookie cookie; - rtas_args = args; - do_enter_rtas(&rtas_args); - args = rtas_args; + raw_spin_lock_irqsave(&rtas_lock, flags); + cookie = lockdep_pin_lock(&rtas_lock); - /* A -1 return code indicates that the last command couldn't - be completed due to a hardware error. */ - if (be32_to_cpu(args.rets[0]) == -1) - errbuf = __fetch_rtas_last_error(buff_copy); + rtas_args = args; + do_enter_rtas(&rtas_args); + args = rtas_args; - lockdep_unpin_lock(&rtas_lock, cookie); - raw_spin_unlock_irqrestore(&rtas_lock, flags); + /* + * Handle error record retrieval before releasing the lock. + */ + if (be32_to_cpu(args.rets[0]) == -1) + errbuf = __fetch_rtas_last_error(buff_copy); + + lockdep_unpin_lock(&rtas_lock, cookie); + raw_spin_unlock_irqrestore(&rtas_lock, flags); + } while (rtas_busy_delay(be32_to_cpu(args.rets[0]))); if (buff_copy) { if (errbuf) -- 2.39.1
WARNING: multiple messages have this Message-ID (diff)
From: Nathan Lynch <nathanl@linux.ibm.com> To: Michael Ellerman <mpe@ellerman.id.au>, Nicholas Piggin <npiggin@gmail.com>, Christophe Leroy <christophe.leroy@csgroup.eu> Cc: linuxppc-dev@lists.ozlabs.org, Tyrel Datwyler <tyreld@linux.ibm.com>, Scott Cheloha <cheloha@linux.ibm.com>, Andrew Donnellan <ajd@linux.ibm.com>, Nick Child <nnac123@linux.ibm.com>, Laurent Dufour <ldufour@linux.ibm.com>, Nathan Lynch <nathanl@linux.ibm.com> Subject: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Date: Mon, 06 Mar 2023 15:33:47 -0600 [thread overview] Message-ID: <20230220-rtas-queue-for-6-4-v1-8-010e4416f13f@linux.ibm.com> (raw) In-Reply-To: <20230220-rtas-queue-for-6-4-v1-0-010e4416f13f@linux.ibm.com> The kernel can handle retrying RTAS function calls in response to -2/990x in the sys_rtas() handler instead of relaying the intermediate status to user space. Justifications: * Currently it's nondeterministic and quite variable in practice whether a retry status is returned for any given invocation of sys_rtas(). Therefore user space code cannot be expecting a retry result without already being broken. * This tends to significantly reduce the total number of system calls issued by programs such as drmgr which make use of sys_rtas(), improving the experience of tracing and debugging such programs. This is the main motivation for me: I think this change will make it easier for us to characterize current sys_rtas() use cases as we move them to other interfaces over time. * It reduces the number of opportunities for user space to leave complex operations, such as those associated with DLPAR, incomplete and diffcult to recover. * We can expect performance improvements for existing sys_rtas() users, not only because of overall reduction in the number of system calls issued, but also due to the better handling of -2/990x in the kernel. For example, librtas still sleeps for 1ms on -2, which is completely unnecessary. Performance differences for PHB add and remove on a small P10 PowerVM partition are included below. For add, elapsed time is slightly reduced. For remove, there are more significant improvements: the number of context switches is reduced by an order of magnitude, and elapsed time is reduced by over half. (- before, + after): Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs): - 1,847.58 msec task-clock # 0.135 CPUs utilized ( +- 14.15% ) - 10,867 cs # 9.800 K/sec ( +- 14.14% ) + 1,901.15 msec task-clock # 0.148 CPUs utilized ( +- 14.13% ) + 10,451 cs # 9.158 K/sec ( +- 14.14% ) - 13.656557 +- 0.000124 seconds time elapsed ( +- 0.00% ) + 12.88080 +- 0.00404 seconds time elapsed ( +- 0.03% ) Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs): - 1,473.75 msec task-clock # 0.092 CPUs utilized ( +- 14.15% ) - 2,652 cs # 3.000 K/sec ( +- 14.16% ) + 1,444.55 msec task-clock # 0.221 CPUs utilized ( +- 14.14% ) + 104 cs # 119.957 /sec ( +- 14.63% ) - 15.99718 +- 0.00801 seconds time elapsed ( +- 0.05% ) + 6.54256 +- 0.00830 seconds time elapsed ( +- 0.13% ) Move the existing rtas_lock-guarded critical section in sys_rtas() into a conventional rtas_busy_delay()-based loop, returning to user space only when a final success or failure result is available. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> --- arch/powerpc/kernel/rtas.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 47a2aa43d7d4..c330a22ccc70 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs, /* We assume to be passed big endian arguments */ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) { - struct pin_cookie cookie; struct rtas_args args; unsigned long flags; char *buff_copy, *errbuf = NULL; @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) buff_copy = get_errorlog_buffer(); - raw_spin_lock_irqsave(&rtas_lock, flags); - cookie = lockdep_pin_lock(&rtas_lock); + do { + struct pin_cookie cookie; - rtas_args = args; - do_enter_rtas(&rtas_args); - args = rtas_args; + raw_spin_lock_irqsave(&rtas_lock, flags); + cookie = lockdep_pin_lock(&rtas_lock); - /* A -1 return code indicates that the last command couldn't - be completed due to a hardware error. */ - if (be32_to_cpu(args.rets[0]) == -1) - errbuf = __fetch_rtas_last_error(buff_copy); + rtas_args = args; + do_enter_rtas(&rtas_args); + args = rtas_args; - lockdep_unpin_lock(&rtas_lock, cookie); - raw_spin_unlock_irqrestore(&rtas_lock, flags); + /* + * Handle error record retrieval before releasing the lock. + */ + if (be32_to_cpu(args.rets[0]) == -1) + errbuf = __fetch_rtas_last_error(buff_copy); + + lockdep_unpin_lock(&rtas_lock, cookie); + raw_spin_unlock_irqrestore(&rtas_lock, flags); + } while (rtas_busy_delay(be32_to_cpu(args.rets[0]))); if (buff_copy) { if (errbuf) -- 2.39.1
next prev parent reply other threads:[~2023-03-06 21:38 UTC|newest] Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay 2023-03-06 21:33 ` Nathan Lynch 2023-03-06 21:33 ` [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args Nathan Lynch 2023-03-06 21:33 ` Nathan Lynch via B4 Relay 2023-03-23 4:00 ` Andrew Donnellan 2023-03-06 21:33 ` [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy Nathan Lynch 2023-03-06 21:33 ` Nathan Lynch via B4 Relay 2023-03-23 4:09 ` Andrew Donnellan 2023-03-06 21:33 ` [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc Nathan Lynch via B4 Relay 2023-03-06 21:33 ` Nathan Lynch 2023-03-23 4:15 ` Andrew Donnellan 2023-03-06 21:33 ` [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc Nathan Lynch via B4 Relay 2023-03-06 21:33 ` Nathan Lynch 2023-03-23 0:17 ` Andrew Donnellan 2023-03-06 21:33 ` [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call() Nathan Lynch 2023-03-06 21:33 ` Nathan Lynch via B4 Relay 2023-03-23 4:17 ` Andrew Donnellan 2023-03-23 16:11 ` Nathan Lynch 2023-03-29 12:24 ` Michael Ellerman 2023-03-06 21:33 ` [PATCH 6/8] powerpc/rtas: lockdep annotations Nathan Lynch 2023-03-06 21:33 ` Nathan Lynch via B4 Relay 2023-03-23 6:01 ` Andrew Donnellan 2023-03-06 21:33 ` [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked() Nathan Lynch via B4 Relay 2023-03-06 21:33 ` Nathan Lynch 2023-03-23 4:25 ` Andrew Donnellan 2023-03-23 12:17 ` Nathan Lynch 2023-03-24 0:56 ` Nathan Lynch 2023-03-29 12:20 ` Michael Ellerman 2023-03-29 16:23 ` Nathan Lynch 2023-03-06 21:33 ` Nathan Lynch via B4 Relay [this message] 2023-03-06 21:33 ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch 2023-03-23 6:26 ` Andrew Donnellan 2023-03-23 19:39 ` Nathan Lynch 2023-03-23 9:44 ` Michael Ellerman 2023-03-23 13:40 ` Nathan Lynch 2024-01-25 15:55 ` Christophe Leroy 2024-01-25 16:33 ` Nathan Lynch 2024-01-25 16:46 ` Christophe Leroy 2024-01-25 17:23 ` Nathan Lynch 2023-04-06 1:09 ` (subset) [PATCH 0/8] RTAS changes for 6.4 Michael Ellerman 2023-04-26 12:12 ` Michael Ellerman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20230220-rtas-queue-for-6-4-v1-8-010e4416f13f@linux.ibm.com \ --to=devnull+nathanl.linux.ibm.com@kernel.org \ --cc=ajd@linux.ibm.com \ --cc=cheloha@linux.ibm.com \ --cc=christophe.leroy@csgroup.eu \ --cc=ldufour@linux.ibm.com \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mpe@ellerman.id.au \ --cc=nathanl@linux.ibm.com \ --cc=nnac123@linux.ibm.com \ --cc=npiggin@gmail.com \ --cc=tyreld@linux.ibm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.