All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
To: Michael Ellerman <mpe@ellerman.id.au>,
	 Nicholas Piggin <npiggin@gmail.com>,
	 Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
	Tyrel Datwyler <tyreld@linux.ibm.com>,
	Nick Child <nnac123@linux.ibm.com>,
	Andrew Donnellan <ajd@linux.ibm.com>,
	Scott Cheloha <cheloha@linux.ibm.com>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
Date: Mon, 06 Mar 2023 15:33:47 -0600	[thread overview]
Message-ID: <20230220-rtas-queue-for-6-4-v1-8-010e4416f13f@linux.ibm.com> (raw)
In-Reply-To: <20230220-rtas-queue-for-6-4-v1-0-010e4416f13f@linux.ibm.com>

From: Nathan Lynch <nathanl@linux.ibm.com>

The kernel can handle retrying RTAS function calls in response to
-2/990x in the sys_rtas() handler instead of relaying the intermediate
status to user space.

Justifications:

* Currently it's nondeterministic and quite variable in practice
  whether a retry status is returned for any given invocation of
  sys_rtas(). Therefore user space code cannot be expecting a retry
  result without already being broken.

* This tends to significantly reduce the total number of system calls
  issued by programs such as drmgr which make use of sys_rtas(),
  improving the experience of tracing and debugging such
  programs. This is the main motivation for me: I think this change
  will make it easier for us to characterize current sys_rtas() use
  cases as we move them to other interfaces over time.

* It reduces the number of opportunities for user space to leave
  complex operations, such as those associated with DLPAR, incomplete
  and diffcult to recover.

* We can expect performance improvements for existing sys_rtas()
  users, not only because of overall reduction in the number of system
  calls issued, but also due to the better handling of -2/990x in the
  kernel. For example, librtas still sleeps for 1ms on -2, which is
  completely unnecessary.

Performance differences for PHB add and remove on a small P10 PowerVM
partition are included below. For add, elapsed time is slightly
reduced. For remove, there are more significant improvements: the
number of context switches is reduced by an order of magnitude, and
elapsed time is reduced by over half.

(- before, + after):

  Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):

-          1,847.58 msec task-clock                       #    0.135 CPUs utilized               ( +- 14.15% )
-            10,867      cs                               #    9.800 K/sec                       ( +- 14.14% )
+          1,901.15 msec task-clock                       #    0.148 CPUs utilized               ( +- 14.13% )
+            10,451      cs                               #    9.158 K/sec                       ( +- 14.14% )

-         13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
+          12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )

  Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):

-          1,473.75 msec task-clock                       #    0.092 CPUs utilized               ( +- 14.15% )
-             2,652      cs                               #    3.000 K/sec                       ( +- 14.16% )
+          1,444.55 msec task-clock                       #    0.221 CPUs utilized               ( +- 14.14% )
+               104      cs                               #  119.957 /sec                        ( +- 14.63% )

-          15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
+           6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )

Move the existing rtas_lock-guarded critical section in sys_rtas()
into a conventional rtas_busy_delay()-based loop, returning to user
space only when a final success or failure result is available.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 47a2aa43d7d4..c330a22ccc70 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
-	struct pin_cookie cookie;
 	struct rtas_args args;
 	unsigned long flags;
 	char *buff_copy, *errbuf = NULL;
@@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 
 	buff_copy = get_errorlog_buffer();
 
-	raw_spin_lock_irqsave(&rtas_lock, flags);
-	cookie = lockdep_pin_lock(&rtas_lock);
+	do {
+		struct pin_cookie cookie;
 
-	rtas_args = args;
-	do_enter_rtas(&rtas_args);
-	args = rtas_args;
+		raw_spin_lock_irqsave(&rtas_lock, flags);
+		cookie = lockdep_pin_lock(&rtas_lock);
 
-	/* A -1 return code indicates that the last command couldn't
-	   be completed due to a hardware error. */
-	if (be32_to_cpu(args.rets[0]) == -1)
-		errbuf = __fetch_rtas_last_error(buff_copy);
+		rtas_args = args;
+		do_enter_rtas(&rtas_args);
+		args = rtas_args;
 
-	lockdep_unpin_lock(&rtas_lock, cookie);
-	raw_spin_unlock_irqrestore(&rtas_lock, flags);
+		/*
+		 * Handle error record retrieval before releasing the lock.
+		 */
+		if (be32_to_cpu(args.rets[0]) == -1)
+			errbuf = __fetch_rtas_last_error(buff_copy);
+
+		lockdep_unpin_lock(&rtas_lock, cookie);
+		raw_spin_unlock_irqrestore(&rtas_lock, flags);
+	} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
 
 	if (buff_copy) {
 		if (errbuf)

-- 
2.39.1


WARNING: multiple messages have this Message-ID (diff)
From: Nathan Lynch <nathanl@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	 Nicholas Piggin <npiggin@gmail.com>,
	 Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org,
	Tyrel Datwyler <tyreld@linux.ibm.com>,
	 Scott Cheloha <cheloha@linux.ibm.com>,
	Andrew Donnellan <ajd@linux.ibm.com>,
	 Nick Child <nnac123@linux.ibm.com>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	 Nathan Lynch <nathanl@linux.ibm.com>
Subject: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
Date: Mon, 06 Mar 2023 15:33:47 -0600	[thread overview]
Message-ID: <20230220-rtas-queue-for-6-4-v1-8-010e4416f13f@linux.ibm.com> (raw)
In-Reply-To: <20230220-rtas-queue-for-6-4-v1-0-010e4416f13f@linux.ibm.com>

The kernel can handle retrying RTAS function calls in response to
-2/990x in the sys_rtas() handler instead of relaying the intermediate
status to user space.

Justifications:

* Currently it's nondeterministic and quite variable in practice
  whether a retry status is returned for any given invocation of
  sys_rtas(). Therefore user space code cannot be expecting a retry
  result without already being broken.

* This tends to significantly reduce the total number of system calls
  issued by programs such as drmgr which make use of sys_rtas(),
  improving the experience of tracing and debugging such
  programs. This is the main motivation for me: I think this change
  will make it easier for us to characterize current sys_rtas() use
  cases as we move them to other interfaces over time.

* It reduces the number of opportunities for user space to leave
  complex operations, such as those associated with DLPAR, incomplete
  and diffcult to recover.

* We can expect performance improvements for existing sys_rtas()
  users, not only because of overall reduction in the number of system
  calls issued, but also due to the better handling of -2/990x in the
  kernel. For example, librtas still sleeps for 1ms on -2, which is
  completely unnecessary.

Performance differences for PHB add and remove on a small P10 PowerVM
partition are included below. For add, elapsed time is slightly
reduced. For remove, there are more significant improvements: the
number of context switches is reduced by an order of magnitude, and
elapsed time is reduced by over half.

(- before, + after):

  Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):

-          1,847.58 msec task-clock                       #    0.135 CPUs utilized               ( +- 14.15% )
-            10,867      cs                               #    9.800 K/sec                       ( +- 14.14% )
+          1,901.15 msec task-clock                       #    0.148 CPUs utilized               ( +- 14.13% )
+            10,451      cs                               #    9.158 K/sec                       ( +- 14.14% )

-         13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
+          12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )

  Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):

-          1,473.75 msec task-clock                       #    0.092 CPUs utilized               ( +- 14.15% )
-             2,652      cs                               #    3.000 K/sec                       ( +- 14.16% )
+          1,444.55 msec task-clock                       #    0.221 CPUs utilized               ( +- 14.14% )
+               104      cs                               #  119.957 /sec                        ( +- 14.63% )

-          15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
+           6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )

Move the existing rtas_lock-guarded critical section in sys_rtas()
into a conventional rtas_busy_delay()-based loop, returning to user
space only when a final success or failure result is available.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 47a2aa43d7d4..c330a22ccc70 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
-	struct pin_cookie cookie;
 	struct rtas_args args;
 	unsigned long flags;
 	char *buff_copy, *errbuf = NULL;
@@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 
 	buff_copy = get_errorlog_buffer();
 
-	raw_spin_lock_irqsave(&rtas_lock, flags);
-	cookie = lockdep_pin_lock(&rtas_lock);
+	do {
+		struct pin_cookie cookie;
 
-	rtas_args = args;
-	do_enter_rtas(&rtas_args);
-	args = rtas_args;
+		raw_spin_lock_irqsave(&rtas_lock, flags);
+		cookie = lockdep_pin_lock(&rtas_lock);
 
-	/* A -1 return code indicates that the last command couldn't
-	   be completed due to a hardware error. */
-	if (be32_to_cpu(args.rets[0]) == -1)
-		errbuf = __fetch_rtas_last_error(buff_copy);
+		rtas_args = args;
+		do_enter_rtas(&rtas_args);
+		args = rtas_args;
 
-	lockdep_unpin_lock(&rtas_lock, cookie);
-	raw_spin_unlock_irqrestore(&rtas_lock, flags);
+		/*
+		 * Handle error record retrieval before releasing the lock.
+		 */
+		if (be32_to_cpu(args.rets[0]) == -1)
+			errbuf = __fetch_rtas_last_error(buff_copy);
+
+		lockdep_unpin_lock(&rtas_lock, cookie);
+		raw_spin_unlock_irqrestore(&rtas_lock, flags);
+	} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
 
 	if (buff_copy) {
 		if (errbuf)

-- 
2.39.1


  parent reply	other threads:[~2023-03-06 21:38 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
2023-03-06 21:33 ` Nathan Lynch
2023-03-06 21:33 ` [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args Nathan Lynch
2023-03-06 21:33   ` Nathan Lynch via B4 Relay
2023-03-23  4:00   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy Nathan Lynch
2023-03-06 21:33   ` Nathan Lynch via B4 Relay
2023-03-23  4:09   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc Nathan Lynch via B4 Relay
2023-03-06 21:33   ` Nathan Lynch
2023-03-23  4:15   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc Nathan Lynch via B4 Relay
2023-03-06 21:33   ` Nathan Lynch
2023-03-23  0:17   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call() Nathan Lynch
2023-03-06 21:33   ` Nathan Lynch via B4 Relay
2023-03-23  4:17   ` Andrew Donnellan
2023-03-23 16:11     ` Nathan Lynch
2023-03-29 12:24   ` Michael Ellerman
2023-03-06 21:33 ` [PATCH 6/8] powerpc/rtas: lockdep annotations Nathan Lynch
2023-03-06 21:33   ` Nathan Lynch via B4 Relay
2023-03-23  6:01   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked() Nathan Lynch via B4 Relay
2023-03-06 21:33   ` Nathan Lynch
2023-03-23  4:25   ` Andrew Donnellan
2023-03-23 12:17     ` Nathan Lynch
2023-03-24  0:56       ` Nathan Lynch
2023-03-29 12:20         ` Michael Ellerman
2023-03-29 16:23           ` Nathan Lynch
2023-03-06 21:33 ` Nathan Lynch via B4 Relay [this message]
2023-03-06 21:33   ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch
2023-03-23  6:26   ` Andrew Donnellan
2023-03-23 19:39     ` Nathan Lynch
2023-03-23  9:44   ` Michael Ellerman
2023-03-23 13:40     ` Nathan Lynch
2024-01-25 15:55   ` Christophe Leroy
2024-01-25 16:33     ` Nathan Lynch
2024-01-25 16:46       ` Christophe Leroy
2024-01-25 17:23         ` Nathan Lynch
2023-04-06  1:09 ` (subset) [PATCH 0/8] RTAS changes for 6.4 Michael Ellerman
2023-04-26 12:12 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230220-rtas-queue-for-6-4-v1-8-010e4416f13f@linux.ibm.com \
    --to=devnull+nathanl.linux.ibm.com@kernel.org \
    --cc=ajd@linux.ibm.com \
    --cc=cheloha@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=ldufour@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=nnac123@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.