From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Michael Ellerman <mpe@ellerman.id.au>,
Matthew Wilcox <willy@infradead.org>,
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"npiggin@gmail.com" <npiggin@gmail.com>
Subject: Re: [PATCH v2] powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes
Date: Sat, 11 Nov 2023 10:33:50 +0000 [thread overview]
Message-ID: <e381f776-8284-3720-53dd-7ee08878f56e@csgroup.eu> (raw)
In-Reply-To: <87bkccgz9b.fsf@mail.lhotse>
Le 02/11/2023 à 12:39, Michael Ellerman a écrit :
> Matthew Wilcox <willy@infradead.org> writes:
>> On Tue, Oct 24, 2023 at 08:06:04PM +0530, Aneesh Kumar K.V wrote:
>>> ptep++;
>>> - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>>> addr += PAGE_SIZE;
>>> + /*
>>> + * increment the pfn.
>>> + */
>>> + pte = pfn_pte(pte_pfn(pte) + 1, pte_pgprot((pte)));
>>
>> when i looked at this, it generated shit code. did you check?
>
> I didn't look ...
>
> <goes and looks>
>
> It's not super clear cut. There's some difference because pfn_pte()
> contains two extra VM_BUG_ONs.
>
> But with DEBUG_VM *off* the version using pfn_pte() generates *better*
> code, or at least less code, ~160 instructions vs ~200.
>
> For some reason the version using PTE_RPN_SHIFT seems to be byte
> swapping the pte an extra two times, each of which generates ~8
> instructions. But I can't see why.
>
> I tried a few other things and couldn't come up with anything that
> generated better code. But I'll keep poking at it tomorrow.
On PPC32 the version using PTE_RPN_SHIFT is better, here is what the
main loop of set_ptes() looks like:
22c: 55 29 f0 be srwi r9,r9,2
230: 7d 29 03 a6 mtctr r9
234: 39 3f 10 00 addi r9,r31,4096
238: 39 1f 20 00 addi r8,r31,8192
23c: 39 5f 30 00 addi r10,r31,12288
240: 3b ff 40 00 addi r31,r31,16384
244: 91 3e 00 04 stw r9,4(r30)
248: 91 1e 00 08 stw r8,8(r30)
24c: 91 5e 00 0c stw r10,12(r30)
250: 97 fe 00 10 stwu r31,16(r30)
254: 42 00 ff e0 bdnz 234 <set_ptes+0x78>
With the version using pfn_pte(), the main loop is:
218: 54 e9 f8 7e srwi r9,r7,1
21c: 7d 29 03 a6 mtctr r9
220: 57 e9 00 26 clrrwi r9,r31,12
224: 39 29 10 00 addi r9,r9,4096
228: 57 ff 05 3e clrlwi r31,r31,20
22c: 7d 29 fb 78 or r9,r9,r31
230: 55 3f 00 26 clrrwi r31,r9,12
234: 3b ff 10 00 addi r31,r31,4096
238: 55 28 05 3e clrlwi r8,r9,20
23c: 7f ff 43 78 or r31,r31,r8
240: 91 3d 00 04 stw r9,4(r29)
244: 93 fd 00 08 stw r31,8(r29)
248: 3b bd 00 08 addi r29,r29,8
24c: 42 00 ff d4 bdnz 220 <set_ptes+0x64>
Not only the loop is bigger, but it is also only unrolled by 2 while
first one is unrolled by 4 (r7 and r9 contain the same value).
Therefore allthough the PTE_RPN_SHIFT version is 87 instructions while
the other one is only 81 instructions, the former looks better.
Christophe
prev parent reply other threads:[~2023-11-11 10:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-24 14:36 [PATCH v2] powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes Aneesh Kumar K.V
2023-10-27 9:46 ` Michael Ellerman
2023-10-27 10:50 ` Matthew Wilcox
2023-11-02 11:39 ` Michael Ellerman
2023-11-11 10:33 ` Christophe Leroy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e381f776-8284-3720-53dd-7ee08878f56e@csgroup.eu \
--to=christophe.leroy@csgroup.eu \
--cc=aneesh.kumar@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.