All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	Mike Kravetz <mike.kravetz@oracle.com>,
	mpe@ellerman.id.au, Dan Williams <dan.j.williams@intel.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	npiggin@gmail.com, linuxppc-dev@lists.ozlabs.org,
	Muchun Song <muchun.song@linux.dev>,
	Will Deacon <will@kernel.org>,
	christophe.leroy@csgroup.eu
Subject: Re: [PATCH v2 08/16] mm/vmemmap: Improve vmemmap_can_optimize and allow architectures to override
Date: Tue, 20 Jun 2023 12:53:21 +0100	[thread overview]
Message-ID: <ed1057ce-2d8d-1053-9f54-2801cfed9de4@oracle.com> (raw)
In-Reply-To: <20230616110826.344417-9-aneesh.kumar@linux.ibm.com>

On 16/06/2023 12:08, Aneesh Kumar K.V wrote:
> dax vmemmap optimization requires a minimum of 2 PAGE_SIZE area within
> vmemmap such that tail page mapping can point to the second PAGE_SIZE area.
> Enforce that in vmemmap_can_optimize() function.
> 
> Architectures like powerpc also want to enable vmemmap optimization
> conditionally (only with radix MMU translation). Hence allow architecture
> override.
> 
This makes sense. The enforcing here is not just for correctness but because you
want to use VMEMMAP_RESERVE_NR supposedly?

I would suggest having two patches one for the refactor and another one for the
override, but I don't feel particularly strongly about it.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  include/linux/mm.h | 30 ++++++++++++++++++++++++++----
>  mm/mm_init.c       |  2 +-
>  2 files changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 27ce77080c79..9a45e61cd83f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -31,6 +31,8 @@
>  #include <linux/memremap.h>
>  #include <linux/slab.h>
>  
> +#include <asm/page.h>
> +

Why is this include needed?

>  struct mempolicy;
>  struct anon_vma;
>  struct anon_vma_chain;
> @@ -3550,13 +3552,33 @@ void vmemmap_free(unsigned long start, unsigned long end,
>  		struct vmem_altmap *altmap);
>  #endif
>  
> +#define VMEMMAP_RESERVE_NR	2

see below

>  #ifdef CONFIG_ARCH_WANT_OPTIMIZE_VMEMMAP
> -static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap,
> -					   struct dev_pagemap *pgmap)
> +static inline bool __vmemmap_can_optimize(struct vmem_altmap *altmap,
> +					  struct dev_pagemap *pgmap)
>  {
> -	return is_power_of_2(sizeof(struct page)) &&
> -		pgmap && (pgmap_vmemmap_nr(pgmap) > 1) && !altmap;
> +	if (pgmap) {
> +		unsigned long nr_pages;
> +		unsigned long nr_vmemmap_pages;
> +
> +		nr_pages = pgmap_vmemmap_nr(pgmap);
> +		nr_vmemmap_pages = ((nr_pages * sizeof(struct page)) >> PAGE_SHIFT);
> +		/*
> +		 * For vmemmap optimization with DAX we need minimum 2 vmemmap



> +		 * pages. See layout diagram in Documentation/mm/vmemmap_dedup.rst
> +		 */
> +		return is_power_of_2(sizeof(struct page)) &&
> +			(nr_vmemmap_pages > VMEMMAP_RESERVE_NR) && !altmap;
> +	}

It would be more readable (i.e. less identation) if you just reverse this:

	unsigned long nr_vmemmap_pages;

	if (!pgmap || !is_power_of_2(sizeof(struct page))
		return false;

	nr_vmemmap_pages = ((pgmap_vmemmap_nr(pgmap) *
			     sizeof(struct page)) >> PAGE_SHIFT);

	/*
	 * For vmemmap optimization with DAX we need minimum 2 vmemmap
	 * pages. See layout diagram in Documentation/mm/vmemmap_dedup.rst
	 */
	return (nr_vmemmap_pages > VMEMMAP_RESERVE_NR) && !altmap;


> +	return false;
>  }
> +/*
> + * If we don't have an architecture override, use the generic rule
> + */
> +#ifndef vmemmap_can_optimize
> +#define vmemmap_can_optimize __vmemmap_can_optimize
> +#endif
> +

sparse-vmemmap code is trivial to change to use dedup a single vmemmap page
(e.g. to align with hugetlb), hopefully the architecture override to do. this is
to say whether VMEMMAP_RESERVE_NR should have similar to above?

>  #else
>  static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap,
>  					   struct dev_pagemap *pgmap)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 7f7f9c677854..d1676afc94f1 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1020,7 +1020,7 @@ static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
>  	if (!vmemmap_can_optimize(altmap, pgmap))
>  		return pgmap_vmemmap_nr(pgmap);
>  
> -	return 2 * (PAGE_SIZE / sizeof(struct page));
> +	return VMEMMAP_RESERVE_NR * (PAGE_SIZE / sizeof(struct page));
>  }
>  
>  static void __ref memmap_init_compound(struct page *head,


WARNING: multiple messages have this Message-ID (diff)
From: Joao Martins <joao.m.martins@oracle.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>,
	Muchun Song <muchun.song@linux.dev>,
	npiggin@gmail.com, linux-mm@kvack.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	akpm@linux-foundation.org, Oscar Salvador <osalvador@suse.de>,
	linuxppc-dev@lists.ozlabs.org,
	Dan Williams <dan.j.williams@intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [PATCH v2 08/16] mm/vmemmap: Improve vmemmap_can_optimize and allow architectures to override
Date: Tue, 20 Jun 2023 12:53:21 +0100	[thread overview]
Message-ID: <ed1057ce-2d8d-1053-9f54-2801cfed9de4@oracle.com> (raw)
In-Reply-To: <20230616110826.344417-9-aneesh.kumar@linux.ibm.com>

On 16/06/2023 12:08, Aneesh Kumar K.V wrote:
> dax vmemmap optimization requires a minimum of 2 PAGE_SIZE area within
> vmemmap such that tail page mapping can point to the second PAGE_SIZE area.
> Enforce that in vmemmap_can_optimize() function.
> 
> Architectures like powerpc also want to enable vmemmap optimization
> conditionally (only with radix MMU translation). Hence allow architecture
> override.
> 
This makes sense. The enforcing here is not just for correctness but because you
want to use VMEMMAP_RESERVE_NR supposedly?

I would suggest having two patches one for the refactor and another one for the
override, but I don't feel particularly strongly about it.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  include/linux/mm.h | 30 ++++++++++++++++++++++++++----
>  mm/mm_init.c       |  2 +-
>  2 files changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 27ce77080c79..9a45e61cd83f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -31,6 +31,8 @@
>  #include <linux/memremap.h>
>  #include <linux/slab.h>
>  
> +#include <asm/page.h>
> +

Why is this include needed?

>  struct mempolicy;
>  struct anon_vma;
>  struct anon_vma_chain;
> @@ -3550,13 +3552,33 @@ void vmemmap_free(unsigned long start, unsigned long end,
>  		struct vmem_altmap *altmap);
>  #endif
>  
> +#define VMEMMAP_RESERVE_NR	2

see below

>  #ifdef CONFIG_ARCH_WANT_OPTIMIZE_VMEMMAP
> -static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap,
> -					   struct dev_pagemap *pgmap)
> +static inline bool __vmemmap_can_optimize(struct vmem_altmap *altmap,
> +					  struct dev_pagemap *pgmap)
>  {
> -	return is_power_of_2(sizeof(struct page)) &&
> -		pgmap && (pgmap_vmemmap_nr(pgmap) > 1) && !altmap;
> +	if (pgmap) {
> +		unsigned long nr_pages;
> +		unsigned long nr_vmemmap_pages;
> +
> +		nr_pages = pgmap_vmemmap_nr(pgmap);
> +		nr_vmemmap_pages = ((nr_pages * sizeof(struct page)) >> PAGE_SHIFT);
> +		/*
> +		 * For vmemmap optimization with DAX we need minimum 2 vmemmap



> +		 * pages. See layout diagram in Documentation/mm/vmemmap_dedup.rst
> +		 */
> +		return is_power_of_2(sizeof(struct page)) &&
> +			(nr_vmemmap_pages > VMEMMAP_RESERVE_NR) && !altmap;
> +	}

It would be more readable (i.e. less identation) if you just reverse this:

	unsigned long nr_vmemmap_pages;

	if (!pgmap || !is_power_of_2(sizeof(struct page))
		return false;

	nr_vmemmap_pages = ((pgmap_vmemmap_nr(pgmap) *
			     sizeof(struct page)) >> PAGE_SHIFT);

	/*
	 * For vmemmap optimization with DAX we need minimum 2 vmemmap
	 * pages. See layout diagram in Documentation/mm/vmemmap_dedup.rst
	 */
	return (nr_vmemmap_pages > VMEMMAP_RESERVE_NR) && !altmap;


> +	return false;
>  }
> +/*
> + * If we don't have an architecture override, use the generic rule
> + */
> +#ifndef vmemmap_can_optimize
> +#define vmemmap_can_optimize __vmemmap_can_optimize
> +#endif
> +

sparse-vmemmap code is trivial to change to use dedup a single vmemmap page
(e.g. to align with hugetlb), hopefully the architecture override to do. this is
to say whether VMEMMAP_RESERVE_NR should have similar to above?

>  #else
>  static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap,
>  					   struct dev_pagemap *pgmap)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 7f7f9c677854..d1676afc94f1 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1020,7 +1020,7 @@ static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
>  	if (!vmemmap_can_optimize(altmap, pgmap))
>  		return pgmap_vmemmap_nr(pgmap);
>  
> -	return 2 * (PAGE_SIZE / sizeof(struct page));
> +	return VMEMMAP_RESERVE_NR * (PAGE_SIZE / sizeof(struct page));
>  }
>  
>  static void __ref memmap_init_compound(struct page *head,

  reply	other threads:[~2023-06-20 11:54 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-16 11:08 [PATCH v2 00/16] Add support for DAX vmemmap optimization for ppc64 Aneesh Kumar K.V
2023-06-16 11:08 ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 01/16] powerpc/mm/book3s64: Use pmdp_ptep helper instead of typecasting Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 02/16] powerpc/book3s64/mm: mmu_vmemmap_psize is used by radix Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 03/16] powerpc/book3s64/mm: Fix DirectMap stats in /proc/meminfo Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 04/16] powerpc/book3s64/mm: Use PAGE_KERNEL instead of opencoding Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 05/16] powerpc/mm/dax: Fix the condition when checking if altmap vmemap can cross-boundary Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 06/16] mm/hugepage pud: Allow arch-specific helper function to check huge page pud support Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 07/16] mm: Change pudp_huge_get_and_clear_full take vm_area_struct as arg Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 08/16] mm/vmemmap: Improve vmemmap_can_optimize and allow architectures to override Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-20 11:53   ` Joao Martins [this message]
2023-06-20 11:53     ` Joao Martins
2023-06-20 14:29     ` Aneesh Kumar K.V
2023-06-20 14:29       ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 09/16] mm/vmemmap: Allow architectures to override how vmemmap optimization works Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 10/16] mm: Add __HAVE_ARCH_PUD_SAME similar to __HAVE_ARCH_P4D_SAME Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 11/16] mm/huge pud: Use transparent huge pud helpers only with CONFIG_TRANSPARENT_HUGEPAGE Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 12/16] mm/vmemmap optimization: Split hugetlb and devdax vmemmap optimization Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-28  1:09   ` Ritesh Harjani
2023-06-28  1:09     ` Ritesh Harjani
2023-06-28  3:01     ` Aneesh Kumar K V
2023-06-28  3:01       ` Aneesh Kumar K V
2023-06-16 11:08 ` [PATCH v2 13/16] powerpc/book3s64/mm: Enable transparent pud hugepage Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-28  1:23   ` Ritesh Harjani
2023-06-28  1:23     ` Ritesh Harjani
2023-06-28  3:32     ` Aneesh Kumar K V
2023-06-28  3:32       ` Aneesh Kumar K V
2023-06-16 11:08 ` [PATCH v2 14/16] powerpc/book3s64/vmemmap: Switch radix to use a different vmemmap handling function Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-28  1:33   ` Ritesh Harjani
2023-06-28  1:33     ` Ritesh Harjani
2023-06-28  3:37     ` Aneesh Kumar K V
2023-06-28  3:37       ` Aneesh Kumar K V
2023-06-16 11:08 ` [PATCH v2 15/16] powerpc/book3s64/radix: Add support for vmemmap optimization for radix Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-16 11:08 ` [PATCH v2 16/16] powerpc/book3s64/radix: Remove mmu_vmemmap_psize Aneesh Kumar K.V
2023-06-16 11:08   ` Aneesh Kumar K.V
2023-06-18 11:54 ` [PATCH v2 00/16] Add support for DAX vmemmap optimization for ppc64 Sachin Sant
2023-06-18 11:54   ` Sachin Sant
2023-06-24 14:52 ` Aneesh Kumar K.V
2023-06-24 14:52   ` Aneesh Kumar K.V
2023-06-24 17:22   ` Andrew Morton
2023-06-24 17:22     ` Andrew Morton
2023-07-03  5:26 ` (subset) " Michael Ellerman
2023-07-03  5:26   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed1057ce-2d8d-1053-9f54-2801cfed9de4@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dan.j.williams@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mpe@ellerman.id.au \
    --cc=muchun.song@linux.dev \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.