Hi,
This patch series addresses the VM_READ_CAPS/VM_WRITE_CAPS flags issue: https://git.morello-project.org/morello/kernel/linux/-/issues/36
io_uring system uses buffers shared with userspace to read the io events and report their results. The structs that populate the submission and completion queues can contain capabilities. Shared mappings don't have the Load/Store capabilities permission to avoid leaking capabilities outside their original address space, so add two new VM flags that would allow the kernel to set up such mappings.
While at it, also fix pte_modify to allow setting PTE_*_CAPS flags, add new the new rc/wc smaps flags, and remove the automatic addition of PTE_*_CAPS to user mappings.
To note: this wouldn't allow userspace to make arbitrary shared mappings with tag access, the new VM flags would be for internal use only for the time being.
Review branch: https://git.morello-project.org/tudcre01/linux/-/commits/vm_rw_caps_v1/
Thanks, Tudor
--- Tudor Cretu (3): arm64: pgtable: Allow PTE_*_CAPS in the pte_modify mask arm64: mm: Add VM_READ_CAPS and VM_WRITE_CAPS flags arm64: mm: Use VM_*_CAPS instead of enabling PTE_*_CAPS to default user mappings
arch/arm64/Kconfig | 1 + arch/arm64/include/asm/mman.h | 19 +++++++++++++++++-- arch/arm64/include/asm/page.h | 2 +- arch/arm64/include/asm/pgtable-prot.h | 12 +++++------- arch/arm64/include/asm/pgtable.h | 2 +- fs/proc/task_mmu.c | 4 ++++ include/linux/mm.h | 9 +++++++++ 7 files changed, 38 insertions(+), 11 deletions(-)
Don't exclude PTE_READ_CAPS and PTE_WRITE_CAPS when calling pte_modify.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- arch/arm64/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index dff2b483ea50..642f561e55a9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -761,7 +761,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY | PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP | - PTE_ATTRINDX_MASK; + PTE_ATTRINDX_MASK | PTE_LOAD_CAPS | PTE_STORE_CAPS; /* preserve the hardware dirty information */ if (pte_hw_dirty(pte)) pte = pte_mkdirty(pte);
On 02/11/2022 12:08, Tudor Cretu wrote:
Don't exclude PTE_READ_CAPS and PTE_WRITE_CAPS when calling pte_modify.
I would be good to explain why, as this is just saying what the diff is. In fact, I am not entirely sure I understand why we need this now (and we didn't need it before).
Kevin
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
arch/arm64/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index dff2b483ea50..642f561e55a9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -761,7 +761,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY | PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
PTE_ATTRINDX_MASK;
/* preserve the hardware dirty information */ if (pte_hw_dirty(pte)) pte = pte_mkdirty(pte);PTE_ATTRINDX_MASK | PTE_LOAD_CAPS | PTE_STORE_CAPS;
On 11-11-2022 10:12, Kevin Brodsky wrote:
On 02/11/2022 12:08, Tudor Cretu wrote:
Don't exclude PTE_READ_CAPS and PTE_WRITE_CAPS when calling pte_modify.
I would be good to explain why, as this is just saying what the diff is. In fact, I am not entirely sure I understand why we need this now (and we didn't need it before).
Indeed, I was not clear about this patch. I didn't find any issue that requires this patch now, it's just a fix that I have found while looking at changes need for MTE. pte_modify is used in change_pte_range, which by backtracking a bit, it looks to be used in a fair number of contexts. I think PTE_LOAD_CAPS and PTE_STORE_CAPS should remain part of the pteval if such a function is applied.
I don't have a strong opinion on merging this patch though, so if you think it's not useful, I'm happy to drop it in v2.
Thanks, Tudor
Kevin
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
arch/arm64/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index dff2b483ea50..642f561e55a9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -761,7 +761,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY | PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
PTE_ATTRINDX_MASK;
/* preserve the hardware dirty information */ if (pte_hw_dirty(pte)) pte = pte_mkdirty(pte);PTE_ATTRINDX_MASK | PTE_LOAD_CAPS | PTE_STORE_CAPS;
On 18/11/2022 11:01, Tudor Cretu wrote:
On 11-11-2022 10:12, Kevin Brodsky wrote:
On 02/11/2022 12:08, Tudor Cretu wrote:
Don't exclude PTE_READ_CAPS and PTE_WRITE_CAPS when calling pte_modify.
I would be good to explain why, as this is just saying what the diff is. In fact, I am not entirely sure I understand why we need this now (and we didn't need it before).
Indeed, I was not clear about this patch. I didn't find any issue that requires this patch now, it's just a fix that I have found while looking at changes need for MTE. pte_modify is used in change_pte_range, which by backtracking a bit, it looks to be used in a fair number of contexts.
Right I suspected this was to align with MTE. AFAICT change_pte_range() is ultimately only used by mprotect() and related operations. Having looked a little bit more, I have a better idea of what happens now: without this change, a change in VM_*_CAPS during mprotect() would simply be ignored (because the value of PTE_*_CAPS is always preserved as-is).
Here is my concern: in this series, VM_*_CAPS still cannot be changed by mprotect(), specifically the value of newflags calculated in do_mprotect_pkey() will always have its VM_*_CAPS fields taken directly from vma->vm_flags. So it seems to me a bit inconsistent to allow updating PTE_*_CAPS while VM_*_CAPS still cannot be changed (that would require updating arch_calc_vm_prot_bits() and probably other bits too).
It remains the case that this patch is perfectly fine, I would just prefer to get it in at the same time as allowing changes to VM_*_CAPS.
Kevin
I think PTE_LOAD_CAPS and PTE_STORE_CAPS should remain part of the pteval if such a function is applied.
I don't have a strong opinion on merging this patch though, so if you think it's not useful, I'm happy to drop it in v2.
Thanks, Tudor
Kevin
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
arch/arm64/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index dff2b483ea50..642f561e55a9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -761,7 +761,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY | PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP | - PTE_ATTRINDX_MASK; + PTE_ATTRINDX_MASK | PTE_LOAD_CAPS | PTE_STORE_CAPS; /* preserve the hardware dirty information */ if (pte_hw_dirty(pte)) pte = pte_mkdirty(pte);
Some systems (e.g. io_uring) need to load/store capabilities on buffers shared with the userspace. Shared mappings don't have load/store capabilities permissions by default, so add two new VM flags that would allow to set up such mappings.
Note: this wouldn't allow userspace to make arbitrary shared mappings with tag access as the flags are not exposed; the new VM flags would be for internal use only.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/mman.h | 6 ++++++ fs/proc/task_mmu.c | 4 ++++ include/linux/mm.h | 9 +++++++++ 4 files changed, 20 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c784d8664a40..e8e6b0f21a91 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1971,6 +1971,7 @@ config ARM64_MORELLO depends on CC_HAS_MORELLO select ARCH_NO_SWAP select ARCH_HAS_CHERI_CAPABILITIES + select ARCH_USES_HIGH_VMA_FLAGS help The Morello architecture is an experimental extension to Armv8.2-A, which extends the AArch64 state with the principles proposed in diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index e3e28f7daf62..eb0b862121a2 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -55,6 +55,12 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) if (vm_flags & VM_MTE) prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
+ if (vm_flags & VM_READ_CAPS) + prot |= PTE_LOAD_CAPS; + + if (vm_flags & VM_WRITE_CAPS) + prot |= PTE_STORE_CAPS; + return __pgprot(prot); } #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f46060eb91b5..4f56772da016 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -697,6 +697,10 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_ARM64_MORELLO + [ilog2(VM_READ_CAPS)] = "rc", + [ilog2(VM_WRITE_CAPS)] = "wc", +#endif }; size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h index 9b7b730db4e9..2634ce86ba78 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -368,6 +368,15 @@ extern unsigned int kobjsize(const void *objp); # define VM_UFFD_MINOR VM_NONE #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
+#ifdef CONFIG_ARM64_MORELLO +# define VM_READ_CAPS VM_HIGH_ARCH_2 /* Permit capability loads */ +# define VM_WRITE_CAPS VM_HIGH_ARCH_3 /* Permit capability stores */ +#else +# define VM_READ_CAPS VM_NONE +# define VM_WRITE_CAPS VM_NONE +#endif +#define VM_RW_CAPS (VM_READ_CAPS | VM_WRITE_CAPS) + /* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
On 02/11/2022 12:08, Tudor Cretu wrote:
Some systems (e.g. io_uring) need to load/store capabilities on buffers shared with the userspace. Shared mappings don't have load/store capabilities permissions by default, so add two new VM flags that would allow to set up such mappings.
Note: this wouldn't allow userspace to make arbitrary shared mappings with
Nit: no need for conditional ("would") here, the patch makes this change so there is no uncertainty :)
tag access as the flags are not exposed; the new VM flags would be for internal use only.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
arch/arm64/Kconfig | 1 + arch/arm64/include/asm/mman.h | 6 ++++++ fs/proc/task_mmu.c | 4 ++++ include/linux/mm.h | 9 +++++++++ 4 files changed, 20 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c784d8664a40..e8e6b0f21a91 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1971,6 +1971,7 @@ config ARM64_MORELLO depends on CC_HAS_MORELLO select ARCH_NO_SWAP select ARCH_HAS_CHERI_CAPABILITIES
- select ARCH_USES_HIGH_VMA_FLAGS help The Morello architecture is an experimental extension to Armv8.2-A, which extends the AArch64 state with the principles proposed in
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index e3e28f7daf62..eb0b862121a2 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -55,6 +55,12 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) if (vm_flags & VM_MTE) prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
- if (vm_flags & VM_READ_CAPS)
prot |= PTE_LOAD_CAPS;
- if (vm_flags & VM_WRITE_CAPS)
prot |= PTE_STORE_CAPS;
- return __pgprot(prot);
} #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f46060eb91b5..4f56772da016 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -697,6 +697,10 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_ARM64_MORELLO
[ilog2(VM_READ_CAPS)] = "rc",
[ilog2(VM_WRITE_CAPS)] = "wc",
As mentioned in a comment above the array, the documentation should be updated accordingly (Documentation/filesystems/proc.rst).
+#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 9b7b730db4e9..2634ce86ba78 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -368,6 +368,15 @@ extern unsigned int kobjsize(const void *objp); # define VM_UFFD_MINOR VM_NONE #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_ARM64_MORELLO +# define VM_READ_CAPS VM_HIGH_ARCH_2 /* Permit capability loads */ +# define VM_WRITE_CAPS VM_HIGH_ARCH_3 /* Permit capability stores */ +#else +# define VM_READ_CAPS VM_NONE +# define VM_WRITE_CAPS VM_NONE +#endif
Nit: I think it would make a bit more sense to have this after the MTE block (though really there isn't that much logic in the order of the existing blocks).
+#define VM_RW_CAPS (VM_READ_CAPS | VM_WRITE_CAPS)
I think we can do without this one. It's used just once in the next patch, and you already explicitly use VM_READ_CAPS | VM_WRITE_CAPS in another place.
Kevin
/* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
Don't enable PTE_*_CAPS to default user mappings in pgtable-prot.h. Instead, add VM_{READ,WRITE}_CAPS to all private user mappings. This ensures that the rc/wc smaps flags are set. The instances where the VM_*_CAPS flags are added by default are: - the stack and brk area by updating the VM_DATA_DEFAULT_FLAGS flags - private user mappings created by userspace through mmap()
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- arch/arm64/include/asm/mman.h | 13 +++++++++++-- arch/arm64/include/asm/page.h | 2 +- arch/arm64/include/asm/pgtable-prot.h | 12 +++++------- 3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index eb0b862121a2..88b0be497231 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -23,15 +23,24 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) { + unsigned long ret = 0; + /* * Only allow MTE on anonymous mappings as these are guaranteed to be * backed by tags-capable memory. The vm_flags may be overridden by a * filesystem supporting MTE (RAM-based). */ if (system_supports_mte() && (flags & MAP_ANONYMOUS)) - return VM_MTE_ALLOWED; + ret |= VM_MTE_ALLOWED; + + /* + * Allow capability tag access to private mappings as they don't pose + * the risk of leaking capabilities outside their original address-space. + */ + if (system_supports_morello() && ((flags & MAP_TYPE) == 0x02 /* MAP_PRIVATE */)) + ret |= VM_READ_CAPS | VM_WRITE_CAPS;
- return 0; + return ret; } #define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 993a27ea6f54..1c3ed7017d86 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -47,7 +47,7 @@ int pfn_is_map_memory(unsigned long pfn);
#endif /* !__ASSEMBLY__ */
-#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED) +#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED | VM_RW_CAPS)
#include <asm-generic/getorder.h>
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 5463f9bc0602..ab746df170e3 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -90,11 +90,9 @@ extern bool arm64_use_ng_mappings; #define PAGE_NONE __pgprot(((_USER_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_UXN) /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */ #define PAGE_SHARED __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN | PTE_WRITE) -#define PAGE_SHARED_RO __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN) -#define PAGE_SHARED_RO_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY) #define PAGE_SHARED_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_WRITE) -#define PAGE_READONLY __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_MAYBE_LS_CAPS | PTE_UXN) -#define PAGE_READONLY_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_MAYBE_LS_CAPS) +#define PAGE_READONLY __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN) +#define PAGE_READONLY_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY) #define PAGE_EXECONLY __pgprot(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN)
#define __P000 PAGE_NONE @@ -107,11 +105,11 @@ extern bool arm64_use_ng_mappings; #define __P111 PAGE_READONLY_EXEC
#define __S000 PAGE_NONE -#define __S001 PAGE_SHARED_RO +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED #define __S011 PAGE_SHARED -#define __S100 PAGE_SHARED_RO_EXEC /* PAGE_EXECONLY if Enhanced PAN */ -#define __S101 PAGE_SHARED_RO_EXEC +#define __S100 PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */ +#define __S101 PAGE_READONLY_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC
Commit title: this is quite long (80 char). The ideal target is 50 char, though it's often hard to be that short. Keeping it under 75 char is highly recommended though.
I also forgot to mention on the other patches, it's not a hard rule but in general the arm64 patches that are clearly specific to an extension are tagged as such, so all patches in that series could be "arm64: morello:" (this is the case for all the original Morello enablement patches).
Putting that together, the title for this one could be for instance "arm64: morello: Explicitly add VM_*_CAPS to private user mappings" (the fact that it replaces the hardcoding of PTE_*_CAPS in pgtable-prot.h is probably too much for the commit title).
On 02/11/2022 12:08, Tudor Cretu wrote:
Don't enable PTE_*_CAPS to default user mappings in pgtable-prot.h.
s/enable/add/?
Instead, add VM_{READ,WRITE}_CAPS to all private user mappings. This ensures that the rc/wc smaps flags are set. The instances where the VM_*_CAPS flags are added by default are:
- the stack and brk area by updating the VM_DATA_DEFAULT_FLAGS flags
- private user mappings created by userspace through mmap()
Ah, but what about mappings created by the kernel itself during execve()? I think this is already taken care of in fact as calc_vm_flag_bits() is used by do_mmap(), which is the implementation both of the syscall and vm_mmap() used for instance by binfmt_elf.c. Still, worth clarifying that this also works for standard mappings created by the kernel.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
arch/arm64/include/asm/mman.h | 13 +++++++++++-- arch/arm64/include/asm/page.h | 2 +- arch/arm64/include/asm/pgtable-prot.h | 12 +++++------- 3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index eb0b862121a2..88b0be497231 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -23,15 +23,24 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) {
- unsigned long ret = 0;
- /*
*/ if (system_supports_mte() && (flags & MAP_ANONYMOUS))
- Only allow MTE on anonymous mappings as these are guaranteed to be
- backed by tags-capable memory. The vm_flags may be overridden by a
- filesystem supporting MTE (RAM-based).
return VM_MTE_ALLOWED;
ret |= VM_MTE_ALLOWED;
- /*
* Allow capability tag access to private mappings as they don't pose
s/to/for/
Since we are now doing the right thing by adding VM flags to enable tag access, could you add a TODO regarding file-backed mappings? What the MTE comment says above applies pretty much exactly to Morello too, but we really need to have tag access in certain private file-backed mappings so we took a shortcut. Shared file-backed mappings are obviously a problem (what if the backing storage can't hold tags?), but private ones can also be an issue, notably because of DAX (see Documentation/filesystems/dax.rst). In other words, we shouldn't blindly add the VM flags to any file-backed mapping, it should only be done when supported, which might required an ALLOWED flag like for MTE. Anyway, a short TODO here would be nice so that people don't think we missed that :)
* the risk of leaking capabilities outside their original address-space.
*/
- if (system_supports_morello() && ((flags & MAP_TYPE) == 0x02 /* MAP_PRIVATE */))
Could we potentially include <uapi/linux/mman.h> instead of <uapi/asm/mman.h> to get MAP_PRIVATE, or is the header soup too bad for that?
ret |= VM_READ_CAPS | VM_WRITE_CAPS;
- return 0;
- return ret;
} #define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 993a27ea6f54..1c3ed7017d86 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -47,7 +47,7 @@ int pfn_is_map_memory(unsigned long pfn); #endif /* !__ASSEMBLY__ */ -#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED) +#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED | VM_RW_CAPS) #include <asm-generic/getorder.h> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 5463f9bc0602..ab746df170e3 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -90,11 +90,9 @@ extern bool arm64_use_ng_mappings; #define PAGE_NONE __pgprot(((_USER_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_UXN) /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */ #define PAGE_SHARED __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN | PTE_WRITE) -#define PAGE_SHARED_RO __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN) -#define PAGE_SHARED_RO_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY) #define PAGE_SHARED_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_WRITE) -#define PAGE_READONLY __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_MAYBE_LS_CAPS | PTE_UXN) -#define PAGE_READONLY_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_MAYBE_LS_CAPS) +#define PAGE_READONLY __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN) +#define PAGE_READONLY_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY) #define PAGE_EXECONLY __pgprot(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN) #define __P000 PAGE_NONE @@ -107,11 +105,11 @@ extern bool arm64_use_ng_mappings; #define __P111 PAGE_READONLY_EXEC #define __S000 PAGE_NONE -#define __S001 PAGE_SHARED_RO +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED #define __S011 PAGE_SHARED -#define __S100 PAGE_SHARED_RO_EXEC /* PAGE_EXECONLY if Enhanced PAN */ -#define __S101 PAGE_SHARED_RO_EXEC +#define __S100 PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */ +#define __S101 PAGE_READONLY_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC
I think it would be nice to mention in the commit message that this is effectively a partial revert of "arm64: morello: Enable access to capabilities in memory" (reverting the addition of the PTE flags to the private user mappings).
Kevin
On 11-11-2022 12:45, Kevin Brodsky wrote:
Commit title: this is quite long (80 char). The ideal target is 50 char, though it's often hard to be that short. Keeping it under 75 char is highly recommended though.
I also forgot to mention on the other patches, it's not a hard rule but in general the arm64 patches that are clearly specific to an extension are tagged as such, so all patches in that series could be "arm64: morello:" (this is the case for all the original Morello enablement patches).
Putting that together, the title for this one could be for instance "arm64: morello: Explicitly add VM_*_CAPS to private user mappings" (the fact that it replaces the hardcoding of PTE_*_CAPS in pgtable-prot.h is probably too much for the commit title).
On 02/11/2022 12:08, Tudor Cretu wrote:
Don't enable PTE_*_CAPS to default user mappings in pgtable-prot.h.
s/enable/add/?
Ack.
Instead, add VM_{READ,WRITE}_CAPS to all private user mappings. This ensures that the rc/wc smaps flags are set. The instances where the VM_*_CAPS flags are added by default are:
- the stack and brk area by updating the VM_DATA_DEFAULT_FLAGS flags
- private user mappings created by userspace through mmap()
Ah, but what about mappings created by the kernel itself during execve()? I think this is already taken care of in fact as calc_vm_flag_bits() is used by do_mmap(), which is the implementation both of the syscall and vm_mmap() used for instance by binfmt_elf.c. Still, worth clarifying that this also works for standard mappings created by the kernel.
Ack.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
arch/arm64/include/asm/mman.h | 13 +++++++++++-- arch/arm64/include/asm/page.h | 2 +- arch/arm64/include/asm/pgtable-prot.h | 12 +++++------- 3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index eb0b862121a2..88b0be497231 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -23,15 +23,24 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) {
- unsigned long ret = 0;
- /*
*/ if (system_supports_mte() && (flags & MAP_ANONYMOUS))
- Only allow MTE on anonymous mappings as these are guaranteed to be
- backed by tags-capable memory. The vm_flags may be overridden by a
- filesystem supporting MTE (RAM-based).
return VM_MTE_ALLOWED;
ret |= VM_MTE_ALLOWED;
- /*
* Allow capability tag access to private mappings as they don't pose
s/to/for/
Ack.
Since we are now doing the right thing by adding VM flags to enable tag access, could you add a TODO regarding file-backed mappings? What the MTE comment says above applies pretty much exactly to Morello too, but we really need to have tag access in certain private file-backed mappings so we took a shortcut. Shared file-backed mappings are obviously a problem (what if the backing storage can't hold tags?), but private ones can also be an issue, notably because of DAX (see Documentation/filesystems/dax.rst). In other words, we shouldn't blindly add the VM flags to any file-backed mapping, it should only be done when supported, which might required an ALLOWED flag like for MTE. Anyway, a short TODO here would be nice so that people don't think we missed that :)
Ack.
* the risk of leaking capabilities outside their original address-space.
*/
- if (system_supports_morello() && ((flags & MAP_TYPE) == 0x02 /* MAP_PRIVATE */))
Could we potentially include <uapi/linux/mman.h> instead of <uapi/asm/mman.h> to get MAP_PRIVATE, or is the header soup too bad for that?
uapi/linux/mman.h includes asm/mman.h (i.e. this file). Also, uapi/asm/mman.h is needed for BTI and MTE. I think I could move the implementations out of the headers and include everything properly, but they wouldn't be inline anymore...
ret |= VM_READ_CAPS | VM_WRITE_CAPS;
- return 0;
- return ret; } #define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 993a27ea6f54..1c3ed7017d86 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -47,7 +47,7 @@ int pfn_is_map_memory(unsigned long pfn); #endif /* !__ASSEMBLY__ */ -#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED) +#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED | VM_RW_CAPS) #include <asm-generic/getorder.h> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 5463f9bc0602..ab746df170e3 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -90,11 +90,9 @@ extern bool arm64_use_ng_mappings; #define PAGE_NONE __pgprot(((_USER_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_UXN) /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */ #define PAGE_SHARED __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN | PTE_WRITE) -#define PAGE_SHARED_RO __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN) -#define PAGE_SHARED_RO_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY) #define PAGE_SHARED_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_WRITE) -#define PAGE_READONLY __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_MAYBE_LS_CAPS | PTE_UXN) -#define PAGE_READONLY_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_MAYBE_LS_CAPS) +#define PAGE_READONLY __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY | PTE_UXN) +#define PAGE_READONLY_EXEC __pgprot(_USER_PAGE_DEFAULT | PTE_RDONLY) #define PAGE_EXECONLY __pgprot(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN) #define __P000 PAGE_NONE @@ -107,11 +105,11 @@ extern bool arm64_use_ng_mappings; #define __P111 PAGE_READONLY_EXEC #define __S000 PAGE_NONE -#define __S001 PAGE_SHARED_RO +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED #define __S011 PAGE_SHARED -#define __S100 PAGE_SHARED_RO_EXEC /* PAGE_EXECONLY if Enhanced PAN */ -#define __S101 PAGE_SHARED_RO_EXEC +#define __S100 PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */ +#define __S101 PAGE_READONLY_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC
I think it would be nice to mention in the commit message that this is effectively a partial revert of "arm64: morello: Enable access to capabilities in memory" (reverting the addition of the PTE flags to the private user mappings).
Ack.
Thanks, Tudor
Kevin
On 18/11/2022 13:20, Tudor Cretu wrote:
+ * the risk of leaking capabilities outside their original address-space. + */ + if (system_supports_morello() && ((flags & MAP_TYPE) == 0x02 /* MAP_PRIVATE */))
Could we potentially include <uapi/linux/mman.h> instead of <uapi/asm/mman.h> to get MAP_PRIVATE, or is the header soup too bad for that?
uapi/linux/mman.h includes asm/mman.h (i.e. this file). Also, uapi/asm/mman.h is needed for BTI and MTE. I think I could move the implementations out of the headers and include everything properly, but they wouldn't be inline anymore...
Wow, I had never realised that uapi headers can end up including non-uapi headers when building the kernel... Then yes indeed there's too much header soup, let's leave it like that.
Kevin
On 02/11/2022 12:08, Tudor Cretu wrote:
Hi,
This patch series addresses the VM_READ_CAPS/VM_WRITE_CAPS flags issue: https://git.morello-project.org/morello/kernel/linux/-/issues/36
io_uring system uses buffers shared with userspace to read the io events and report their results. The structs that populate the submission and completion queues can contain capabilities. Shared mappings don't have the Load/Store capabilities permission to avoid leaking capabilities outside their original address space, so add two new VM flags that would allow the kernel to set up such mappings.
While at it, also fix pte_modify to allow setting PTE_*_CAPS flags, add new the new rc/wc smaps flags, and remove the automatic addition of PTE_*_CAPS to user mappings.
I was just looking at morello.c and realised that access_remote_cap() hardcodes the fact that private mappings (and only those) are tagged. Would be a good improvement to change that to check VM_WRITE_CAPS instead in v2 :)
Kevin
To note: this wouldn't allow userspace to make arbitrary shared mappings with tag access, the new VM flags would be for internal use only for the time being.
Review branch: https://git.morello-project.org/tudcre01/linux/-/commits/vm_rw_caps_v1/
Thanks, Tudor
Tudor Cretu (3): arm64: pgtable: Allow PTE_*_CAPS in the pte_modify mask arm64: mm: Add VM_READ_CAPS and VM_WRITE_CAPS flags arm64: mm: Use VM_*_CAPS instead of enabling PTE_*_CAPS to default user mappings
arch/arm64/Kconfig | 1 + arch/arm64/include/asm/mman.h | 19 +++++++++++++++++-- arch/arm64/include/asm/page.h | 2 +- arch/arm64/include/asm/pgtable-prot.h | 12 +++++------- arch/arm64/include/asm/pgtable.h | 2 +- fs/proc/task_mmu.c | 4 ++++ include/linux/mm.h | 9 +++++++++ 7 files changed, 38 insertions(+), 11 deletions(-)
linux-morello@op-lists.linaro.org