On 17/10/2022 19:09, Tudor Cretu wrote:
Some systems (e.g. io_uring) need to load/store capabilities on buffers shared with the userspace. Shared mappings don't have load/store capabilities permissions by default, so add two new VM flags that would allow to set up such mappings.
Note: this wouldn't allow userspace to make arbitrary shared mappings with tag access as the flags are not exposed; the new VM flags would be for internal use only.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
arch/arm64/Kconfig | 1 + arch/arm64/include/asm/mman.h | 6 ++++++ fs/proc/task_mmu.c | 4 ++++ include/linux/mm.h | 10 ++++++++++ 4 files changed, 21 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c784d8664a40..e8e6b0f21a91 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1971,6 +1971,7 @@ config ARM64_MORELLO depends on CC_HAS_MORELLO select ARCH_NO_SWAP select ARCH_HAS_CHERI_CAPABILITIES
- select ARCH_USES_HIGH_VMA_FLAGS help The Morello architecture is an experimental extension to Armv8.2-A, which extends the AArch64 state with the principles proposed in
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index e3e28f7daf62..eb0b862121a2 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -55,6 +55,12 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) if (vm_flags & VM_MTE) prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
- if (vm_flags & VM_READ_CAPS)
prot |= PTE_LOAD_CAPS;
- if (vm_flags & VM_WRITE_CAPS)
prot |= PTE_STORE_CAPS;
- return __pgprot(prot); } #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f46060eb91b5..4f56772da016 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -697,6 +697,10 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_ARM64_MORELLO
[ilog2(VM_READ_CAPS)] = "rc",
[ilog2(VM_WRITE_CAPS)] = "wc",
+#endif
This is nice, people have been asking for exactly this before :)
}; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 9b7b730db4e9..67247211aefa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -368,6 +368,16 @@ extern unsigned int kobjsize(const void *objp); # define VM_UFFD_MINOR VM_NONE #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_ARM64_MORELLO +# define VM_READ_CAPS_BIT 38 +# define VM_WRITE_CAPS_BIT 39
These are arch-specific (at least in the current form) so you should use VM_HIGH_ARCH_*, in fact I see you have already selected ARCH_USES_HIGH_VMA_FLAGS above. MTE uses 0/1 but 2/3 seem to be available still (anyway ARM64_MTE and ARM64_MORELLO are mutually exclusive options).
+# define VM_READ_CAPS BIT(VM_READ_CAPS_BIT) +# define VM_WRITE_CAPS BIT(VM_WRITE_CAPS_BIT) +#else +# define VM_READ_CAPS VM_NONE +# define VM_WRITE_CAPS VM_NONE +#endif
- /* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
This patch is a good start, but I would rather we didn't stop halfway through. With just this patch, VM_{READ,WRITE}_CAP is only present when explicitly requested when creating the mapping. This means that the private mappings don't have them, although they all have capability access enabled by virtue of d054e88a4994 ("arm64: morello: Enable access to capabilities in memory"). It would be much nicer if we also removed this magic addition of PTE_*_CAPS to user mappings from pgtable-prot.h and instead made sure that VM_{READ,WRITE}_CAP are automatically added to all private user mappings (not only because this is cleaner, but more importantly because the new rc/wc smaps flags you're adding won't be set otherwise). This clearly belongs to at least one new commit, I'm not sure exactly how many places need to be changed (this needs to apply to all user mappings whether they are created by userspace through mmap(), or directly by the kernel itself, typically during execve()).
Depending on the amount of changes required, it might make sense to split all this into a new series. If not, make sure to move all the commits to the beginning of the series as they are fundamental groundwork for the rest of the series.
Kevin