Hi All,
This patch series introduces the mm reservation interface to manage the owning capability of the allocated addresses. This series adds reservation details in the VMA structure and different capability/reservation constraint checks. Looking for feedback regarding API names, directory structure etc.
Details about several rules implemented can be found in PCuABI spec here [1].
This series is based on tree [2].
Changes in this v2 as compared with v1(Based on suggestions from Kevin):
1) Separated the user pointer related helpers from reservation helpers and added them in lib/user_ptr.c. 2) Added new helpers user_ptr_is_valid() and user_ptr_set_addr() to reduce CONFIG_CHERI_PURECAP_UABI ifdefs. 3) Fixed max gap issues in unmapped_area_topdown(). 4) Dropped patch "mm,fs: Use address as user_uintptr_t in generic get_unmapped_area()". However, due to this get_unmapped_area() cannot be called for MAP_FIXED type valid capabilities. A special sanity check function is created vm_area_range_within_limit() and is to be used for sanity checks in those cases. 5) Some fixes regarding adding reservation details in VMA merging/expansions. 6) A new patch "fs/binfmt_elf: Add PCuABI reservation constraints" is added in this series to demostrate the use of API reserv_range_set_reserv() and kernel mapping functions vm_mmap() and vm_munmap(). 7) Some code fixes and cleanups as suggested by Kevin.
Future works:
1) Users of vm_mmap/vm_munmap() i.e. filesystems, vdso, exec stack to be modified to preserve capability addresses. 2) Cover remaining memory addressing syscalls.
Testing:
1) All tests by Chaitanya in v3 selftests [3] passes. 2) Purecap/Compat Busybox boot passes after adding [WIP] patches present in [4].
The whole series can be found here [4].
[1]: https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca... [2]: https://git.morello-project.org/morello/kernel/linux morello/next [3]: https://git.morello-project.org/chaitanya_prakash/linux.git review/purecap_mmap_testcases_v8 [4]: https://git.morello-project.org/amitdaniel/linux.git review/purecap_mm_reservation_v2
Thanks, Amit Daniel
Amit Daniel Kachhap (22): uapi: errno.h: Introduce PCuABI memory reservation error linux/sched/coredump.h: Add MMF_PCUABI_RESERV mm flag mm/cap_addr_mgmt: Add capability reservation interfaces in VMA linux/user_ptr.h: Add two helpers to operate on user pointers lib/user_ptr: Add helpers to be used by mm syscalls mm/(mmap,mremap): Modify unmapped address space management code mm: Add and use PCuABI reservation during VMA operation mm/mmap: Add reservation constraints in mmap/munmap parameters mm/mremap: Add reservation constraints in mremap syscall mm/mprotect: Add the PCuABI reservation constraints mm/madvise: Add the PCuABI reservation constraints mm/mlock: Add the PCuABI reservation constraints mm/msync: Add the PCuABI reservation constraints mm/mmap: Disable MAP_GROWSDOWN mapping flag for PCuABI uapi: mman-common.h: Macros for maximum capability permissions lib/user_ptr: Add user pointer permission helpers for PCuABI arm64: user_ptr: Implement morello capability permission helpers mm/mmap: Add capability permission constraints for PCuABI mm/mremap: Add capability permission constraints for PCuABI mm/mprotect: Add capability permission constraints for PCuABI mm/mincore: Add PCuABI reservation/capability constraints fs/binfmt_elf: Add PCuABI reservation constraints
Documentation/core-api/user_ptr.rst | 28 ++++ arch/Kconfig | 3 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/user_ptr.h | 33 ++++ fs/binfmt_elf.c | 100 ++++++++---- include/linux/cap_addr_mgmt.h | 217 +++++++++++++++++++++++++ include/linux/mm.h | 19 ++- include/linux/mm_types.h | 9 + include/linux/sched/coredump.h | 2 + include/linux/user_ptr.h | 101 ++++++++++++ include/uapi/asm-generic/errno.h | 2 + include/uapi/asm-generic/mman-common.h | 6 + io_uring/advise.c | 2 +- kernel/fork.c | 3 + lib/user_ptr.c | 93 +++++++++++ mm/Makefile | 2 +- mm/cap_addr_mgmt.c | 152 +++++++++++++++++ mm/damon/vaddr.c | 2 +- mm/madvise.c | 26 ++- mm/mincore.c | 46 +++++- mm/mlock.c | 36 +++- mm/mmap.c | 207 +++++++++++++++++++---- mm/mprotect.c | 26 ++- mm/mremap.c | 122 +++++++++++--- mm/msync.c | 13 +- mm/util.c | 16 +- 26 files changed, 1137 insertions(+), 130 deletions(-) create mode 100644 arch/arm64/include/asm/user_ptr.h create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 mm/cap_addr_mgmt.c
PCuABI specification introduces this error and is used to denote any error during managing memory reservations.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/uapi/asm-generic/errno.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h index cf9c51ac49f9..4589a3165fe1 100644 --- a/include/uapi/asm-generic/errno.h +++ b/include/uapi/asm-generic/errno.h @@ -120,4 +120,6 @@
#define EHWPOISON 133 /* Memory page has hardware error */
+#define ERESERVATION 192 /* PCuABI memory reservation error */ + #endif
PCuABI specification introduces memory reservation, so add a flag MMF_PCUABI_RESERV to represent such memory mappings. As memory reservations are mm specific, this flag will help to differentiate between purecap and compat process memory mappings.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/sched/coredump.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 02f5090ffea2..4d42106593a6 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -92,6 +92,8 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_VM_MERGE_ANY 30 #define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
+#define MMF_PCUABI_RESERV 30 /* PCuABI memory reservation feature */ + #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ MMF_VM_MERGE_ANY_MASK)
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
PCuABI specification introduces memory reservation, so add a flag MMF_PCUABI_RESERV to represent such memory mappings. As memory reservations are mm specific, this flag will help to differentiate between purecap and compat process memory mappings.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/sched/coredump.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 02f5090ffea2..4d42106593a6 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -92,6 +92,8 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_VM_MERGE_ANY 30 #define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY) +#define MMF_PCUABI_RESERV 30 /* PCuABI memory reservation feature */
The value clashes with the new MMF_VM_MERGE_ANY flag (above).
Kevin
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ MMF_VM_MERGE_ANY_MASK)
PCuABI needs the address space reservation interfaces to manage the owning capability of the allocated addresses. This interface prevents two unrelated owning capabilities created by the kernel from overlapping.
The reservation interface stores the ranges of different virtual addresses as reservation entries, which is the same as the bound of the capability provided by the kernel to userspace. It also stores the owning capability permissions to manage the future syscall requests for updating permissions.
The reservation interfaces follow a few basic rules:
- Reservations can only be created or destroyed but never expanded or shrunk. Reservations are created when new memory mapping is made outside of an existing reservation. - A single reservation can have many mappings. However, unused regions of the reservation cannot be reused again. - The Reservation start address is aligned to CHERI representable base. - The Reservation length value is aligned to CHERI representable length.
More rules about the address space reservation interface can be found in the PCuABI specification.
This commit introduces API's reserv_vma_set_reserv(), reserv_range_set_reserv(), reserv_vmi_range_mapped(), reserv_vmi_cap_within_reserv(), reserv_vma_cap_within_reserv(), reserv_vma_range_within_reserv(), reserv_is_supported() and reserv_fork(). Here, except reserv_range_set_reserv(), all others involve single VMA. All the above interfaces will be used in different memory management syscalls in subsequent patches.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/cap_addr_mgmt.h | 217 ++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 9 ++ include/linux/user_ptr.h | 5 + mm/Makefile | 2 +- mm/cap_addr_mgmt.c | 152 ++++++++++++++++++++++++ 5 files changed, 384 insertions(+), 1 deletion(-) create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 mm/cap_addr_mgmt.c
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h new file mode 100644 index 000000000000..3cb45e41f36c --- /dev/null +++ b/include/linux/cap_addr_mgmt.h @@ -0,0 +1,217 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_CAP_ADDR_MGMT_H +#define _LINUX_CAP_ADDR_MGMT_H + +#include <linux/cheri.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/mm_types.h> +#include <linux/sched/coredump.h> +#include <linux/types.h> +#include <linux/user_ptr.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI +#define reserv_representable_alignment(len) \ + (test_bit(MMF_PCUABI_RESERV, ¤t->mm->flags) \ + ? ~cheri_representable_alignment_mask(len) : 0) + +#define reserv_representable_base(base, len) \ + (test_bit(MMF_PCUABI_RESERV, ¤t->mm->flags) \ + ? base & cheri_representable_alignment_mask(len) : base) + +#define reserv_representable_length(len) \ + (test_bit(MMF_PCUABI_RESERV, ¤t->mm->flags) \ + ? cheri_representable_length(len) : len) + +#define reserv_vma_reserv_start(vma) \ + (test_bit(MMF_PCUABI_RESERV, &vma->vm_mm->flags) \ + ? vma->reserv_data.reserv_start : 0) + +#define reserv_vma_reserv_len(vma) \ + (test_bit(MMF_PCUABI_RESERV, &vma->vm_mm->flags) \ + ? vma->reserv_data.reserv_len : 0) + +#define reserv_vma_reserv_perm(vma) \ + (test_bit(MMF_PCUABI_RESERV, &vma->vm_mm->flags) \ + ? vma->reserv_data.reserv_perm : 0) + +/** + * reserv_vma_set_reserv() - Sets the reservation details in the VMA for the + * virtual address range from start to (start + len) with perm permission as + * the entry. The start address are stored as CHERI representable base and the + * length as CHERI representable length. They are expected to not interfere + * with the successive VMA. This function should be called with mmap_lock + * held. + * @vma: The VMA pointer to insert the reservation entry. + * @start: Reservation start value. + * @len: Reservation length. + * @perm: Capability permission for the reserved range. + * + * Return: 0 if reservation entry added successfully or negative errorcode + * otherwise. + */ +int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, user_ptr_perms_t perm); + +/** + * reserv_range_set_reserv() - Sets the reservation details across the VMA's + * for the virtual address range from start to (start + len) with the perm + * permission as the entry. The start address is expected to be CHERI + * representable base and the length to be CHERI representable length. + * This function internally uses mmap_lock to synchronize the VMA updates + * if mmap_lock is not already held. + * @start: Reservation start value. + * @len: Reservation length. + * @perm: Capability permission for the reserved range. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: valid capability with bounded range and requested permission or + * negative error code otherwise. + */ +user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, + user_ptr_perms_t perm, bool locked); + +/** + * reserv_vmi_range_mapped() - Searches the reservation interface for + * the virtual address range from start to (start + len). This is useful to + * find if the requested range maps completely and there is no fragmentation. + * This function internally uses mmap_lock to synchronize the VMA updates + * if mmap_lock is not already held. + * @vmi: The VMA iterator pointing at the VMA. + * @start: Virtual address start value. + * @len: Virtual address length. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: 0 if the VMA mapping matches fully with the given range or negative + * error code otherwise. + */ +int reserv_vmi_range_mapped(struct vma_iterator *vmi, ptraddr_t start, + size_t len, bool locked); + +/** + * reserv_vmi_cap_within_reserv() - Searches and matches the input VMI for the + * for the capability bound values falling within the reserved virtual address + * range. This function internally uses mmap_lock to synchronize the VMA updates + * if mmap_lock is not already held. + * @vmi: The VMA iterator pointing at the VMA. + * @cap: Reservation capability value. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: True if the input capability bound values within the reserved virtual + * address range or false otherwise. + */ +bool reserv_vmi_cap_within_reserv(struct vma_iterator *vmi, user_uintptr_t cap, + bool locked); + +/** + * reserv_vma_cap_within_reserv() - Searches and matches the input VMA for the + * capability bound values falling within the reserved virtual address range. + * This function should be called with mmap_lock held. + * @vma: The VMA pointer. + * @cap: Reservation capability value. + * + * Return: True if the input capability bound values within the reserved virtual + * address range or false otherwise. + */ +bool reserv_vma_cap_within_reserv(struct vm_area_struct *vma, user_uintptr_t cap); + +/** + * reserv_vma_range_within_reserv() - Searches and matches the input VMA for the input + * address range falling within the reserved virtual address range. This function + * should be called with mmap_lock held. + * @vma: The VMA pointer. + * @start: Virtual address start value. + * @len: Virtual address length. + * + * Return: True if the input address range within the reserved virtual address + * range or false otherwise. + */ +bool reserv_vma_range_within_reserv(struct vm_area_struct *vma, ptraddr_t start, size_t len); + +/** + * reserv_is_supported() - Checks if the reservation property exists for the mm. + * @mm: The mm pointer. + * + * Return: True if mm has the reservation property set or false otherwise. + */ +static inline bool reserv_is_supported(struct mm_struct *mm) +{ + if (mm && test_bit(MMF_PCUABI_RESERV, &mm->flags)) + return true; + + return false; +} + +/** + * reserv_fork() - Checks and copies the MMF_PCUABI_RESERV bit in the new mm during fork. + * @mm: New mm pointer. + * @oldmm: Old mm pointer. + * + * Return: None. + */ +static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) +{ + if (test_bit(MMF_PCUABI_RESERV, &oldmm->flags)) + set_bit(MMF_PCUABI_RESERV, &mm->flags); +} + +#else /* CONFIG_CHERI_PURECAP_UABI */ + +#define reserv_representable_alignment(len) 0 + +#define reserv_representable_base(base, len) base + +#define reserv_representable_length(len) len + +#define reserv_vma_reserv_start(vma) 0 + +#define reserv_vma_reserv_len(vma) 0 + +#define reserv_vma_reserv_perm(vma) 0 + +static inline int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, user_ptr_perms_t perm) +{ + return 0; +} + +static inline user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, + user_ptr_perms_t perm, bool locked) +{ + return (user_uintptr_t)start; +} + +static inline int reserv_vmi_range_mapped(struct vma_iterator *vmi, ptraddr_t start, + size_t len, bool locked) +{ + return 0; +} + +static inline bool reserv_vmi_cap_within_reserv(struct vma_iterator *vmi, user_uintptr_t cap, + bool locked) +{ + return true; +} + +static inline bool reserv_vma_cap_within_reserv(struct vm_area_struct *vma, user_uintptr_t cap) +{ + return true; +} + +static inline bool reserv_vma_range_within_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len) +{ + return true; +} + +static inline bool reserv_is_supported(struct mm_struct *mm) +{ + return false; +} + +static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) {} + +#endif /* CONFIG_CHERI_PURECAP_UABI */ + +#endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 774bd7d6ad60..5182848f4228 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -607,6 +607,12 @@ struct vma_numab_state { int prev_scan_seq; };
+struct reserv_struct { + ptraddr_t reserv_start; + size_t reserv_len; + user_ptr_perms_t reserv_perm; +}; + /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory @@ -711,6 +717,9 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI + struct reserv_struct reserv_data; +#endif } __randomize_layout;
#ifdef CONFIG_NUMA diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 685586bc0d89..d663c6105d54 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -2,6 +2,7 @@ #ifndef _LINUX_USER_PTR_H #define _LINUX_USER_PTR_H
+#include <linux/cheri.h> #include <linux/limits.h> #include <linux/typecheck.h>
@@ -27,6 +28,8 @@
#ifdef CONFIG_CHERI_PURECAP_UABI
+#define user_ptr_perms_t cheri_perms_t + /** * uaddr_to_user_ptr() - Convert a user-provided address to a user pointer. * @addr: The address to set the pointer to. @@ -109,6 +112,8 @@ bool check_user_ptr_rw(void __user *ptr, size_t len);
#else /* CONFIG_CHERI_PURECAP_UABI */
+#define user_ptr_perms_t int + static inline void __user *uaddr_to_user_ptr(ptraddr_t addr) { return as_user_ptr(addr); diff --git a/mm/Makefile b/mm/Makefile index 33873c8aedb3..6f994a1664e4 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \ - pgtable-generic.o rmap.o vmalloc.o + pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c new file mode 100644 index 000000000000..5586fde34d0a --- /dev/null +++ b/mm/cap_addr_mgmt.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/mm.h> +#include <linux/slab.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI + +int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, user_ptr_perms_t perm) +{ + if (!reserv_is_supported(vma->vm_mm)) + return 0; + if (start + len < start) + return -EINVAL; + /* Reservation base/length is expected as page aligned */ + VM_BUG_ON(start & ~PAGE_MASK || len % PAGE_SIZE); + + vma->reserv_data.reserv_start = start & cheri_representable_alignment_mask(len); + vma->reserv_data.reserv_len = cheri_representable_length(len); + if (perm) + vma->reserv_data.reserv_perm = perm; + + return 0; +} + +user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, user_ptr_perms_t perm, + bool locked) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + ptraddr_t end = start + len; + user_uintptr_t ret = 0; + VMA_ITERATOR(vmi, mm, start); + + if (!reserv_is_supported(mm)) + return start; + if (end < start) + return -EINVAL; + + /* Check if the reservation range is representable and throw error if not */ + if (start & ~cheri_representable_alignment_mask(len) || + len != cheri_representable_length(len) || + start & ~PAGE_MASK || len % PAGE_SIZE) { + printk(KERN_WARNING "Reservation range (0x%lx)-(0x%lx) is not representable\n", + start, start + len - 1); + return -ERESERVATION; + } + if (!locked && mmap_write_lock_killable(mm)) + return -EINTR; + + for_each_vma_range(vmi, vma, end) { + WRITE_ONCE(vma->reserv_data.reserv_start, start); + WRITE_ONCE(vma->reserv_data.reserv_len, len); + WRITE_ONCE(vma->reserv_data.reserv_perm, perm); + } + if (!locked) + mmap_write_unlock(current->mm); + ret = (user_uintptr_t)uaddr_to_user_ptr_safe(start); + + return ret; +} + +int reserv_vmi_range_mapped(struct vma_iterator *vmi, ptraddr_t start, + size_t len, bool locked) +{ + struct vm_area_struct *vma; + struct mm_struct *mm = current->mm; + int ret = -ENOMEM; + + if (!reserv_is_supported(mm)) + return 0; + if (!locked && mmap_read_lock_killable(mm)) + return -EINTR; + + start = round_down(start, PAGE_SIZE); + len = round_up(len, PAGE_SIZE); + mas_set_range(&vmi->mas, start, start); + /* Try walking the given range */ + vma = mas_find(&vmi->mas, start + len - 1); + if (!vma) + goto out; + + /* If the range is fully mapped then no gap exists */ + if (mas_empty_area(&vmi->mas, start, start + len - 1, 1)) + goto out; + ret = 0; +out: + if (!locked) + mmap_read_unlock(mm); + return ret; +} + +bool reserv_vmi_cap_within_reserv(struct vma_iterator *vmi, user_uintptr_t cap, bool locked) +{ + struct vm_area_struct *vma; + struct mm_struct *mm = current->mm; + ptraddr_t cap_start = cheri_base_get(cap); + ptraddr_t cap_end = cap_start + cheri_length_get(cap); + bool ret = false; + + if (!reserv_is_supported(mm)) + return true; + + if (!locked && mmap_read_lock_killable(mm)) + return false; + + /* Check if there is match with the existing reservations */ + vma = mas_find(&vmi->mas, cap_end); + if (!vma) + goto out; + + if (vma->reserv_data.reserv_start <= cap_start && + vma->reserv_data.reserv_len >= cheri_length_get(cap)) + ret = true; +out: + if (!locked) + mmap_read_unlock(mm); + + return ret; +} + +bool reserv_vma_cap_within_reserv(struct vm_area_struct *vma, user_uintptr_t cap) +{ + if (!reserv_is_supported(vma->vm_mm)) + return true; + + /* Check if there is match with the existing reservations */ + if (vma->reserv_data.reserv_start <= cheri_base_get(cap) && + vma->reserv_data.reserv_len >= cheri_length_get(cap)) + return true; + + return false; +} + +bool reserv_vma_range_within_reserv(struct vm_area_struct *vma, ptraddr_t start, size_t len) +{ + if (!reserv_is_supported(vma->vm_mm)) + return true; + + start = untagged_addr(start); + + /* Check if there is match with the existing reservations */ + if (vma->reserv_data.reserv_start <= start && vma->reserv_data.reserv_len >= len) + return true; + + return false; +} + +#endif /* CONFIG_CHERI_PURECAP_UABI */
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
[...]
+#ifdef CONFIG_CHERI_PURECAP_UABI +#define reserv_representable_alignment(len) \
- (test_bit(MMF_PCUABI_RESERV, ¤t->mm->flags) \
- ? ~cheri_representable_alignment_mask(len) : 0)
The value info.align_mask is set to always seems to be ANDed with PAGE_MASK (see for instance hugetlb_get_unmapped_area_bottomup()), presumably to avoid adding PAGE_SIZE - 1 in the calculations in unmapped_area(). I guess we should do the same here?
+#define reserv_representable_base(base, len) \
- (test_bit(MMF_PCUABI_RESERV, ¤t->mm->flags) \
- ? base & cheri_representable_alignment_mask(len) : base)
Macro arguments should be enclosed in parentheses when they're referenced (also applies to the !PCuABI definitions).
+#define reserv_representable_length(len) \
- (test_bit(MMF_PCUABI_RESERV, ¤t->mm->flags) \
- ? cheri_representable_length(len) : len)
+#define reserv_vma_reserv_start(vma) \
- (test_bit(MMF_PCUABI_RESERV, &vma->vm_mm->flags) \
? vma->reserv_data.reserv_start : 0)
+#define reserv_vma_reserv_len(vma) \
- (test_bit(MMF_PCUABI_RESERV, &vma->vm_mm->flags) \
? vma->reserv_data.reserv_len : 0)
+#define reserv_vma_reserv_perm(vma) \
Sorry I didn't mention this earlier: "perms" (plural) is generally better throughout the code, a capability almost always has multiple permissions.
- (test_bit(MMF_PCUABI_RESERV, &vma->vm_mm->flags) \
? vma->reserv_data.reserv_perm : 0)
[...]
+/**
- reserv_is_supported() - Checks if the reservation property exists for the mm.
- @mm: The mm pointer.
- Return: True if mm has the reservation property set or false otherwise.
- */
+static inline bool reserv_is_supported(struct mm_struct *mm) +{
- if (mm && test_bit(MMF_PCUABI_RESERV, &mm->flags))
Is the null check really warranted for mm? Is there any situation where current->mm or vma->vm_mm (what this function is passed in practice) is null?
return true;
- return false;
Or simply:
return mm && test_bit(MMF_PCUABI_RESERV, &mm->flags);
Same comment for a few functions in cap_addr_mgmt.c.
+}
[...] +struct reserv_struct {
- ptraddr_t reserv_start;
- size_t reserv_len;
- user_ptr_perms_t reserv_perm;
Having moved all the fields to their own struct, I guess the reserv_ prefix becomes redundant: simply start, len, perms?
+};
/*
- This struct describes a virtual memory area. There is one of these
- per VM-area/task. A VM area is any part of the process virtual memory
@@ -711,6 +717,9 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI
- struct reserv_struct reserv_data;
+#endif } __randomize_layout; #ifdef CONFIG_NUMA diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 685586bc0d89..d663c6105d54 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -2,6 +2,7 @@ #ifndef _LINUX_USER_PTR_H #define _LINUX_USER_PTR_H +#include <linux/cheri.h> #include <linux/limits.h> #include <linux/typecheck.h> @@ -27,6 +28,8 @@ #ifdef CONFIG_CHERI_PURECAP_UABI +#define user_ptr_perms_t cheri_perms_t
/**
- uaddr_to_user_ptr() - Convert a user-provided address to a user pointer.
- @addr: The address to set the pointer to.
@@ -109,6 +112,8 @@ bool check_user_ptr_rw(void __user *ptr, size_t len); #else /* CONFIG_CHERI_PURECAP_UABI */ +#define user_ptr_perms_t int
static inline void __user *uaddr_to_user_ptr(ptraddr_t addr) { return as_user_ptr(addr); diff --git a/mm/Makefile b/mm/Makefile index 33873c8aedb3..6f994a1664e4 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \
pgtable-generic.o rmap.o vmalloc.o
pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
We could add the file to mmu-$(CONFIG_CHERI_PURECAP_UABI) instead, this way we don't need the #ifdef in the .c.
[...]
+bool reserv_vma_range_within_reserv(struct vm_area_struct *vma, ptraddr_t start, size_t len) +{
- if (!reserv_is_supported(vma->vm_mm))
return true;
- start = untagged_addr(start);
This is strange as we're not doing that in any other function. AFAICT reserv_vma_range_within_reserv() is never passed an address that could be tagged (i.e. directly from the user, without untaged_addr() having been called already).
Kevin
- /* Check if there is match with the existing reservations */
- if (vma->reserv_data.reserv_start <= start && vma->reserv_data.reserv_len >= len)
return true;
- return false;
+}
+#endif /* CONFIG_CHERI_PURECAP_UABI */
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 685586bc0d89..d663c6105d54 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -2,6 +2,7 @@ #ifndef _LINUX_USER_PTR_H #define _LINUX_USER_PTR_H +#include <linux/cheri.h> #include <linux/limits.h> #include <linux/typecheck.h> @@ -27,6 +28,8 @@ #ifdef CONFIG_CHERI_PURECAP_UABI +#define user_ptr_perms_t cheri_perms_t
This should really be a typedef. I suppose this should be a separate patch as well.
Note in passing: I was trying quite hard not to include linux/cheri.h in linux/user_ptr.h, because the latter is itself included in linux/kernel.h and therefore everywhere. That's why the builtins are used directly instead of the cheriintrin.h wrappers. However, I can't see an easy to avoid the include if we're going to use cheri_perms_t. I suppose we can live with cheri.h being included everywhere.
Kevin
/**
- uaddr_to_user_ptr() - Convert a user-provided address to a user pointer.
- @addr: The address to set the pointer to.
@@ -109,6 +112,8 @@ bool check_user_ptr_rw(void __user *ptr, size_t len); #else /* CONFIG_CHERI_PURECAP_UABI */ +#define user_ptr_perms_t int
static inline void __user *uaddr_to_user_ptr(ptraddr_t addr) { return as_user_ptr(addr);
Add user_ptr_is_valid() and user_ptr_set_addr() helpers to operate on user pointers in different situations in subsequent commits.
* user_ptr_is_valid() validates the user pointer by fetching the tag.
* user_ptr_set_addr() sets the address field of the user pointer.
Both of the above helpers use CHERI compiler builtins for PCuABI case.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- Documentation/core-api/user_ptr.rst | 12 +++++++++++ include/linux/user_ptr.h | 33 +++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+)
diff --git a/Documentation/core-api/user_ptr.rst b/Documentation/core-api/user_ptr.rst index 9db5e9271578..1427c4701af8 100644 --- a/Documentation/core-api/user_ptr.rst +++ b/Documentation/core-api/user_ptr.rst @@ -222,6 +222,18 @@ equal without being identical. To check whether two user pointers are truly identical, ``user_ptr_is_same(p1, p2)`` (``<linux/user_ptr.h>``) should be used.
+Validity +---------- + +To check whether a user pointer is valid, +``user_ptr_is_valid(p)`` (``<linux/user_ptr.h>``) should be used. + +Setting the address +------------------- + +To set the address field of the user pointers, +``user_ptr_set_addr(p)`` (``<linux/user_ptr.h>``) should be used. + Alignment ---------
diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index d663c6105d54..8cf69280bfcc 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -226,4 +226,37 @@ static inline bool user_ptr_is_same(const void __user *p1, const void __user *p2 #endif }
+/** + * user_ptr_is_valid() - Checks if the user pointer is valid. + * @ptr: The user pointer to check. + * + * Return: true if @ptr is valid. + * + * This function returns the tag of user pointer @ptr. + */ +static inline bool user_ptr_is_valid(const void __user *ptr) +{ +#ifdef CONFIG_CHERI_PURECAP_UABI + return __builtin_cheri_tag_get(ptr); +#else + return 0; +#endif +} + +/** + * user_ptr_set_addr() - Sets the address of the user pointer. + * @ptr: The user pointer to set address. + * @addr: The address to set the pointer to. + * + * Return: A user pointer with its address set to @addr. + */ +static inline void __user *user_ptr_set_addr(void __user *ptr, ptraddr_t addr) +{ +#ifdef CONFIG_CHERI_PURECAP_UABI + return __builtin_cheri_address_set(ptr, addr); +#else + return as_user_ptr(addr); +#endif +} + #endif /* _LINUX_USER_PTR_H */
Helper functions check_user_ptr_owning(), make_user_ptr_owning() and user_ptr_perms_from_prot() are added to manage owning capability constraints as per PCuABI specifications. These helpers will be mostly used by memory management syscalls to apply the different capability constraints.
* check_user_ptr_owning() checks if the capability owns the input range. The input range is page aligned, and the operation is ignored for non PCuABI case.
* make_user_ptr_owning() creates the relevant capability from the input range and permissions. The input range is first page aligned and then CHERI representable aligned.
Both of these functions are implemented on top of the cheri_* helpers in linux/cheri.h.
* user_ptr_perms_from_prot() converts memory mapping protections to capability permissions.
Note: These helper functions currently check only capability bounds and not capability permission constraints and full support will be incrementally added in subsequent commits.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- Documentation/core-api/user_ptr.rst | 15 +++++++++ include/linux/user_ptr.h | 47 +++++++++++++++++++++++++++++ lib/user_ptr.c | 37 +++++++++++++++++++++++ mm/cap_addr_mgmt.c | 2 +- 4 files changed, 100 insertions(+), 1 deletion(-)
diff --git a/Documentation/core-api/user_ptr.rst b/Documentation/core-api/user_ptr.rst index 1427c4701af8..627bcea2a07e 100644 --- a/Documentation/core-api/user_ptr.rst +++ b/Documentation/core-api/user_ptr.rst @@ -345,3 +345,18 @@ accidentally providing capabilities to userspace in PCuABI. | routines suffixed with ``with_captags``. See ``<linux/uaccess.h>`` | | for details. | +-----------------------------------------------------------------------+ + +Managing user pointers by mm subsystem +-------------------------------------- + +User pointers and memory length managed in Linux mm subsystem are usually +page aligned or sometimes CHERI representable aligned. Below, APIs consider +those requirements while creating and checking user pointers. They also +check the mm flag MMF_PCUABI_RESERV to skip the operation for non-PCuABI +implementation, such as compat64 mode. + +* ``check_user_ptr_owning(ptr, addr, n)`` +* ``make_user_ptr_owning(addr, n, perm)`` +* ``user_ptr_perms_from_prot(prot, tag_perm)`` + +See ``<linux/user_ptr.h>`` for details on how to use them. diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 8cf69280bfcc..1eb59442b06e 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -110,6 +110,38 @@ bool check_user_ptr_read(const void __user *ptr, size_t len); bool check_user_ptr_write(void __user *ptr, size_t len); bool check_user_ptr_rw(void __user *ptr, size_t len);
+/** + * check_user_ptr_owning() - Check if the address range is within the valid + * user pointer capability bound. + * @user_ptr: User pointer. + * @addr: Address start value. + * @len: Address length. + * + * Return: True if address within the capability bound or false otherwise. + */ +bool check_user_ptr_owning(user_uintptr_t user_ptr, ptraddr_t addr, size_t len); + +/** + * make_user_ptr_owning() - Creates a userspace capability from the + * requested base address, length and memory permission flags. + * @addr: Requested capability address. + * @len: Requested capability length. + * @perm: Requested capability permission flags. + * + * Return: A new capability derived from cheri_user_root_cap. + */ +user_uintptr_t make_user_ptr_owning(ptraddr_t addr, size_t len, user_ptr_perms_t perm); + +/** + * user_ptr_perms_from_prot() - Converts memory mapping protection flags to + * capability permission flags. + * @prot: Memory protection flags. + * @has_tag_access: Capability permissions to have tag check flags. + * + * Return: Capability permission flags + */ +user_ptr_perms_t user_ptr_perms_from_prot(int prot, bool has_tag_access); + #else /* CONFIG_CHERI_PURECAP_UABI */
#define user_ptr_perms_t int @@ -150,6 +182,21 @@ static inline bool check_user_ptr_rw(void __user *ptr, size_t len) return true; }
+static inline bool check_user_ptr_owning(user_uintptr_t user_ptr, ptraddr_t addr, size_t len) +{ + return true; +} + +static inline user_uintptr_t make_user_ptr_owning(ptraddr_t addr, size_t len, user_ptr_perms_t perm) +{ + return addr; +} + +static inline user_ptr_perms_t user_ptr_perms_from_prot(int prot, bool has_tag_access) +{ + return 0; +} + #endif /* CONFIG_CHERI_PURECAP_UABI */
/** diff --git a/lib/user_ptr.c b/lib/user_ptr.c index 115efc9fe678..f597f73191bb 100644 --- a/lib/user_ptr.c +++ b/lib/user_ptr.c @@ -1,6 +1,8 @@ /* SPDX-License-Identifier: GPL-2.0-only */ #include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> #include <linux/cheri.h> +#include <linux/sched.h> #include <linux/user_ptr.h>
void __user *uaddr_to_user_ptr(ptraddr_t addr) @@ -70,3 +72,38 @@ bool check_user_ptr_rw(void __user *ptr, size_t len) { return cheri_check_cap(ptr, len, CHERI_PERM_LOAD | CHERI_PERM_STORE); } + +bool check_user_ptr_owning(user_uintptr_t user_ptr, ptraddr_t addr, size_t len) +{ + if (!reserv_is_supported(current->mm)) + return true; + + addr = round_down(addr, PAGE_SIZE); + len = round_up(len, PAGE_SIZE); + + return cheri_check_cap((const void * __capability)cheri_address_set(user_ptr, addr), + len, CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM); +} + +user_uintptr_t make_user_ptr_owning(ptraddr_t addr, size_t len, user_ptr_perms_t perm) +{ + ptraddr_t align_addr; + user_uintptr_t user_ptr; + + if (!reserv_is_supported(current->mm)) + return (user_uintptr_t)addr; + + align_addr = reserv_representable_base(round_down(addr, PAGE_SIZE), len); + len = cheri_representable_length(round_up(len, PAGE_SIZE)); + user_ptr = (user_uintptr_t)cheri_build_user_cap(align_addr, len, perm); + + return cheri_address_set(user_ptr, addr); +} + +user_ptr_perms_t user_ptr_perms_from_prot(int prot __maybe_unused, + bool has_tag_access __maybe_unused) +{ + /* TODO [PCuABI] - capability permission conversion from memory permission */ + return (CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP); +} diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index 5586fde34d0a..890101eec187 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -58,7 +58,7 @@ user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, user_ptr_per } if (!locked) mmap_write_unlock(current->mm); - ret = (user_uintptr_t)uaddr_to_user_ptr_safe(start); + ret = make_user_ptr_owning(start, len, perm);
return ret; }
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
[...]
diff --git a/lib/user_ptr.c b/lib/user_ptr.c index 115efc9fe678..f597f73191bb 100644 --- a/lib/user_ptr.c +++ b/lib/user_ptr.c @@ -1,6 +1,8 @@ /* SPDX-License-Identifier: GPL-2.0-only */ #include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> #include <linux/cheri.h> +#include <linux/sched.h> #include <linux/user_ptr.h> void __user *uaddr_to_user_ptr(ptraddr_t addr) @@ -70,3 +72,38 @@ bool check_user_ptr_rw(void __user *ptr, size_t len) { return cheri_check_cap(ptr, len, CHERI_PERM_LOAD | CHERI_PERM_STORE); }
+bool check_user_ptr_owning(user_uintptr_t user_ptr, ptraddr_t addr, size_t len) +{
- if (!reserv_is_supported(current->mm))
return true;
I would strongly prefer these checks to be done in the caller. In fact AFAICT this is already done in most cases, so this check is redundant. The helpers in this file are generic, they should do what they're asked to do regardless of the nature of current->mm.
- addr = round_down(addr, PAGE_SIZE);
- len = round_up(len, PAGE_SIZE);
- return cheri_check_cap((const void * __capability)cheri_address_set(user_ptr, addr),
len, CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM);
+}
+user_uintptr_t make_user_ptr_owning(ptraddr_t addr, size_t len, user_ptr_perms_t perm) +{
- ptraddr_t align_addr;
- user_uintptr_t user_ptr;
- if (!reserv_is_supported(current->mm))
return (user_uintptr_t)addr;
- align_addr = reserv_representable_base(round_down(addr, PAGE_SIZE), len);
- len = cheri_representable_length(round_up(len, PAGE_SIZE));
- user_ptr = (user_uintptr_t)cheri_build_user_cap(align_addr, len, perm);
- return cheri_address_set(user_ptr, addr);
+}
+user_ptr_perms_t user_ptr_perms_from_prot(int prot __maybe_unused,
bool has_tag_access __maybe_unused)
Nit: not sure why we would need __maybe_unused on function parameters. The kernel is built with -Wno-unused-parameter even with W=1 (see scripts/Makefile.extrawarn). That's a pretty strong assumption in the kernel, even trivial empty implementations like those added above would warn otherwise.
Kevin
+{
- /* TODO [PCuABI] - capability permission conversion from memory permission */
- return (CHERI_PERMS_READ | CHERI_PERMS_WRITE |
CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP);
+} diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index 5586fde34d0a..890101eec187 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -58,7 +58,7 @@ user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, user_ptr_per } if (!locked) mmap_write_unlock(current->mm);
- ret = (user_uintptr_t)uaddr_to_user_ptr_safe(start);
- ret = make_user_ptr_owning(start, len, perm);
return ret; }
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
+user_ptr_perms_t user_ptr_perms_from_prot(int prot __maybe_unused,
I would add "owning" to the function name, to highlight this is to be used only for owning capabilities.
Kevin
bool has_tag_access __maybe_unused)
+{
- /* TODO [PCuABI] - capability permission conversion from memory permission */
- return (CHERI_PERMS_READ | CHERI_PERMS_WRITE |
CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP);
+}
In CHERI architecture, all the ranges cannot be represented in capability so add the necessary CHERI base and length alignment checks when generating the free unmapped virtual address or evaluating the fixed input address.
The PCuABI reservation interface stores the unusable alignment gaps at the start and end. These gaps should be considered when finding the free unmapped address space.
In the case of fixed valid capability type addresses, the requested address range should completely overlap with the reservation range. In the case of fixed null capability addresses, they are verified to not overlap with any existing reservation range.
Due to the above requirement, get_unmapped_area() function should not be used for limit checks for fixed valid capability addresses or for already mapped addresses like done for vma_expandable(). A function vm_area_range_within_limit() is created for sanity checks in those cases.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/mm.h | 8 +++++ mm/mmap.c | 78 +++++++++++++++++++++++++++++++++++++--------- mm/mremap.c | 17 +++++++--- 3 files changed, 85 insertions(+), 18 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index ce2501062292..f7f09fe0684e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -30,6 +30,7 @@ #include <linux/kasan.h> #include <linux/memremap.h> #include <linux/slab.h> +#include <linux/cap_addr_mgmt.h>
struct mempolicy; struct anon_vma; @@ -3409,6 +3410,8 @@ struct vm_unmapped_area_info { };
extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info); +extern int vm_area_range_within_limit(unsigned long addr, unsigned long len, + unsigned long flags);
/* truncate.c */ extern void truncate_inode_pages(struct address_space *, loff_t); @@ -3472,9 +3475,12 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start;
+ if (reserv_is_supported(vma->vm_mm)) + vm_start = reserv_vma_reserv_start(vma); vm_start -= gap; if (vm_start > vma->vm_start) vm_start = 0; + return vm_start; }
@@ -3482,6 +3488,8 @@ static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end;
+ if (reserv_is_supported(vma->vm_mm)) + vm_end = reserv_vma_reserv_start(vma) + reserv_vma_reserv_len(vma); if (vma->vm_flags & VM_GROWSUP) { vm_end += stack_guard_gap; if (vm_end < vma->vm_end) diff --git a/mm/mmap.c b/mm/mmap.c index bec26ad4fdb0..305c90332424 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,8 @@ #include <linux/sched/mm.h> #include <linux/ksm.h>
+#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -1656,6 +1658,8 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start; + if (reserv_is_supported(tmp->vm_mm)) + high_limit = reserv_vma_reserv_start(tmp); mas_reset(&mas); goto retry; } @@ -1686,6 +1690,19 @@ unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info) return addr; }
+int vm_area_range_within_limit(unsigned long addr, unsigned long len, + unsigned long flags) +{ + const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long align_len = reserv_representable_length(len); + + /* requested length too big for entire address space */ + if (align_len > mmap_end - mmap_min_addr) + return -ENOMEM; + + return 0; +} + /* Get an address range which is currently unmapped. * For shmat() with addr=0. * @@ -1706,27 +1723,44 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, struct vm_area_struct *vma, *prev; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long align_len; + unsigned long align_addr;
- if (len > mmap_end - mmap_min_addr) + align_len = reserv_representable_length(len); + if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED) + /* + * In case of PCuABI, fixed address without valid capability should + * not overlap with any existing reservation. Let this scenario + * fallthrough below for such checks. + */ + if ((flags & MAP_FIXED) && !reserv_is_supported(mm)) return addr;
if (addr) { addr = PAGE_ALIGN(addr); + /* + * Here CHERI representable address is aligned down as reservation + * layer holds this unusable aligned down gap. + */ + align_addr = reserv_representable_base(addr, len); vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && - (!prev || addr >= vm_end_gap(prev))) + if (mmap_end - align_len >= align_addr && align_addr >= mmap_min_addr && + (!vma || align_addr + align_len <= vm_start_gap(vma)) && + (!prev || align_addr >= vm_end_gap(prev))) return addr; + else if (flags & MAP_FIXED) + /* This non-tagged fixed address overlaps with other reservation */ + return -ERESERVATION; }
info.flags = 0; - info.length = len; + info.length = align_len; info.low_limit = mm->mmap_base; info.high_limit = mmap_end; info.align_mask = 0; + info.align_mask = reserv_representable_alignment(len); info.align_offset = 0; return vm_unmapped_area(&info); } @@ -1754,29 +1788,45 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, struct mm_struct *mm = current->mm; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long align_len; + unsigned long align_addr;
+ align_len = reserv_representable_length(len); /* requested length too big for entire address space */ - if (len > mmap_end - mmap_min_addr) + if (align_len > mmap_end - mmap_min_addr) return -ENOMEM; - - if (flags & MAP_FIXED) + /* + * In case of PCuABI, fixed address without valid capability should + * not overlap with any existing reservation. Let this scenario + * fallthrough below for such checks. + */ + if ((flags & MAP_FIXED) && !reserv_is_supported(mm)) return addr;
/* requesting a specific address */ if (addr) { addr = PAGE_ALIGN(addr); - vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && - (!prev || addr >= vm_end_gap(prev))) + /* + * Here CHERI representable address is aligned down as reservation + * layer holds this unusable aligned down gap. + */ + align_addr = reserv_representable_base(addr, len); + vma = find_vma_prev(mm, align_addr, &prev); + if (mmap_end - align_len >= align_addr && align_addr >= mmap_min_addr && + (!vma || align_addr + align_len <= vm_start_gap(vma)) && + (!prev || align_addr >= vm_end_gap(prev))) return addr; + else if (flags & MAP_FIXED) + /* This fixed address overlaps with other reservation. */ + return -ERESERVATION; }
info.flags = VM_UNMAPPED_AREA_TOPDOWN; - info.length = len; + info.length = align_len; info.low_limit = PAGE_SIZE; info.high_limit = arch_get_mmap_base(addr, mm->mmap_base); info.align_mask = 0; + info.align_mask = reserv_representable_alignment(len); info.align_offset = 0; addr = vm_unmapped_area(&info);
diff --git a/mm/mremap.c b/mm/mremap.c index 515217a95293..f014ac50d9f1 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -934,7 +934,10 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (vma->vm_flags & VM_MAYSHARE) map_flags |= MAP_SHARED;
- ret = get_unmapped_area(vma->vm_file, new_addr, new_len, vma->vm_pgoff + + if (reserv_is_supported(vma->vm_mm) && (map_flags & MAP_FIXED)) + ret = vm_area_range_within_limit(new_addr, new_len, map_flags); + else + ret = get_unmapped_area(vma->vm_file, new_addr, new_len, vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT), map_flags); if (IS_ERR_VALUE(ret)) @@ -959,9 +962,15 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) return 0; if (find_vma_intersection(vma->vm_mm, vma->vm_end, end)) return 0; - if (get_unmapped_area(NULL, vma->vm_start, end - vma->vm_start, - 0, MAP_FIXED) & ~PAGE_MASK) - return 0; + if (reserv_is_supported(vma->vm_mm)) { + if (vm_area_range_within_limit(vma->vm_start, end - vma->vm_start, + MAP_FIXED)) + return 0; + } else { + if (get_unmapped_area(NULL, vma->vm_start, end - vma->vm_start, + 0, MAP_FIXED) & ~PAGE_MASK) + return 0; + } return 1; }
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
[...]
@@ -3472,9 +3475,12 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start;
- if (reserv_is_supported(vma->vm_mm))
vm_start = reserv_vma_reserv_start(vma);
If we made reserv_vma_reserv_start(vma) return vma->vm_start in the !PCuABI case, then we could use it unconditionally. Similar idea for for reserv_vma_reserv_len(vma).
vm_start -= gap; if (vm_start > vma->vm_start) vm_start = 0;
Nit: spurious change.
return vm_start; } @@ -3482,6 +3488,8 @@ static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end;
- if (reserv_is_supported(vma->vm_mm))
if (vma->vm_flags & VM_GROWSUP) { vm_end += stack_guard_gap; if (vm_end < vma->vm_end)vm_end = reserv_vma_reserv_start(vma) + reserv_vma_reserv_len(vma);
diff --git a/mm/mmap.c b/mm/mmap.c index bec26ad4fdb0..305c90332424 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,8 @@ #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -1656,6 +1658,8 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start;
if (reserv_is_supported(tmp->vm_mm))
}high_limit = reserv_vma_reserv_start(tmp); mas_reset(&mas); goto retry;
@@ -1686,6 +1690,19 @@ unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info) return addr; } +int vm_area_range_within_limit(unsigned long addr, unsigned long len,
unsigned long flags)
+{
- const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
- unsigned long align_len = reserv_representable_length(len);
- /* requested length too big for entire address space */
- if (align_len > mmap_end - mmap_min_addr)
return -ENOMEM;
- return 0;
+}
/* Get an address range which is currently unmapped.
- For shmat() with addr=0.
@@ -1706,27 +1723,44 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, struct vm_area_struct *vma, *prev; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
- unsigned long align_len;
- unsigned long align_addr;
- if (len > mmap_end - mmap_min_addr)
- align_len = reserv_representable_length(len);
- if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED)
- /*
* In case of PCuABI, fixed address without valid capability should
* not overlap with any existing reservation. Let this scenario
* fallthrough below for such checks.
*/
- if ((flags & MAP_FIXED) && !reserv_is_supported(mm)) return addr;
if (addr) { addr = PAGE_ALIGN(addr);
/*
* Here CHERI representable address is aligned down as reservation
* layer holds this unusable aligned down gap.
*/
vma = find_vma_prev(mm, addr, &prev);align_addr = reserv_representable_base(addr, len);
if (mmap_end - len >= addr && addr >= mmap_min_addr &&
(!vma || addr + len <= vm_start_gap(vma)) &&
(!prev || addr >= vm_end_gap(prev)))
if (mmap_end - align_len >= align_addr && align_addr >= mmap_min_addr &&
(!vma || align_addr + align_len <= vm_start_gap(vma)) &&
(!prev || align_addr >= vm_end_gap(prev))) return addr;
else if (flags & MAP_FIXED)
/* This non-tagged fixed address overlaps with other reservation */
return -ERESERVATION;
Expanding slightly on my comments in patch 7 here: what I am proposing is that we allow MAP_FIXED inside a reservation here. This means that instead of unconditionally returning -ERESERVATION, we check that the range is wholly contained within a reservation. If it is, then we just return addr as usual, otherwise (overlap case) we return -ERESERVATION. This might call for a new helper in cap_addr_mgmt.h.
Kevin
} [...]
PCuABI memory reservation requires adding reservation properties while creating and modifying the VMA. reserv_vma_set_reserv() interface is used to update those reservation details. Currently, these properties are added only for mmap/mremap syscalls, and later commits will add them for other special VMA mappings.
PCuABI memory reservation also requires merging or expanding VMAs that only belong to the original reservation. Use suitable reservation interfaces to check those properties before performing such operations on the VMA.
The address parameter type is modified from unsigned long to user_uintptr_t in several do_mmap() upstream functions to determine whether a new reservation is required.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/mm.h | 8 ++--- kernel/fork.c | 3 ++ mm/mmap.c | 73 +++++++++++++++++++++++++++++++++++++--------- mm/mremap.c | 24 ++++++++++++--- mm/util.c | 13 ++++----- 5 files changed, 91 insertions(+), 30 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index f7f09fe0684e..44a55c3e2c06 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3259,7 +3259,7 @@ extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); extern void unlink_file_vma(struct vm_area_struct *); extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, - bool *need_rmap_locks); + bool *need_rmap_locks, struct reserv_struct *reserv_info); extern void exit_mmap(struct mm_struct *); struct vm_area_struct *vma_modify(struct vma_iterator *vmi, struct vm_area_struct *prev, @@ -3365,8 +3365,8 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf); -extern unsigned long do_mmap(struct file *file, unsigned long addr, + struct list_head *uf, unsigned long prot, bool new_reserv); +extern user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_ptr, unsigned long len, unsigned long prot, unsigned long flags, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); @@ -3395,7 +3395,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} /* This takes the mm semaphore itself */ extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long); extern int vm_munmap(unsigned long, size_t); -extern unsigned long __must_check vm_mmap(struct file *, unsigned long, +extern user_uintptr_t __must_check vm_mmap(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long, unsigned long);
diff --git a/kernel/fork.c b/kernel/fork.c index a460a65624d7..9ee78c76fd4a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -99,6 +99,7 @@ #include <linux/stackprotector.h> #include <linux/user_events.h> #include <linux/iommu.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -678,6 +679,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, goto out; khugepaged_fork(mm, oldmm);
+ reserv_fork(mm, oldmm); + retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count); if (retval) goto out; diff --git a/mm/mmap.c b/mm/mmap.c index 305c90332424..3b072e822f99 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -912,7 +912,8 @@ static struct vm_area_struct /* Can we merge the predecessor? */ if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy) && can_vma_merge_after(prev, vm_flags, anon_vma, file, - pgoff, vm_userfaultfd_ctx, anon_name)) { + pgoff, vm_userfaultfd_ctx, anon_name) + && reserv_vma_range_within_reserv(prev, addr, end - addr)) { merge_prev = true; vma_prev(vmi); } @@ -921,7 +922,8 @@ static struct vm_area_struct /* Can we merge the successor? */ if (next && mpol_equal(policy, vma_policy(next)) && can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx, anon_name)) { + vm_userfaultfd_ctx, anon_name) + && reserv_vma_range_within_reserv(next, addr, end - addr)) { merge_next = true; }
@@ -1212,7 +1214,7 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, /* * The caller must write-lock current->mm->mmap_lock. */ -unsigned long do_mmap(struct file *file, unsigned long addr, +user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_ptr, unsigned long len, unsigned long prot, unsigned long flags, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, @@ -1220,12 +1222,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr, { struct mm_struct *mm = current->mm; int pkey = 0; + unsigned long addr = (ptraddr_t)user_ptr; + bool new_reserv = true;
*populate = 0;
if (!len) return -EINVAL;
+ if (reserv_is_supported(mm)) { + if (user_ptr_is_valid((const void __user *)user_ptr)) + new_reserv = false; + } /* * Does the application expect PROT_READ to imply PROT_EXEC? * @@ -1259,9 +1267,11 @@ unsigned long do_mmap(struct file *file, unsigned long addr, /* Obtain the address to map to. we verify (or select) it and ensure * that it represents a valid section of the address space. */ - addr = get_unmapped_area(file, addr, len, pgoff, flags); - if (IS_ERR_VALUE(addr)) - return addr; + if (new_reserv) { + addr = get_unmapped_area(file, addr, len, pgoff, flags); + if (IS_ERR_VALUE(addr)) + return addr; + }
if (flags & MAP_FIXED_NOREPLACE) { if (find_vma_intersection(mm, addr, addr + len)) @@ -1383,15 +1393,19 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); + if (new_reserv) + user_ptr = addr; + + addr = mmap_region(file, user_ptr, len, vm_flags, pgoff, uf, prot, new_reserv); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; + return addr; }
-user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, +user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { @@ -1429,7 +1443,7 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, return PTR_ERR(file); }
- retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); + retval = vm_mmap_pgoff(file, user_ptr, len, prot, flags, pgoff); out_fput: if (file) fput(file); @@ -2801,16 +2815,17 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, unsigned long prot, bool new_reserv) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; - struct vm_area_struct *next, *prev, *merge; + struct vm_area_struct *next, *prev, *merge, *old_vma; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end; bool writable_file_mapping = false; + struct reserv_struct reserv_info; pgoff_t vm_pgoff; int error; VMA_ITERATOR(vmi, mm, addr); @@ -2830,6 +2845,21 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return -ENOMEM; }
+ if (!new_reserv) { + old_vma = vma_find(&vmi, end); + if (!old_vma) + /* + * This error scenario may not occur as address with valid + * capability should have been verified in the upstream + * syscall functions. + */ + return -ENOMEM; +#ifdef CONFIG_CHERI_PURECAP_UABI + memcpy(&reserv_info, &old_vma->reserv_data, sizeof(reserv_info)); +#endif + vma_iter_set(&vmi, addr); + } + /* Unmap any existing mapping in the area */ if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) return -ENOMEM; @@ -2856,7 +2886,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Check next */ if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen, - NULL_VM_UFFD_CTX, NULL)) { + NULL_VM_UFFD_CTX, NULL) && + reserv_vma_range_within_reserv(next, addr, len)) { merge_end = next->vm_end; vma = next; vm_pgoff = next->vm_pgoff - pglen; @@ -2867,7 +2898,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, pgoff, vma->vm_userfaultfd_ctx, NULL) : can_vma_merge_after(prev, vm_flags, NULL, file, pgoff, - NULL_VM_UFFD_CTX, NULL))) { + NULL_VM_UFFD_CTX, NULL)) && + reserv_vma_range_within_reserv(prev, addr, len)) { merge_start = prev->vm_start; vma = prev; vm_pgoff = prev->vm_pgoff; @@ -2903,6 +2935,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff; + if (new_reserv) + reserv_vma_set_reserv(vma, addr, len, + user_ptr_perms_from_prot(prot, (vm_flags & VM_SHARED) + ? false : true)); + else + reserv_vma_set_reserv(vma, reserv_info.reserv_start, reserv_info.reserv_len, + reserv_info.reserv_perm);
if (file) { vma->vm_file = get_file(file); @@ -3448,7 +3487,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) */ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, unsigned long addr, unsigned long len, pgoff_t pgoff, - bool *need_rmap_locks) + bool *need_rmap_locks, struct reserv_struct *reserv_info) { struct vm_area_struct *vma = *vmap; unsigned long vma_start = vma->vm_start; @@ -3500,6 +3539,12 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff; + if (reserv_info) + reserv_vma_set_reserv(new_vma, reserv_info->reserv_start, + reserv_info->reserv_len, + reserv_info->reserv_perm); + else + reserv_vma_set_reserv(new_vma, addr, len, 0); if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma)) diff --git a/mm/mremap.c b/mm/mremap.c index f014ac50d9f1..70f4031df1f4 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -651,7 +651,8 @@ static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, bool *locked, unsigned long flags, - struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap, + struct reserv_struct *reserv_info) { long to_account = new_len - old_len; struct mm_struct *mm = vma->vm_mm; @@ -705,7 +706,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, vma_start_write(vma); new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT); new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff, - &need_rmap_locks); + &need_rmap_locks, reserv_info); if (!new_vma) { if (vm_flags & VM_ACCOUNT) vm_unacct_memory(to_account >> PAGE_SHIFT); @@ -874,6 +875,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, struct vm_area_struct *vma; unsigned long ret = -EINVAL; unsigned long map_flags = 0; + struct reserv_struct reserv_info, *reserv_ptr = NULL;
if (offset_in_page(new_addr)) goto out; @@ -903,6 +905,20 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, return -ENOMEM;
if (flags & MREMAP_FIXED) { + if (reserv_is_supported(mm)) { + vma = find_vma(mm, new_addr); + if (!vma) + /* + * This error scenario may not occur as address with valid + * capability should have been verified in the upstream + * syscall functions. + */ + return -ENOMEM; +#ifdef CONFIG_CHERI_PURECAP_UABI + memcpy(&reserv_info, &vma->reserv_data, sizeof(reserv_info)); +#endif + reserv_ptr = &reserv_info; + } ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); if (ret) goto out; @@ -948,7 +964,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, new_addr = ret;
ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, - uf_unmap); + uf_unmap, reserv_ptr);
out: return ret; @@ -1169,7 +1185,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len }
ret = move_vma(vma, addr, old_len, new_len, new_addr, - &locked, flags, &uf, &uf_unmap); + &locked, flags, &uf, &uf_unmap, NULL); } out: if (offset_in_page(ret)) diff --git a/mm/util.c b/mm/util.c index afd40ed9c3c8..077c9a2592a9 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,7 +540,7 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc) } EXPORT_SYMBOL_GPL(account_locked_vm);
-user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, +user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long pgoff) { @@ -553,11 +553,7 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - /* - * TODO [PCuABI] - might need propagating uintcap further down - * to do_mmap to properly handle capabilities - */ - ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, + ret = do_mmap(file, usrptr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); @@ -570,7 +566,8 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, return ret; }
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset) { @@ -579,7 +576,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr, if (unlikely(offset_in_page(offset))) return -EINVAL;
- return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT); + return vm_mmap_pgoff(file, usrptr, len, prot, flag, offset >> PAGE_SHIFT); } EXPORT_SYMBOL(vm_mmap);
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
[...]
@@ -1259,9 +1267,11 @@ unsigned long do_mmap(struct file *file, unsigned long addr, /* Obtain the address to map to. we verify (or select) it and ensure * that it represents a valid section of the address space. */
- addr = get_unmapped_area(file, addr, len, pgoff, flags);
- if (IS_ERR_VALUE(addr))
return addr;
- if (new_reserv) {
addr = get_unmapped_area(file, addr, len, pgoff, flags);
if (IS_ERR_VALUE(addr))
return addr;
- }
I see the rationale in calling get_unmapped_area() only if we need a new reservation, but to avoid duplicating code with what I am suggesting below (vm_mmap() taking unsigned long), I think it would be better to keep calling it unconditionally. In the MAP_FIXED case, get_unmapped_area() would be expected to check that the range is either wholly contained in an existing reservation, or that a new reservation could be created (see related comment on patch 6).
With this approach we shouldn't need to change do_mmap() at all, as all the reservation handling is done in mmap_region() and get_unmapped_area().
if (flags & MAP_FIXED_NOREPLACE) { if (find_vma_intersection(mm, addr, addr + len)) @@ -1383,15 +1393,19 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
- if (new_reserv)
user_ptr = addr;
- addr = mmap_region(file, user_ptr, len, vm_flags, pgoff, uf, prot, new_reserv); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len;
- return addr;
} -user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, +user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len,
I think this is a good time to (re)consider the typing of user pointers in all those syscalls and helpers. Does it really make sense to use user_uintptr_t? After all, we know for sure that this is a capability in PCuABI, unlike in ioctl() for instance. Switching to void __user * would remove the need for a lot of casting and give us stronger typing. We already extract the address explicitly by casting, so it is straightforward to make use of user_ptr_addr() instead. I don't immediately see any disadvantage with making that switch (the few extra changes are largely compensated by being able to remove most casts), but I I may be missing something.
unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff)
{ @@ -1429,7 +1443,7 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, return PTR_ERR(file); }
- retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
- retval = vm_mmap_pgoff(file, user_ptr, len, prot, flags, pgoff);
out_fput: if (file) fput(file); @@ -2801,16 +2815,17 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
struct list_head *uf, unsigned long prot, bool new_reserv)
{ struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL;
- struct vm_area_struct *next, *prev, *merge;
- struct vm_area_struct *next, *prev, *merge, *old_vma; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end; bool writable_file_mapping = false;
- struct reserv_struct reserv_info; pgoff_t vm_pgoff; int error; VMA_ITERATOR(vmi, mm, addr);
@@ -2830,6 +2845,21 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return -ENOMEM; }
- if (!new_reserv) {
With my proposed approach, we need to find out whether we are creating a reservation or not here. Since get_unmapped_area() has already checked that the new mapping is either fully inside a reservation or a new one can be created, we don't need to actually check if that's possible. We just need to find out which situation we are in here by looking up the vmas before *and* after addr, and checking if either reservation contains addr. If it does, then we're not creating a reservation, otherwise we are. (We need to check after addr as well, because it is possible that the reservation the next vma is in has a start address that is below vma->start).
Worth noting that we're not adding much overhead by doing it this way, since we need to find the existing reservation data anyway to copy it into the new vma.
old_vma = vma_find(&vmi, end);
if (!old_vma)
/*
* This error scenario may not occur as address with valid
* capability should have been verified in the upstream
* syscall functions.
*/
return -ENOMEM;
+#ifdef CONFIG_CHERI_PURECAP_UABI
memcpy(&reserv_info, &old_vma->reserv_data, sizeof(reserv_info));
Nit: the struct can be copied with a regular assignment (reserv_info = old_vma->reserv_data).
+#endif
vma_iter_set(&vmi, addr);
- }
[...]
@@ -903,6 +905,20 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, return -ENOMEM; if (flags & MREMAP_FIXED) {
if (reserv_is_supported(mm)) {
vma = find_vma(mm, new_addr);
if (!vma)
The same logic as in mmap_region() is necessary here, otherwise we are assuming that MREMAP_FIXED targets an existing reservation, but like mmap() that's not necessarily the case (new_addr can null-derived).
/*
* This error scenario may not occur as address with valid
* capability should have been verified in the upstream
* syscall functions.
*/
return -ENOMEM;
+#ifdef CONFIG_CHERI_PURECAP_UABI
memcpy(&reserv_info, &vma->reserv_data, sizeof(reserv_info));
+#endif
reserv_ptr = &reserv_info;
ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); if (ret) goto out;}
@@ -948,7 +964,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, new_addr = ret; ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf,
uf_unmap);
uf_unmap, reserv_ptr);
out: return ret; @@ -1169,7 +1185,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len } ret = move_vma(vma, addr, old_len, new_len, new_addr,
&locked, flags, &uf, &uf_unmap);
}&locked, flags, &uf, &uf_unmap, NULL);
out: if (offset_in_page(ret)) diff --git a/mm/util.c b/mm/util.c index afd40ed9c3c8..077c9a2592a9 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,7 +540,7 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc) } EXPORT_SYMBOL_GPL(account_locked_vm); -user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, +user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long pgoff) { @@ -553,11 +553,7 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR;
/*
* TODO [PCuABI] - might need propagating uintcap further down
* to do_mmap to properly handle capabilities
*/
ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate,
mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf);ret = do_mmap(file, usrptr, len, prot, flag, 0, pgoff, &populate, &uf);
@@ -570,7 +566,8 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, return ret; } -unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr,
As discussed offline, I still think we're better off with all helpers (including vm_mmap_pgoff()) taking and returning unsigned long, with the exception of ksys_mmap_pgoff() as that's the syscall handler. It already does the capability checking, I think we could make it do all the capability-related operations, including creating a new capability if necessary.
With that approach, we do have the problem of deciding whether we are creating a new reservation or not. I think we can do this in mmap_region() by inspecting existing reservations, see comment above. In this way, we are completely separating the capability handling (ksys_mmap_pgoff()) from the reservation management (mmap_region(), get_unmapped_area()).
Kevin
unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset) { @@ -579,7 +576,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr, if (unlikely(offset_in_page(offset))) return -EINVAL;
- return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
- return vm_mmap_pgoff(file, usrptr, len, prot, flag, offset >> PAGE_SHIFT);
} EXPORT_SYMBOL(vm_mmap);
Use the recently introduced PCuABI reservation interfaces to add different parameter constraints for mmap/munmap syscall. The capability returned by mmap syscalls is now bounded and is same as the reservation range. These reservation checks do not affect the compat64 code path.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mmap.c | 41 ++++++++++++++++++++++++++++++++++++++--- mm/util.c | 3 --- 2 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3b072e822f99..fd25fa7c9cda 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1401,6 +1401,13 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_ptr, ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; + if (!IS_ERR_VALUE(addr)) { + if (new_reserv && reserv_is_supported(mm)) + user_ptr = make_user_ptr_owning(addr, len, + user_ptr_perms_from_prot(prot, + (flags & MAP_SHARED) ? false : true)); + return user_ptr; + }
return addr; } @@ -1410,7 +1417,9 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; - user_uintptr_t retval; + user_uintptr_t retval = -EINVAL; + ptraddr_t addr = (ptraddr_t)user_ptr; + VMA_ITERATOR(vmi, current->mm, addr);
if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); @@ -1442,6 +1451,26 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, if (IS_ERR(file)) return PTR_ERR(file); } + if (!reserv_is_supported(current->mm)) + goto skip_pcuabi_checks; + + if (user_ptr_is_valid((const void __user *)user_ptr)) { + if (!(flags & MAP_FIXED) || !check_user_ptr_owning(user_ptr, addr, len)) + goto out_fput; + if (!reserv_vmi_cap_within_reserv(&vmi, user_ptr, false)) { + retval = -ERESERVATION; + goto out_fput; + } + if (!reserv_vmi_range_mapped(&vmi, addr, len, false)) { + retval = -ENOMEM; + goto out_fput; + } + } else { + if (!user_ptr_is_same((const void __user *)user_ptr, + (const void __user *)(user_uintptr_t)addr)) + goto out_fput; + } +skip_pcuabi_checks:
retval = vm_mmap_pgoff(file, user_ptr, len, prot, flags, pgoff); out_fput: @@ -3120,9 +3149,15 @@ int vm_munmap(unsigned long start, size_t len) } EXPORT_SYMBOL(vm_munmap);
-SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) +SYSCALL_DEFINE2(munmap, user_uintptr_t, user_ptr, size_t, len) { - addr = untagged_addr(addr); + ptraddr_t addr = untagged_addr((ptraddr_t)user_ptr); + VMA_ITERATOR(vmi, current->mm, addr); + + if (!check_user_ptr_owning(user_ptr, addr, len)) + return -EINVAL; + if (!reserv_vmi_cap_within_reserv(&vmi, user_ptr, false)) + return -ERESERVATION; return __vm_munmap(addr, len, true); }
diff --git a/mm/util.c b/mm/util.c index 077c9a2592a9..b3e8175fefc2 100644 --- a/mm/util.c +++ b/mm/util.c @@ -559,9 +559,6 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t usrptr, userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); - /* TODO [PCuABI] - derive proper capability */ - if (!IS_ERR_VALUE(ret)) - ret = (user_uintptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); } return ret; }
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
Use the recently introduced PCuABI reservation interfaces to add different parameter constraints for mmap/munmap syscall. The capability returned by mmap syscalls is now bounded and is same as the reservation range. These reservation checks do not affect the compat64 code path.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
mm/mmap.c | 41 ++++++++++++++++++++++++++++++++++++++--- mm/util.c | 3 --- 2 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 3b072e822f99..fd25fa7c9cda 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1401,6 +1401,13 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_ptr, ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len;
- if (!IS_ERR_VALUE(addr)) {
if (new_reserv && reserv_is_supported(mm))
user_ptr = make_user_ptr_owning(addr, len,
user_ptr_perms_from_prot(prot,
(flags & MAP_SHARED) ? false : true));
return user_ptr;
- }
return addr; } @@ -1410,7 +1417,9 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, unsigned long fd, unsigned long pgoff) { struct file *file = NULL;
- user_uintptr_t retval;
- user_uintptr_t retval = -EINVAL;
- ptraddr_t addr = (ptraddr_t)user_ptr;
- VMA_ITERATOR(vmi, current->mm, addr);
if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); @@ -1442,6 +1451,26 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, if (IS_ERR(file)) return PTR_ERR(file); }
- if (!reserv_is_supported(current->mm))
goto skip_pcuabi_checks;
- if (user_ptr_is_valid((const void __user *)user_ptr)) {
We don't really need a separate tag check. We can directly use check_user_ptr_owning() as a "selector" (if it returns false, then we check that the capability is null-derived, like we do now in the else). This works because the error code is always -EINVAL when we don't like the capability metadata.
if (!(flags & MAP_FIXED) || !check_user_ptr_owning(user_ptr, addr, len))
goto out_fput;
if (!reserv_vmi_cap_within_reserv(&vmi, user_ptr, false)) {
retval = -ERESERVATION;
goto out_fput;
}
if (!reserv_vmi_range_mapped(&vmi, addr, len, false)) {
retval = -ENOMEM;
goto out_fput;
}
- } else {
Nit: could be an else if.
if (!user_ptr_is_same((const void __user *)user_ptr,
(const void __user *)(user_uintptr_t)addr))
goto out_fput;
- }
+skip_pcuabi_checks:
Maybe all this could be a separate function to avoid the extra goto? In fact, I suspect that function could be used by mremap too, as memap is functionally equivalent to mmap in terms of those capability/reservation checks.
Kevin
retval = vm_mmap_pgoff(file, user_ptr, len, prot, flags, pgoff); out_fput: @@ -3120,9 +3149,15 @@ int vm_munmap(unsigned long start, size_t len) } EXPORT_SYMBOL(vm_munmap); -SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) +SYSCALL_DEFINE2(munmap, user_uintptr_t, user_ptr, size_t, len) {
- addr = untagged_addr(addr);
- ptraddr_t addr = untagged_addr((ptraddr_t)user_ptr);
- VMA_ITERATOR(vmi, current->mm, addr);
- if (!check_user_ptr_owning(user_ptr, addr, len))
return -EINVAL;
- if (!reserv_vmi_cap_within_reserv(&vmi, user_ptr, false))
return __vm_munmap(addr, len, true);return -ERESERVATION;
} diff --git a/mm/util.c b/mm/util.c index 077c9a2592a9..b3e8175fefc2 100644 --- a/mm/util.c +++ b/mm/util.c @@ -559,9 +559,6 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t usrptr, userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate);
/* TODO [PCuABI] - derive proper capability */
if (!IS_ERR_VALUE(ret))
} return ret;ret = (user_uintptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret);
}
Use the recently introduced PCuABI reservation interfaces and add the relevant capability/reservation constraint checks on the different mremap syscall parameters.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mremap.c | 78 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 64 insertions(+), 14 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c index 70f4031df1f4..fb648147b5d4 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -25,6 +25,7 @@ #include <linux/uaccess.h> #include <linux/userfaultfd_k.h> #include <linux/mempolicy.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/cacheflush.h> #include <asm/tlb.h> @@ -865,17 +866,20 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, return vma; }
-static unsigned long mremap_to(unsigned long addr, unsigned long old_len, - unsigned long new_addr, unsigned long new_len, bool *locked, +static user_uintptr_t mremap_to(user_uintptr_t user_ptr, unsigned long old_len, + user_uintptr_t new_user_ptr, unsigned long new_len, bool *locked, unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - unsigned long ret = -EINVAL; + user_uintptr_t ret = -EINVAL; unsigned long map_flags = 0; struct reserv_struct reserv_info, *reserv_ptr = NULL; + ptraddr_t addr = (ptraddr_t)user_ptr; + ptraddr_t new_addr = (ptraddr_t)new_user_ptr; + unsigned long old_perm = 0;
if (offset_in_page(new_addr)) goto out; @@ -963,9 +967,18 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (!(flags & MREMAP_FIXED)) new_addr = ret;
+ old_perm = reserv_vma_reserv_perm(vma); ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap, reserv_ptr);
+ if (!IS_ERR_VALUE(ret)) { + if (reserv_is_supported(mm)) { + if (!(flags & MREMAP_FIXED)) + ret = make_user_ptr_owning(new_addr, new_len, old_perm); + else + ret = new_user_ptr; + } + } out: return ret; } @@ -987,6 +1000,9 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) 0, MAP_FIXED) & ~PAGE_MASK) return 0; } + if (!reserv_vma_range_within_reserv(vma, vma->vm_start, end - vma->vm_start)) + return 0; + return 1; }
@@ -997,19 +1013,22 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise * This option implies MREMAP_MAYMOVE. */ -SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len, +SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_ptr, unsigned long, old_len, unsigned long, new_len, unsigned long, flags, - user_uintptr_t, new_addr) + user_uintptr_t, new_user_ptr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; user_uintptr_t ret = -EINVAL; bool locked = false; struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; + ptraddr_t addr = (ptraddr_t)user_ptr; + ptraddr_t new_addr = (ptraddr_t)new_user_ptr; + unsigned long old_perm = 0; LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap); + VMA_ITERATOR(vmi, current->mm, new_addr);
- /* @TODO [PCuABI] - capability validation */ /* * There is a deliberate asymmetry here: we strip the pointer tag * from the old address but leave the new address alone. This is @@ -1022,6 +1041,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len * information. */ addr = untagged_addr(addr); + user_ptr = (user_uintptr_t)user_ptr_set_addr((void __user *)user_ptr, addr);
if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; @@ -1060,6 +1080,33 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
+ if (!reserv_is_supported(current->mm)) + goto skip_pcuabi_checks; + if (!check_user_ptr_owning(user_ptr, addr, old_len ? old_len : new_len)) + goto out; + if (!reserv_vma_cap_within_reserv(vma, user_ptr)) { + ret = -ERESERVATION; + goto out; + } + if (flags & MREMAP_FIXED) { + if (!check_user_ptr_owning(new_user_ptr, new_addr, new_len)) + goto out; + if (!reserv_vmi_cap_within_reserv(&vmi, new_user_ptr, true)) { + ret = -ERESERVATION; + goto out; + } + if (!reserv_vmi_range_mapped(&vmi, new_addr, new_len, true)) { + ret = -ENOMEM; + goto out; + } + } else { + if (!user_ptr_is_same((const void __user *)new_user_ptr, + (const void __user *)(user_uintptr_t)new_addr)) + goto out; + } + old_perm = reserv_vma_reserv_perm(vma); +skip_pcuabi_checks: + if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -1081,7 +1128,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len }
if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) { - ret = mremap_to(addr, old_len, new_addr, new_len, + ret = mremap_to(user_ptr, old_len, new_user_ptr, new_len, &locked, flags, &uf, &uf_unmap_early, &uf_unmap); goto out; @@ -1106,7 +1153,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len if (ret) goto out;
- ret = addr; + ret = user_ptr; goto out_unlocked; }
@@ -1160,7 +1207,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len locked = true; new_addr = addr; } - ret = addr; + ret = user_ptr; goto out; } } @@ -1184,8 +1231,14 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
- ret = move_vma(vma, addr, old_len, new_len, new_addr, + ret = move_vma(vma, user_ptr, old_len, new_len, new_addr, &locked, flags, &uf, &uf_unmap, NULL); + if (!IS_ERR_VALUE(ret)) { + if (reserv_is_supported(mm)) + ret = make_user_ptr_owning(new_addr, new_len, old_perm); + else + ret = (user_uintptr_t)new_addr; + } } out: if (offset_in_page(ret)) @@ -1197,8 +1250,5 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len userfaultfd_unmap_complete(mm, &uf_unmap_early); mremap_userfaultfd_complete(&uf, addr, ret, old_len); userfaultfd_unmap_complete(mm, &uf_unmap); - /* TODO [PCuABI] - derive proper capability */ - return IS_ERR_VALUE(ret) ? - ret : - (user_intptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); + return ret; }
Use the recently introduced PCuABI reservation interfaces and add the relevant capability/reservation constraint checks on the mprotect syscall parameters.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mprotect.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 4dffb34f62fd..c99f795b51b8 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -32,6 +32,7 @@ #include <linux/sched/sysctl.h> #include <linux/userfaultfd_k.h> #include <linux/memory-tiers.h> +#include <linux/cap_addr_mgmt.h> #include <asm/cacheflush.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> @@ -677,7 +678,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, /* * pkey==-1 when doing a legacy mprotect() */ -static int do_mprotect_pkey(user_uintptr_t start, size_t len, +static int do_mprotect_pkey(user_uintptr_t user_ptr, size_t len, unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; @@ -688,9 +689,7 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, (prot & PROT_READ); struct mmu_gather tlb; struct vma_iterator vmi; - - /* TODO [PCuABI] - capability checks for uaccess */ - start = untagged_addr(start); + unsigned long start = untagged_addr(user_ptr);
prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ @@ -704,6 +703,9 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, end = start + len; if (end <= start) return -ENOMEM; + + if (!check_user_ptr_owning(user_ptr, start, len)) + return -EINVAL; if (!arch_validate_prot(prot, start)) return -EINVAL;
@@ -761,6 +763,12 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, break; }
+ /* Check if the capability range is valid with mmap lock. */ + if (!reserv_vma_cap_within_reserv(vma, user_ptr)) { + error = -ERESERVATION; + break; + } + /* Does the application expect PROT_READ to imply PROT_EXEC */ if (rier && (vma->vm_flags & VM_MAYEXEC)) prot |= PROT_EXEC; @@ -825,18 +833,18 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, return error; }
-SYSCALL_DEFINE3(mprotect, user_uintptr_t, start, size_t, len, +SYSCALL_DEFINE3(mprotect, user_uintptr_t, user_ptr, size_t, len, unsigned long, prot) { - return do_mprotect_pkey(start, len, prot, -1); + return do_mprotect_pkey(user_ptr, len, prot, -1); }
#ifdef CONFIG_ARCH_HAS_PKEYS
-SYSCALL_DEFINE4(pkey_mprotect, user_uintptr_t, start, size_t, len, +SYSCALL_DEFINE4(pkey_mprotect, user_uintptr_t, user_ptr, size_t, len, unsigned long, prot, int, pkey) { - return do_mprotect_pkey(start, len, prot, pkey); + return do_mprotect_pkey(user_ptr, len, prot, pkey); }
SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
Use the recently introduced PCuABI reservation interfaces to verify the address range for madvise syscall.
do_madvise() function is used by virtual address monitoring daemon and this may not satisfy the reservation range criteria, so add a parameter to skip the reservation checks.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/mm.h | 3 ++- io_uring/advise.c | 2 +- mm/damon/vaddr.c | 2 +- mm/madvise.c | 26 +++++++++++++++++++++----- 4 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 44a55c3e2c06..f1c70f416eff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3375,7 +3375,8 @@ extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, bool unlock); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); +extern int do_madvise(struct mm_struct *mm, user_uintptr_t user_ptr, size_t len_in, + int behavior, bool reserv_ignore);
#ifdef CONFIG_MMU extern int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, diff --git a/io_uring/advise.c b/io_uring/advise.c index 952d9289a311..2e43142cf4df 100644 --- a/io_uring/advise.c +++ b/io_uring/advise.c @@ -55,7 +55,7 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags) WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
/* TODO [PCuABI] - capability checks for uaccess */ - ret = do_madvise(current->mm, user_ptr_addr(ma->addr), ma->len, ma->advice); + ret = do_madvise(current->mm, (user_uintptr_t)ma->addr, ma->len, ma->advice, false); io_req_set_res(req, ret, 0); return IOU_OK; #else diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index a4d1f63c5b23..3138da113117 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -643,7 +643,7 @@ static unsigned long damos_madvise(struct damon_target *target, if (!mm) return 0;
- applied = do_madvise(mm, start, len, behavior) ? 0 : len; + applied = do_madvise(mm, start, len, behavior, true) ? 0 : len; mmput(mm);
return applied; diff --git a/mm/madvise.c b/mm/madvise.c index d0c8e854636e..3bbb353f5f0b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -31,6 +31,7 @@ #include <linux/swapops.h> #include <linux/shmem_fs.h> #include <linux/mmu_notifier.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/tlb.h>
@@ -1394,13 +1395,16 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. */ -int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) +int do_madvise(struct mm_struct *mm, user_uintptr_t user_ptr, size_t len_in, + int behavior, bool reserv_ignore) { unsigned long end; int error; int write; size_t len; struct blk_plug plug; + unsigned long start = (ptraddr_t)user_ptr; + struct vma_iterator vmi;
if (!madvise_behavior_valid(behavior)) return -EINVAL; @@ -1433,14 +1437,26 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh mmap_read_lock(mm); }
- /* TODO [PCuABI] - capability checks for uaccess */ start = untagged_addr_remote(mm, start); end = start + len;
+ if (!reserv_ignore) { + vma_iter_init(&vmi, current->mm, start); + if (!check_user_ptr_owning(user_ptr, start, len)) { + error = -EINVAL; + goto out; + } + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_cap_within_reserv(&vmi, user_ptr, true)) { + error = -ERESERVATION; + goto out; + } + } blk_start_plug(&plug); error = madvise_walk_vmas(mm, start, end, behavior, madvise_vma_behavior); blk_finish_plug(&plug); +out: if (write) mmap_write_unlock(mm); else @@ -1449,9 +1465,9 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh return error; }
-SYSCALL_DEFINE3(madvise, user_uintptr_t, start, size_t, len_in, int, behavior) +SYSCALL_DEFINE3(madvise, user_uintptr_t, user_ptr, size_t, len_in, int, behavior) { - return do_madvise(current->mm, start, len_in, behavior); + return do_madvise(current->mm, user_ptr, len_in, behavior, false); }
SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, @@ -1506,7 +1522,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
while (iov_iter_count(&iter)) { ret = do_madvise(mm, user_ptr_addr(iter_iov_addr(&iter)), - iter_iov_len(&iter), behavior); + iter_iov_len(&iter), behavior, false); if (ret < 0) break; iov_iter_advance(&iter, iter_iov_len(&iter));
Use the recently introduced PCuABI reservation interfaces to verify the address range for mlock, mlock2, and munlock syscalls.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mlock.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 086546ac5766..ecc36a698843 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -25,6 +25,7 @@ #include <linux/memcontrol.h> #include <linux/mm_inline.h> #include <linux/secretmem.h> +#include <linux/cap_addr_mgmt.h>
#include "internal.h"
@@ -621,14 +622,16 @@ static int __mlock_posix_error_return(long retval) return retval; }
-static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) +static __must_check int do_mlock(user_uintptr_t user_ptr, size_t len, vm_flags_t flags) { unsigned long locked; unsigned long lock_limit; int error = -ENOMEM; + unsigned long start = untagged_addr(user_ptr); + struct vma_iterator vmi;
- start = untagged_addr(start); - + if (!check_user_ptr_owning(user_ptr, start, len)) + return -EINVAL; if (!can_do_mlock()) return -EPERM;
@@ -642,6 +645,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla if (mmap_write_lock_killable(current->mm)) return -EINTR;
+ vma_iter_init(&vmi, current->mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_cap_within_reserv(&vmi, user_ptr, true)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } locked += current->mm->locked_vm; if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) { /* @@ -668,12 +677,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla return 0; }
-SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(mlock, user_uintptr_t, user_ptr, size_t, len) { - return do_mlock(start, len, VM_LOCKED); + return do_mlock(user_ptr, len, VM_LOCKED); }
-SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(mlock2, user_uintptr_t, user_ptr, size_t, len, int, flags) { vm_flags_t vm_flags = VM_LOCKED;
@@ -683,20 +692,29 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) if (flags & MLOCK_ONFAULT) vm_flags |= VM_LOCKONFAULT;
- return do_mlock(start, len, vm_flags); + return do_mlock(user_ptr, len, vm_flags); }
-SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(munlock, user_uintptr_t, user_ptr, size_t, len) { int ret; + unsigned long start = untagged_addr(user_ptr); + struct vma_iterator vmi;
- start = untagged_addr(start); + if (!check_user_ptr_owning(user_ptr, start, len)) + return -EINVAL;
len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
if (mmap_write_lock_killable(current->mm)) return -EINTR; + vma_iter_init(&vmi, current->mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_cap_within_reserv(&vmi, user_ptr, true)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } ret = apply_vma_lock_flags(start, len, 0); mmap_write_unlock(current->mm);
Use the recently introduced PCuABI reservation interfaces to verify the address range for msync syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/msync.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/mm/msync.c b/mm/msync.c index ac4c9bfea2e7..14b1dfa06dde 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -14,6 +14,7 @@ #include <linux/file.h> #include <linux/syscalls.h> #include <linux/sched.h> +#include <linux/cap_addr_mgmt.h>
/* * MS_SYNC syncs the entire file - including mappings. @@ -29,16 +30,17 @@ * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ -SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(msync, user_uintptr_t, user_ptr, size_t, len, int, flags) { unsigned long end; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; int unmapped_error = 0; int error = -EINVAL; + unsigned long start = untagged_addr(user_ptr);
- start = untagged_addr(start); - + if (!check_user_ptr_owning(user_ptr, start, len)) + return -EINVAL; if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; if (offset_in_page(start)) @@ -61,6 +63,11 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) */ mmap_read_lock(mm); vma = find_vma(mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (vma && !reserv_vma_cap_within_reserv(vma, user_ptr)) { + error = -ERESERVATION; + goto out_unlock; + } for (;;) { struct file *file; loff_t fstart, fend;
MAP_GROWSDOWN flag is not supported by PCuABI specification. Hence, reject such requests with -EOPNOTSUPP error.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mmap.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index fd25fa7c9cda..c5ffa129ff72 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1453,6 +1453,15 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, } if (!reserv_is_supported(current->mm)) goto skip_pcuabi_checks; + /* + * Introduce checks for PCuABI: + * - MAP_GROWSDOWN flag has no fixed bounds and hence is not supported + * in the PCuABI reservation model. + */ + if (flags & MAP_GROWSDOWN) { + retval = -EOPNOTSUPP; + goto out_fput; + }
if (user_ptr_is_valid((const void __user *)user_ptr)) { if (!(flags & MAP_FIXED) || !check_user_ptr_owning(user_ptr, addr, len))
PCuABI specification introduces limitations in expanding the capability permissions through mprotect() system calls. This needs the capabilities to be initially created with maximum permissions, the memory mappings may possess in their lifetime.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/uapi/asm-generic/mman-common.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..e7ba511c2bad 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -17,6 +17,12 @@ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
+/* PCuABI mapping and capability permissions */ +#define _PROT_MAX_SHIFT 16 +#define PROT_MAX(prot) ((prot) << _PROT_MAX_SHIFT) +#define PROT_EXTRACT(prot) ((prot) & (PROT_READ | PROT_WRITE | PROT_EXEC)) +#define PROT_MAX_EXTRACT(prot) (((prot) >> _PROT_MAX_SHIFT) & (PROT_READ | PROT_WRITE | PROT_EXEC)) + /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */
Helper functions user_ptr_may_set_prot() and user_ptr_perms_from_prot() are added/modified to manage capability permissions in memory management syscalls as per PCuABI specifications.
Also, use arch-specific hook arch_user_ptr_perms_from_prot() to convert arch-specific mapping protection to capability permissions.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- Documentation/core-api/user_ptr.rst | 1 + arch/Kconfig | 3 ++ include/linux/user_ptr.h | 16 +++++++ lib/user_ptr.c | 66 ++++++++++++++++++++++++++--- 4 files changed, 81 insertions(+), 5 deletions(-)
diff --git a/Documentation/core-api/user_ptr.rst b/Documentation/core-api/user_ptr.rst index 627bcea2a07e..a85112a5ba7b 100644 --- a/Documentation/core-api/user_ptr.rst +++ b/Documentation/core-api/user_ptr.rst @@ -358,5 +358,6 @@ implementation, such as compat64 mode. * ``check_user_ptr_owning(ptr, addr, n)`` * ``make_user_ptr_owning(addr, n, perm)`` * ``user_ptr_perms_from_prot(prot, tag_perm)`` +* ``user_ptr_may_set_prot(ptr, prot)``
See ``<linux/user_ptr.h>`` for details on how to use them. diff --git a/arch/Kconfig b/arch/Kconfig index 19f7bbb20a41..161f7002b0ab 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1502,6 +1502,9 @@ config CHERI_PURECAP_UABI availability of CHERI capabilities at compile-time; the resulting kernel image will not boot on incompatible hardware.
+config HAVE_ARCH_USER_PTR_H + bool + source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig" diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 1eb59442b06e..993635f2eda1 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -142,6 +142,17 @@ user_uintptr_t make_user_ptr_owning(ptraddr_t addr, size_t len, user_ptr_perms_t */ user_ptr_perms_t user_ptr_perms_from_prot(int prot, bool has_tag_access);
+/** + * user_ptr_may_set_prot() - Verify if the mapping protection flags confirms + * with the capability permission flags. + * @user_ptr: User pointer. + * @prot: Memory protection flag. + * + * Return: True if the capability permissions includes the protection flags + * or false otherwise. + */ +bool user_ptr_may_set_prot(user_uintptr_t user_ptr, int prot); + #else /* CONFIG_CHERI_PURECAP_UABI */
#define user_ptr_perms_t int @@ -197,6 +208,11 @@ static inline user_ptr_perms_t user_ptr_perms_from_prot(int prot, bool has_tag_a return 0; }
+static inline bool user_ptr_may_set_prot(user_uintptr_t user_ptr, int prot) +{ + return true; +} + #endif /* CONFIG_CHERI_PURECAP_UABI */
/** diff --git a/lib/user_ptr.c b/lib/user_ptr.c index f597f73191bb..6cab8f8864d8 100644 --- a/lib/user_ptr.c +++ b/lib/user_ptr.c @@ -2,9 +2,14 @@ #include <linux/bug.h> #include <linux/cap_addr_mgmt.h> #include <linux/cheri.h> +#include <linux/mman.h> #include <linux/sched.h> #include <linux/user_ptr.h>
+#ifdef CONFIG_HAVE_ARCH_USER_PTR_H +#include <asm/user_ptr.h> +#endif + void __user *uaddr_to_user_ptr(ptraddr_t addr) { /* @@ -100,10 +105,61 @@ user_uintptr_t make_user_ptr_owning(ptraddr_t addr, size_t len, user_ptr_perms_t return cheri_address_set(user_ptr, addr); }
-user_ptr_perms_t user_ptr_perms_from_prot(int prot __maybe_unused, - bool has_tag_access __maybe_unused) +static bool mapping_may_have_prot_flag(int prot, int map_val) +{ + int prot_max = PROT_MAX_EXTRACT(prot); + + if (prot_max) + return !!(prot_max & map_val); + else + return !!(prot & map_val); +} + +#ifndef arch_user_ptr_perms_from_prot +static __always_inline user_ptr_perms_t arch_user_ptr_perms_from_prot(int prot, bool has_tag_access) { - /* TODO [PCuABI] - capability permission conversion from memory permission */ - return (CHERI_PERMS_READ | CHERI_PERMS_WRITE | - CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP); + return 0; +} +#define arch_user_ptr_perms_from_prot arch_user_ptr_perms_from_prot +#endif /* arch_user_ptr_perms_from_prot */ + +user_ptr_perms_t user_ptr_perms_from_prot(int prot, bool has_tag_access) +{ + user_ptr_perms_t perms = 0; + + if (!reserv_is_supported(current->mm)) + return perms; + if (mapping_may_have_prot_flag(prot, PROT_READ)) { + perms |= CHERI_PERM_LOAD; + if (has_tag_access) + perms |= CHERI_PERM_LOAD_CAP; + } + if (mapping_may_have_prot_flag(prot, PROT_WRITE)) { + perms |= CHERI_PERM_STORE; + if (has_tag_access) + perms |= (CHERI_PERM_STORE_CAP | CHERI_PERM_STORE_LOCAL_CAP); + } + if (mapping_may_have_prot_flag(prot, PROT_EXEC)) + perms |= CHERI_PERM_EXECUTE; + + /* Fetch any extra architecture specific permissions */ + perms |= arch_user_ptr_perms_from_prot(PROT_MAX_EXTRACT(prot) ? + PROT_MAX_EXTRACT(prot) : prot, has_tag_access); + perms |= CHERI_PERMS_ROOTCAP; + + return perms; +} + +bool user_ptr_may_set_prot(user_uintptr_t user_ptr, int prot) +{ + user_ptr_perms_t perms = cheri_perms_get(user_ptr); + + if (!reserv_is_supported(current->mm)) + return true; + if (((prot & PROT_READ) && !(perms & CHERI_PERM_LOAD)) || + ((prot & PROT_WRITE) && !(perms & CHERI_PERM_STORE)) || + ((prot & PROT_EXEC) && !(perms & CHERI_PERM_EXECUTE))) + return false; + + return true; }
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
+user_ptr_perms_t user_ptr_perms_from_prot(int prot, bool has_tag_access) +{
- user_ptr_perms_t perms = 0;
- if (!reserv_is_supported(current->mm))
return perms;
- if (mapping_may_have_prot_flag(prot, PROT_READ)) {
perms |= CHERI_PERM_LOAD;
if (has_tag_access)
perms |= CHERI_PERM_LOAD_CAP;
- }
- if (mapping_may_have_prot_flag(prot, PROT_WRITE)) {
perms |= CHERI_PERM_STORE;
if (has_tag_access)
perms |= (CHERI_PERM_STORE_CAP | CHERI_PERM_STORE_LOCAL_CAP);
- }
- if (mapping_may_have_prot_flag(prot, PROT_EXEC))
perms |= CHERI_PERM_EXECUTE;
- /* Fetch any extra architecture specific permissions */
- perms |= arch_user_ptr_perms_from_prot(PROT_MAX_EXTRACT(prot) ?
PROT_MAX_EXTRACT(prot) : prot, has_tag_access);
I got confused for a moment looking at arch_user_ptr_perms_from_prot(), I thought PROT_MAX() was not being taken into account. I think doing it this way is fine, but then we might as well do the same in this function: calculate a local prot value first and then directly test PROT_{READ,WRITE,EXEC} in it, without the mapping_may_have_prot_flag() helper. This way it will look more similar to the arch_ helper and there is less risk of confusion.
Kevin
- perms |= CHERI_PERMS_ROOTCAP;
- return perms;
+}
Add arm64 morello specific hook for arch_user_ptr_perms_from_prot() to convert arch specific memory mapping permissions to capability permissions.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/user_ptr.h | 33 +++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+) create mode 100644 arch/arm64/include/asm/user_ptr.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 83a5817afa7d..fbf4ed6c6b5b 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -31,6 +31,7 @@ config ARM64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV select HAVE_ARCH_CHERI_H + select HAVE_ARCH_USER_PTR_H select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/include/asm/user_ptr.h b/arch/arm64/include/asm/user_ptr.h new file mode 100644 index 000000000000..dbe34885f1da --- /dev/null +++ b/arch/arm64/include/asm/user_ptr.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_USER_PTR_H +#define __ASM_USER_PTR_H + +#include <linux/cheri.h> +#include <linux/mman.h> +#include <linux/sched/task_stack.h> +#include <asm/processor.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI + +static __always_inline cheri_perms_t arch_user_ptr_perms_from_prot(int prot, bool has_tag_access) +{ + struct pt_regs *regs = task_pt_regs(current); + cheri_perms_t perms = 0; + + if ((prot & PROT_READ) && has_tag_access) + perms |= ARM_CAP_PERMISSION_MUTABLE_LOAD; + + if (prot & PROT_EXEC) { + if (cheri_perms_get(regs->pcc) & CHERI_PERM_SYSTEM_REGS) + perms |= CHERI_PERM_SYSTEM_REGS; + if (cheri_perms_get(regs->pcc) & ARM_CAP_PERMISSION_EXECUTIVE) + perms |= ARM_CAP_PERMISSION_EXECUTIVE; + } + + return perms; +} +#define arch_user_ptr_perms_from_prot arch_user_ptr_perms_from_prot + +#endif /* CONFIG_CHERI_PURECAP_UABI */ + +#endif /* __ASM_USER_PTR_H */
Add a check that the requested protection bits do not exceed the maximum protection bits.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index c5ffa129ff72..464fa2d7b748 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1457,11 +1457,17 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, * Introduce checks for PCuABI: * - MAP_GROWSDOWN flag has no fixed bounds and hence is not supported * in the PCuABI reservation model. + * - PCuABI reservation model introduces the concept of maximum + * protection the mappings can have. Add a check to make sure the + * requested protection does not exceed the maximum protection. */ if (flags & MAP_GROWSDOWN) { retval = -EOPNOTSUPP; goto out_fput; } + if ((PROT_MAX_EXTRACT(prot) != 0) && + ((PROT_EXTRACT(prot) & PROT_MAX_EXTRACT(prot)) != PROT_EXTRACT(prot))) + goto out_fput;
if (user_ptr_is_valid((const void __user *)user_ptr)) { if (!(flags & MAP_FIXED) || !check_user_ptr_owning(user_ptr, addr, len))
Check that the permission of the new user address does not exceed the permission of old user address for mremap syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mremap.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/mremap.c b/mm/mremap.c index fb648147b5d4..e3b67f55f2d8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1091,6 +1091,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_ptr, unsigned long, old if (flags & MREMAP_FIXED) { if (!check_user_ptr_owning(new_user_ptr, new_addr, new_len)) goto out; + if ((cheri_perms_get(user_ptr) | cheri_perms_get(new_user_ptr)) + != cheri_perms_get(user_ptr)) + goto out; if (!reserv_vmi_cap_within_reserv(&vmi, new_user_ptr, true)) { ret = -ERESERVATION; goto out;
On 11/03/2024 10:28, Amit Daniel Kachhap wrote:
Check that the permission of the new user address does not exceed the permission of old user address for mremap syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
mm/mremap.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/mremap.c b/mm/mremap.c index fb648147b5d4..e3b67f55f2d8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1091,6 +1091,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_ptr, unsigned long, old if (flags & MREMAP_FIXED) { if (!check_user_ptr_owning(new_user_ptr, new_addr, new_len)) goto out;
if ((cheri_perms_get(user_ptr) | cheri_perms_get(new_user_ptr))
!= cheri_perms_get(user_ptr))
That won't build in !CHERI. This test is pretty specific to mremap so it probably doesn't make sense to add a generic helper. We could just add a helper in this file with the appropriate #ifdef'ing (to avoid adding #ifdef's in the main function body).
Kevin
if (!reserv_vmi_cap_within_reserv(&vmi, new_user_ptr, true)) { ret = -ERESERVATION; goto out;goto out;
Check that the requested permission matches the constraints of input user capability address for mprotect syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mprotect.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mprotect.c b/mm/mprotect.c index c99f795b51b8..1e9a6b5442e8 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -706,6 +706,8 @@ static int do_mprotect_pkey(user_uintptr_t user_ptr, size_t len,
if (!check_user_ptr_owning(user_ptr, start, len)) return -EINVAL; + if (!user_ptr_may_set_prot(user_ptr, prot)) + return -EINVAL; if (!arch_validate_prot(prot, start)) return -EINVAL;
Different capability permission and bound constraints are added as per PCuABI specification for mincore() syscall. mincore() does not need VMem permission and any RWX memory permission, so the standard check_user_ptr_owning() interface is not used, and permissions are verified explicitly.
Also, as mincore() allows the address range to not span whole pages, so checking only a single byte at the page intersection is sufficient.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mincore.c | 46 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 41 insertions(+), 5 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c index dd164cb84ba8..23156caa01f2 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -19,6 +19,7 @@ #include <linux/hugetlb.h> #include <linux/pgtable.h>
+#include <linux/cap_addr_mgmt.h> #include <linux/uaccess.h> #include "swap.h"
@@ -184,15 +185,19 @@ static const struct mm_walk_ops mincore_walk_ops = { * all the arguments, we hold the mmap semaphore: we should * just return the amount of info we're asked for. */ -static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec) +static long do_mincore(user_uintptr_t user_ptr, unsigned long pages, unsigned char *vec) { struct vm_area_struct *vma; unsigned long end; + unsigned long addr = (ptraddr_t)user_ptr; int err;
vma = vma_lookup(current->mm, addr); if (!vma) return -ENOMEM; + /* Check if the capability range is valid with mmap lock. */ + if (!reserv_vma_cap_within_reserv(vma, user_ptr)) + return -ERESERVATION; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); if (!can_do_mincore(vma)) { unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE); @@ -229,14 +234,16 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v * mapped * -EAGAIN - A kernel resource was temporarily unavailable. */ -SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, +SYSCALL_DEFINE3(mincore, user_uintptr_t, user_ptr, size_t, len, unsigned char __user *, vec) { long retval; unsigned long pages; unsigned char *tmp; - - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)user_ptr); +#ifdef CONFIG_CHERI_PURECAP_UABI + unsigned long cap_start, cap_len; +#endif
/* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) @@ -253,6 +260,35 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, if (!access_ok(vec, pages)) return -EFAULT;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (!reserv_is_supported(current->mm)) + goto skip_pcuabi_checks; + /* + * mincore syscall does not need VMem permission so as to allow ordinary pages. + * Also at least one of the standard memory permissions RWX will help to reject + * non memory capabilities. + */ + user_ptr = cheri_address_set(user_ptr, start); + if (cheri_is_invalid(user_ptr) || cheri_is_sealed(user_ptr) || + !(CHERI_PERM_GLOBAL & cheri_perms_get(user_ptr)) || + !((CHERI_PERM_LOAD | CHERI_PERM_STORE | CHERI_PERM_EXECUTE) + & cheri_perms_get(user_ptr))) + return -EINVAL; + /* + * mincore syscall can be invoked as: + * mincore(align_down(p, PAGE_SIZE), sz + (p.addr % PAGE_SIZE), vec) + * Hence, the capability might not consider the increased range due to + * alignment. In this scenario, check only the single byte at the page + * intersection. + */ + cap_start = cheri_base_get(user_ptr); + cap_len = cheri_length_get(user_ptr); + if ((start + PAGE_SIZE <= cap_start) || + (cap_start + cap_len < start + len - offset_in_page(len))) + return -EINVAL; +skip_pcuabi_checks: +#endif + tmp = (void *) __get_free_page(GFP_USER); if (!tmp) return -EAGAIN; @@ -264,7 +300,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, * the temporary buffer size. */ mmap_read_lock(current->mm); - retval = do_mincore(start, min(pages, PAGE_SIZE), tmp); + retval = do_mincore(user_ptr, min(pages, PAGE_SIZE), tmp); mmap_read_unlock(current->mm);
if (retval <= 0)
Use the recently introduced PCuABI reservation interfaces to create the appropriate bounded capability for executable/interpreter load segments.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- fs/binfmt_elf.c | 100 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 72 insertions(+), 28 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index c10ba610be50..1adf5789668a 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -48,6 +48,7 @@ #include <linux/uaccess.h> #include <linux/rseq.h> #include <linux/cheri.h> +#include <linux/cap_addr_mgmt.h> #include <asm/param.h> #include <asm/page.h>
@@ -119,15 +120,14 @@ static struct linux_binfmt elf_format = { * p_filesz when it ends before the page ends (e.g. bss), otherwise this * memory will contain the junk from the file that should not be present. */ -static int padzero(unsigned long address) +static int padzero(user_uintptr_t user_ptr) { unsigned long nbyte;
- nbyte = ELF_PAGEOFFSET(address); + nbyte = ELF_PAGEOFFSET((ptraddr_t)user_ptr); if (nbyte) { nbyte = ELF_MIN_ALIGN - nbyte; - if (clear_user(make_user_ptr_for_write_uaccess(address, nbyte), - nbyte)) + if (clear_user((void __user *)user_ptr, nbyte)) return -EFAULT; } return 0; @@ -163,6 +163,7 @@ struct elf_load_info { unsigned long end_elf_rx; unsigned long start_elf_rw; unsigned long end_elf_rw; + user_uintptr_t user_ptr_elf; };
static int @@ -298,22 +299,23 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, NEW_AUX_ENT(AT_RSEQ_ALIGN, __alignof__(struct rseq)); #endif #if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) - /* - * TODO [PCuABI] - Restrict bounds/perms for AT_CHERI_* entries - */ NEW_AUX_ENT(AT_CHERI_EXEC_RW_CAP, (exec_load_info->start_elf_rw != ~0UL ? - elf_uaddr_to_user_ptr(exec_load_info->start_elf_rw) : + (void __user *)cheri_address_set(exec_load_info->user_ptr_elf, + exec_load_info->start_elf_rw) : NULL)); NEW_AUX_ENT(AT_CHERI_EXEC_RX_CAP, - elf_uaddr_to_user_ptr(exec_load_info->start_elf_rx)); + (void __user *)cheri_address_set(exec_load_info->user_ptr_elf, + exec_load_info->start_elf_rx)); NEW_AUX_ENT(AT_CHERI_INTERP_RW_CAP, ((interp_load_addr && interp_load_info->start_elf_rw != ~0UL) ? - elf_uaddr_to_user_ptr(interp_load_info->start_elf_rw) : + (void __user *)cheri_address_set(interp_load_info->user_ptr_elf, + interp_load_info->start_elf_rw) : NULL)); NEW_AUX_ENT(AT_CHERI_INTERP_RX_CAP, (interp_load_addr ? - elf_uaddr_to_user_ptr(interp_load_info->start_elf_rx) : + (void __user *)cheri_address_set(interp_load_info->user_ptr_elf, + interp_load_info->start_elf_rx) : NULL)); NEW_AUX_ENT(AT_CHERI_STACK_CAP, elf_uaddr_to_user_ptr(0)); NEW_AUX_ENT(AT_CHERI_SEAL_CAP, cheri_user_root_seal_cap); @@ -420,14 +422,14 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, * into memory at "addr". (Note that p_filesz is rounded up to the * next page, so any extra bytes from the file must be wiped.) */ -static unsigned long elf_map(struct file *filep, unsigned long addr, +static unsigned long elf_map(struct file *filep, user_uintptr_t user_ptr, const struct elf_phdr *eppnt, int prot, int type, unsigned long total_size) { unsigned long map_addr; unsigned long size = eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr); unsigned long off = eppnt->p_offset - ELF_PAGEOFFSET(eppnt->p_vaddr); - addr = ELF_PAGESTART(addr); + unsigned long addr = ELF_PAGESTART((ptraddr_t)user_ptr); size = ELF_PAGEALIGN(size);
/* mmap() will return -EINVAL if given a zero size, but a @@ -435,6 +437,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, if (!size) return addr;
+ if (reserv_is_supported(current->mm)) + user_ptr = (user_uintptr_t)user_ptr_set_addr((void __user *)user_ptr, addr); + else + user_ptr = addr; /* * total_size is the size of the ELF (interpreter) image. * The _first_ mmap needs to know the full size, otherwise @@ -445,11 +451,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, */ if (total_size) { total_size = ELF_PAGEALIGN(total_size); - map_addr = vm_mmap(filep, addr, total_size, prot, type, off); - if (!BAD_ADDR(map_addr)) + map_addr = (ptraddr_t)vm_mmap(filep, user_ptr, total_size, prot, type, off); + if (!reserv_is_supported(current->mm) && !BAD_ADDR(map_addr)) vm_munmap(map_addr+size, total_size-size); } else - map_addr = vm_mmap(filep, addr, size, prot, type, off); + map_addr = (ptraddr_t)vm_mmap(filep, user_ptr, size, prot, type, off);
if ((type & MAP_FIXED_NOREPLACE) && PTR_ERR((void *)map_addr) == -EEXIST) @@ -464,28 +470,44 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, * into memory at "addr". Memory from "p_filesz" through "p_memsz" * rounded up to the next page is zeroed. */ -static unsigned long elf_load(struct file *filep, unsigned long addr, - const struct elf_phdr *eppnt, int prot, int type, - unsigned long total_size) +static unsigned long elf_load(struct elf_load_info *load_info, struct file *filep, + unsigned long addr, const struct elf_phdr *eppnt, + int prot, int type, unsigned long total_size) { unsigned long zero_start, zero_end; unsigned long map_addr; + user_uintptr_t map_user_ptr;
+ if (reserv_is_supported(current->mm) && !total_size) + map_user_ptr = (user_uintptr_t)user_ptr_set_addr((void __user *)load_info->user_ptr_elf, addr); + else + map_user_ptr = addr; if (eppnt->p_filesz) { - map_addr = elf_map(filep, addr, eppnt, prot, type, total_size); + map_addr = elf_map(filep, map_user_ptr, eppnt, prot, type, total_size); if (BAD_ADDR(map_addr)) return map_addr; + if (reserv_is_supported(current->mm) && total_size) { + load_info->user_ptr_elf = + reserv_range_set_reserv(map_addr, ELF_PAGEALIGN(total_size), + user_ptr_perms_from_prot(PROT_READ | PROT_WRITE | PROT_EXEC, + true), false); + if (IS_ERR_VALUE(load_info->user_ptr_elf)) + return (long)load_info->user_ptr_elf; + } if (eppnt->p_memsz > eppnt->p_filesz) { zero_start = map_addr + ELF_PAGEOFFSET(eppnt->p_vaddr) + eppnt->p_filesz; zero_end = map_addr + ELF_PAGEOFFSET(eppnt->p_vaddr) + eppnt->p_memsz; - + map_user_ptr = zero_start; + if (reserv_is_supported(current->mm)) + map_user_ptr = (user_uintptr_t)user_ptr_set_addr((void __user *)load_info->user_ptr_elf, + zero_start); /* * Zero the end of the last mapped page but ignore * any errors if the segment isn't writable. */ - if (padzero(zero_start) && (prot & PROT_WRITE)) + if (padzero(map_user_ptr) && (prot & PROT_WRITE)) return -EFAULT; } } else { @@ -499,15 +521,24 @@ static unsigned long elf_load(struct file *filep, unsigned long addr, * If the header is requesting these pages to be * executable, honour that (ppc32 needs this). */ - int error;
zero_start = ELF_PAGEALIGN(zero_start); zero_end = ELF_PAGEALIGN(zero_end);
- error = vm_brk_flags(zero_start, zero_end - zero_start, + if (!reserv_is_supported(current->mm)) + return vm_brk_flags(zero_start, zero_end - zero_start, prot & PROT_EXEC ? VM_EXEC : 0); - if (error) - map_addr = error; + + if (zero_end <= zero_start) + return map_addr; + map_user_ptr = (user_uintptr_t)user_ptr_set_addr((void __user *)load_info->user_ptr_elf, + zero_start); + map_addr = vm_mmap(0, map_user_ptr, zero_end - zero_start, prot, + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, 0); + if (BAD_ADDR(map_addr)) + return (int)map_addr; + if (padzero(map_user_ptr)) + map_addr = -EFAULT; } return map_addr; } @@ -745,7 +776,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex, else if (no_base && interp_elf_ex->e_type == ET_DYN) load_addr = -vaddr;
- map_addr = elf_load(interpreter, load_addr + vaddr, + map_addr = elf_load(load_info, interpreter, load_addr + vaddr, eppnt, elf_prot, elf_type, total_size); total_size = 0; error = map_addr; @@ -1090,6 +1121,11 @@ static int load_elf_binary(struct linux_binprm *bprm)
setup_new_exec(bprm);
+#if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) + set_bit(MMF_PCUABI_RESERV, ¤t->mm->flags); +#else + clear_bit(MMF_PCUABI_RESERV, ¤t->mm->flags); +#endif /* Do this so that we can load the interpreter, if need be. We will change some of these later */ retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP), @@ -1217,7 +1253,15 @@ static int load_elf_binary(struct linux_binprm *bprm) } }
- error = elf_load(bprm->file, load_bias + vaddr, elf_ppnt, + if (reserv_is_supported(current->mm) && first_pt_load && !total_size) { + total_size = total_mapping_size(elf_phdata, elf_ex->e_phnum); + if (!total_size) { + retval = -EINVAL; + goto out_free_dentry; + } + } + + error = elf_load(&exec_load_info, bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, total_size); if (BAD_ADDR(error)) { retval = IS_ERR_VALUE(error) ?
linux-morello@op-lists.linaro.org