Hi All,
This patch series introduces the mm reservation interface to manage the owning capability of the allocated addresses. This series adds reservation details in the VMA structure and different capability/reservation constraint checks. Looking for feedback regarding API names, directory structure etc.
Details about several rules implemented can be found in PCuABI spec here [1].
Changes in this v1 as compared with RFC v2:
1) Added mm specific purecap flag instead of vma specific.
2) Enabled vma merge/expansion with reservation limit checks.
3) Modified get_unmapped_area() to add overlap checks for fixed address.
4) Some code fixes and cleanups as suggested by Kevin.
Future works:
1) Users of vm_mmap/vm_munmap() i.e. filesystems, loaders, vdso, exec stack to be modified to preserve capability addresses. Some of these work in progress are hosted in [3] for reference. 2) Cover remaining memory addressing syscalls.
Testing:
1) All tests by Chaitanya in v3 selftests [2] passes. 2) Purecap/Compat Busybox boot passes after adding [WIP] patches present in [3].
The whole series can be found here [3].
[1]: https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca... [2]: https://git.morello-project.org/chaitanya_prakash/linux.git review/purecap_mmap_testcases_v3 [3]: https://git.morello-project.org/amitdaniel/linux.git review/purecap_mm_reservation_v1
Thanks, Amit Daniel
Amit Daniel Kachhap (21): uapi: errno.h: Introduce PCuABI memory reservation error mm: Add MMF_PCUABI_RESERVE mm flag mm: Add capability reservation interfaces in vma for PCuABI mm/cap_addr_mgmt: Add capability constraint helpers for PCuABI mm,fs: Use address as user_uintptr_t in generic get_unmapped_area() mm/mmap: Modify unmapped address space management code for PCuABI mm: Add PCuABI reservation details during vma operation mm/mmap: Add reservation constraints in mmap/munmap parameters mm/mremap: Add reservation constraints in mremap syscall mm/mprotect: Add the PCuABI reservation constraints mm/madvise: Add the PCuABI reservation constraints mm/mlock: Add the PCuABI reservation constraints mm/msync: Add the PCuABI reservation constraints uapi: mman-common.h: Macros for maximum capability permissions mm/cap_addr_mgmt: Add capability permission helpers for PCuABI mm/cap_addr_mgmt: Reduce the maximum protection check impact mm/mmap: Disable MAP_GROWSDOWN mapping flag for PCuABI mm/mmap: Add capability permission constraints for PCuABI mm/mremap: Add capability permission constraints for PCuABI mm/mprotect: Add capability permission constraints for PCuABI mm/mincore: Add PCuABI reservation/capability constraints
arch/arm64/include/asm/cap_addr_mgmt.h | 22 ++ drivers/char/mem.c | 2 +- fs/hugetlbfs/inode.c | 2 +- fs/proc/inode.c | 4 +- fs/ramfs/file-mmu.c | 2 +- include/linux/cap_addr_mgmt.h | 268 +++++++++++++++++++++++++ include/linux/cheri.h | 3 + include/linux/fs.h | 2 +- include/linux/huge_mm.h | 2 +- include/linux/mm.h | 39 +++- include/linux/mm_types.h | 7 +- include/linux/proc_fs.h | 2 +- include/linux/sched/coredump.h | 2 + include/linux/sched/mm.h | 8 +- include/linux/shmem_fs.h | 2 +- include/uapi/asm-generic/errno.h | 2 + include/uapi/asm-generic/mman-common.h | 6 + io_uring/advise.c | 2 +- io_uring/io_uring.c | 2 +- ipc/shm.c | 2 +- kernel/fork.c | 3 + mm/Makefile | 2 +- mm/cap_addr_mgmt.c | 261 ++++++++++++++++++++++++ mm/damon/vaddr.c | 2 +- mm/huge_memory.c | 2 +- mm/madvise.c | 29 ++- mm/mincore.c | 46 ++++- mm/mlock.c | 38 +++- mm/mmap.c | 250 ++++++++++++++++++++--- mm/mprotect.c | 29 ++- mm/mremap.c | 96 +++++++-- mm/msync.c | 16 +- mm/shmem.c | 4 +- mm/util.c | 12 +- 34 files changed, 1056 insertions(+), 115 deletions(-) create mode 100644 arch/arm64/include/asm/cap_addr_mgmt.h create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 mm/cap_addr_mgmt.c
PCuABI specification introduces this error and is used to denote any error during managing memory reservations.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/uapi/asm-generic/errno.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h index cf9c51ac49f9..4589a3165fe1 100644 --- a/include/uapi/asm-generic/errno.h +++ b/include/uapi/asm-generic/errno.h @@ -120,4 +120,6 @@
#define EHWPOISON 133 /* Memory page has hardware error */
+#define ERESERVATION 192 /* PCuABI memory reservation error */ + #endif
PCuABI specification introduces memory reservation so add a flag MMF_PCUABI_RESERVE to represent such memory mappings. As memory reservations are mm specific, this flag will help to differentiate between purecap and compat process memory mappings.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/sched/coredump.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 0ee96ea7a0e9..fa5b5ae5f67c 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -91,4 +91,6 @@ static inline int get_dumpable(struct mm_struct *mm) MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
#define MMF_VM_MERGE_ANY 29 + +#define MMF_PCUABI_RESERVE 30 /* PCuABI memory reservation feature */ #endif /* _LINUX_SCHED_COREDUMP_H */
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
PCuABI specification introduces memory reservation so add a flag MMF_PCUABI_RESERVE to represent such memory mappings. As memory reservations are mm specific, this flag will help to differentiate between purecap and compat process memory mappings.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/sched/coredump.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 0ee96ea7a0e9..fa5b5ae5f67c 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -91,4 +91,6 @@ static inline int get_dumpable(struct mm_struct *mm) MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK) #define MMF_VM_MERGE_ANY 29
+#define MMF_PCUABI_RESERVE 30 /* PCuABI memory reservation feature */
Reservations are abbreviated as "reserv" everywhere else AFAICT, so probably MMF_PCUABI_RESERV without the E?
Kevin
#endif /* _LINUX_SCHED_COREDUMP_H */
PCuABI needs the address space reservation interfaces to manage the owning capability of the allocated addresses. This interface prevents two unrelated owning capabilities created by the kernel to overlap.
The reservation interface stores the ranges of different virtual addresses as reservation entries which is same as the bound of the capability provided by the kernel to userspace. It also stores the owning capability permissions to manage the future syscall requests for updating permissions.
Few basic rules are followed by the reservation interfaces:
- Reservations can only be created or destroyed and they are never expanded or shrunk. Reservations are created when new memory mapping is made outside of an existing reservation. - A single reservation can have many mappings. However unused region of the reservation cannot be re-used again. - Reservations start address is aligned to CHERI representable base. - Reservations length value is aligned to CHERI representable length.
More rules about the address space reservation interface can be found in the PCuABI specification.
This commit introduces API's reserv_vma_set_reserv(), reserv_range_set_reserv(), reserv_vmi_range_mapped(), reserv_vmi_valid_capability(), reserv_vma_valid_capability(), reserv_vma_valid_address(), reserv_vma_same_reserv() and reserv_vma_is_supported(). Here except reserv_range_set_reserv(), all other involves single vma. All the above interfaces will be used in different memory management syscalls in subsequent patches.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/cap_addr_mgmt.h | 196 ++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 5 + mm/Makefile | 2 +- mm/cap_addr_mgmt.c | 176 ++++++++++++++++++++++++++++++ 4 files changed, 378 insertions(+), 1 deletion(-) create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 mm/cap_addr_mgmt.c
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h new file mode 100644 index 000000000000..2f296f02c3ff --- /dev/null +++ b/include/linux/cap_addr_mgmt.h @@ -0,0 +1,196 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_CAP_ADDR_MGMT_H +#define _LINUX_CAP_ADDR_MGMT_H + +#include <linux/cheri.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/mm_types.h> +#include <linux/sched/coredump.h> +#include <linux/types.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI +#define CHERI_REPRESENTABLE_ALIGNMENT(len) \ + (~cheri_representable_alignment_mask(len) + 1) + +#define CHERI_REPRESENTABLE_BASE(base, len) \ + ((base) & cheri_representable_alignment_mask(len)) + +/** + * reserv_vma_set_reserv() - Sets the reservation details in the VMA for the + * virtual address range from start to (start + len) with perm permission as + * the entry. The start address are stored as CHERI representable base and the + * length as CHERI representable length. They are expected to not interfere + * with the successive vma. This function should be called with mmap_lock + * held. + * @vma: VMA pointer to insert the reservation entry. + * @start: Reservation start value. + * @len: Reservation length. + * @perm: Capability permission for the reserved range. + * + * Return: 0 if reservation entry added successfully or -ERESERVATION + * otherwise. + */ +int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, cheri_perms_t perm); + +/** + * reserv_range_set_reserv() - Sets the reservation details across the VMA's + * for the virtual address range from start to (start + len) with the perm + * permission as the entry. The start address is expected to be CHERI + * representable base and the length to be CHERI representable length. + * This function internally uses mmap_lock to synchronize the vma updates + * if mmap_lock not already held. + * @start: Reservation start value. + * @len: Reservation length. + * @perm: Capability permission for the reserved range. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: valid capability with bounded range and requested permission or + * negative errorcode otherwise. + */ +user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, + cheri_perms_t perm, bool locked); + +/** + * reserv_vmi_range_mapped_unlocked() - Searches the reservation interface for + * the virtual address range from start to (start + len). This is useful to + * find if the requested range maps completely and there is no fragmentation. + * This function internally uses mmap_lock to synchronize the vma updates + * if mmap_lock not already held. + * @vmi: The vma iterator pointing at the vma. + * @start: Virtual address start value. + * @len: Virtual address length. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: 0 if the vma mapping matches fully with the given range or negative + * errorcode otherwise. + */ +int reserv_vmi_range_mapped(struct vma_iterator *vmi, ptraddr_t start, + size_t len, bool locked); + +/** + * reserv_vmi_valid_capability() - Search and matches the reservation interface + * for the capability bound values falling within the reserved virtual address + * range. This function internally uses mmap_lock to synchronize the vma + * updates if mmap_lock not already held. + * @vmi: The vma iterator pointing at the vma. + * @cap: Reservation capability value. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: True if the input capability bound values within the reserved virtual + * address range or false otherwise. + */ +bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap, + bool locked); + +/** + * reserv_vma_valid_capability() - Search and matches the input vma for the + * capability bound values falling within the reserved virtual address range. + * This function should be called with mmap_lock held. + * @vma: The vma pointer. + * @cap: Reservation capability value. + * + * Return: True if the input capability bound values within the reserved virtual + * address range or false otherwise. + */ +bool reserv_vma_valid_capability(struct vm_area_struct *vma, user_uintptr_t cap); + +/** + * reserv_vma_valid_address() - Search and matches the input vma for the input + * address range falling within the reserved virtual address range. This + * function should be called with mmap_lock held. + * @vma: The vma pointer. + * @start: Virtual address start value. + * @len: Virtual address length. + * + * Return: True if the input address range within the reserved virtual address + * range or false otherwise. + */ +bool reserv_vma_valid_address(struct vm_area_struct *vma, ptraddr_t start, size_t len); + +/** + * reserv_vma_same_reserv() - Compares the reservation properties between + * two vma's. This function should be called with mmap_lock held. + * @vma1: The first vma pointer. + * @vma2: The second vma pointer. + * + * Return: True if two vma's have the same reservation range or false otherwise. + */ +bool reserv_vma_same_reserv(struct vm_area_struct *vma1, struct vm_area_struct *vma2); + +/** + * reserv_vma_is_supported() - Checks if the reservation properties exists + * for the vma. This function should be called with mmap_lock held. + * @vma: The vma pointer. + * + * Return: True if vma has the reservation property set or false otherwise. + */ +bool reserv_vma_is_supported(struct vm_area_struct *vma); + +/** + * reserv_fork() - Checks and copies the MMF_PCUABI_RESERVE bit in the new + * mm during fork. + * @mm: New mm pointer. + * @oldmm: Old mm pointer. + * + * Return: None. + */ +static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) +{ + if (test_bit(MMF_PCUABI_RESERVE, &oldmm->flags)) + set_bit(MMF_PCUABI_RESERVE, &mm->flags); +} + +#else /* CONFIG_CHERI_PURECAP_UABI */ + +static inline int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, cheri_perms_t perm) +{ + return 0; +} + +static inline user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, + cheri_perms_t perm, bool locked) +{ + return (user_uintptr_t)start; +} + +static inline int reserv_vmi_range_mapped(struct vma_iterator *vmi, ptraddr_t start, + size_t len, bool locked) +{ + return 0; +} + +static inline bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap, + bool locked) +{ + return true; +} + +static inline bool reserv_vma_valid_capability(struct vm_area_struct *vma, user_uintptr_t cap) +{ + return true; +} + +static inline bool reserv_vma_valid_address(struct vm_area_struct *vma, ptraddr_t addr, size_t len) +{ + return true; +} + +static inline bool reserv_vma_same_reserv(struct vm_area_struct *vma1, struct vm_area_struct *vma2) +{ + return true; +} + +static inline bool reserv_vma_is_supported(struct vm_area_struct *vma) +{ + return false; +} + +static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) {} + +#endif /* CONFIG_CHERI_PURECAP_UABI */ + +#endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 12e87f83287d..38bad6b201ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI + unsigned long reserv_start; + unsigned long reserv_len; + unsigned long reserv_perm; +#endif } __randomize_layout;
#ifdef CONFIG_SCHED_MM_CID diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..c2ca31fe5798 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \ - pgtable-generic.o rmap.o vmalloc.o + pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c new file mode 100644 index 000000000000..8b5f98d0d01c --- /dev/null +++ b/mm/cap_addr_mgmt.c @@ -0,0 +1,176 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/mm.h> +#include <linux/slab.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI + +int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, cheri_perms_t perm) +{ + if (!reserv_vma_is_supported(vma)) + return 0; + /* Reservation base/length is expected as page aligned */ + VM_BUG_ON(start & ~PAGE_MASK); + VM_BUG_ON(len & ~PAGE_MASK); + + vma->reserv_start = CHERI_REPRESENTABLE_BASE(start, len); + vma->reserv_len = cheri_representable_length(len); + if (perm) + vma->reserv_perm = perm; + + return 0; +} + +user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, cheri_perms_t perm, bool locked) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + ptraddr_t end = start + len; + user_uintptr_t ret = 0; + VMA_ITERATOR(vmi, mm, start); + + if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + return start; + + if (end < start) + return -EINVAL; + if (end == start) + return 0; + + /* Check if the reservation range is representable and throw error if not */ + if (start & ~cheri_representable_alignment_mask(len) || + len != cheri_representable_length(len) || + start & ~PAGE_MASK || len % PAGE_SIZE) { + printk(KERN_WARNING "Reservation range (0x%lx)-(0x%lx) is not representable\n", + start, start + len - 1); + return -ERESERVATION; + } + if (!locked && mmap_write_lock_killable(mm)) + return -EINTR; + + for_each_vma_range(vmi, vma, end) { + WRITE_ONCE(vma->reserv_start, start); + WRITE_ONCE(vma->reserv_len, len); + WRITE_ONCE(vma->reserv_perm, perm); + } + if (!locked) + mmap_write_unlock(current->mm); + ret = (user_uintptr_t)uaddr_to_user_ptr_safe(start); + + return ret; +} + +int reserv_vmi_range_mapped(struct vma_iterator *vmi, ptraddr_t start, + size_t len, bool locked) +{ + struct vm_area_struct *vma; + struct mm_struct *mm = current->mm; + int ret = -ENOMEM; + + if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + return 0; + + if (!locked && mmap_read_lock_killable(mm)) + return -EINTR; + + start = round_down(start, PAGE_SIZE); + len = round_up(len, PAGE_SIZE); + mas_set_range(&vmi->mas, start, start); + /* Try walking the given range */ + vma = mas_find(&vmi->mas, start + len - 1); + if (!vma) + goto out; + + /* If the range is fully mapped then no gap exists */ + if (mas_empty_area(&vmi->mas, start, start + len - 1, 1)) + goto out; +out: + if (!locked) + mmap_read_unlock(mm); + return ret; +} + +bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap, bool locked) +{ + struct vm_area_struct *vma; + struct mm_struct *mm = current->mm; + ptraddr_t cap_start = cheri_base_get(cap); + ptraddr_t cap_end = cap_start + cheri_length_get(cap); + bool ret = false; + + if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + return true; + + if (!locked && mmap_read_lock_killable(mm)) + return false; + + /* Check if there is match with the existing reservations */ + vma = mas_find(&vmi->mas, cap_end); + if (!vma) { + ret = true; + goto out; + } + + if (vma->reserv_start <= cap_start && + vma->reserv_len >= cheri_length_get(cap)) + ret = true; +out: + if (!locked) + mmap_read_unlock(mm); + + return ret; +} + +bool reserv_vma_valid_capability(struct vm_area_struct *vma, user_uintptr_t cap) +{ + if (!reserv_vma_is_supported(vma)) + return true; + + /* Check if there is match with the existing reservations */ + if (vma->reserv_start <= cheri_base_get(cap) && + vma->reserv_len >= cheri_length_get(cap)) + return true; + + return false; +} + +bool reserv_vma_valid_address(struct vm_area_struct *vma, ptraddr_t start, size_t len) +{ + if (!reserv_vma_is_supported(vma)) + return true; + + /* Check if there is match with the existing reservations */ + if (vma->reserv_start <= start && vma->reserv_len >= len) + return true; + + return false; +} + +bool reserv_vma_same_reserv(struct vm_area_struct *vma1, struct vm_area_struct *vma2) +{ + if (!vma1 || !vma2 || !reserv_vma_is_supported(vma1) || !reserv_vma_is_supported(vma1)) + return true; + + /* Match reservation properties betwen 2 vma's */ + if (reserv_vma_is_supported(vma1) && reserv_vma_is_supported(vma1) + && vma1->reserv_start == vma2->reserv_start + && vma1->reserv_len == vma2->reserv_len) + return true; + + return false; +} + +bool reserv_vma_is_supported(struct vm_area_struct *vma) +{ + if (vma && vma->vm_mm && test_bit(MMF_PCUABI_RESERVE, + &vma->vm_mm->flags)) + return true; + + return false; +} + +#endif /* CONFIG_CHERI_PURECAP_UABI */
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
[...]
+#ifdef CONFIG_CHERI_PURECAP_UABI +#define CHERI_REPRESENTABLE_ALIGNMENT(len) \
- (~cheri_representable_alignment_mask(len) + 1)
+#define CHERI_REPRESENTABLE_BASE(base, len) \
- ((base) & cheri_representable_alignment_mask(len))
To be generic, we could rename those to reserv_representable_* and add one for the length too, with trivial !PCuABI implementations.
[...]
+/**
- reserv_vmi_valid_capability() - Search and matches the reservation interface
- for the capability bound values falling within the reserved virtual address
- range. This function internally uses mmap_lock to synchronize the vma
- updates if mmap_lock not already held.
- @vmi: The vma iterator pointing at the vma.
- @cap: Reservation capability value.
- @locked: Flag to indicate if mmap_lock is already held.
- Return: True if the input capability bound values within the reserved virtual
address range or false otherwise.
- */
+bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap,
bool locked);
+/**
- reserv_vma_valid_capability() - Search and matches the input vma for the
- capability bound values falling within the reserved virtual address range.
- This function should be called with mmap_lock held.
- @vma: The vma pointer.
- @cap: Reservation capability value.
- Return: True if the input capability bound values within the reserved virtual
address range or false otherwise.
- */
+bool reserv_vma_valid_capability(struct vm_area_struct *vma, user_uintptr_t cap);
+/**
- reserv_vma_valid_address() - Search and matches the input vma for the input
- address range falling within the reserved virtual address range. This
- function should be called with mmap_lock held.
- @vma: The vma pointer.
- @start: Virtual address start value.
- @len: Virtual address length.
- Return: True if the input address range within the reserved virtual address
range or false otherwise.
- */
+bool reserv_vma_valid_address(struct vm_area_struct *vma, ptraddr_t start, size_t len);
It's pretty unclear what "valid" means in these three functions. We could maybe say "reserv_*_cap_within_reserv" and "reserv_vma_range_within_reserv", since they only check that the capability bounds / range are within some reservation.
[...]
@@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI
- unsigned long reserv_start;
Should be ptraddr_t too.
- unsigned long reserv_len;
size_t is a bit nicer for lengths, though it's not used very consistently in the mm code.
- unsigned long reserv_perm;
cheri_perms_t I suppose? Shouldn't be a problem as this is #ifdef'd.
+#endif } __randomize_layout; [...]
+bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap, bool locked) +{
- struct vm_area_struct *vma;
- struct mm_struct *mm = current->mm;
- ptraddr_t cap_start = cheri_base_get(cap);
- ptraddr_t cap_end = cap_start + cheri_length_get(cap);
- bool ret = false;
- if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags))
return true;
- if (!locked && mmap_read_lock_killable(mm))
return false;
- /* Check if there is match with the existing reservations */
- vma = mas_find(&vmi->mas, cap_end);
- if (!vma) {
ret = true;
Why do we return true if there is no reservation in this range?
goto out;
- }
- if (vma->reserv_start <= cap_start &&
vma->reserv_len >= cheri_length_get(cap))
ret = true;
+out:
- if (!locked)
mmap_read_unlock(mm);
- return ret;
+}
+bool reserv_vma_valid_capability(struct vm_area_struct *vma, user_uintptr_t cap) +{
- if (!reserv_vma_is_supported(vma))
return true;
- /* Check if there is match with the existing reservations */
- if (vma->reserv_start <= cheri_base_get(cap) &&
vma->reserv_len >= cheri_length_get(cap))
return true;
- return false;
+}
+bool reserv_vma_valid_address(struct vm_area_struct *vma, ptraddr_t start, size_t len) +{
- if (!reserv_vma_is_supported(vma))
return true;
- /* Check if there is match with the existing reservations */
- if (vma->reserv_start <= start && vma->reserv_len >= len)
return true;
- return false;
+}
+bool reserv_vma_same_reserv(struct vm_area_struct *vma1, struct vm_area_struct *vma2) +{
- if (!vma1 || !vma2 || !reserv_vma_is_supported(vma1) || !reserv_vma_is_supported(vma1))
return true;
- /* Match reservation properties betwen 2 vma's */
- if (reserv_vma_is_supported(vma1) && reserv_vma_is_supported(vma1)
&& vma1->reserv_start == vma2->reserv_start
&& vma1->reserv_len == vma2->reserv_len)
return true;
- return false;
+}
+bool reserv_vma_is_supported(struct vm_area_struct *vma)
I don't think this should be part of the API, as reservations are not enabled per vma. In fact, considering that half the functions (e.g. reserv_range_set_reserv) simply check current->mm->flags, we might as well just do that for all of them, in which case we'd just replace this function with one that takes a struct mm_struct * as argument, as I suggested in patch 6.
A wider question that applies to the whole series is how we want to approach checking mm->flags. Right now we're often testing it both in the helper and in the caller, which is unnecessary. I can see pros and cons for both approaches, and I think we could do it either way, but we should choose one of them. Doing the checking in the helpers probably implies adding more helpers to avoid conditional code in syscall handlers, but that could be quite elegant.
Kevin
+{
- if (vma && vma->vm_mm && test_bit(MMF_PCUABI_RESERVE,
&vma->vm_mm->flags))
return true;
- return false;
+}
+#endif /* CONFIG_CHERI_PURECAP_UABI */
Hi,
On 1/15/24 14:05, Kevin Brodsky wrote:
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
[...]
+#ifdef CONFIG_CHERI_PURECAP_UABI +#define CHERI_REPRESENTABLE_ALIGNMENT(len) \
- (~cheri_representable_alignment_mask(len) + 1)
+#define CHERI_REPRESENTABLE_BASE(base, len) \
- ((base) & cheri_representable_alignment_mask(len))
To be generic, we could rename those to reserv_representable_* and add one for the length too, with trivial !PCuABI implementations.
Yes makes sense.
[...]
+/**
- reserv_vmi_valid_capability() - Search and matches the reservation interface
- for the capability bound values falling within the reserved virtual address
- range. This function internally uses mmap_lock to synchronize the vma
- updates if mmap_lock not already held.
- @vmi: The vma iterator pointing at the vma.
- @cap: Reservation capability value.
- @locked: Flag to indicate if mmap_lock is already held.
- Return: True if the input capability bound values within the reserved virtual
address range or false otherwise.
- */
+bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap,
bool locked);
+/**
- reserv_vma_valid_capability() - Search and matches the input vma for the
- capability bound values falling within the reserved virtual address range.
- This function should be called with mmap_lock held.
- @vma: The vma pointer.
- @cap: Reservation capability value.
- Return: True if the input capability bound values within the reserved virtual
address range or false otherwise.
- */
+bool reserv_vma_valid_capability(struct vm_area_struct *vma, user_uintptr_t cap);
+/**
- reserv_vma_valid_address() - Search and matches the input vma for the input
- address range falling within the reserved virtual address range. This
- function should be called with mmap_lock held.
- @vma: The vma pointer.
- @start: Virtual address start value.
- @len: Virtual address length.
- Return: True if the input address range within the reserved virtual address
range or false otherwise.
- */
+bool reserv_vma_valid_address(struct vm_area_struct *vma, ptraddr_t start, size_t len);
It's pretty unclear what "valid" means in these three functions. We could maybe say "reserv_*_cap_within_reserv" and "reserv_vma_range_within_reserv", since they only check that the capability bounds / range are within some reservation.
yes *_within_reserv() sounds better.
[...]
@@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI
- unsigned long reserv_start;
Should be ptraddr_t too.
- unsigned long reserv_len;
size_t is a bit nicer for lengths, though it's not used very consistently in the mm code.
- unsigned long reserv_perm;
cheri_perms_t I suppose? Shouldn't be a problem as this is #ifdef'd.
+#endif } __randomize_layout;
ok. I will update all the 3 reserv members.
[...]
+bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap, bool locked) +{
- struct vm_area_struct *vma;
- struct mm_struct *mm = current->mm;
- ptraddr_t cap_start = cheri_base_get(cap);
- ptraddr_t cap_end = cap_start + cheri_length_get(cap);
- bool ret = false;
- if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags))
return true;
- if (!locked && mmap_read_lock_killable(mm))
return false;
- /* Check if there is match with the existing reservations */
- vma = mas_find(&vmi->mas, cap_end);
- if (!vma) {
ret = true;
Why do we return true if there is no reservation in this range?
Well this is somewhat inspired from munmap() behaviour. If munmap() is called for non-existing vma then it does not return error. So I tried to have the same behavior and not return the -ERESERVATION error.
Besides I need some clarification regarding this interface. Currently this interface only checks reservation for the first vma and not all vma in the given range. As reservation fields remains constant so I left it like this for performance benefit. Let me know your opinion on this.
goto out;
- }
- if (vma->reserv_start <= cap_start &&
vma->reserv_len >= cheri_length_get(cap))
ret = true;
+out:
- if (!locked)
mmap_read_unlock(mm);
- return ret;
+}
+bool reserv_vma_valid_capability(struct vm_area_struct *vma, user_uintptr_t cap) +{
- if (!reserv_vma_is_supported(vma))
return true;
- /* Check if there is match with the existing reservations */
- if (vma->reserv_start <= cheri_base_get(cap) &&
vma->reserv_len >= cheri_length_get(cap))
return true;
- return false;
+}
+bool reserv_vma_valid_address(struct vm_area_struct *vma, ptraddr_t start, size_t len) +{
- if (!reserv_vma_is_supported(vma))
return true;
- /* Check if there is match with the existing reservations */
- if (vma->reserv_start <= start && vma->reserv_len >= len)
return true;
- return false;
+}
+bool reserv_vma_same_reserv(struct vm_area_struct *vma1, struct vm_area_struct *vma2) +{
- if (!vma1 || !vma2 || !reserv_vma_is_supported(vma1) || !reserv_vma_is_supported(vma1))
return true;
- /* Match reservation properties betwen 2 vma's */
- if (reserv_vma_is_supported(vma1) && reserv_vma_is_supported(vma1)
&& vma1->reserv_start == vma2->reserv_start
&& vma1->reserv_len == vma2->reserv_len)
return true;
- return false;
+}
+bool reserv_vma_is_supported(struct vm_area_struct *vma)
I don't think this should be part of the API, as reservations are not enabled per vma. In fact, considering that half the functions (e.g. reserv_range_set_reserv) simply check current->mm->flags, we might as well just do that for all of them, in which case we'd just replace this function with one that takes a struct mm_struct * as argument, as I suggested in patch 6.
I agree with your suggestion that this helper should take mm as input.
A wider question that applies to the whole series is how we want to approach checking mm->flags. Right now we're often testing it both in the helper and in the caller, which is unnecessary. I can see pros and cons for both approaches, and I think we could do it either way, but we should choose one of them. Doing the checking in the helpers probably implies adding more helpers to avoid conditional code in syscall handlers, but that could be quite elegant.
Yes there are some repeated mm->flags checks. It can be moved to helpers only. However get_unmapped_area{*} is the place where still mm->flags checks will be required. I will check more into it.
Thanks, Amit
Kevin
+{
- if (vma && vma->vm_mm && test_bit(MMF_PCUABI_RESERVE,
&vma->vm_mm->flags))
return true;
- return false;
+}
+#endif /* CONFIG_CHERI_PURECAP_UABI */
On 22/01/2024 10:00, Amit Daniel Kachhap wrote:
+bool reserv_vmi_valid_capability(struct vma_iterator *vmi, user_uintptr_t cap, bool locked) +{ +Â Â Â struct vm_area_struct *vma; +Â Â Â struct mm_struct *mm = current->mm; +Â Â Â ptraddr_t cap_start = cheri_base_get(cap); +Â Â Â ptraddr_t cap_end = cap_start + cheri_length_get(cap); +Â Â Â bool ret = false;
+Â Â Â if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) +Â Â Â Â Â Â Â return true;
+Â Â Â if (!locked && mmap_read_lock_killable(mm)) +Â Â Â Â Â Â Â return false;
+Â Â Â /* Check if there is match with the existing reservations */ +Â Â Â vma = mas_find(&vmi->mas, cap_end); +Â Â Â if (!vma) { +Â Â Â Â Â Â Â ret = true;
Why do we return true if there is no reservation in this range?
Well this is somewhat inspired from munmap() behaviour. If munmap() is called for non-existing vma then it does not return error. So I tried to have the same behavior and not return the -ERESERVATION error.
Ah, I see what you're saying. The relevant rule in the spec is:
Its bounds are checked against existing reservations. If the bounds of the capability are not contained within the bounds of any existing reservation, the call fails with -ERESERVATION.
The reason there is no exception for munmap() is that passing an owning capability corresponding to a reservation that no longer exists is a form of use-after-free: such capabilities should not be used in any way after the reservation is gone. With an appropriate revocation scheme, they would be untagged before this can happen, so the syscall would fail anyway. Having this explicit check catches at least some of the cases that revocation would.
Besides I need some clarification regarding this interface. Currently this interface only checks reservation for the first vma and not all vma in the given range. As reservation fields remains constant so I left it like this for performance benefit. Let me know your opinion on this.
I don't think this is needed, because reservations cannot overlap. If the capability bounds are within the reservation according to the first vma we find, then we're good. If we do find a vma but the capability bounds exceed the reservation bounds, then it means that this capability owns a (larger) reservation that has been destroyed and replaced later by a smaller one, so we should error out with -ERESERVATION in any case.
Kevin
+Â Â Â Â Â Â Â goto out; +Â Â Â }
+Â Â Â if (vma->reserv_start <= cap_start && +Â Â Â Â Â Â Â vma->reserv_len >= cheri_length_get(cap)) +Â Â Â Â Â Â Â ret = true; +out: +Â Â Â if (!locked) +Â Â Â Â Â Â Â mmap_read_unlock(mm);
+Â Â Â return ret; +}
Helper functions such as capability_owns_range(), build_owning_capability() and mapping_to_capability_perm() are added to manage capability constraints as per PCuABI specifications.
Note: These helper functions do not check for capability permission constraints and full support will be added in subsequent commits.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/cap_addr_mgmt.h | 52 +++++++++++++++++++++++++++++++++++ mm/cap_addr_mgmt.c | 36 +++++++++++++++++++++++- 2 files changed, 87 insertions(+), 1 deletion(-)
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index 2f296f02c3ff..6a42e714ecd5 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -143,6 +143,38 @@ static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) set_bit(MMF_PCUABI_RESERVE, &mm->flags); }
+/** + * capability_owns_range() - Check if the address range is within the valid + * capability bound. + * @cap: A Capability value. + * @addr: Address start value. + * @len: Address length. + * + * Return: True if address within the capability bound or false otherwise. + */ +bool capability_owns_range(user_uintptr_t cap, ptraddr_t addr, size_t len); + +/** + * build_owning_capability() - Creates a userspace capability from the + * requested base address, length and memory permission flags. + * @addr: Requested capability address. + * @len: Requested capability length. + * @perm: Requested capability permission flags. + * + * Return: A new capability derived from cheri_user_root_cap. + */ +user_uintptr_t build_owning_capability(ptraddr_t addr, size_t len, cheri_perms_t perm); + +/** + * mapping_to_capability_perm() - Converts memory mapping protection flags to + * capability permission flags. + * @prot: Memory protection flags. + * @has_tag_access: Capability permissions to have tag check flags. + * + * Return: Capability permission flags + */ +cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access); + #else /* CONFIG_CHERI_PURECAP_UABI */
static inline int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, @@ -191,6 +223,26 @@ static inline bool reserv_vma_is_supported(struct vm_area_struct *vma)
static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) {}
+static inline bool capability_owns_range(user_uintptr_t cap, ptraddr_t addr, size_t len) +{ + return true; +} + +static inline user_uintptr_t build_owning_capability(ptraddr_t addr, size_t len, cheri_perms_t perm) +{ + return addr; +} + +static inline bool capability_may_set_prot(user_uintptr_t cap, int prot) +{ + return true; +} + +static inline cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) +{ + return 0; +} + #endif /* CONFIG_CHERI_PURECAP_UABI */
#endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index 8b5f98d0d01c..0112b2136755 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -59,7 +59,7 @@ user_uintptr_t reserv_range_set_reserv(ptraddr_t start, size_t len, cheri_perms_ } if (!locked) mmap_write_unlock(current->mm); - ret = (user_uintptr_t)uaddr_to_user_ptr_safe(start); + ret = build_owning_capability(start, len, perm);
return ret; } @@ -173,4 +173,38 @@ bool reserv_vma_is_supported(struct vm_area_struct *vma) return false; }
+bool capability_owns_range(user_uintptr_t cap, ptraddr_t addr, size_t len) +{ + ptraddr_t align_addr = round_down(addr, PAGE_SIZE); + size_t align_len = round_up(len, PAGE_SIZE); + + if (!test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + return true; + + return cheri_check_cap((const void * __capability)cheri_address_set(cap, align_addr), + align_len, CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM); +} + +user_uintptr_t build_owning_capability(ptraddr_t addr, size_t len, cheri_perms_t perm) +{ + ptraddr_t align_start = CHERI_REPRESENTABLE_BASE(round_down(addr, PAGE_SIZE), len); + size_t align_len = cheri_representable_length(round_up(len, PAGE_SIZE)); + user_uintptr_t ret; + + if (!test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + return (user_uintptr_t)addr; + + ret = (user_uintptr_t)cheri_build_user_cap(align_start, align_len, perm); + + return cheri_address_set(ret, addr); +} + +cheri_perms_t mapping_to_capability_perm(int prot __maybe_unused, + bool has_tag_access __maybe_unused) +{ + /* TODO [PCuABI] - capability permission conversion from memory permission */ + return (CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP); +} + #endif /* CONFIG_CHERI_PURECAP_UABI */
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
Helper functions such as capability_owns_range(), build_owning_capability() and mapping_to_capability_perm() are added to manage capability constraints as per PCuABI specifications.
Note: These helper functions do not check for capability permission constraints and full support will be added in subsequent commits.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/cap_addr_mgmt.h | 52 +++++++++++++++++++++++++++++++++++ mm/cap_addr_mgmt.c | 36 +++++++++++++++++++++++- 2 files changed, 87 insertions(+), 1 deletion(-)
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index 2f296f02c3ff..6a42e714ecd5 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -143,6 +143,38 @@ static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) set_bit(MMF_PCUABI_RESERVE, &mm->flags); } +/**
- capability_owns_range() - Check if the address range is within the valid
- capability bound.
- @cap: A Capability value.
- @addr: Address start value.
- @len: Address length.
- Return: True if address within the capability bound or false otherwise.
- */
+bool capability_owns_range(user_uintptr_t cap, ptraddr_t addr, size_t len);
+/**
- build_owning_capability() - Creates a userspace capability from the
- requested base address, length and memory permission flags.
- @addr: Requested capability address.
- @len: Requested capability length.
- @perm: Requested capability permission flags.
- Return: A new capability derived from cheri_user_root_cap.
- */
+user_uintptr_t build_owning_capability(ptraddr_t addr, size_t len, cheri_perms_t perm);
Since these functions will be used directly in generic code and operate on user_uintptr_t, we shouldn't use "capability" in their name. We could actually have them in linux/user_ptr.h and follow a similar naming, for instance check_user_ptr_owning() and make_use_ptr_owning().
+/**
- mapping_to_capability_perm() - Converts memory mapping protection flags to
- capability permission flags.
- @prot: Memory protection flags.
- @has_tag_access: Capability permissions to have tag check flags.
- Return: Capability permission flags
- */
+cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access);
I hadn't realised this in patch 3, but we can't use cheri_perms_t unconditionally - this shouldn't build in !MORELLO. We could introduce user_ptr_perms_t in user_ptr.h, typedef to cheri_perms_t in PCuABI and maybe int otherwise. Maybe that function could live in user_ptr.h as well, called something like user_ptr_perms_from_prot().
Kevin
[...]
On 1/15/24 14:06, Kevin Brodsky wrote:
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
Helper functions such as capability_owns_range(), build_owning_capability() and mapping_to_capability_perm() are added to manage capability constraints as per PCuABI specifications.
Note: These helper functions do not check for capability permission constraints and full support will be added in subsequent commits.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/cap_addr_mgmt.h | 52 +++++++++++++++++++++++++++++++++++ mm/cap_addr_mgmt.c | 36 +++++++++++++++++++++++- 2 files changed, 87 insertions(+), 1 deletion(-)
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index 2f296f02c3ff..6a42e714ecd5 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -143,6 +143,38 @@ static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) set_bit(MMF_PCUABI_RESERVE, &mm->flags); } +/**
- capability_owns_range() - Check if the address range is within the valid
- capability bound.
- @cap: A Capability value.
- @addr: Address start value.
- @len: Address length.
- Return: True if address within the capability bound or false otherwise.
- */
+bool capability_owns_range(user_uintptr_t cap, ptraddr_t addr, size_t len);
+/**
- build_owning_capability() - Creates a userspace capability from the
- requested base address, length and memory permission flags.
- @addr: Requested capability address.
- @len: Requested capability length.
- @perm: Requested capability permission flags.
- Return: A new capability derived from cheri_user_root_cap.
- */
+user_uintptr_t build_owning_capability(ptraddr_t addr, size_t len, cheri_perms_t perm);
Since these functions will be used directly in generic code and operate on user_uintptr_t, we shouldn't use "capability" in their name. We could actually have them in linux/user_ptr.h and follow a similar naming, for instance check_user_ptr_owning() and make_use_ptr_owning().
ok above names looks more appropriate.
+/**
- mapping_to_capability_perm() - Converts memory mapping protection flags to
- capability permission flags.
- @prot: Memory protection flags.
- @has_tag_access: Capability permissions to have tag check flags.
- Return: Capability permission flags
- */
+cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access);
I hadn't realised this in patch 3, but we can't use cheri_perms_t unconditionally - this shouldn't build in !MORELLO. We could introduce user_ptr_perms_t in user_ptr.h, typedef to cheri_perms_t in PCuABI and maybe int otherwise. Maybe that function could live in user_ptr.h as well, called something like user_ptr_perms_from_prot().
ok.
Thanks, Amit
Kevin
[...]
The address argument in struct file_operations.get_unmapped_area() function is unsigned long. However, PCuABI requirements need full capability details to be passed to this generic function get_unmapped_area(). This will help to implement the different PCuABI constraints in the low-level handler generic_get_unmapped_area*.
Note that this is a non-functional change for non-purecap cases which still pass the argument as unsigned long. Also, this commit is not exhaustive at the moment.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- drivers/char/mem.c | 2 +- fs/hugetlbfs/inode.c | 2 +- fs/proc/inode.c | 4 ++-- fs/ramfs/file-mmu.c | 2 +- include/linux/fs.h | 2 +- include/linux/huge_mm.h | 2 +- include/linux/mm.h | 2 +- include/linux/mm_types.h | 2 +- include/linux/proc_fs.h | 2 +- include/linux/sched/mm.h | 8 ++++---- include/linux/shmem_fs.h | 2 +- io_uring/io_uring.c | 2 +- ipc/shm.c | 2 +- mm/huge_memory.c | 2 +- mm/mmap.c | 17 ++++++++++------- mm/shmem.c | 4 ++-- 16 files changed, 30 insertions(+), 27 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c index f494d31f2b98..332a1276b9fc 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -548,7 +548,7 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma) }
static unsigned long get_unmapped_area_zero(struct file *file, - unsigned long addr, unsigned long len, + user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { #ifdef CONFIG_MMU diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ecfdfb2529a3..63457f1f9b60 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -274,7 +274,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
#ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA static unsigned long -hugetlb_get_unmapped_area(struct file *file, unsigned long addr, +hugetlb_get_unmapped_area(struct file *file, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 8bc77efd6244..abe54ba79759 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -453,7 +453,7 @@ static int proc_reg_mmap(struct file *file, struct vm_area_struct *vma) }
static unsigned long -pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned long orig_addr, +pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, user_uintptr_t orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags) { @@ -470,7 +470,7 @@ pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned lo }
static unsigned long -proc_reg_get_unmapped_area(struct file *file, unsigned long orig_addr, +proc_reg_get_unmapped_area(struct file *file, user_uintptr_t orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags) { diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c index 12af0490322f..deec4fac8a13 100644 --- a/fs/ramfs/file-mmu.c +++ b/fs/ramfs/file-mmu.c @@ -32,7 +32,7 @@ #include "internal.h"
static unsigned long ramfs_mmu_get_unmapped_area(struct file *file, - unsigned long addr, unsigned long len, unsigned long pgoff, + user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { return current->mm->get_unmapped_area(file, addr, len, pgoff, flags); diff --git a/include/linux/fs.h b/include/linux/fs.h index 8f2952e298a8..affd05163686 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1794,7 +1794,7 @@ struct file_operations { int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); - unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); + unsigned long (*get_unmapped_area)(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 20284387b841..a322bf196ac7 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -140,7 +140,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, (transparent_hugepage_flags & \ (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
-unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, +unsigned long thp_get_unmapped_area(struct file *filp, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags);
void prep_transhuge_page(struct page *page); diff --git a/include/linux/mm.h b/include/linux/mm.h index c1f4996a957f..a4b7381b4977 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3135,7 +3135,7 @@ extern int install_special_mapping(struct mm_struct *mm, unsigned long randomize_stack_top(unsigned long stack_top); unsigned long randomize_page(unsigned long start, unsigned long range);
-extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); +extern unsigned long get_unmapped_area(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long);
extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 38bad6b201ca..487389d5a33e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -591,7 +591,7 @@ struct mm_struct { struct maple_tree mm_mt; #ifdef CONFIG_MMU unsigned long (*get_unmapped_area) (struct file *filp, - unsigned long addr, unsigned long len, + user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags); #endif unsigned long mmap_base; /* base of mmap area */ diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 6a2cabec118d..cbe40e5ad8d0 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -41,7 +41,7 @@ struct proc_ops { long (*proc_compat_ioctl)(struct file *, unsigned int, unsigned long); #endif int (*proc_mmap)(struct file *, struct vm_area_struct *); - unsigned long (*proc_get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); + unsigned long (*proc_get_unmapped_area)(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long); } __randomize_layout;
/* definitions for hide_pid field */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 8d89c8c4fac1..3385ac9cd298 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -179,19 +179,19 @@ static inline void mm_update_next_owner(struct mm_struct *mm) extern void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack); extern unsigned long -arch_get_unmapped_area(struct file *, unsigned long, unsigned long, +arch_get_unmapped_area(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long); extern unsigned long -arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, +arch_get_unmapped_area_topdown(struct file *filp, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags);
unsigned long -generic_get_unmapped_area(struct file *filp, unsigned long addr, +generic_get_unmapped_area(struct file *filp, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags); unsigned long -generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, +generic_get_unmapped_area_topdown(struct file *filp, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags); #else diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 9029abd29b1c..233617830711 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -73,7 +73,7 @@ extern struct file *shmem_kernel_file_setup(const char *name, loff_t size, extern struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt, const char *name, loff_t size, unsigned long flags); extern int shmem_zero_setup(struct vm_area_struct *); -extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr, +extern unsigned long shmem_get_unmapped_area(struct file *, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags); extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts); #ifdef CONFIG_SHMEM diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 37c3a1a6cc76..5c9c9e768788 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3468,7 +3468,7 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) }
static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp, - unsigned long addr, unsigned long len, + user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); diff --git a/ipc/shm.c b/ipc/shm.c index 16b75e8bcda1..447b8dcd95df 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -647,7 +647,7 @@ static long shm_fallocate(struct file *file, int mode, loff_t offset, }
static unsigned long shm_get_unmapped_area(struct file *file, - unsigned long addr, unsigned long len, unsigned long pgoff, + user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct shm_file_data *sfd = shm_file_data(file); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 624671aaa60d..e67b474efac7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -634,7 +634,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, return ret; }
-unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, +unsigned long thp_get_unmapped_area(struct file *filp, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { unsigned long ret; diff --git a/mm/mmap.c b/mm/mmap.c index bc422cc4a14b..7f2246cbc969 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1679,13 +1679,14 @@ unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info) * This function "knows" that -ENOMEM has the bits set. */ unsigned long -generic_get_unmapped_area(struct file *filp, unsigned long addr, +generic_get_unmapped_area(struct file *filp, user_uintptr_t user_ptr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma, *prev; struct vm_unmapped_area_info info; + unsigned long addr = (ptraddr_t)user_ptr; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
if (len > mmap_end - mmap_min_addr) @@ -1714,7 +1715,7 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr,
#ifndef HAVE_ARCH_UNMAPPED_AREA unsigned long -arch_get_unmapped_area(struct file *filp, unsigned long addr, +arch_get_unmapped_area(struct file *filp, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { @@ -1727,13 +1728,14 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, * stack's low limit (the base): */ unsigned long -generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, +generic_get_unmapped_area_topdown(struct file *filp, user_uintptr_t user_ptr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct vm_area_struct *vma, *prev; struct mm_struct *mm = current->mm; struct vm_unmapped_area_info info; + unsigned long addr = (ptraddr_t)user_ptr; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
/* requested length too big for entire address space */ @@ -1780,7 +1782,7 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
#ifndef HAVE_ARCH_UNMAPPED_AREA_TOPDOWN unsigned long -arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, +arch_get_unmapped_area_topdown(struct file *filp, user_uintptr_t addr, unsigned long len, unsigned long pgoff, unsigned long flags) { @@ -1789,12 +1791,13 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, #endif
unsigned long -get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, +get_unmapped_area(struct file *file, user_uintptr_t user_ptr, unsigned long len, unsigned long pgoff, unsigned long flags) { - unsigned long (*get_area)(struct file *, unsigned long, + unsigned long (*get_area)(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long);
+ unsigned long addr = (ptraddr_t)user_ptr; unsigned long error = arch_mmap_check(addr, len, flags); if (error) return error; @@ -1817,7 +1820,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, get_area = shmem_get_unmapped_area; }
- addr = get_area(file, addr, len, pgoff, flags); + addr = get_area(file, user_ptr, len, pgoff, flags); if (IS_ERR_VALUE(addr)) return addr;
diff --git a/mm/shmem.c b/mm/shmem.c index e40a08c5c6d7..36c0b00c1982 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2170,11 +2170,11 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) }
unsigned long shmem_get_unmapped_area(struct file *file, - unsigned long uaddr, unsigned long len, + user_uintptr_t uaddr, unsigned long len, unsigned long pgoff, unsigned long flags) { unsigned long (*get_area)(struct file *, - unsigned long, unsigned long, unsigned long, unsigned long); + user_uintptr_t, unsigned long, unsigned long, unsigned long); unsigned long addr; unsigned long offset; unsigned long inflated_len;
In CHERI architecture, all the ranges cannot be represented in capability so add the necessary CHERI base and length alignment checks when generating the free unmapped virtual address or evaluating the fixed input address.
The PCuABI reservation interface stores the un-usable alignment gaps at the start and end. These gaps should be considered when finding the free unmapped address space.
In case of fixed valid capability type addresses, the requested address range should completely overlap with the reservation range. In case of fixed null capability addresses, they are verified to not overlap with any existing reservation range.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/mm.h | 8 ++++ mm/mmap.c | 97 +++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 95 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index a4b7381b4977..5b79120c999a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3239,6 +3239,10 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (test_bit(MMF_PCUABI_RESERVE, &vma->vm_mm->flags)) + vm_start = vma->reserv_start; +#endif if (vma->vm_flags & VM_GROWSDOWN) { vm_start -= stack_guard_gap; if (vm_start > vma->vm_start) @@ -3251,6 +3255,10 @@ static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (test_bit(MMF_PCUABI_RESERVE, &vma->vm_mm->flags)) + vm_end = vma->reserv_start + vma->reserv_len; +#endif if (vma->vm_flags & VM_GROWSUP) { vm_end += stack_guard_gap; if (vm_end < vma->vm_end) diff --git a/mm/mmap.c b/mm/mmap.c index 7f2246cbc969..6027da2c248b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,7 @@ #include <linux/sched/mm.h> #include <linux/ksm.h>
+#include <linux/cap_addr_mgmt.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -1561,6 +1562,11 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
/* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI) + /* Cheri Representable length is sufficient for alignment */ + if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + length = cheri_representable_length(info->length); +#endif if (length < info->length) return -ENOMEM;
@@ -1612,6 +1618,11 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI) + /* Cheri Representable length is sufficient for alignment */ + if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + length = cheri_representable_length(info->length); +#endif if (length < info->length) return -ENOMEM;
@@ -1637,6 +1648,10 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start; +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + high_limit = tmp->reserv_start; +#endif mas_reset(&mas); goto retry; } @@ -1688,20 +1703,46 @@ generic_get_unmapped_area(struct file *filp, user_uintptr_t user_ptr, struct vm_unmapped_area_info info; unsigned long addr = (ptraddr_t)user_ptr; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long align_len = len;
- if (len > mmap_end - mmap_min_addr) +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + align_len = cheri_representable_length(len); +#endif + if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED) + if (flags & MAP_FIXED) { +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags) && cheri_tag_get(user_ptr)) { + /* Ensure that this range is within the reservation bound */ + vma = find_vma(mm, addr); + if (!vma || !reserv_vma_valid_address(vma, addr, len)) + return -ERESERVATION; + return addr; + } else if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + return addr; +#else return addr; +#endif + }
if (addr) { addr = PAGE_ALIGN(addr); +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + addr = round_up(addr, CHERI_REPRESENTABLE_ALIGNMENT(len)); +#endif vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && + if (mmap_end - align_len >= addr && addr >= mmap_min_addr && + (!vma || addr + align_len <= vm_start_gap(vma)) && (!prev || addr >= vm_end_gap(prev))) return addr; +#if defined(CONFIG_CHERI_PURECAP_UABI) + else if (flags & MAP_FIXED) + /* This non-tagged fixed address overlaps with other reservation */ + return -ERESERVATION; +#endif }
info.flags = 0; @@ -1709,6 +1750,10 @@ generic_get_unmapped_area(struct file *filp, user_uintptr_t user_ptr, info.low_limit = mm->mmap_base; info.high_limit = mmap_end; info.align_mask = 0; +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + info.align_mask = ~(cheri_representable_alignment_mask(len)); +#endif info.align_offset = 0; return vm_unmapped_area(&info); } @@ -1737,22 +1782,50 @@ generic_get_unmapped_area_topdown(struct file *filp, user_uintptr_t user_ptr, struct vm_unmapped_area_info info; unsigned long addr = (ptraddr_t)user_ptr; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long align_len = len; + unsigned long align_addr;
+#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + align_len = cheri_representable_length(len); +#endif /* requested length too big for entire address space */ - if (len > mmap_end - mmap_min_addr) + if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED) + if (flags & MAP_FIXED) { +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags) && cheri_tag_get(user_ptr)) { + /* Ensure that this range is within the reservation bound */ + vma = find_vma(mm, addr); + if (!vma || !reserv_vma_valid_address(vma, addr, len)) + return -ERESERVATION; + return addr; + } else if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + return addr; +#else return addr; +#endif + }
/* requesting a specific address */ if (addr) { addr = PAGE_ALIGN(addr); - vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && - (!prev || addr >= vm_end_gap(prev))) + align_addr = addr; +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + align_addr = CHERI_REPRESENTABLE_BASE(addr, len); +#endif + vma = find_vma_prev(mm, align_addr, &prev); + if (mmap_end - align_len >= align_addr && align_addr >= mmap_min_addr && + (!vma || align_addr + align_len <= vm_start_gap(vma)) && + (!prev || align_addr >= vm_end_gap(prev))) return addr; +#if defined(CONFIG_CHERI_PURECAP_UABI) + else if (flags & MAP_FIXED) + /* This non-tagged fixed address overlaps with other reservation */ + return -ERESERVATION; +#endif }
info.flags = VM_UNMAPPED_AREA_TOPDOWN; @@ -1760,6 +1833,10 @@ generic_get_unmapped_area_topdown(struct file *filp, user_uintptr_t user_ptr, info.low_limit = PAGE_SIZE; info.high_limit = arch_get_mmap_base(addr, mm->mmap_base); info.align_mask = 0; +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + info.align_mask = ~(cheri_representable_alignment_mask(len)); +#endif info.align_offset = 0; addr = vm_unmapped_area(&info);
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
In CHERI architecture, all the ranges cannot be represented in capability so add the necessary CHERI base and length alignment checks when generating the free unmapped virtual address or evaluating the fixed input address.
The PCuABI reservation interface stores the un-usable alignment gaps at the start and end. These gaps should be considered when finding the free unmapped address space.
In case of fixed valid capability type addresses, the requested address range should completely overlap with the reservation range. In case of fixed null capability addresses, they are verified to not overlap with any existing reservation range.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/mm.h | 8 ++++ mm/mmap.c | 97 +++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 95 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index a4b7381b4977..5b79120c999a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3239,6 +3239,10 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start; +#ifdef CONFIG_CHERI_PURECAP_UABI
It would be nice to get rid of all this #ifdef'ing, this is really a lot of noise. In most cases we already test the mm flag so it's just a matter of ensuring the conditional code can build. This should be quite straightforward if we introduce a few more helpers, for instance to get the reservation start and end (fallback implementation being the mapping start and end).
- if (test_bit(MMF_PCUABI_RESERVE, &vma->vm_mm->flags))
This test is done so many times that it probably deserves a helper.
vm_start = vma->reserv_start;
+#endif if (vma->vm_flags & VM_GROWSDOWN) { vm_start -= stack_guard_gap; if (vm_start > vma->vm_start) @@ -3251,6 +3255,10 @@ static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (test_bit(MMF_PCUABI_RESERVE, &vma->vm_mm->flags))
vm_end = vma->reserv_start + vma->reserv_len;
+#endif if (vma->vm_flags & VM_GROWSUP) { vm_end += stack_guard_gap; if (vm_end < vma->vm_end) diff --git a/mm/mmap.c b/mm/mmap.c index 7f2246cbc969..6027da2c248b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,7 @@ #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/cap_addr_mgmt.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -1561,6 +1562,11 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
I still don't understand how we know this is the case. The gap we find needs to be of length cheri_representable_length(info->length), *but also* its alignment needs to be at least CHERI_REPRESENTABLE_ALIGNMENT(info->length). There is no guarantee that mas_empty_area() will return a sufficiently aligned gap, and that's why the original code increases the requested length - the following line will ensure that the returned address is indeed aligned (discarding the bottom part of the gap if it's not sufficiently aligned):
gap += (info->align_offset - gap) & info->align_mask;
- if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags))
length = cheri_representable_length(info->length);
+#endif if (length < info->length) return -ENOMEM; @@ -1612,6 +1618,11 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
- if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags))
length = cheri_representable_length(info->length);
+#endif if (length < info->length) return -ENOMEM; @@ -1637,6 +1648,10 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start; +#if defined(CONFIG_CHERI_PURECAP_UABI)
if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags))
high_limit = tmp->reserv_start;
+#endif mas_reset(&mas); goto retry; } @@ -1688,20 +1703,46 @@ generic_get_unmapped_area(struct file *filp, user_uintptr_t user_ptr, struct vm_unmapped_area_info info; unsigned long addr = (ptraddr_t)user_ptr; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
- unsigned long align_len = len;
- if (len > mmap_end - mmap_min_addr)
+#if defined(CONFIG_CHERI_PURECAP_UABI)
- if (test_bit(MMF_PCUABI_RESERVE, &mm->flags))
align_len = cheri_representable_length(len);
+#endif
- if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED)
- if (flags & MAP_FIXED) {
+#if defined(CONFIG_CHERI_PURECAP_UABI)
if (test_bit(MMF_PCUABI_RESERVE, &mm->flags) && cheri_tag_get(user_ptr)) {
AFAICT this cheri_tag_get(user_ptr) is the only reason we need the previous patch, propagating the capability to get_unmapped_area(). This doesn't really seem to be worth it, and more importantly it doesn't work if we don't change in-kernel callers of vm_mmap() to use capabilities. So consistently with what I suggested on patch 3 in RFC v2, I don't think we should consider capabilities at all here. In the MAP_FIXED case, let's check that the range is either entirely inside a reservation, or otherwise that we can create a (representable) reservation for it. It is up to the syscall handler to do the actual capability checking.
/* Ensure that this range is within the reservation bound */
vma = find_vma(mm, addr);
if (!vma || !reserv_vma_valid_address(vma, addr, len))
return -ERESERVATION;
return addr;
} else if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags))
return addr;
+#else return addr; +#endif
- }
if (addr) { addr = PAGE_ALIGN(addr); +#if defined(CONFIG_CHERI_PURECAP_UABI)
if (test_bit(MMF_PCUABI_RESERVE, &mm->flags))
addr = round_up(addr, CHERI_REPRESENTABLE_ALIGNMENT(len));
round_down() surely?
+#endif vma = find_vma_prev(mm, addr, &prev);
if (mmap_end - len >= addr && addr >= mmap_min_addr &&
(!vma || addr + len <= vm_start_gap(vma)) &&
if (mmap_end - align_len >= addr && addr >= mmap_min_addr &&
(!prev || addr >= vm_end_gap(prev))) return addr;(!vma || addr + align_len <= vm_start_gap(vma)) &&
+#if defined(CONFIG_CHERI_PURECAP_UABI)
else if (flags & MAP_FIXED)
/* This non-tagged fixed address overlaps with other reservation */
return -ERESERVATION;
I don't think this is ever hit, considering that we always return above if flags & MAP_FIXED.
Kevin
[...]
On 1/12/24 20:24, Kevin Brodsky wrote:
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
In CHERI architecture, all the ranges cannot be represented in capability so add the necessary CHERI base and length alignment checks when generating the free unmapped virtual address or evaluating the fixed input address.
The PCuABI reservation interface stores the un-usable alignment gaps at the start and end. These gaps should be considered when finding the free unmapped address space.
In case of fixed valid capability type addresses, the requested address range should completely overlap with the reservation range. In case of fixed null capability addresses, they are verified to not overlap with any existing reservation range.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/mm.h | 8 ++++ mm/mmap.c | 97 +++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 95 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index a4b7381b4977..5b79120c999a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3239,6 +3239,10 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start; +#ifdef CONFIG_CHERI_PURECAP_UABI
It would be nice to get rid of all this #ifdef'ing, this is really a lot of noise. In most cases we already test the mm flag so it's just a matter of ensuring the conditional code can build. This should be quite straightforward if we introduce a few more helpers, for instance to get the reservation start and end (fallback implementation being the mapping start and end).
ok I will try adding more helpers.
- if (test_bit(MMF_PCUABI_RESERVE, &vma->vm_mm->flags))
This test is done so many times that it probably deserves a helper.
vm_start = vma->reserv_start;
+#endif if (vma->vm_flags & VM_GROWSDOWN) { vm_start -= stack_guard_gap; if (vm_start > vma->vm_start) @@ -3251,6 +3255,10 @@ static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (test_bit(MMF_PCUABI_RESERVE, &vma->vm_mm->flags))
vm_end = vma->reserv_start + vma->reserv_len;
+#endif if (vma->vm_flags & VM_GROWSUP) { vm_end += stack_guard_gap; if (vm_end < vma->vm_end) diff --git a/mm/mmap.c b/mm/mmap.c index 7f2246cbc969..6027da2c248b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,7 @@ #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/cap_addr_mgmt.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -1561,6 +1562,11 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
I still don't understand how we know this is the case. The gap we find needs to be of length cheri_representable_length(info->length), *but also* its alignment needs to be at least CHERI_REPRESENTABLE_ALIGNMENT(info->length). There is no guarantee that > mas_empty_area() will return a sufficiently aligned gap, and that's why the original code increases the requested length - the following line will ensure that the returned address is indeed aligned (discarding the bottom part of the gap if it's not sufficiently aligned):
Yes you are right here. There is also 1 bug in current the code. Even info->length should be equal to cheri_representable_length(). I will update in the next iteration.
gap += (info->align_offset - gap) & info->align_mask;
- if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags))
length = cheri_representable_length(info->length);
+#endif if (length < info->length) return -ENOMEM; @@ -1612,6 +1618,11 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
- if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags))
length = cheri_representable_length(info->length);
+#endif if (length < info->length) return -ENOMEM; @@ -1637,6 +1648,10 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start; +#if defined(CONFIG_CHERI_PURECAP_UABI)
if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags))
high_limit = tmp->reserv_start;
+#endif mas_reset(&mas); goto retry; } @@ -1688,20 +1703,46 @@ generic_get_unmapped_area(struct file *filp, user_uintptr_t user_ptr, struct vm_unmapped_area_info info; unsigned long addr = (ptraddr_t)user_ptr; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
- unsigned long align_len = len;
- if (len > mmap_end - mmap_min_addr)
+#if defined(CONFIG_CHERI_PURECAP_UABI)
- if (test_bit(MMF_PCUABI_RESERVE, &mm->flags))
align_len = cheri_representable_length(len);
+#endif
- if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED)
- if (flags & MAP_FIXED) {
+#if defined(CONFIG_CHERI_PURECAP_UABI)
if (test_bit(MMF_PCUABI_RESERVE, &mm->flags) && cheri_tag_get(user_ptr)) {
AFAICT this cheri_tag_get(user_ptr) is the only reason we need the previous patch, propagating the capability to get_unmapped_area(). This doesn't really seem to be worth it, and more importantly it doesn't work if we don't change in-kernel callers of vm_mmap() to use capabilities. So consistently with what I suggested on patch 3 in RFC v2, I don't think we should consider capabilities at all here. In the MAP_FIXED case, let's check that the range is either entirely inside a reservation, or otherwise that we can create a (representable) reservation for it. It is up to the syscall handler to do the actual capability checking.
Yes tag determines here if the address range is within reservation bound or not. I felt that this approach is generic in nature. But yes for sure this can be done in syscall handler itself and patch 4 can entirely be dropped.
/* Ensure that this range is within the reservation bound */
vma = find_vma(mm, addr);
if (!vma || !reserv_vma_valid_address(vma, addr, len))
return -ERESERVATION;
return addr;
} else if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags))
return addr;
+#else return addr; +#endif
- }
if (addr) { addr = PAGE_ALIGN(addr); +#if defined(CONFIG_CHERI_PURECAP_UABI)
if (test_bit(MMF_PCUABI_RESERVE, &mm->flags))
addr = round_up(addr, CHERI_REPRESENTABLE_ALIGNMENT(len));
round_down() surely?
This function is generic_get_unmapped_area() where free memory are in increasing order so round_up is used. In generic_get_unmapped_area_top_down(), this will be otherwise.
+#endif vma = find_vma_prev(mm, addr, &prev);
if (mmap_end - len >= addr && addr >= mmap_min_addr &&
(!vma || addr + len <= vm_start_gap(vma)) &&
if (mmap_end - align_len >= addr && addr >= mmap_min_addr &&
(!prev || addr >= vm_end_gap(prev))) return addr;(!vma || addr + align_len <= vm_start_gap(vma)) &&
+#if defined(CONFIG_CHERI_PURECAP_UABI)
else if (flags & MAP_FIXED)
/* This non-tagged fixed address overlaps with other reservation */
return -ERESERVATION;
I don't think this is ever hit, considering that we always return above if flags & MAP_FIXED.
This is a fallback when non-tagged fixed address is not free and overlapping and hence cannot be used.
Thanks, Amit Daniel
Kevin
[...]
On 19/01/2024 10:44, Amit Daniel Kachhap wrote:
+           /* Ensure that this range is within the reservation bound */ +           vma = find_vma(mm, addr); +           if (!vma || !reserv_vma_valid_address(vma, addr, len)) +               return -ERESERVATION; +           return addr; +       } else if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) +           return addr; +#else          return addr; +#endif +   }       if (addr) {          addr = PAGE_ALIGN(addr); +#if defined(CONFIG_CHERI_PURECAP_UABI) +       if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) +           addr = round_up(addr, CHERI_REPRESENTABLE_ALIGNMENT(len));
round_down() surely?
This function is generic_get_unmapped_area() where free memory are in increasing order so round_up is used. In generic_get_unmapped_area_top_down(), this will be otherwise.
Ah yes I see, I also missed that PAGE_ALIGN() above is also aligning up. Sorry for the confusion.
+#endif          vma = find_vma_prev(mm, addr, &prev); -       if (mmap_end - len >= addr && addr >= mmap_min_addr && -           (!vma || addr + len <= vm_start_gap(vma)) && +       if (mmap_end - align_len >= addr && addr >= mmap_min_addr && +           (!vma || addr + align_len <= vm_start_gap(vma)) &&              (!prev || addr >= vm_end_gap(prev)))              return addr; +#if defined(CONFIG_CHERI_PURECAP_UABI) +       else if (flags & MAP_FIXED) +           /* This non-tagged fixed address overlaps with other reservation */ +           return -ERESERVATION;
I don't think this is ever hit, considering that we always return above if flags & MAP_FIXED.
This is a fallback when non-tagged fixed address is not free and overlapping and hence cannot be used.
My point is that we already have an if (flags & MAP_FIXED) about 25 lines above, and we always return if the condition is true. So I don't see how this return statement can ever be reached.
Kevin
PCuABI memory reservation requires adding reservation properties while creating and modifying the vma. reserv_vma_init_reserv() is added to initialize them and reserv_vma_set_reserv() is used to update those details. As of now only for mmap/mremap syscalls these properties are added and later commits will add them for other special vma mappings.
PCuABI memory reservation also requires merging or expanding vma's which only belong to the original reservation. Use suitable reservation interfaces to check those properties before performing such operations on the vma.
The address parameter type is modified from unsigned long to user_uintptr_t in several do_mmap() upstream functions to find out if a new reservation is required or not.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/mm.h | 24 +++++++++++++++--- kernel/fork.c | 3 +++ mm/mmap.c | 63 ++++++++++++++++++++++++++++++++++++++-------- mm/util.c | 5 ++-- 4 files changed, 80 insertions(+), 15 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 5b79120c999a..06d2aee83aa4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -747,6 +747,21 @@ static inline void vma_mark_detached(struct vm_area_struct *vma,
#endif /* CONFIG_PER_VMA_LOCK */
+#ifdef CONFIG_CHERI_PURECAP_UABI + +static void reserv_vma_init_reserv(struct vm_area_struct *vma) +{ + vma->reserv_start = 0; + vma->reserv_len = 0; + vma->reserv_perm = 0; +} + +#else /* CONFIG_CHERI_PURECAP_UABI */ + +static void reserv_vma_init_reserv(struct vm_area_struct *vma) {} + +#endif /* CONFIG_CHERI_PURECAP_UABI */ + /* * WARNING: vma_init does not initialize vma->vm_lock. * Use vm_area_alloc()/vm_area_free() if vma needs locking. @@ -761,6 +776,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); vma_numab_state_init(vma); + reserv_vma_init_reserv(vma); }
/* Use when VMA is not part of the VMA tree and needs no locking */ @@ -3139,10 +3155,12 @@ extern unsigned long get_unmapped_area(struct file *, user_uintptr_t, unsigned l
extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf); -extern unsigned long do_mmap(struct file *file, unsigned long addr, + struct list_head *uf, unsigned long prot, bool new_reserv); + +extern user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); + extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool downgrade); @@ -3169,7 +3187,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} extern int __must_check vm_brk(unsigned long, unsigned long); extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long); extern int vm_munmap(unsigned long, size_t); -extern unsigned long __must_check vm_mmap(struct file *, unsigned long, +extern user_uintptr_t __must_check vm_mmap(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long, unsigned long);
diff --git a/kernel/fork.c b/kernel/fork.c index d6fd09ba8d0a..f051a75e448e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -99,6 +99,7 @@ #include <linux/stackprotector.h> #include <linux/user_events.h> #include <linux/iommu.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -682,6 +683,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, goto out; khugepaged_fork(mm, oldmm);
+ reserv_fork(mm, oldmm); + retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count); if (retval) goto out; diff --git a/mm/mmap.c b/mm/mmap.c index 6027da2c248b..7fd7d3ac377b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -49,6 +49,7 @@ #include <linux/ksm.h>
#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -953,7 +954,8 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, /* Can we merge the predecessor? */ if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy) && can_vma_merge_after(prev, vm_flags, anon_vma, file, - pgoff, vm_userfaultfd_ctx, anon_name)) { + pgoff, vm_userfaultfd_ctx, anon_name) + && reserv_vma_same_reserv(prev, curr)) { merge_prev = true; vma_prev(vmi); } @@ -962,7 +964,8 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, /* Can we merge the successor? */ if (next && mpol_equal(policy, vma_policy(next)) && can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx, anon_name)) { + vm_userfaultfd_ctx, anon_name) + && reserv_vma_same_reserv(next, curr)) { merge_next = true; }
@@ -1225,7 +1228,7 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, /* * The caller must write-lock current->mm->mmap_lock. */ -unsigned long do_mmap(struct file *file, unsigned long addr, +user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1233,6 +1236,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct mm_struct *mm = current->mm; vm_flags_t vm_flags; int pkey = 0; + unsigned long addr = (ptraddr_t)usrptr; + bool new_caps = true;
validate_mm(mm); *populate = 0; @@ -1240,6 +1245,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) { + if (cheri_tag_get(usrptr)) + new_caps = false; + } +#endif /* * Does the application expect PROT_READ to imply PROT_EXEC? * @@ -1397,20 +1408,25 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); + if (new_caps) + usrptr = addr; + + addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, prot, new_caps); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; + return addr; }
-user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, +user_uintptr_t ksys_mmap_pgoff(user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; user_uintptr_t retval; + ptraddr_t addr = (ptraddr_t)usrptr;
if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); @@ -1443,7 +1459,7 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, return PTR_ERR(file); }
- retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); + retval = vm_mmap_pgoff(file, usrptr, len, prot, flags, pgoff); out_fput: if (file) fput(file); @@ -2628,15 +2644,18 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, unsigned long prot, bool new_reserv) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; - struct vm_area_struct *next, *prev, *merge; + struct vm_area_struct *next, *prev, *merge, *old_vma; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end; + unsigned long reserv_start = 0; + unsigned long reserv_len = 0; + unsigned long reserv_perm = 0; pgoff_t vm_pgoff; int error; VMA_ITERATOR(vmi, mm, addr); @@ -2656,6 +2675,20 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return -ENOMEM; }
+ if (!new_reserv) { + old_vma = vma_find(&vmi, end); + if (!old_vma) + return -ENOMEM; +#ifdef CONFIG_CHERI_PURECAP_UABI + if (reserv_vma_is_supported(old_vma)) { + reserv_start = old_vma->reserv_start; + reserv_len = old_vma->reserv_len; + reserv_perm = old_vma->reserv_perm; + } +#endif + vma_iter_set(&vmi, addr); + } + /* Unmap any existing mapping in the area */ if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) return -ENOMEM; @@ -2679,7 +2712,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Check next */ if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen, - NULL_VM_UFFD_CTX, NULL)) { + NULL_VM_UFFD_CTX, NULL) && + reserv_vma_valid_address(next, addr, len)) { merge_end = next->vm_end; vma = next; vm_pgoff = next->vm_pgoff - pglen; @@ -2690,7 +2724,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, pgoff, vma->vm_userfaultfd_ctx, NULL) : can_vma_merge_after(prev, vm_flags, NULL, file, pgoff, - NULL_VM_UFFD_CTX, NULL))) { + NULL_VM_UFFD_CTX, NULL)) && + reserv_vma_valid_address(prev, addr, len)) { merge_start = prev->vm_start; vma = prev; vm_pgoff = prev->vm_pgoff; @@ -2722,6 +2757,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff; + if (new_reserv) + reserv_vma_set_reserv(vma, addr, len, + mapping_to_capability_perm(prot, (vm_flags & VM_SHARED) + ? false : true)); + else + reserv_vma_set_reserv(vma, reserv_start, reserv_len, reserv_perm);
if (file) { if (vm_flags & VM_SHARED) { @@ -3320,6 +3361,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, *vmap = vma = new_vma; } *need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff); + reserv_vma_set_reserv(new_vma, addr, len, 0); } else { new_vma = vm_area_dup(vma); if (!new_vma) @@ -3327,6 +3369,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff; + reserv_vma_set_reserv(new_vma, addr, len, 0); if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma)) diff --git a/mm/util.c b/mm/util.c index 61de3bf7712b..f6fbfa7d014e 100644 --- a/mm/util.c +++ b/mm/util.c @@ -557,7 +557,8 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, return ret; }
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset) { @@ -566,7 +567,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr, if (unlikely(offset_in_page(offset))) return -EINVAL;
- return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT); + return vm_mmap_pgoff(file, usrptr, len, prot, flag, offset >> PAGE_SHIFT); } EXPORT_SYMBOL(vm_mmap);
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
PCuABI memory reservation requires adding reservation properties while creating and modifying the vma. reserv_vma_init_reserv() is added to initialize them and reserv_vma_set_reserv() is used to update those details. As of now only for mmap/mremap syscalls these properties are added and later commits will add them for other special vma mappings.
PCuABI memory reservation also requires merging or expanding vma's which only belong to the original reservation. Use suitable reservation interfaces to check those properties before performing such operations on the vma.
The address parameter type is modified from unsigned long to user_uintptr_t in several do_mmap() upstream functions to find out if a new reservation is required or not.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/mm.h | 24 +++++++++++++++--- kernel/fork.c | 3 +++ mm/mmap.c | 63 ++++++++++++++++++++++++++++++++++++++-------- mm/util.c | 5 ++-- 4 files changed, 80 insertions(+), 15 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 5b79120c999a..06d2aee83aa4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -747,6 +747,21 @@ static inline void vma_mark_detached(struct vm_area_struct *vma, #endif /* CONFIG_PER_VMA_LOCK */ +#ifdef CONFIG_CHERI_PURECAP_UABI
+static void reserv_vma_init_reserv(struct vm_area_struct *vma) +{
- vma->reserv_start = 0;
- vma->reserv_len = 0;
- vma->reserv_perm = 0;
vma_init() already zero-inits the vma.
+}
+#else /* CONFIG_CHERI_PURECAP_UABI */
+static void reserv_vma_init_reserv(struct vm_area_struct *vma) {}
+#endif /* CONFIG_CHERI_PURECAP_UABI */ [...]
-unsigned long do_mmap(struct file *file, unsigned long addr, +user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1233,6 +1236,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct mm_struct *mm = current->mm; vm_flags_t vm_flags; int pkey = 0;
- unsigned long addr = (ptraddr_t)usrptr;
- bool new_caps = true;
It's not really obvious what "caps" means (especially with an "s"). Maybe "new_reserv" like the mmap_region() parameter.
Also I'd suggest s/usrptr/user_ptr/ (since that's the abbreviation we've generally used so far).
validate_mm(mm); *populate = 0; @@ -1240,6 +1245,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) {
if (cheri_tag_get(usrptr))
new_caps = false;
- }
+#endif /* * Does the application expect PROT_READ to imply PROT_EXEC? * @@ -1397,20 +1408,25 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
- if (new_caps)
usrptr = addr;
usrptr isn't actually used in this patch, maybe the changes in this function belong to the next patch. That said these two lines don't seem to have any effect even after patch 8.
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, prot, new_caps); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len;
- return addr;
} -user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, +user_uintptr_t ksys_mmap_pgoff(user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; user_uintptr_t retval;
- ptraddr_t addr = (ptraddr_t)usrptr;
addr isn't used in this patch. Those changes probably belong to the next patch too.
if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); @@ -1443,7 +1459,7 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, return PTR_ERR(file); }
- retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
- retval = vm_mmap_pgoff(file, usrptr, len, prot, flags, pgoff);
out_fput: if (file) fput(file); @@ -2628,15 +2644,18 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
struct list_head *uf, unsigned long prot, bool new_reserv)
{ struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL;
- struct vm_area_struct *next, *prev, *merge;
- struct vm_area_struct *next, *prev, *merge, *old_vma; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end;
- unsigned long reserv_start = 0;
- unsigned long reserv_len = 0;
- unsigned long reserv_perm = 0; pgoff_t vm_pgoff; int error; VMA_ITERATOR(vmi, mm, addr);
@@ -2656,6 +2675,20 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return -ENOMEM; }
- if (!new_reserv) {
old_vma = vma_find(&vmi, end);
if (!old_vma)
I initially thought the check was insufficient, but then realised that the actual check happens by calling reserv_vmi_range_mapped() in ksys_mmap_pgoff(). So I think looking up the reservation information this way makes sense, but we definitely want a comment saying that the actual check has already happened (in principle this condition can never be true).
return -ENOMEM;
+#ifdef CONFIG_CHERI_PURECAP_UABI
if (reserv_vma_is_supported(old_vma)) {
reserv_start = old_vma->reserv_start;
reserv_len = old_vma->reserv_len;
reserv_perm = old_vma->reserv_perm;
}
+#endif
vma_iter_set(&vmi, addr);
- }
- /* Unmap any existing mapping in the area */ if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) return -ENOMEM;
@@ -2679,7 +2712,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Check next */ if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
NULL_VM_UFFD_CTX, NULL)) {
NULL_VM_UFFD_CTX, NULL) &&
merge_end = next->vm_end; vma = next; vm_pgoff = next->vm_pgoff - pglen;reserv_vma_valid_address(next, addr, len)) {
@@ -2690,7 +2724,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, pgoff, vma->vm_userfaultfd_ctx, NULL) : can_vma_merge_after(prev, vm_flags, NULL, file, pgoff,
NULL_VM_UFFD_CTX, NULL))) {
NULL_VM_UFFD_CTX, NULL)) &&
merge_start = prev->vm_start; vma = prev; vm_pgoff = prev->vm_pgoff;reserv_vma_valid_address(prev, addr, len)) {
@@ -2722,6 +2757,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff;
- if (new_reserv)
reserv_vma_set_reserv(vma, addr, len,
mapping_to_capability_perm(prot, (vm_flags & VM_SHARED)
? false : true));
- else
reserv_vma_set_reserv(vma, reserv_start, reserv_len, reserv_perm);
if (file) { if (vm_flags & VM_SHARED) { @@ -3320,6 +3361,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, *vmap = vma = new_vma; } *need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff);
reserv_vma_set_reserv(new_vma, addr, len, 0);
AFAICT we should do nothing in this case. An existing vma has been expanded when duplicating the target vma, and the reservation it is in remains unchanged.
} else { new_vma = vm_area_dup(vma); if (!new_vma) @@ -3327,6 +3369,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff;
reserv_vma_set_reserv(new_vma, addr, len, 0);
We cannot use the (addr, len) range as reservation bounds unconditionally. This should be correct if we are actually creating a new reservation, but if MREMAP_FIXED was passed we might actually be moving the mapping into an existing reservation, in which case we should copy reserv_* from another vma in that reservation.
if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma))
diff --git a/mm/util.c b/mm/util.c index 61de3bf7712b..f6fbfa7d014e 100644 --- a/mm/util.c +++ b/mm/util.c @@ -557,7 +557,8 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, return ret; } -unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr,
Provided that in-kernel users do not manipulate mappings with capabilities, do we need to change the prototypes of vm_* at all?
Kevin
unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset) { @@ -566,7 +567,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr, if (unlikely(offset_in_page(offset))) return -EINVAL;
- return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
- return vm_mmap_pgoff(file, usrptr, len, prot, flag, offset >> PAGE_SHIFT);
} EXPORT_SYMBOL(vm_mmap);
On 1/12/24 20:26, Kevin Brodsky wrote:
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
PCuABI memory reservation requires adding reservation properties while creating and modifying the vma. reserv_vma_init_reserv() is added to initialize them and reserv_vma_set_reserv() is used to update those details. As of now only for mmap/mremap syscalls these properties are added and later commits will add them for other special vma mappings.
PCuABI memory reservation also requires merging or expanding vma's which only belong to the original reservation. Use suitable reservation interfaces to check those properties before performing such operations on the vma.
The address parameter type is modified from unsigned long to user_uintptr_t in several do_mmap() upstream functions to find out if a new reservation is required or not.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
include/linux/mm.h | 24 +++++++++++++++--- kernel/fork.c | 3 +++ mm/mmap.c | 63 ++++++++++++++++++++++++++++++++++++++-------- mm/util.c | 5 ++-- 4 files changed, 80 insertions(+), 15 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 5b79120c999a..06d2aee83aa4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -747,6 +747,21 @@ static inline void vma_mark_detached(struct vm_area_struct *vma, #endif /* CONFIG_PER_VMA_LOCK */ +#ifdef CONFIG_CHERI_PURECAP_UABI
+static void reserv_vma_init_reserv(struct vm_area_struct *vma) +{
- vma->reserv_start = 0;
- vma->reserv_len = 0;
- vma->reserv_perm = 0;
vma_init() already zero-inits the vma.
Yes good catch.
+}
+#else /* CONFIG_CHERI_PURECAP_UABI */
+static void reserv_vma_init_reserv(struct vm_area_struct *vma) {}
+#endif /* CONFIG_CHERI_PURECAP_UABI */ [...]
-unsigned long do_mmap(struct file *file, unsigned long addr, +user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1233,6 +1236,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct mm_struct *mm = current->mm; vm_flags_t vm_flags; int pkey = 0;
- unsigned long addr = (ptraddr_t)usrptr;
- bool new_caps = true;
It's not really obvious what "caps" means (especially with an "s"). Maybe "new_reserv" like the mmap_region() parameter.
Also I'd suggest s/usrptr/user_ptr/ (since that's the abbreviation we've generally used so far).
ok
validate_mm(mm); *populate = 0; @@ -1240,6 +1245,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) {
if (cheri_tag_get(usrptr))
new_caps = false;
- }
+#endif /* * Does the application expect PROT_READ to imply PROT_EXEC? * @@ -1397,20 +1408,25 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
- if (new_caps)
usrptr = addr;
usrptr isn't actually used in this patch, maybe the changes in this function belong to the next patch. That said these two lines don't seem to have any effect even after patch 8.
Looks like some code got missed in patch preperation.
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, prot, new_caps); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len;
- return addr; }
-user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, +user_uintptr_t ksys_mmap_pgoff(user_uintptr_t usrptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; user_uintptr_t retval;
- ptraddr_t addr = (ptraddr_t)usrptr;
addr isn't used in this patch. Those changes probably belong to the next patch too.
if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); @@ -1443,7 +1459,7 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, return PTR_ERR(file); }
- retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
- retval = vm_mmap_pgoff(file, usrptr, len, prot, flags, pgoff); out_fput: if (file) fput(file);
@@ -2628,15 +2644,18 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
{ struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL;struct list_head *uf, unsigned long prot, bool new_reserv)
- struct vm_area_struct *next, *prev, *merge;
- struct vm_area_struct *next, *prev, *merge, *old_vma; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end;
- unsigned long reserv_start = 0;
- unsigned long reserv_len = 0;
- unsigned long reserv_perm = 0; pgoff_t vm_pgoff; int error; VMA_ITERATOR(vmi, mm, addr);
@@ -2656,6 +2675,20 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return -ENOMEM; }
- if (!new_reserv) {
old_vma = vma_find(&vmi, end);
if (!old_vma)
I initially thought the check was insufficient, but then realised that the actual check happens by calling reserv_vmi_range_mapped() in ksys_mmap_pgoff(). So I think looking up the reservation information this way makes sense, but we definitely want a comment saying that the actual check has already happened (in principle this condition can never be true).
ok
return -ENOMEM;
+#ifdef CONFIG_CHERI_PURECAP_UABI
if (reserv_vma_is_supported(old_vma)) {
reserv_start = old_vma->reserv_start;
reserv_len = old_vma->reserv_len;
reserv_perm = old_vma->reserv_perm;
}
+#endif
vma_iter_set(&vmi, addr);
- }
- /* Unmap any existing mapping in the area */ if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) return -ENOMEM;
@@ -2679,7 +2712,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Check next */ if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
NULL_VM_UFFD_CTX, NULL)) {
NULL_VM_UFFD_CTX, NULL) &&
merge_end = next->vm_end; vma = next; vm_pgoff = next->vm_pgoff - pglen;reserv_vma_valid_address(next, addr, len)) {
@@ -2690,7 +2724,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, pgoff, vma->vm_userfaultfd_ctx, NULL) : can_vma_merge_after(prev, vm_flags, NULL, file, pgoff,
NULL_VM_UFFD_CTX, NULL))) {
NULL_VM_UFFD_CTX, NULL)) &&
merge_start = prev->vm_start; vma = prev; vm_pgoff = prev->vm_pgoff;reserv_vma_valid_address(prev, addr, len)) {
@@ -2722,6 +2757,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff;
- if (new_reserv)
reserv_vma_set_reserv(vma, addr, len,
mapping_to_capability_perm(prot, (vm_flags & VM_SHARED)
? false : true));
- else
reserv_vma_set_reserv(vma, reserv_start, reserv_len, reserv_perm);
if (file) { if (vm_flags & VM_SHARED) { @@ -3320,6 +3361,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, *vmap = vma = new_vma; } *need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff);
reserv_vma_set_reserv(new_vma, addr, len, 0);
AFAICT we should do nothing in this case. An existing vma has been expanded when duplicating the target vma, and the reservation it is in remains unchanged.
} else { new_vma = vm_area_dup(vma); if (!new_vma) @@ -3327,6 +3369,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff;
reserv_vma_set_reserv(new_vma, addr, len, 0);
We cannot use the (addr, len) range as reservation bounds unconditionally. This should be correct if we are actually creating a new reservation, but if MREMAP_FIXED was passed we might actually be moving the mapping into an existing reservation, in which case we should copy reserv_* from another vma in that reservation.
Ok I will check this.
if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma))
diff --git a/mm/util.c b/mm/util.c index 61de3bf7712b..f6fbfa7d014e 100644 --- a/mm/util.c +++ b/mm/util.c @@ -557,7 +557,8 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, return ret; } -unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr,
Provided that in-kernel users do not manipulate mappings with capabilities, do we need to change the prototypes of vm_* at all?
As discussed earlier, elf_map in binfmt_elf.c will use now MAP_FIXED to change mapping of individual elf segment so full capability has to be supplied. In case of vm_munmap() full capability may not be required but for consistency sake may be capability can be used there too.
Thanks, Amit
Kevin
unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset) { @@ -566,7 +567,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr, if (unlikely(offset_in_page(offset))) return -EINVAL;
- return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
- return vm_mmap_pgoff(file, usrptr, len, prot, flag, offset >> PAGE_SHIFT); } EXPORT_SYMBOL(vm_mmap);
On 19/01/2024 10:45, Amit Daniel Kachhap wrote:
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr,
Provided that in-kernel users do not manipulate mappings with capabilities, do we need to change the prototypes of vm_* at all?
As discussed earlier, elf_map in binfmt_elf.c will use now MAP_FIXED to change mapping of individual elf segment so full capability has to be supplied. In case of vm_munmap() full capability may not be required but for consistency sake may be capability can be used there too.
My proposal in RFCv2 [1] is precisely not to use capabilities at all when calling vm_* directly from kernel subsystems, which is also why I am suggesting dropping patch 5 (no capability manipulation in get_unmapped_area). It seems to me that this makes things quite a bit more straightforward, but let's discuss if this approach is problematic in some ways.
Kevin
[1] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
Hi,
On 1/22/24 18:26, Kevin Brodsky wrote:
On 19/01/2024 10:45, Amit Daniel Kachhap wrote:
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr,
Provided that in-kernel users do not manipulate mappings with capabilities, do we need to change the prototypes of vm_* at all?
As discussed earlier, elf_map in binfmt_elf.c will use now MAP_FIXED to change mapping of individual elf segment so full capability has to be supplied. In case of vm_munmap() full capability may not be required but for consistency sake may be capability can be used there too.
My proposal in RFCv2 [1] is precisely not to use capabilities at all when calling vm_* directly from kernel subsystems, which is also why I am suggesting dropping patch 5 (no capability manipulation in get_unmapped_area). It seems to me that this makes things quite a bit more straightforward, but let's discuss if this approach is problematic in some ways.
ok I got your perspective now. I actually took a liberty and not used RFCv2 [1] proposal as it is but with a slight change. Let me explain.
Your suggestion: 1. Map the first segment [unchanged] 2. Call reserv_vma_insert_entry() on that new vma, setting the reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
Here step 1 internally first maps full segment length and then munmaps (full segment length- segment length) to make sure sufficient memory mapping possible.
Here step 3 was suggested to skip the reservation check for MAP_FIXED and vm_mmap() kernel call which is a deviation from syscall mmap.
My changes in step 1, 2 and 3 are,: a) skip munmap of (full segment length- segment length) for PCuABI and let the full length mapping exist. Here for mmap, maximum protmax permission is used and hence the reservation bound covers all the segment.
b) Step 2 not needed.
c) Map the rest of the individual segment with full capability and MAP_FIXED flag.(Same as syscall mmap)
I feel this approach is simpler. Let me know your concern.
Thanks, Amit
Kevin
[1] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
On 24/01/2024 10:36, Amit Daniel Kachhap wrote:
Hi,
On 1/22/24 18:26, Kevin Brodsky wrote:
On 19/01/2024 10:45, Amit Daniel Kachhap wrote:
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr,
Provided that in-kernel users do not manipulate mappings with capabilities, do we need to change the prototypes of vm_* at all?
As discussed earlier, elf_map in binfmt_elf.c will use now MAP_FIXED to change mapping of individual elf segment so full capability has to be supplied. In case of vm_munmap() full capability may not be required but for consistency sake may be capability can be used there too.
My proposal in RFCv2 [1] is precisely not to use capabilities at all when calling vm_* directly from kernel subsystems, which is also why I am suggesting dropping patch 5 (no capability manipulation in get_unmapped_area). It seems to me that this makes things quite a bit more straightforward, but let's discuss if this approach is problematic in some ways.
ok I got your perspective now. I actually took a liberty and not used RFCv2 [1] proposal as it is but with a slight change. Let me explain.
Your suggestion:
- Map the first segment [unchanged]
- Call reserv_vma_insert_entry() on that new vma, setting the
reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
Here step 1 internally first maps full segment length and then munmaps (full segment length- segment length) to make sure sufficient memory mapping possible.
Apologies, I had completely misunderstood how (upstream) binfmt_elf maps the executable and interpreter. Since we do indeed start with a vm_mmap() encompassing *all the segments*, then a reservation with appropriate bounds gets created at this point and it just works.
Here step 3 was suggested to skip the reservation check for MAP_FIXED and vm_mmap() kernel call which is a deviation from syscall mmap.
My changes in step 1, 2 and 3 are,: a) skip munmap of (full segment length- segment length) for PCuABI and let the full length mapping exist. Here for mmap, maximum protmax permission is used and hence the reservation bound covers all the segment.
b) Step 2 not needed.
c) Map the rest of the individual segment with full capability and MAP_FIXED flag.(Same as syscall mmap)
I feel this approach is simpler. Let me know your concern.
This would be a convenient solution if we could rely on segments being contiguous, unfortunately we cannot. There can be (unmapped) gaps between segments. We could potentially detect these gaps and unmap them in a loop, but it's not particularly elegant.
AFAICT, in this series (patch 8), the "no-reuse" check is done by calling reserv_vmi_range_mapped() in ksys_mmap_pgoff(). As a result, there is no such check when calling vm_mmap() (which is indeed what I suggested). So it seems to me that keeping the way binmft_elf maps segments unchanged (map the whole range, unmap the range beyond the first segment, then mmap(MAP_FIXED) each remaining segment) should work just fine.
With that said, my original comment in this thread was about whether vm_mmap() and friends manipulate capabilities or not. To me this topic is orthogonal to how we handle the initial reservations in binfmt_elf. The one potential issue is that without a capability, we don't know if we are meant to operate on an existing reservation or not in the MAP_FIXED case. I don't think this is especially problematic though - we can simply choose based on whether the specified range is actually within a reservation or not. All capability checks are done at the syscall handler level, which is what patch 8 already does. I may be missing something though, let me know.
Kevin
Use the recently introduced PCuABI reservation interfaces to add different parameter constraints for mmap/munmap syscall. The capability returned by mmap syscalls is now bounded and is same as the reservation range. These reservation checks do not affect the compat64 code path.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/cheri.h | 3 +++ include/linux/mm.h | 2 +- mm/mmap.c | 58 ++++++++++++++++++++++++++++++++++++++----- mm/util.c | 7 ------ 4 files changed, 56 insertions(+), 14 deletions(-)
diff --git a/include/linux/cheri.h b/include/linux/cheri.h index e5f588b056ad..02ef0e911e63 100644 --- a/include/linux/cheri.h +++ b/include/linux/cheri.h @@ -37,6 +37,9 @@ (CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM) #endif
+#define cheri_is_null_derived(cap) \ + cheri_is_equal_exact((uintcap_t)cheri_address_get(cap), cap) + /** * cheri_build_user_cap() - Create a userspace capability. * @addr: Requested capability address. diff --git a/include/linux/mm.h b/include/linux/mm.h index 06d2aee83aa4..8e6541a8d4f9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3186,7 +3186,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} /* These take the mm semaphore themselves */ extern int __must_check vm_brk(unsigned long, unsigned long); extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long); -extern int vm_munmap(unsigned long, size_t); +extern int vm_munmap(user_uintptr_t, size_t); extern user_uintptr_t __must_check vm_mmap(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long, unsigned long); diff --git a/mm/mmap.c b/mm/mmap.c index 7fd7d3ac377b..95d16e306559 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1238,6 +1238,7 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, int pkey = 0; unsigned long addr = (ptraddr_t)usrptr; bool new_caps = true; + bool ignore_reserv = true;
validate_mm(mm); *populate = 0; @@ -1247,6 +1248,7 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr,
#ifdef CONFIG_CHERI_PURECAP_UABI if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) { + ignore_reserv = false; if (cheri_tag_get(usrptr)) new_caps = false; } @@ -1284,7 +1286,7 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, /* Obtain the address to map to. we verify (or select) it and ensure * that it represents a valid section of the address space. */ - addr = get_unmapped_area(file, addr, len, pgoff, flags); + addr = get_unmapped_area(file, usrptr, len, pgoff, flags); if (IS_ERR_VALUE(addr)) return addr;
@@ -1416,6 +1418,13 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; + if (!IS_ERR_VALUE(addr)) { + if (new_caps && !ignore_reserv) + usrptr = build_owning_capability(addr, len, + mapping_to_capability_perm(prot, + (flags & MAP_SHARED) ? false : true)); + return usrptr; + }
return addr; } @@ -1425,8 +1434,11 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t usrptr, unsigned long len, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; - user_uintptr_t retval; + user_uintptr_t retval = -EINVAL; ptraddr_t addr = (ptraddr_t)usrptr; +#ifdef CONFIG_CHERI_PURECAP_UABI + VMA_ITERATOR(vmi, current->mm, addr); +#endif
if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); @@ -1458,6 +1470,26 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t usrptr, unsigned long len, if (IS_ERR(file)) return PTR_ERR(file); } +#ifdef CONFIG_CHERI_PURECAP_UABI + if (!test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + goto skip_pcuabi_checks; + if (cheri_tag_get(usrptr)) { + if (!(flags & MAP_FIXED) || !capability_owns_range(usrptr, addr, len)) + goto out_fput; + if (!reserv_vmi_valid_capability(&vmi, usrptr, false)) { + retval = -ERESERVATION; + goto out_fput; + } + if (!reserv_vmi_range_mapped(&vmi, addr, len, false)) { + retval = -ENOMEM; + goto out_fput; + } + } else { + if (!cheri_is_null_derived(usrptr)) + goto out_fput; + } +skip_pcuabi_checks: +#endif
retval = vm_mmap_pgoff(file, usrptr, len, prot, flags, pgoff); out_fput: @@ -2941,15 +2973,29 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade) return ret; }
-int vm_munmap(unsigned long start, size_t len) +/* TODO [PCuABI] - Update the users of vm_munmap */ +int vm_munmap(user_uintptr_t usrptr, size_t len) { - return __vm_munmap(start, len, false); + return __vm_munmap((ptraddr_t)usrptr, len, false); } EXPORT_SYMBOL(vm_munmap);
-SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) +SYSCALL_DEFINE2(munmap, user_uintptr_t, usrptr, size_t, len) { - addr = untagged_addr(addr); + ptraddr_t addr = untagged_addr((ptraddr_t)usrptr); + VMA_ITERATOR(vmi, current->mm, addr); + +#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, addr); +#else + usrptr = addr; +#endif + if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) { + if (!capability_owns_range(usrptr, addr, len)) + return -EINVAL; + if (!reserv_vmi_valid_capability(&vmi, usrptr, false)) + return -ERESERVATION; + } return __vm_munmap(addr, len, true); }
diff --git a/mm/util.c b/mm/util.c index f6fbfa7d014e..d0f937c334f3 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,19 +540,12 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - /* - * TODO [PCuABI] - might need propagating uintcap further down - * to do_mmap to properly handle capabilities - */ ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); - /* TODO [PCuABI] - derive proper capability */ - if (!IS_ERR_VALUE(ret)) - ret = (user_uintptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); } return ret; }
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
diff --git a/mm/mmap.c b/mm/mmap.c index 7fd7d3ac377b..95d16e306559 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1238,6 +1238,7 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t usrptr, int pkey = 0; unsigned long addr = (ptraddr_t)usrptr; bool new_caps = true;
- bool ignore_reserv = true;
I think we could do without that, defaulting new_caps to false. This means that mmap_region() should do nothing regardless of the value of new_reserv in !PCuABI, which is easily achieved by only doing the vma lookup if we are using reservations (mm_flags has MMF_PCUABI_RESERV).
[...]
-SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) +SYSCALL_DEFINE2(munmap, user_uintptr_t, usrptr, size_t, len) {
- addr = untagged_addr(addr);
- ptraddr_t addr = untagged_addr((ptraddr_t)usrptr);
- VMA_ITERATOR(vmi, current->mm, addr);
+#ifdef CONFIG_CHERI_PURECAP_UABI
- usrptr = cheri_address_set(usrptr, addr);
This is pretty cumbersome. It would be preferable to ensure that we always untag the address of a capability before comparing it. In fact, cheri_check_cap() already does that, and so does capability_owns_range() as a result. Note that the base and limit of a capability are never tagged, so reserv_vmi_valid_capability() should also cope with a tagged address.
Kevin
+#else
- usrptr = addr;
+#endif
- if (test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) {
if (!capability_owns_range(usrptr, addr, len))
return -EINVAL;
if (!reserv_vmi_valid_capability(&vmi, usrptr, false))
return -ERESERVATION;
- } return __vm_munmap(addr, len, true);
}
[...]
Use the recently introduced PCuABI reservation interfaces and add the relevant capability/reservation constraint checks on the different mremap syscall parameters.
Modify vma_expandable() to accept input address as capability pointer since get_unmapped_area() now requires it for MMAP_FIXED type mappings.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mremap.c | 93 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 76 insertions(+), 17 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c index b52592303e8b..2e2955417730 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -25,6 +25,7 @@ #include <linux/uaccess.h> #include <linux/userfaultfd_k.h> #include <linux/mempolicy.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/cacheflush.h> #include <asm/tlb.h> @@ -785,16 +786,19 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, return vma; }
-static unsigned long mremap_to(unsigned long addr, unsigned long old_len, - unsigned long new_addr, unsigned long new_len, bool *locked, +static user_uintptr_t mremap_to(user_uintptr_t usrptr, unsigned long old_len, + user_uintptr_t new_usrptr, unsigned long new_len, bool *locked, unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - unsigned long ret = -EINVAL; + user_uintptr_t ret = -EINVAL; unsigned long map_flags = 0; + ptraddr_t addr = (ptraddr_t)usrptr; + ptraddr_t new_addr = (ptraddr_t)new_usrptr; + unsigned long old_perm = 0;
if (offset_in_page(new_addr)) goto out; @@ -865,22 +869,38 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (!(flags & MREMAP_FIXED)) new_addr = ret;
+#ifdef CONFIG_CHERI_PURECAP_UABI + old_perm = vma->reserv_perm; +#endif ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap);
+ if (!IS_ERR_VALUE(ret)) { + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) { + if (!(flags & MREMAP_FIXED)) + ret = build_owning_capability(new_addr, new_len, old_perm); + else + ret = new_usrptr; + } + } out: return ret; }
-static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) +static int vma_expandable(user_uintptr_t usrptr, struct vm_area_struct *vma, unsigned long delta) { unsigned long end = vma->vm_end + delta;
+#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, vma->vm_start); +#else + usrptr = vma->vm_start; +#endif if (end < vma->vm_end) /* overflow */ return 0; if (find_vma_intersection(vma->vm_mm, vma->vm_end, end)) return 0; - if (get_unmapped_area(NULL, vma->vm_start, end - vma->vm_start, + if (get_unmapped_area(NULL, usrptr, end - vma->vm_start, 0, MAP_FIXED) & ~PAGE_MASK) return 0; return 1; @@ -893,9 +913,9 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise * This option implies MREMAP_MAYMOVE. */ -SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len, +SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, usrptr, unsigned long, old_len, unsigned long, new_len, unsigned long, flags, - user_uintptr_t, new_addr) + user_uintptr_t, new_usrptr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; @@ -903,10 +923,15 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len bool locked = false; bool downgraded = false; struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; + ptraddr_t addr = (ptraddr_t)usrptr; + ptraddr_t new_addr = (ptraddr_t)new_usrptr; + unsigned long old_perm = 0; LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap); +#ifdef CONFIG_CHERI_PURECAP_UABI + VMA_ITERATOR(vmi, current->mm, new_addr); +#endif
- /* @TODO [PCuABI] - capability validation */ /* * There is a deliberate asymmetry here: we strip the pointer tag * from the old address but leave the new address alone. This is @@ -918,6 +943,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len * See Documentation/arm64/tagged-address-abi.rst for more information. */ addr = untagged_addr(addr); +#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, addr); +#endif
if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; @@ -956,6 +984,34 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (!test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + goto skip_pcuabi_checks; + if (!capability_owns_range(usrptr, addr, old_len ? old_len : new_len)) + goto out; + if (!reserv_vma_valid_capability(vma, usrptr)) { + ret = -ERESERVATION; + goto out; + } + if ((flags & MREMAP_FIXED)) { + if (!capability_owns_range(new_usrptr, new_addr, new_len)) + goto out; + if (!reserv_vmi_valid_capability(&vmi, new_usrptr, true)) { + ret = -ERESERVATION; + goto out; + } + if (!reserv_vmi_range_mapped(&vmi, new_addr, new_len, true)) { + ret = -ENOMEM; + goto out; + } + } else { + if (!cheri_is_null_derived(new_usrptr)) + goto out; + } + old_perm = vma->reserv_perm; +skip_pcuabi_checks: +#endif + if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -977,7 +1033,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len }
if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) { - ret = mremap_to(addr, old_len, new_addr, new_len, + ret = mremap_to(usrptr, old_len, new_usrptr, new_len, &locked, flags, &uf, &uf_unmap_early, &uf_unmap); goto out; @@ -1003,7 +1059,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
- ret = addr; + ret = usrptr; goto out; }
@@ -1020,7 +1076,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len */ if (old_len == vma->vm_end - addr) { /* can we just expand the current mapping? */ - if (vma_expandable(vma, new_len - old_len)) { + if (vma_expandable(usrptr, vma, new_len - old_len)) { long pages = (new_len - old_len) >> PAGE_SHIFT; unsigned long extension_start = addr + old_len; unsigned long extension_end = addr + new_len; @@ -1059,7 +1115,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len locked = true; new_addr = addr; } - ret = addr; + ret = usrptr; goto out; } } @@ -1083,8 +1139,14 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
- ret = move_vma(vma, addr, old_len, new_len, new_addr, + ret = move_vma(vma, usrptr, old_len, new_len, new_addr, &locked, flags, &uf, &uf_unmap); + if (!IS_ERR_VALUE(ret)) { + if (test_bit(MMF_PCUABI_RESERVE, &mm->flags)) + ret = build_owning_capability(new_addr, new_len, old_perm); + else + ret = (user_uintptr_t)new_addr; + } } out: if (offset_in_page(ret)) @@ -1098,8 +1160,5 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len userfaultfd_unmap_complete(mm, &uf_unmap_early); mremap_userfaultfd_complete(&uf, addr, ret, old_len); userfaultfd_unmap_complete(mm, &uf_unmap); - /* TODO [PCuABI] - derive proper capability */ - return IS_ERR_VALUE(ret) ? - ret : - (user_intptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); + return ret; }
Use the recently introduced PCuABI reservation interfaces and add the relevant capability/reservation constraint checks on the mprotect syscall parameters.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mprotect.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index c9188e2cb2a6..6b034fb7b5b1 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -32,6 +32,7 @@ #include <linux/sched/sysctl.h> #include <linux/userfaultfd_k.h> #include <linux/memory-tiers.h> +#include <linux/cap_addr_mgmt.h> #include <asm/cacheflush.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> @@ -728,7 +729,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, /* * pkey==-1 when doing a legacy mprotect() */ -static int do_mprotect_pkey(user_uintptr_t start, size_t len, +static int do_mprotect_pkey(user_uintptr_t usrptr, size_t len, unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; @@ -739,9 +740,7 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, (prot & PROT_READ); struct mmu_gather tlb; struct vma_iterator vmi; - - /* TODO [PCuABI] - capability checks for uaccess */ - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)usrptr);
prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ @@ -755,6 +754,12 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, end = start + len; if (end <= start) return -ENOMEM; + +#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, start); +#endif + if (!capability_owns_range(usrptr, start, len)) + return -EINVAL; if (!arch_validate_prot(prot, start)) return -EINVAL;
@@ -812,6 +817,12 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, break; }
+ /* Check if the capability range is valid with mmap lock. */ + if (!reserv_vma_valid_capability(vma, usrptr)) { + error = -ERESERVATION; + break; + } + /* Does the application expect PROT_READ to imply PROT_EXEC */ if (rier && (vma->vm_flags & VM_MAYEXEC)) prot |= PROT_EXEC; @@ -876,18 +887,18 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, return error; }
-SYSCALL_DEFINE3(mprotect, user_uintptr_t, start, size_t, len, +SYSCALL_DEFINE3(mprotect, user_uintptr_t, usrptr, size_t, len, unsigned long, prot) { - return do_mprotect_pkey(start, len, prot, -1); + return do_mprotect_pkey(usrptr, len, prot, -1); }
#ifdef CONFIG_ARCH_HAS_PKEYS
-SYSCALL_DEFINE4(pkey_mprotect, user_uintptr_t, start, size_t, len, +SYSCALL_DEFINE4(pkey_mprotect, user_uintptr_t, usrptr, size_t, len, unsigned long, prot, int, pkey) { - return do_mprotect_pkey(start, len, prot, pkey); + return do_mprotect_pkey(usrptr, len, prot, pkey); }
SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
Use the recently introduced PCuABI reservation interfaces to verify the address range for madvise syscall.
do_madvise() function is used by virtual address monitoring damon and this may not satisfy the reservation range criteria so add a parameter to skip the reservation checks.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/linux/mm.h | 3 ++- io_uring/advise.c | 2 +- mm/damon/vaddr.c | 2 +- mm/madvise.c | 29 ++++++++++++++++++++++++----- 4 files changed, 28 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 8e6541a8d4f9..4269820bb6a5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3166,7 +3166,8 @@ extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); +extern int do_madvise(struct mm_struct *mm, user_uintptr_t usrptr, size_t len_in, + int behavior, bool reserv_ignore);
#ifdef CONFIG_MMU extern int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, diff --git a/io_uring/advise.c b/io_uring/advise.c index 952d9289a311..2e43142cf4df 100644 --- a/io_uring/advise.c +++ b/io_uring/advise.c @@ -55,7 +55,7 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags) WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
/* TODO [PCuABI] - capability checks for uaccess */ - ret = do_madvise(current->mm, user_ptr_addr(ma->addr), ma->len, ma->advice); + ret = do_madvise(current->mm, (user_uintptr_t)ma->addr, ma->len, ma->advice, false); io_req_set_res(req, ret, 0); return IOU_OK; #else diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 1fec16d7263e..fcdd3f4f608f 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -623,7 +623,7 @@ static unsigned long damos_madvise(struct damon_target *target, if (!mm) return 0;
- applied = do_madvise(mm, start, len, behavior) ? 0 : len; + applied = do_madvise(mm, start, len, behavior, true) ? 0 : len; mmput(mm);
return applied; diff --git a/mm/madvise.c b/mm/madvise.c index ad59e1e07ec9..ea602a2881ef 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -31,6 +31,7 @@ #include <linux/swapops.h> #include <linux/shmem_fs.h> #include <linux/mmu_notifier.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/tlb.h>
@@ -1382,13 +1383,16 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. */ -int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) +int do_madvise(struct mm_struct *mm, user_uintptr_t usrptr, size_t len_in, + int behavior, bool reserv_ignore) { unsigned long end; int error; int write; size_t len; struct blk_plug plug; + unsigned long start = (ptraddr_t)usrptr; + struct vma_iterator vmi;
if (!madvise_behavior_valid(behavior)) return -EINVAL; @@ -1421,14 +1425,29 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh mmap_read_lock(mm); }
- /* TODO [PCuABI] - capability checks for uaccess */ start = untagged_addr_remote(mm, start); end = start + len;
+#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, start); +#endif + if (!reserv_ignore) { + vma_iter_init(&vmi, current->mm, start); + if (!capability_owns_range(usrptr, start, len)) { + error = -EINVAL; + goto out; + } + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_valid_capability(&vmi, usrptr, true)) { + error = -ERESERVATION; + goto out; + } + } blk_start_plug(&plug); error = madvise_walk_vmas(mm, start, end, behavior, madvise_vma_behavior); blk_finish_plug(&plug); +out: if (write) mmap_write_unlock(mm); else @@ -1437,9 +1456,9 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh return error; }
-SYSCALL_DEFINE3(madvise, user_uintptr_t, start, size_t, len_in, int, behavior) +SYSCALL_DEFINE3(madvise, user_uintptr_t, usrptr, size_t, len_in, int, behavior) { - return do_madvise(current->mm, start, len_in, behavior); + return do_madvise(current->mm, usrptr, len_in, behavior, false); }
SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, @@ -1494,7 +1513,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
while (iov_iter_count(&iter)) { ret = do_madvise(mm, user_ptr_addr(iter_iov_addr(&iter)), - iter_iov_len(&iter), behavior); + iter_iov_len(&iter), behavior, false); if (ret < 0) break; iov_iter_advance(&iter, iter_iov_len(&iter));
Use the recently introduced PCuABI reservation interfaces to verify the address range for mlock, mlock2 and munlock syscalls.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mlock.c | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 40b43f8740df..31402d17075a 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -25,6 +25,7 @@ #include <linux/memcontrol.h> #include <linux/mm_inline.h> #include <linux/secretmem.h> +#include <linux/cap_addr_mgmt.h>
#include "internal.h"
@@ -563,14 +564,19 @@ static int __mlock_posix_error_return(long retval) return retval; }
-static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) +static __must_check int do_mlock(user_uintptr_t usrptr, size_t len, vm_flags_t flags) { unsigned long locked; unsigned long lock_limit; int error = -ENOMEM; + unsigned long start = untagged_addr((ptraddr_t)usrptr); + struct vma_iterator vmi;
- start = untagged_addr(start); - +#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, start); +#endif + if (!capability_owns_range(usrptr, start, len)) + return -EINVAL; if (!can_do_mlock()) return -EPERM;
@@ -584,6 +590,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla if (mmap_write_lock_killable(current->mm)) return -EINTR;
+ vma_iter_init(&vmi, current->mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_valid_capability(&vmi, usrptr, true)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } locked += current->mm->locked_vm; if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) { /* @@ -610,12 +622,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla return 0; }
-SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(mlock, user_uintptr_t, start, size_t, len) { return do_mlock(start, len, VM_LOCKED); }
-SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(mlock2, user_uintptr_t, start, size_t, len, int, flags) { vm_flags_t vm_flags = VM_LOCKED;
@@ -628,17 +640,29 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) return do_mlock(start, len, vm_flags); }
-SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(munlock, user_uintptr_t, usrptr, size_t, len) { int ret; + unsigned long start = untagged_addr((ptraddr_t)usrptr); + struct vma_iterator vmi;
- start = untagged_addr(start); +#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, start); +#endif + if (!capability_owns_range(usrptr, start, len)) + return -EINVAL;
len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
if (mmap_write_lock_killable(current->mm)) return -EINTR; + vma_iter_init(&vmi, current->mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_valid_capability(&vmi, usrptr, true)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } ret = apply_vma_lock_flags(start, len, 0); mmap_write_unlock(current->mm);
Use the recently introduced PCuABI reservation interfaces to verify the address range for msync syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/msync.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/mm/msync.c b/mm/msync.c index ac4c9bfea2e7..1c8700c6c98a 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -14,6 +14,7 @@ #include <linux/file.h> #include <linux/syscalls.h> #include <linux/sched.h> +#include <linux/cap_addr_mgmt.h>
/* * MS_SYNC syncs the entire file - including mappings. @@ -29,16 +30,20 @@ * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ -SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(msync, user_uintptr_t, usrptr, size_t, len, int, flags) { unsigned long end; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; int unmapped_error = 0; int error = -EINVAL; + unsigned long start = untagged_addr((ptraddr_t)usrptr);
- start = untagged_addr(start); - +#ifdef CONFIG_CHERI_PURECAP_UABI + usrptr = cheri_address_set(usrptr, start); +#endif + if (!capability_owns_range(usrptr, start, len)) + return -EINVAL; if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; if (offset_in_page(start)) @@ -61,6 +66,11 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) */ mmap_read_lock(mm); vma = find_vma(mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (vma && !reserv_vma_valid_capability(vma, usrptr)) { + error = -ERESERVATION; + goto out_unlock; + } for (;;) { struct file *file; loff_t fstart, fend;
PCuABI specification introduces limitations in expanding the capability permissions through mprotect() system calls. This needs the capabilities to be initially created with maximum permissions, the memory mappings may possess in their lifetime.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- include/uapi/asm-generic/mman-common.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..e7ba511c2bad 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -17,6 +17,12 @@ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
+/* PCuABI mapping and capability permissions */ +#define _PROT_MAX_SHIFT 16 +#define PROT_MAX(prot) ((prot) << _PROT_MAX_SHIFT) +#define PROT_EXTRACT(prot) ((prot) & (PROT_READ | PROT_WRITE | PROT_EXEC)) +#define PROT_MAX_EXTRACT(prot) (((prot) >> _PROT_MAX_SHIFT) & (PROT_READ | PROT_WRITE | PROT_EXEC)) + /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */
Helper functions capability_may_set_prot(), mapping_to_capability_perm() and build_owning_capability() are added/modified to manage capability permissions in address space management syscalls as per PCuABI specifications.
Also a arch specific hook arch_map_to_cap_perm() is added to manage arch specific capability permissions.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- arch/arm64/include/asm/cap_addr_mgmt.h | 22 +++++++++++ include/linux/cap_addr_mgmt.h | 20 ++++++++++ mm/cap_addr_mgmt.c | 54 +++++++++++++++++++++++--- 3 files changed, 91 insertions(+), 5 deletions(-) create mode 100644 arch/arm64/include/asm/cap_addr_mgmt.h
diff --git a/arch/arm64/include/asm/cap_addr_mgmt.h b/arch/arm64/include/asm/cap_addr_mgmt.h new file mode 100644 index 000000000000..aadb4768d2fd --- /dev/null +++ b/arch/arm64/include/asm/cap_addr_mgmt.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_CAP_ADDR_MGMT_H +#define __ASM_CAP_ADDR_MGMT_H + +#include <linux/cheri.h> +#include <linux/mman.h> + +static __always_inline cheri_perms_t arch_map_to_cap_perm(int prot, bool has_tag_access) +{ + cheri_perms_t perms = 0; + + if ((prot & PROT_READ) && has_tag_access) + perms |= ARM_CAP_PERMISSION_MUTABLE_LOAD; + + if ((prot & PROT_EXEC) && + (cheri_perms_get(cheri_pcc_get()) & ARM_CAP_PERMISSION_EXECUTIVE)) + perms |= ARM_CAP_PERMISSION_EXECUTIVE; + + return perms; +} +#define arch_map_to_cap_perm arch_map_to_cap_perm +#endif /* __ASM_CAP_ADDR_MGMT_H */ diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index 6a42e714ecd5..81125cfe74a8 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -9,6 +9,7 @@ #include <linux/mm_types.h> #include <linux/sched/coredump.h> #include <linux/types.h> +#include <asm/cap_addr_mgmt.h>
#ifdef CONFIG_CHERI_PURECAP_UABI #define CHERI_REPRESENTABLE_ALIGNMENT(len) \ @@ -165,6 +166,17 @@ bool capability_owns_range(user_uintptr_t cap, ptraddr_t addr, size_t len); */ user_uintptr_t build_owning_capability(ptraddr_t addr, size_t len, cheri_perms_t perm);
+/** + * capability_may_set_prot() - Verify if the mapping protection flags confirms + * with the capability permission flags. + * @cap: Capability value. + * @prot: Memory protection flags. + * + * Return: True if the capability permissions includes the protection flags + * or false otherwise. + */ +bool capability_may_set_prot(user_uintptr_t cap, int prot); + /** * mapping_to_capability_perm() - Converts memory mapping protection flags to * capability permission flags. @@ -245,4 +257,12 @@ static inline cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_ac
#endif /* CONFIG_CHERI_PURECAP_UABI */
+#ifndef arch_map_to_cap_perm +static __always_inline cheri_perms_t arch_map_to_cap_perm(int prot, + bool has_tag_access) +{ + return 0; +} +#endif /* arch_map_to_cap_perm */ + #endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index 0112b2136755..5ff4fdf26d28 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -199,12 +199,56 @@ user_uintptr_t build_owning_capability(ptraddr_t addr, size_t len, cheri_perms_t return cheri_address_set(ret, addr); }
-cheri_perms_t mapping_to_capability_perm(int prot __maybe_unused, - bool has_tag_access __maybe_unused) +static bool mapping_may_have_prot_flag(int prot, int map_val) { - /* TODO [PCuABI] - capability permission conversion from memory permission */ - return (CHERI_PERMS_READ | CHERI_PERMS_WRITE | - CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP); + int prot_max = PROT_MAX_EXTRACT(prot); + + if (prot_max) + return !!(prot_max & map_val); + else + return !!(prot & map_val); +} + +bool capability_may_set_prot(user_uintptr_t cap, int prot) +{ + cheri_perms_t perms = cheri_perms_get(cap); + + if (!test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + return true; + + if (((prot & PROT_READ) && !(perms & CHERI_PERM_LOAD)) || + ((prot & PROT_WRITE) && !(perms & CHERI_PERM_STORE)) || + ((prot & PROT_EXEC) && !(perms & CHERI_PERM_EXECUTE))) + return false; + + return true; +} + +cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) +{ + cheri_perms_t perms = 0; + + if (mapping_may_have_prot_flag(prot, PROT_READ)) { + perms |= CHERI_PERM_LOAD; + if (has_tag_access) + perms |= CHERI_PERM_LOAD_CAP; + } + if (mapping_may_have_prot_flag(prot, PROT_WRITE)) { + perms |= CHERI_PERM_STORE; + if (has_tag_access) + perms |= (CHERI_PERM_STORE_CAP | CHERI_PERM_STORE_LOCAL_CAP); + } + if (mapping_may_have_prot_flag(prot, PROT_EXEC)) { + perms |= CHERI_PERM_EXECUTE; + if (cheri_perms_get(cheri_pcc_get()) & CHERI_PERM_SYSTEM_REGS) + perms |= CHERI_PERM_SYSTEM_REGS; + } + /* Fetch any extra architecture specific permissions */ + perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot, + has_tag_access); + perms |= CHERI_PERMS_ROOTCAP; + + return perms; }
#endif /* CONFIG_CHERI_PURECAP_UABI */
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
[...]
+cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) +{
- cheri_perms_t perms = 0;
- if (mapping_may_have_prot_flag(prot, PROT_READ)) {
perms |= CHERI_PERM_LOAD;
if (has_tag_access)
perms |= CHERI_PERM_LOAD_CAP;
- }
- if (mapping_may_have_prot_flag(prot, PROT_WRITE)) {
perms |= CHERI_PERM_STORE;
if (has_tag_access)
perms |= (CHERI_PERM_STORE_CAP | CHERI_PERM_STORE_LOCAL_CAP);
- }
- if (mapping_may_have_prot_flag(prot, PROT_EXEC)) {
perms |= CHERI_PERM_EXECUTE;
if (cheri_perms_get(cheri_pcc_get()) & CHERI_PERM_SYSTEM_REGS)
cheri_pcc_get() gives you the current PCC, that is the kernel one. We don't have a generic way to access saved (user) capability registers at the moment (i.e. for PCC an equivalent of instruction_pointer()). Since we're already introducing an arch helper, we might as well add this permission there.
Kevin
perms |= CHERI_PERM_SYSTEM_REGS;
- }
- /* Fetch any extra architecture specific permissions */
- perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot,
has_tag_access);
- perms |= CHERI_PERMS_ROOTCAP;
- return perms;
} #endif /* CONFIG_CHERI_PURECAP_UABI */
The existing userspace may not use the maximum protection bits in the protection flags introduced by PCuABI and hence such applications may have inconsistency in the memory protection flag updated via mprotect() syscall with the capability permission bits.
Reduce the impact of such failures by setting the capability to maximum permission if no maximum protection bits are detected.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/cap_addr_mgmt.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index 5ff4fdf26d28..68139abd004a 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -228,6 +228,12 @@ cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) { cheri_perms_t perms = 0;
+ if (!PROT_MAX_EXTRACT(prot)) { + perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + goto skip_calc_perm; + } + if (mapping_may_have_prot_flag(prot, PROT_READ)) { perms |= CHERI_PERM_LOAD; if (has_tag_access) @@ -247,6 +253,7 @@ cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot, has_tag_access); perms |= CHERI_PERMS_ROOTCAP; +skip_calc_perm:
return perms; }
On 08/01/2024 11:23, Amit Daniel Kachhap wrote:
The existing userspace may not use the maximum protection bits in the protection flags introduced by PCuABI and hence such applications may have inconsistency in the memory protection flag updated via mprotect() syscall with the capability permission bits.
Reduce the impact of such failures by setting the capability to maximum permission if no maximum protection bits are detected.
We do want to detect applications that require PROT_MAX() but do not use it. These should be rare cases (e.g. a JIT that creates an RW mapping initially and then makes it RX, in which case PROT_MAX(RWX) is required). With this patch, tests that expect PCuABI to be implemented as specified should fail.
Kevin
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
mm/cap_addr_mgmt.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index 5ff4fdf26d28..68139abd004a 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -228,6 +228,12 @@ cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) { cheri_perms_t perms = 0;
- if (!PROT_MAX_EXTRACT(prot)) {
perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE |
CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP;
goto skip_calc_perm;
- }
- if (mapping_may_have_prot_flag(prot, PROT_READ)) { perms |= CHERI_PERM_LOAD; if (has_tag_access)
@@ -247,6 +253,7 @@ cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot, has_tag_access); perms |= CHERI_PERMS_ROOTCAP; +skip_calc_perm: return perms; }
MAP_GROWSDOWN flag is not supported by PCuABI specification. Hence reject such requests with -EOPNOTSUPP error.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mmap.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index 95d16e306559..6b66e94f0dfb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1473,6 +1473,15 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t usrptr, unsigned long len, #ifdef CONFIG_CHERI_PURECAP_UABI if (!test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) goto skip_pcuabi_checks; + /* + * Introduce checks for PCuABI: + * - MAP_GROWSDOWN flag do not have fixed bounds and hence not + * supported in PCuABI reservation model. + */ + if (flags & MAP_GROWSDOWN) { + retval = -EOPNOTSUPP; + goto out_fput; + } if (cheri_tag_get(usrptr)) { if (!(flags & MAP_FIXED) || !capability_owns_range(usrptr, addr, len)) goto out_fput;
Add a check that the requested protection bits does not exceed the maximum protection bits.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index 6b66e94f0dfb..a1f85c22321a 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1477,11 +1477,17 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t usrptr, unsigned long len, * Introduce checks for PCuABI: * - MAP_GROWSDOWN flag do not have fixed bounds and hence not * supported in PCuABI reservation model. + * - PCuABI reservation model introduces the concept of maximum + * protection the mappings can have. Add a check to make sure the + * requested protection does not exceed the maximum protection. */ if (flags & MAP_GROWSDOWN) { retval = -EOPNOTSUPP; goto out_fput; } + if ((PROT_MAX_EXTRACT(prot) != 0) && + ((PROT_EXTRACT(prot) & PROT_MAX_EXTRACT(prot)) != PROT_EXTRACT(prot))) + goto out_fput; if (cheri_tag_get(usrptr)) { if (!(flags & MAP_FIXED) || !capability_owns_range(usrptr, addr, len)) goto out_fput;
Check that the permission of new user address does not exceed the permission of old user address for mremap syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mremap.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/mremap.c b/mm/mremap.c index 2e2955417730..d36ed780bca8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -996,6 +996,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, usrptr, unsigned long, old_l if ((flags & MREMAP_FIXED)) { if (!capability_owns_range(new_usrptr, new_addr, new_len)) goto out; + if ((cheri_perms_get(usrptr) | cheri_perms_get(new_usrptr)) + != cheri_perms_get(usrptr)) + goto out; if (!reserv_vmi_valid_capability(&vmi, new_usrptr, true)) { ret = -ERESERVATION; goto out;
Check that the requested permission matches the constraints of input user capability address for mprotect syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mprotect.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 6b034fb7b5b1..bad4b5da5b38 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -760,6 +760,8 @@ static int do_mprotect_pkey(user_uintptr_t usrptr, size_t len, #endif if (!capability_owns_range(usrptr, start, len)) return -EINVAL; + if (!capability_may_set_prot(usrptr, prot)) + return -EINVAL; if (!arch_validate_prot(prot, start)) return -EINVAL;
Different capability permission and bound constraints are added as per PCuABI specification for mincore() syscall. mincore() does not need VMem permission and any of RWX memory permission so standard capability_owns_range() interface is not used and permissions are verified explicitly.
Also as mincore() allows the address range to not span whole pages so checking only a single byte at the page intersection is sufficient.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mincore.c | 46 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 41 insertions(+), 5 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c index 3a307bfa91c4..7bc6e0a7a7f3 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -19,6 +19,7 @@ #include <linux/hugetlb.h> #include <linux/pgtable.h>
+#include <linux/cap_addr_mgmt.h> #include <linux/uaccess.h> #include "swap.h"
@@ -184,15 +185,19 @@ static const struct mm_walk_ops mincore_walk_ops = { * all the arguments, we hold the mmap semaphore: we should * just return the amount of info we're asked for. */ -static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec) +static long do_mincore(user_uintptr_t usrptr, unsigned long pages, unsigned char *vec) { struct vm_area_struct *vma; unsigned long end; + unsigned long addr = (ptraddr_t)usrptr; int err;
vma = vma_lookup(current->mm, addr); if (!vma) return -ENOMEM; + /* Check if the capability range is valid with mmap lock. */ + if (!reserv_vma_valid_capability(vma, usrptr)) + return -ERESERVATION; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); if (!can_do_mincore(vma)) { unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE); @@ -229,14 +234,16 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v * mapped * -EAGAIN - A kernel resource was temporarily unavailable. */ -SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, +SYSCALL_DEFINE3(mincore, user_uintptr_t, usrptr, size_t, len, unsigned char __user *, vec) { long retval; unsigned long pages; unsigned char *tmp; - - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)usrptr); +#ifdef CONFIG_CHERI_PURECAP_UABI + unsigned long cap_start, cap_len; +#endif
/* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) @@ -253,6 +260,35 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, if (!access_ok(vec, pages)) return -EFAULT;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (!test_bit(MMF_PCUABI_RESERVE, ¤t->mm->flags)) + goto skip_pcuabi_checks; + /* + * mincore syscall does not need VMem permission so as to allow ordinary pages. + * Also at least one of the standard memory permissions RWX will help to reject + * non memory capabilities. + */ + usrptr = cheri_address_set(usrptr, start); + if (cheri_is_invalid(usrptr) || cheri_is_sealed(usrptr) || + !(CHERI_PERM_GLOBAL & cheri_perms_get(usrptr)) || + !((CHERI_PERM_LOAD | CHERI_PERM_STORE | CHERI_PERM_EXECUTE) + & cheri_perms_get(usrptr))) + return -EINVAL; + /* + * mincore syscall can be invoked as: + * mincore(align_down(p, PAGE_SIZE), sz + (p.addr % PAGE_SIZE), vec) + * Hence, the capability might not consider the increased range due to + * alignment. In this scenario, check only the single byte at the page + * intersection. + */ + cap_start = cheri_base_get(usrptr); + cap_len = cheri_length_get(usrptr); + if ((start + PAGE_SIZE <= cap_start) || + (cap_start + cap_len < start + len - offset_in_page(len))) + return -EINVAL; +skip_pcuabi_checks: +#endif + tmp = (void *) __get_free_page(GFP_USER); if (!tmp) return -EAGAIN; @@ -264,7 +300,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, * the temporary buffer size. */ mmap_read_lock(current->mm); - retval = do_mincore(start, min(pages, PAGE_SIZE), tmp); + retval = do_mincore(usrptr, min(pages, PAGE_SIZE), tmp); mmap_read_unlock(current->mm);
if (retval <= 0)
linux-morello@op-lists.linaro.org