Hi All,
This patch series introduces the mm reservation interface to manage the owning capability of the allocated addresses. As compared to the RFC v1, this series adds reservation details in the VMA structure. Looking for feedback regarding interface names, interface directory structure etc. Patch [1-11] manages capability bounds via reservation interface. Patch [12-19] adds support for managing capability permissions. Details about several rules implemented can be found in PCuABI spec here [1].
Changes in RFC v2:
1) Removed separate maple tree structures for the reservation interface and modified the vma structure to add the reservation details. As most of the mmap/munmap operations happen per-vma so this reduced the code churnings. However this approach will increase time-complexity of syscalls which operate across vma's such as mlock, madvise etc. get_unmapped_area() which generated free unmapped virtual address may now need more iterations.
2) Added Cheri base representability and length representability modifications. Now get_unmapped_area() will generate CHERI representable addresses.
3) Added new PCuABI changes for mincore() syscall.
4) Added changes for compat64.
Future works:
1) Users of vm_mmap/vm_munmap() i.e. filesystems, loaders, kexec etc to be modified to preserve capability addresses. 2) Cover remaining memory addressing syscalls.
Testing:
1) All tests by Chaitanya in v4 selftests [2] passes. 2) Purecap/Compat Busybox boot passes after adding [WIP] patches present in [3].
The whole series can be found here [3].
[1]: https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca... [2]: https://git.morello-project.org/chaitanya_prakash/linux.git review/purecap_mmap_testcases [3]: https://git.morello-project.org/amitdaniel/linux.git review/purecap_mm_reservation_v2
Thanks, Amit Daniel
Amit Daniel Kachhap (19): uapi: errno.h: Introduce PCuABI memory reservation error arm64: morello: Add VM_PCUABI_RESERVE flags mm: Add capability reservation interfaces in vma for PCuABI mm/cap_addr_mgmt: Add capability bound helpers for PCuABI mm/mmap: Modify unmapped address space management code for PCuABI mm/mmap: Use the PCuABI reservations in mmap/munmap mm/mremap: Add the PCuABI reservation interfaces mm/mprotect: Add the PCuABI reservation interfaces mm/madvise: Add the PCuABI reservation interfaces mm/mlock: Add the PCuABI reservation interfaces mm/msync: Add the PCuABI reservation interfaces uapi: mman-common.h: Macros for maximum capability permissions mm/cap_addr_mgmt: Add capability permission helpers for PCuABI mm/cap_addr_mgmt: Reduce the maximum protection check impact mm/mmap: Disable MAP_GROWSDOWN mapping flag for PCuABI mm/mmap: Add capability permission constraints for PCuABI mm/mremap: Add capability permission constraints for PCuABI mm/mprotect: Add capability permission constraints for PCuABI mm/mincore: Add capability constraints for PCuABI
arch/arm64/include/asm/cap_addr_mgmt.h | 22 ++ include/linux/cap_addr_mgmt.h | 168 +++++++++++++ include/linux/cheri.h | 3 + include/linux/mm.h | 29 ++- include/linux/mm_types.h | 5 + include/uapi/asm-generic/errno.h | 2 + include/uapi/asm-generic/mman-common.h | 6 + io_uring/advise.c | 2 +- mm/Makefile | 2 +- mm/cap_addr_mgmt.c | 314 +++++++++++++++++++++++++ mm/damon/vaddr.c | 2 +- mm/madvise.c | 27 ++- mm/mincore.c | 46 +++- mm/mlock.c | 38 ++- mm/mmap.c | 182 ++++++++++++-- mm/mprotect.c | 21 +- mm/mremap.c | 109 +++++++-- mm/msync.c | 15 +- mm/util.c | 10 +- 19 files changed, 919 insertions(+), 84 deletions(-) create mode 100644 arch/arm64/include/asm/cap_addr_mgmt.h create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 mm/cap_addr_mgmt.c
PCuABI specification introduces this error and is used to denote any error during managing memory reservations.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/uapi/asm-generic/errno.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h index cf9c51ac49f9..4589a3165fe1 100644 --- a/include/uapi/asm-generic/errno.h +++ b/include/uapi/asm-generic/errno.h @@ -120,4 +120,6 @@
#define EHWPOISON 133 /* Memory page has hardware error */
+#define ERESERVATION 192 /* PCuABI memory reservation error */ + #endif
PCuABI specification introduces memory reservation so add a flag VM_PCUABI_RESERVE to represent such memory mappings.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/mm.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index c1f4996a957f..913b79b204be 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -369,9 +369,11 @@ extern unsigned int kobjsize(const void *objp); #ifdef CONFIG_ARM64_MORELLO # define VM_READ_CAPS VM_HIGH_ARCH_2 /* Permit capability tag loads */ # define VM_WRITE_CAPS VM_HIGH_ARCH_3 /* Permit capability tag stores */ +# define VM_PCUABI_RESERVE VM_HIGH_ARCH_4 /* Permit PCuABI memory reservation */ #else # define VM_READ_CAPS VM_NONE # define VM_WRITE_CAPS VM_NONE +# define VM_PCUABI_RESERVE VM_NONE #endif
#ifndef VM_GROWSUP
PCuABI needs the address space reservation interfaces to manage the owning capability of the allocated addresses. This interface prevents two unrelated owning capabilities created by the kernel to overlap.
The reservation interface stores the ranges of different virtual addresses as reservation entries which is same as the bound of the capability provided by the kernel to userspace. It also stores the owning capability permissions to manage the future syscall requests for updating permissions.
Few basic rules are followed by the reservation interfaces:
- Reservations can only be created or destroyed and they are never expanded or shrunk. Reservations are created when new memory mapping is made outside of an existing reservation. - A single reservation can have many mappings. However unused region of the reservation cannot be re-used again. - Reservations start address is aligned to CHERI representable base. - Reservations length value is aligned to CHERI representable length.
More rules about the address space reservation interface can be found in the PCuABI specification.
This commit introduces API's reserv_vma_insert_entry(), reserv_range_insert_entry(), reserv_vmi_range_fully_mapped(), reserv_vma_range_fully_mapped(), reserv_vmi_match_capability(), reserv_vma_match_capability() and reserv_vma_match_properties(). Here except reserv_range_insert_entry(), all other involves single vma. All the above interfaces will be used in different memory management syscalls in subsequent patches.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/cap_addr_mgmt.h | 116 ++++++++++++++++++ include/linux/mm_types.h | 5 + mm/Makefile | 2 +- mm/cap_addr_mgmt.c | 222 ++++++++++++++++++++++++++++++++++ 4 files changed, 344 insertions(+), 1 deletion(-) create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 mm/cap_addr_mgmt.c
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h new file mode 100644 index 000000000000..00eda91c2fc0 --- /dev/null +++ b/include/linux/cap_addr_mgmt.h @@ -0,0 +1,116 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_CAP_ADDR_MGMT_H +#define _LINUX_CAP_ADDR_MGMT_H + +#include <linux/init.h> +#include <linux/list.h> +#include <linux/mm_types.h> +#include <linux/types.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI +#define CHERI_REPRESENTABLE_ALIGNMENT(len) \ + (~cheri_representable_alignment_mask(len) + 1) + +#define CHERI_REPRESENTABLE_BASE(base, len) \ + ((base) & cheri_representable_alignment_mask(len)) +#endif + +/** + * reserv_vma_insert_entry() - Adds the reservation details in the VMA for the + * virtual address range from start to (start + len) with perm permission as + * the entry. The start address is expected to be CHERI representable base + * and the length may expand to CHERI representable length without interfering + * with the successive vma. This function should be called with mmap_lock + * held. + * @vma: VMA pointer to insert the reservation entry. + * @start: Reservation start value. + * @len: Reservation length. + * @perm: Memory mapping permission for the reserved range. + * + * Return: 0 if reservation entry added successfully or -ERESERVATION + * otherwise. + */ +int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, + unsigned long len, unsigned long perm); + +/** + * reserv_range_insert_entry() - Adds the reservation details across the VMA's + * for the virtual address range from start to (start + len) with the perm + * permission as the entry. The start address is expected to be CHERI + * representable base and the length to be CHERI representable length. + * This function internally uses mmap_lock to synchronize the vma updates. + * @start: Reservation start value. + * @len: Reservation length. + * @perm: Memory mapping permission for the reserved range. + * + * Return: valid capability with bounded range and requested permission or + * negative errorcode otherwise. + */ +user_uintptr_t reserv_range_insert_entry(unsigned long start, unsigned long len, + unsigned long perm); + +/** + * reserv_vmi_range_fully_mapped() - Searches the reservation interface for the + * virtual address range from start to (start + len). This is useful to find + * if the requested range maps exactly with the reserved range and there is no + * fragmentation. This function should be called with mmap_lock held. + * @vmi: The vma iterator pointing at the vma. + * @start: Virtual address start value. + * @len: Virtual address length. + * + * Return: True if the vma mapping matches fully with the given range or false + * otherwise. + */ +bool reserv_vmi_range_fully_mapped(struct vma_iterator *vmi, unsigned long start, + unsigned long len); + +/** + * reserv_vma_range_fully_mapped() - Searches the reservation interface for the + * virtual address range from start to (start + len). This is useful to find + * if the requested range maps exactly with the reserved range and there is no + * fragmentation. This function should be called with mmap_lock held. + * @vma: The vma pointer. + * @start: Virtual address start value. + * @len: Virtual address length. + * + * Return: True if the vma mapping matches fully with the given range or false + * otherwise. + */ +bool reserv_vma_range_fully_mapped(struct vm_area_struct *vma, unsigned long start, + unsigned long len); + +/** + * reserv_vmi_match_capability() - Search and matches the reservation interface + * for the virtual address range derived from the capability bound values. + * This function should be called with mmap_lock held. + * @vmi: The vma iterator pointing at the vma. + * @start: Reservation capability value. + * + * Return: True if reservation entry found with the exact capability bound or + * false otherwise. + */ +bool reserv_vmi_match_capability(struct vma_iterator *vmi, user_uintptr_t start); + +/** + * reserv_vma_match_capability() - Search and matches the reservation interface + * for the virtual address range derived from the capability bound values. + * This function should be called with mmap_lock held. + * @vma: The vma pointer. + * @start: Reservation capability value. + * + * Return: True if reservation entry found with the exact capability bound or + * false otherwise. + */ +bool reserv_vma_match_capability(struct vm_area_struct *vma, user_uintptr_t start); + +/** + * reserv_vma_match_properties() - Compares the reservation properties between + * two vma's. This function should be called with mmap_lock held. + * @vma1: The first vma pointer. + * @vma2: The second vma pointer. + * + * Return: True if two vma's have the same reservation range or false otherwise. + */ +bool reserv_vma_match_properties(struct vm_area_struct *vma1, struct vm_area_struct *vma2); +#endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 12e87f83287d..38bad6b201ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI + unsigned long reserv_start; + unsigned long reserv_len; + unsigned long reserv_perm; +#endif } __randomize_layout;
#ifdef CONFIG_SCHED_MM_CID diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..c2ca31fe5798 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \ - pgtable-generic.o rmap.o vmalloc.o + pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c new file mode 100644 index 000000000000..fe318c92dc7a --- /dev/null +++ b/mm/cap_addr_mgmt.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/mm.h> +#include <linux/slab.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI + +int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, + unsigned long len, unsigned long perm) +{ + /* TODO [PCuABI] - capability permission conversion from memory permission */ + cheri_perms_t cheri_perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + + if (is_compat_task() || !(vma->vm_flags & VM_PCUABI_RESERVE)) + return 0; + + /* Check if the reservation base is representable */ + if (start & ~cheri_representable_alignment_mask(len) || start & ~PAGE_MASK) { + printk(KERN_WARNING "Reservation base address(0x%lx) is not representable\n", + start); + return -ERESERVATION; + } + + /* Check if the reservation length is page aligned */ + if (len % PAGE_SIZE) { + printk(KERN_WARNING "Reservation base address(0x%lx) is not representable\n", + start); + return -ERESERVATION; + } + vma->reserv_start = start; + vma->reserv_len = cheri_representable_length(len); + if (perm) + vma->reserv_perm = cheri_perms; + + return 0; +} + +user_uintptr_t reserv_range_insert_entry(unsigned long start, unsigned long len, + unsigned long perm __maybe_unused) +{ + /* TODO [PCuABI] - capability permission conversion from memory permission */ + cheri_perms_t cheri_perm = CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + unsigned long end = start + len; + user_uintptr_t ret = 0; + VMA_ITERATOR(vmi, mm, start); + + if (is_compat_task()) + return start; + + if (end < start) + return -EINVAL; + if (end == start) + return 0; + + /* Check if the reservation range is representable and throw error if not */ + if (start & ~cheri_representable_alignment_mask(len) || + len != cheri_representable_length(len) || + start & ~PAGE_MASK || len % PAGE_SIZE) { + printk(KERN_WARNING "Reservation range (0x%lx)-(0x%lx) is not representable\n", + start, start + len - 1); + return -ERESERVATION; + } + if (mmap_write_lock_killable(mm)) + return -EINTR; + + for_each_vma_range(vmi, vma, end) { + vm_flags_set(vma, VM_PCUABI_RESERVE); + WRITE_ONCE(vma->reserv_start, start); + WRITE_ONCE(vma->reserv_len, len); + WRITE_ONCE(vma->reserv_perm, cheri_perm); + } + mmap_write_unlock(current->mm); + ret = (user_uintptr_t)uaddr_to_user_ptr_safe(start); + + return ret; +} + +bool reserv_vmi_range_fully_mapped(struct vma_iterator *vmi, unsigned long start, + unsigned long len) +{ + unsigned long align_start, align_len; + struct vm_area_struct *vma; + + if (is_compat_task()) + return true; + + mas_set_range(&vmi->mas, start, start + len); + + /* Try finding the given range */ + vma = mas_find(&vmi->mas, start + len); + if (!vma || !(vma->vm_flags & VM_PCUABI_RESERVE)) + return true; + + align_start = round_down(vma->vm_start, max(PAGE_SIZE, CHERI_REPRESENTABLE_ALIGNMENT(len))); + align_len = cheri_representable_length(round_up((vma->vm_end - vma->vm_start), PAGE_SIZE)); + /* Check if the range fully mapped */ + if (start != vma->vm_start || (start + len) != vma->vm_end || + align_start != vma->reserv_start || + align_len != vma->reserv_len) + return false; + + return true; +} + +bool reserv_vma_range_fully_mapped(struct vm_area_struct *vma, unsigned long start, + unsigned long len) +{ + unsigned long align_start, align_len; + + if (is_compat_task() || !vma || !(vma->vm_flags & VM_PCUABI_RESERVE)) + return true; + + align_start = round_down(vma->vm_start, max(PAGE_SIZE, CHERI_REPRESENTABLE_ALIGNMENT(len))); + align_len = cheri_representable_length(round_up((vma->vm_end - vma->vm_start), PAGE_SIZE)); + + /* Check if the range fully mapped */ + if (start != vma->vm_start || (start + len) != vma->vm_end || + align_start != vma->reserv_start || + align_len != vma->reserv_len) + return false; + + return true; +} + +bool reserv_vmi_match_capability(struct vma_iterator *vmi, user_uintptr_t start) +{ + struct vm_area_struct *vma; + unsigned long reserv_start = cheri_base_get(start); + unsigned long reserv_end = reserv_start + cheri_length_get(start); + + if (is_compat_task()) + return true; + + /* Check if there is match with the existing reservations */ + vma = mas_find(&vmi->mas, reserv_end); + if (!vma || !(vma->vm_flags & VM_PCUABI_RESERVE)) + return true; + + if (vma->reserv_start == reserv_start && + vma->reserv_len == cheri_length_get(start)) + return true; + + return false; +} + +bool reserv_vma_match_capability(struct vm_area_struct *vma, user_uintptr_t start) +{ + if (is_compat_task() || !vma || !(vma->vm_flags & VM_PCUABI_RESERVE)) + return true; + + /* Check if there is match with the existing reservations */ + if (vma->reserv_start == cheri_base_get(start) && + vma->reserv_len == cheri_length_get(start)) + return true; + + return false; +} + +bool reserv_vma_match_properties(struct vm_area_struct *vma1, struct vm_area_struct *vma2) +{ + if (is_compat_task()) + return true; + + /* Match reservation properties betwen 2 vma's */ + if (vma1 && (vma1->vm_flags & VM_PCUABI_RESERVE) && + vma2 && (vma2->vm_flags & VM_PCUABI_RESERVE) && + vma1->reserv_start == vma2->reserv_start && + vma1->reserv_len == vma2->reserv_len) + return true; + + return false; +} + +#else + +int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, + unsigned long len, unsigned long perm) +{ + return 0; +} + +unsigned long reserv_range_insert_entry(unsigned long start, unsigned long len, + unsigned long perm) +{ + return start; +} + +bool reserv_vmi_range_fully_mapped(struct vma_iterator *vmi, unsigned long start, + unsigned long len) +{ + return true; +} + +bool reserv_vma_range_fully_mapped(struct vm_area_struct *vma, unsigned long start, + unsigned long len) +{ + return true; +} + +bool reserv_vmi_match_capability(struct vma_iterator *vmi, user_uintptr_t start) +{ + return true; +} + +bool reserv_vma_match_capability(struct vm_area_struct *vma, user_uintptr_t start) +{ + return true; +} + +bool reserv_vma_match_properties(struct vm_area_struct *vma1, struct vm_area_struct *vma2) +{ + return true; +} + +#endif /* CONFIG_CHERI_PURECAP_UABI */
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
+/**
- reserv_vma_insert_entry() - Adds the reservation details in the VMA for the
- virtual address range from start to (start + len) with perm permission as
- the entry. The start address is expected to be CHERI representable base
- and the length may expand to CHERI representable length without interfering
- with the successive vma. This function should be called with mmap_lock
- held.
- @vma: VMA pointer to insert the reservation entry.
- @start: Reservation start value.
- @len: Reservation length.
- @perm: Memory mapping permission for the reserved range.
- Return: 0 if reservation entry added successfully or -ERESERVATION
otherwise.
- */
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm);
The name of this function doesn't really correspond to what it does any more. Since we store the reservation information directly in the VMA, there is nothing to "insert". Maybe something like reserv_vma_set_reserv()? Same comment regarding reserv_range_insert_entry().
Also, as a general comment: for new code, it would be preferable to use ptraddr_t instead of unsigned long. Of course it's not always clear-cut, as we should try to be consistent with existing code too, but this being a whole new header I think using ptraddr_t throughout makes sense.
[...]
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 12e87f83287d..38bad6b201ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI
- unsigned long reserv_start;
- unsigned long reserv_len;
- unsigned long reserv_perm;
I've been wondering, are these fields automatically preserved when splitting or merging VMAs?
By the way, the overall approach to manage reservations makes a lot more sense to me now, I think it's going in the right direction.
+#endif } __randomize_layout; #ifdef CONFIG_SCHED_MM_CID diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..c2ca31fe5798 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \
pgtable-generic.o rmap.o vmalloc.o
pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c new file mode 100644 index 000000000000..fe318c92dc7a --- /dev/null +++ b/mm/cap_addr_mgmt.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0
+#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/mm.h> +#include <linux/slab.h>
+#ifdef CONFIG_CHERI_PURECAP_UABI
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm)
+{
- /* TODO [PCuABI] - capability permission conversion from memory permission */
- cheri_perms_t cheri_perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE |
CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP;
- if (is_compat_task() || !(vma->vm_flags & VM_PCUABI_RESERVE))
I've been pondering about VM_PCUABI_RESERVE. I think it's quite clear it can't be a permanent solution - in PCuABI, reservation management is required for all mappings, with no exception. I do however see the issue with vm_mmap() and friends. My feeling is that it is not worth trying to move to capability-based address space management in kernel subsystems, like binfmt_elf. The benefits are small and it seems that the hassle is quite high. On the other hand, reservation management can never be skipped. At the moment, it seems to me that the following approach would be the most practical:
* Moving all capability operations to the top-level syscall handler (or corresponding utility function, i.e. ksys_mmap_pgoff() for mmap). This way lower level helpers that can be called from the kernel too, such as vm_mmap(), keep operating purely on addresses and we don't have to worry about their callers. That should also mean fewer changes overall.
* The reservation management remains next to the mapping management, like in these patches. It is done unconditionally in PCuABI.
* No new vma->vm_flags. However, we could reduce the #ifdef'ing + checks for compat by introducing a new mm->flags. Reservation management is a property of the address space, so it would be sensible to introduce a per-mm flag.
Does that sound sensible?
[...]
+bool reserv_vmi_range_fully_mapped(struct vma_iterator *vmi, unsigned long start,
unsigned long len)
+{
- unsigned long align_start, align_len;
- struct vm_area_struct *vma;
- if (is_compat_task())
return true;
- mas_set_range(&vmi->mas, start, start + len);
- /* Try finding the given range */
- vma = mas_find(&vmi->mas, start + len);
- if (!vma || !(vma->vm_flags & VM_PCUABI_RESERVE))
return true;
Surely if no VMA is found (!vma) then the range is fully *unmapped* and the answer is false?
- align_start = round_down(vma->vm_start, max(PAGE_SIZE, CHERI_REPRESENTABLE_ALIGNMENT(len)));
- align_len = cheri_representable_length(round_up((vma->vm_end - vma->vm_start), PAGE_SIZE));
- /* Check if the range fully mapped */
- if (start != vma->vm_start || (start + len) != vma->vm_end ||
align_start != vma->reserv_start ||
align_len != vma->reserv_len)
This checks that the found VMA corresponds exactly to the specified range. What we actually want though is to check that the range is fully mapped, i.e. there exists *any number* of mappings spanning every single page.
Assuming this function is always used for enforcing rules in the spec such as:
If any part of AlignedRange(addr.address, length) is not currently
mapped, then the call fails with -ENOMEM.
Then reservation bounds are irrelevant - this is purely about mappings. Also note that AlignedRange() is about page alignment, not capability representation.
Kevin
Hi Kevin,
Thanks for the detailed review.
On 10/27/23 19:33, Kevin Brodsky wrote:
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
+/**
- reserv_vma_insert_entry() - Adds the reservation details in the VMA for the
- virtual address range from start to (start + len) with perm permission as
- the entry. The start address is expected to be CHERI representable base
- and the length may expand to CHERI representable length without interfering
- with the successive vma. This function should be called with mmap_lock
- held.
- @vma: VMA pointer to insert the reservation entry.
- @start: Reservation start value.
- @len: Reservation length.
- @perm: Memory mapping permission for the reserved range.
- Return: 0 if reservation entry added successfully or -ERESERVATION
otherwise.
- */
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm);
The name of this function doesn't really correspond to what it does any more. Since we store the reservation information directly in the VMA, there is nothing to "insert". Maybe something like reserv_vma_set_reserv()? Same comment regarding
Yes agreed. The above name was more appropriate for v1 and not v2.
reserv_range_insert_entry().
Also, as a general comment: for new code, it would be preferable to use ptraddr_t instead of unsigned long. Of course it's not always clear-cut, as we should try to be consistent with existing code too, but this being a whole new header I think using ptraddr_t throughout makes sense.
ok
[...]
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 12e87f83287d..38bad6b201ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI
- unsigned long reserv_start;
- unsigned long reserv_len;
- unsigned long reserv_perm;
I've been wondering, are these fields automatically preserved when splitting or merging VMAs?
Yes these are preserved and if same then only vma's are merged.
By the way, the overall approach to manage reservations makes a lot more sense to me now, I think it's going in the right direction.
+#endif } __randomize_layout; #ifdef CONFIG_SCHED_MM_CID diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..c2ca31fe5798 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \
pgtable-generic.o rmap.o vmalloc.o
pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c new file mode 100644 index 000000000000..fe318c92dc7a --- /dev/null +++ b/mm/cap_addr_mgmt.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0
+#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/mm.h> +#include <linux/slab.h>
+#ifdef CONFIG_CHERI_PURECAP_UABI
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm)
+{
- /* TODO [PCuABI] - capability permission conversion from memory permission */
- cheri_perms_t cheri_perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE |
CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP;
- if (is_compat_task() || !(vma->vm_flags & VM_PCUABI_RESERVE))
I've been pondering about VM_PCUABI_RESERVE. I think it's quite clear it can't be a permanent solution - in PCuABI, reservation management is required for all mappings, with no exception. I do however see the issue with vm_mmap() and friends. My feeling is that it is not worth trying to move to capability-based address space management in kernel subsystems, like binfmt_elf. The benefits are small and it seems that the hassle is quite high. On the other hand, reservation management can never be skipped. At the moment, it seems to me that the following approach would > be the most practical:
- Moving all capability operations to the top-level syscall handler (or
corresponding utility function, i.e. ksys_mmap_pgoff() for mmap). This way lower level helpers that can be called from the kernel too, such as vm_mmap(), keep operating purely on addresses and we don't have to worry about their callers. That should also mean fewer changes overall.
Yes agree with you here on skipping address space checks here. In the next iteration I will try moving capability checks in ksys_mmap_pgoff()
- The reservation management remains next to the mapping management,
like in these patches. It is done unconditionally in PCuABI.
The way binfmt_elf calls vm_mmap() is for each elf segment so the approach I am thinking is to defer the reservation till all the elf segments are mapped. In the end, call the api reserv_range_insert_entry() with the range including the complete elf segments. This api will also return a capability with the bound same as reservation range.
- No new vma->vm_flags. However, we could reduce the #ifdef'ing + checks
for compat by introducing a new mm->flags. Reservation management is a property of the address space, so it would be sensible to introduce a per-mm flag.
Any specific reason to avoid vm_flags? (Less free bits). vm_flags makes the vma operation simple to implement. Although vm_flags is still duplicate as the vma using reservation can be determined if it has the reservation fields initialized.
Does that sound sensible?
Yes most of it.
[...]
+bool reserv_vmi_range_fully_mapped(struct vma_iterator *vmi, unsigned long start,
unsigned long len)
+{
- unsigned long align_start, align_len;
- struct vm_area_struct *vma;
- if (is_compat_task())
return true;
- mas_set_range(&vmi->mas, start, start + len);
- /* Try finding the given range */
- vma = mas_find(&vmi->mas, start + len);
- if (!vma || !(vma->vm_flags & VM_PCUABI_RESERVE))
return true;
Surely if no VMA is found (!vma) then the range is fully *unmapped* and the answer is false?
Well agree that NULL vma is not appropriately reflected. I will change this interface
- align_start = round_down(vma->vm_start, max(PAGE_SIZE, CHERI_REPRESENTABLE_ALIGNMENT(len)));
- align_len = cheri_representable_length(round_up((vma->vm_end - vma->vm_start), PAGE_SIZE));
- /* Check if the range fully mapped */
- if (start != vma->vm_start || (start + len) != vma->vm_end ||
align_start != vma->reserv_start ||
align_len != vma->reserv_len)
This checks that the found VMA corresponds exactly to the specified range. What we actually want though is to check that the range is fully mapped, i.e. there exists *any number* of mappings spanning every single page.
Assuming this function is always used for enforcing rules in the spec such as:
If any part of AlignedRange(addr.address, length) is not currently
mapped, then the call fails with -ENOMEM.
Then reservation bounds are irrelevant - this is purely about mappings. Also note that AlignedRange() is about page alignment, not capability representation.
I will change the implementation as the current approach is too restrictive.
//Amit
Kevin
On 03/11/2023 07:56, Amit Daniel Kachhap wrote:
- The reservation management remains next to the mapping management,
like in these patches. It is done unconditionally in PCuABI.
The way binfmt_elf calls vm_mmap() is for each elf segment so the approach I am thinking is to defer the reservation till all the elf segments are mapped. In the end, call the api reserv_range_insert_entry() with the range including the complete elf segments. This api will also return a capability with the bound same as reservation range.
Right I see that binfmt_elf is a special case, as the reservation we need to create is larger than the first vm_mmap() call (for the first segment). Ideally we wouldn't need to add a different path in vm_mmap() just for this situation though. Intuitively the reservation should be created before any mapping it contains in this case, but clearly we cannot do this when representing reservations as vma metadata. Having considered this further, couldn't we simply enlarge the reservation created by the first vm_mmap()? That is:
1. Map the first segment [unchanged] 2. Call reserv_vma_insert_entry() on that new vma, setting the reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
At step 2, there is an assumption that there is sufficient space to create that reservation. However I believe this is essentially the same assumption that exists today, as the kernel has to be able to map all the remaining segments at a fixed offset from the first one.
The calls to vm_mmap() at step 3 almost observe the standard reservation rules. Almost, because they are creating mappings in unmapped ranges within the reservation, which is normally forbidden. I suspect this rule isn't really necessary for in-kernel users anyway (it's meant to prevent userspace from unmapping and remapping the same range, instead of doing that atomically by overwriting the range with MAP_FIXED), so we could avoid this problem by having this check in the syscall handler too, not down in vm_mmap().
I believe that with this approach, there is no need to special-case vm_mmap() for the binfmt_elf usage, but I may well be missing something.
- No new vma->vm_flags. However, we could reduce the #ifdef'ing + checks
for compat by introducing a new mm->flags. Reservation management is a property of the address space, so it would be sensible to introduce a per-mm flag.
Any specific reason to avoid vm_flags? (Less free bits). vm_flags makes the vma operation simple to implement. Although vm_flags is still duplicate as the vma using reservation can be determined if it has the reservation fields initialized.
All mappings should have the reservation fields set in PCuABI, with no exception. That's why I don't think it makes sense to have a new vm_flags - whether reservations are used or not is a per-process property, not a per-vma property. AFAICT mm->flags is directly accessible in most functions, in the worst case current->mm can be used.
Kevin
Hi,
On 11/9/23 14:44, Kevin Brodsky wrote:
On 03/11/2023 07:56, Amit Daniel Kachhap wrote:
- The reservation management remains next to the mapping management,
like in these patches. It is done unconditionally in PCuABI.
The way binfmt_elf calls vm_mmap() is for each elf segment so the approach I am thinking is to defer the reservation till all the elf segments are mapped. In the end, call the api reserv_range_insert_entry() with the range including the complete elf segments. This api will also return a capability with the bound same as reservation range.
Right I see that binfmt_elf is a special case, as the reservation we need to create is larger than the first vm_mmap() call (for the first segment). Ideally we wouldn't need to add a different path in vm_mmap() just for this situation though. Intuitively the reservation should be created before any mapping it contains in this case, but clearly we cannot do this when representing reservations as vma metadata. Having considered this further, couldn't we simply enlarge the reservation created by the first vm_mmap()? That is:
- Map the first segment [unchanged]
- Call reserv_vma_insert_entry() on that new vma, setting the
reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
At step 2, there is an assumption that there is sufficient space to create that reservation. However I believe this is essentially the same assumption that exists today, as the kernel has to be able to map all the remaining segments at a fixed offset from the first one.
The calls to vm_mmap() at step 3 almost observe the standard reservation rules. Almost, because they are creating mappings in unmapped ranges within the reservation, which is normally forbidden. I suspect this rule isn't really necessary for in-kernel users anyway (it's meant to prevent userspace from unmapping and remapping the same range, instead of doing that atomically by overwriting the range with MAP_FIXED), so we could avoid this problem by having this check in the syscall handler too, not down in vm_mmap().
I believe that with this approach, there is no need to special-case vm_mmap() for the binfmt_elf usage, but I may well be missing something.
Your suggestion makes sense. How about the below steps?.
1) Call vm_mmap with total segment size and maximum permission. Reservation will be created for full size. 2). call vm_mmap with MAP_FIXED address and each segment size.
The disadvantage of this approach is that there will be extra vma's for each of the holes.
The above approach is also doable but does not look clean as per spec.
- No new vma->vm_flags. However, we could reduce the #ifdef'ing + checks
for compat by introducing a new mm->flags. Reservation management is a property of the address space, so it would be sensible to introduce a per-mm flag.
Any specific reason to avoid vm_flags? (Less free bits). vm_flags makes the vma operation simple to implement. Although vm_flags is still duplicate as the vma using reservation can be determined if it has the reservation fields initialized.
All mappings should have the reservation fields set in PCuABI, with no exception. That's why I don't think it makes sense to have a new vm_flags - whether reservations are used or not is a per-process property, not a per-vma property. AFAICT mm->flags is directly accessible in most functions, in the worst case current->mm can be used.
ok agreed.
//Amit
Kevin
On 09/11/2023 11:50, Amit Daniel Kachhap wrote:
On 11/9/23 14:44, Kevin Brodsky wrote:
On 03/11/2023 07:56, Amit Daniel Kachhap wrote:
- The reservation management remains next to the mapping management,
like in these patches. It is done unconditionally in PCuABI.
The way binfmt_elf calls vm_mmap() is for each elf segment so the approach I am thinking is to defer the reservation till all the elf segments are mapped. In the end, call the api reserv_range_insert_entry() with the range including the complete elf segments. This api will also return a capability with the bound same as reservation range.
Right I see that binfmt_elf is a special case, as the reservation we need to create is larger than the first vm_mmap() call (for the first segment). Ideally we wouldn't need to add a different path in vm_mmap() just for this situation though. Intuitively the reservation should be created before any mapping it contains in this case, but clearly we cannot do this when representing reservations as vma metadata. Having considered this further, couldn't we simply enlarge the reservation created by the first vm_mmap()? That is:
- Map the first segment [unchanged]
- Call reserv_vma_insert_entry() on that new vma, setting the
reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
At step 2, there is an assumption that there is sufficient space to create that reservation. However I believe this is essentially the same assumption that exists today, as the kernel has to be able to map all the remaining segments at a fixed offset from the first one.
The calls to vm_mmap() at step 3 almost observe the standard reservation rules. Almost, because they are creating mappings in unmapped ranges within the reservation, which is normally forbidden. I suspect this rule isn't really necessary for in-kernel users anyway (it's meant to prevent userspace from unmapping and remapping the same range, instead of doing that atomically by overwriting the range with MAP_FIXED), so we could avoid this problem by having this check in the syscall handler too, not down in vm_mmap().
I believe that with this approach, there is no need to special-case vm_mmap() for the binfmt_elf usage, but I may well be missing something.
Your suggestion makes sense. How about the below steps?.
- Call vm_mmap with total segment size and maximum permission.
Reservation will be created for full size. 2). call vm_mmap with MAP_FIXED address and each segment size.
The disadvantage of this approach is that there will be extra vma's for each of the holes.
I considered this, but I would really like to avoid any user-visible side-effect, in this case additional (PROT_NONE) vmas. We could manually unmap those, but the whole process because quite cumbersome, along the lines of: 1. Map the whole reservation (anonymous, PROT_NONE) 2. Map each segment (MAP_FIXED) 3. Unmap leftovers of the PROT_NONE mapping from 1.
The above approach is also doable but does not look clean as per spec
The spec is really about the kernel-user ABI - as long as what the user sees is consistent with the spec, we are fine. Do you have any particular concern with the approach I outlined above?
Kevin
Hi,
On 11/10/23 16:59, Kevin Brodsky wrote:
On 09/11/2023 11:50, Amit Daniel Kachhap wrote:
On 11/9/23 14:44, Kevin Brodsky wrote:
On 03/11/2023 07:56, Amit Daniel Kachhap wrote:
- The reservation management remains next to the mapping management,
like in these patches. It is done unconditionally in PCuABI.
The way binfmt_elf calls vm_mmap() is for each elf segment so the approach I am thinking is to defer the reservation till all the elf segments are mapped. In the end, call the api reserv_range_insert_entry() with the range including the complete elf segments. This api will also return a capability with the bound same as reservation range.
Right I see that binfmt_elf is a special case, as the reservation we need to create is larger than the first vm_mmap() call (for the first segment). Ideally we wouldn't need to add a different path in vm_mmap() just for this situation though. Intuitively the reservation should be created before any mapping it contains in this case, but clearly we cannot do this when representing reservations as vma metadata. Having considered this further, couldn't we simply enlarge the reservation created by the first vm_mmap()? That is:
- Map the first segment [unchanged]
- Call reserv_vma_insert_entry() on that new vma, setting the
reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
At step 2, there is an assumption that there is sufficient space to create that reservation. However I believe this is essentially the same assumption that exists today, as the kernel has to be able to map all the remaining segments at a fixed offset from the first one.
The calls to vm_mmap() at step 3 almost observe the standard reservation rules. Almost, because they are creating mappings in unmapped ranges within the reservation, which is normally forbidden. I suspect this rule isn't really necessary for in-kernel users anyway (it's meant to prevent userspace from unmapping and remapping the same range, instead of doing that atomically by overwriting the range with MAP_FIXED), so we could avoid this problem by having this check in the syscall handler too, not down in vm_mmap().
I believe that with this approach, there is no need to special-case vm_mmap() for the binfmt_elf usage, but I may well be missing something.
Your suggestion makes sense. How about the below steps?.
- Call vm_mmap with total segment size and maximum permission.
Reservation will be created for full size. 2). call vm_mmap with MAP_FIXED address and each segment size.
The disadvantage of this approach is that there will be extra vma's for each of the holes.
I considered this, but I would really like to avoid any user-visible side-effect, in this case additional (PROT_NONE) vmas. We could manually unmap those, but the whole process because quite cumbersome, along the lines of:
- Map the whole reservation (anonymous, PROT_NONE)
- Map each segment (MAP_FIXED)
- Unmap leftovers of the PROT_NONE mapping from 1.
Yes.
The above approach is also doable but does not look clean as per spec
The spec is really about the kernel-user ABI - as long as what the user sees is consistent with the spec, we are fine. Do you have any particular concern with the approach I outlined above?
No concern as such. Just increasing the reservation explicitly. But as you mentioned the spec is for user-kernel so that should be ok.
//Amit
Kevin
Helper functions such as capability_owns_range() and build_owning_capability() are added as per PCuABI specifications.
These may be helpful in adding different PCuABI reservation constraints.
Note: These helper functions do not check for capability permission constraints and full support will be added in subsequent commits.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/cap_addr_mgmt.h | 22 +++++++++++++++++++++ mm/cap_addr_mgmt.c | 37 ++++++++++++++++++++++++++++++++++- 2 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index 00eda91c2fc0..60af5633ef6f 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -113,4 +113,26 @@ bool reserv_vma_match_capability(struct vm_area_struct *vma, user_uintptr_t star * Return: True if two vma's have the same reservation range or false otherwise. */ bool reserv_vma_match_properties(struct vm_area_struct *vma1, struct vm_area_struct *vma2); + +/** + * capability_owns_range() - Check if the address range is within the valid + * capability bound. + * @cap: A Capability value. + * @addr: Address start value. + * @len: Address length. + * + * Return: True if address within the capability bound or false otherwise. + */ +bool capability_owns_range(user_uintptr_t cap, unsigned long addr, unsigned long len); + +/** + * build_owning_capability() - Creates a userspace capability from the + * requested base address, length and memory protection flags. + * @addr: Requested capability address. + * @len: Requested capability length. + * @perm: Requested memory mapping permission flags. + * + * Return: A new capability derived from cheri_user_root_cap. + */ +user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, unsigned long perm); #endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index fe318c92dc7a..a4a85b37d59d 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -77,7 +77,7 @@ user_uintptr_t reserv_range_insert_entry(unsigned long start, unsigned long len, WRITE_ONCE(vma->reserv_perm, cheri_perm); } mmap_write_unlock(current->mm); - ret = (user_uintptr_t)uaddr_to_user_ptr_safe(start); + ret = build_owning_capability(start, len, perm);
return ret; } @@ -178,6 +178,31 @@ bool reserv_vma_match_properties(struct vm_area_struct *vma1, struct vm_area_str return false; }
+bool capability_owns_range(user_uintptr_t cap, unsigned long addr, unsigned long len) +{ + unsigned long align_addr = round_down(addr, PAGE_SIZE); + unsigned long align_len = cheri_representable_length(round_up(len, PAGE_SIZE)); + + if (is_compat_task()) + return true; + + return cheri_check_cap((const void * __capability)cheri_address_set(cap, align_addr), + align_len, CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM); +} + +user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, + unsigned long perm __maybe_unused) +{ + unsigned long align_start = round_down(addr, PAGE_SIZE); + unsigned long align_len = cheri_representable_length(round_up(len, PAGE_SIZE)); + + /* TODO [PCuABI] - capability permission conversion from memory permission */ + cheri_perms_t perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + + return (user_uintptr_t)cheri_build_user_cap(align_start, align_len, perms); +} + #else
int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, @@ -219,4 +244,14 @@ bool reserv_vma_match_properties(struct vm_area_struct *vma1, struct vm_area_str return true; }
+bool capability_owns_range(user_uintptr_t cap, unsigned long addr, unsigned long len) +{ + return true; +} + +user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, unsigned long perm) +{ + return addr; +} + #endif /* CONFIG_CHERI_PURECAP_UABI */
In CHERI architecture, all the ranges cannot be represented in capability so add necessary CHERI base and length alignment code when generating the free unmapped virtual address.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/mm.h | 8 ++++++ mm/mmap.c | 67 ++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 67 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 913b79b204be..00cb9fd3a5ee 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3241,6 +3241,10 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (vma->vm_flags & VM_PCUABI_RESERVE) + vm_start = vma->reserv_start; +#endif if (vma->vm_flags & VM_GROWSDOWN) { vm_start -= stack_guard_gap; if (vm_start > vma->vm_start) @@ -3253,6 +3257,10 @@ static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (vma->vm_flags & VM_PCUABI_RESERVE) + vm_end = vma->reserv_start + vma->reserv_len; +#endif if (vma->vm_flags & VM_GROWSUP) { vm_end += stack_guard_gap; if (vm_end < vma->vm_end) diff --git a/mm/mmap.c b/mm/mmap.c index bc422cc4a14b..74e52dc512fa 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -56,6 +56,7 @@ #define CREATE_TRACE_POINTS #include <trace/events/mmap.h>
+#include "cap_addr_mgmt.h" #include "internal.h"
#ifndef arch_mmap_check @@ -1561,6 +1562,11 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
/* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI) + /* Cheri Representable length is sufficient for alignment */ + if (!is_compat_task()) + length = cheri_representable_length(info->length); +#endif if (length < info->length) return -ENOMEM;
@@ -1612,6 +1618,11 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI) + /* Cheri Representable length is sufficient for alignment */ + if (!is_compat_task()) + length = cheri_representable_length(info->length); +#endif if (length < info->length) return -ENOMEM;
@@ -1637,6 +1648,10 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start; +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (tmp->vm_flags & VM_PCUABI_RESERVE) + high_limit = tmp->reserv_start; +#endif mas_reset(&mas); goto retry; } @@ -1687,18 +1702,32 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, struct vm_area_struct *vma, *prev; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long align_len = len;
- if (len > mmap_end - mmap_min_addr) +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task()) + align_len = cheri_representable_length(len); +#endif + if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED) + if (flags & MAP_FIXED) { +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task() && (addr & ~cheri_representable_alignment_mask(len))) + return -ERESERVATION; +#endif return addr; + }
if (addr) { addr = PAGE_ALIGN(addr); +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task()) + addr = round_up(addr, CHERI_REPRESENTABLE_ALIGNMENT(len)); +#endif vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && + if (mmap_end - align_len >= addr && addr >= mmap_min_addr && + (!vma || addr + align_len <= vm_start_gap(vma)) && (!prev || addr >= vm_end_gap(prev))) return addr; } @@ -1708,6 +1737,10 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, info.low_limit = mm->mmap_base; info.high_limit = mmap_end; info.align_mask = 0; +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task()) + info.align_mask = ~(cheri_representable_alignment_mask(len)); +#endif info.align_offset = 0; return vm_unmapped_area(&info); } @@ -1735,20 +1768,34 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, struct mm_struct *mm = current->mm; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long align_len = len;
+#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task()) + align_len = cheri_representable_length(len); +#endif /* requested length too big for entire address space */ - if (len > mmap_end - mmap_min_addr) + if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED) + if (flags & MAP_FIXED) { +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task() && (addr & ~cheri_representable_alignment_mask(len))) + return -ERESERVATION; +#endif return addr; + }
/* requesting a specific address */ if (addr) { addr = PAGE_ALIGN(addr); +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task()) + addr = CHERI_REPRESENTABLE_BASE(addr, len); +#endif vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && + if (mmap_end - align_len >= addr && addr >= mmap_min_addr && + (!vma || addr + align_len <= vm_start_gap(vma)) && (!prev || addr >= vm_end_gap(prev))) return addr; } @@ -1758,6 +1805,10 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, info.low_limit = PAGE_SIZE; info.high_limit = arch_get_mmap_base(addr, mm->mmap_base); info.align_mask = 0; +#if defined(CONFIG_CHERI_PURECAP_UABI) + if (!is_compat_task()) + info.align_mask = ~(cheri_representable_alignment_mask(len)); +#endif info.align_offset = 0; addr = vm_unmapped_area(&info);
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
diff --git a/mm/mmap.c b/mm/mmap.c index bc422cc4a14b..74e52dc512fa 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -56,6 +56,7 @@ #define CREATE_TRACE_POINTS #include <trace/events/mmap.h> +#include "cap_addr_mgmt.h"
Looks like an invalid #include that is fixed in the next patch.
#include "internal.h" #ifndef arch_mmap_check @@ -1561,6 +1562,11 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
- if (!is_compat_task())
length = cheri_representable_length(info->length);
Not sure I understand this. Why can we skip adding align_mask in this case? And also why isn't info->length adjusted in generic_get_unmapped_area()?
+#endif if (length < info->length) return -ENOMEM; @@ -1612,6 +1618,11 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
- if (!is_compat_task())
length = cheri_representable_length(info->length);
+#endif if (length < info->length) return -ENOMEM; @@ -1637,6 +1648,10 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start; +#if defined(CONFIG_CHERI_PURECAP_UABI)
if (tmp->vm_flags & VM_PCUABI_RESERVE)
high_limit = tmp->reserv_start;
+#endif mas_reset(&mas); goto retry; } @@ -1687,18 +1702,32 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, struct vm_area_struct *vma, *prev; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
- unsigned long align_len = len;
- if (len > mmap_end - mmap_min_addr)
+#if defined(CONFIG_CHERI_PURECAP_UABI)
- if (!is_compat_task())
align_len = cheri_representable_length(len);
+#endif
- if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED)
- if (flags & MAP_FIXED) {
+#if defined(CONFIG_CHERI_PURECAP_UABI)
if (!is_compat_task() && (addr & ~cheri_representable_alignment_mask(len)))
There is no requirement for the specified address to be aligned. Rather, the requirement is for the new reservation, implied from the specified address and length, not to overlap with any other (as usual). In other words, no reservation should already exist within the aligned range.
Kevin
Hi,
On 10/27/23 19:33, Kevin Brodsky wrote:
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
diff --git a/mm/mmap.c b/mm/mmap.c index bc422cc4a14b..74e52dc512fa 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -56,6 +56,7 @@ #define CREATE_TRACE_POINTS #include <trace/events/mmap.h> +#include "cap_addr_mgmt.h"
Looks like an invalid #include that is fixed in the next patch.
Yes typo error.
#include "internal.h" #ifndef arch_mmap_check @@ -1561,6 +1562,11 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
- if (!is_compat_task())
length = cheri_representable_length(info->length);
Not sure I understand this. Why can we skip adding align_mask in this case? And also why isn't info->length adjusted in generic_get_unmapped_area()?
(len + align_mask) was higher value than cheri_representable_length(len) so gap requirement was more(Even keeping as align_mask is fine but may be less optimized). I will add some more documentation here to explain this.
info->length is not adjusted as the function needs the original length during AND operation with align_mask.
+#endif if (length < info->length) return -ENOMEM; @@ -1612,6 +1618,11 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); /* Adjust search length to account for worst case alignment overhead */ length = info->length + info->align_mask; +#if defined(CONFIG_CHERI_PURECAP_UABI)
- /* Cheri Representable length is sufficient for alignment */
- if (!is_compat_task())
length = cheri_representable_length(info->length);
+#endif if (length < info->length) return -ENOMEM; @@ -1637,6 +1648,10 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { high_limit = tmp->vm_start; +#if defined(CONFIG_CHERI_PURECAP_UABI)
if (tmp->vm_flags & VM_PCUABI_RESERVE)
high_limit = tmp->reserv_start;
+#endif mas_reset(&mas); goto retry; } @@ -1687,18 +1702,32 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, struct vm_area_struct *vma, *prev; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
- unsigned long align_len = len;
- if (len > mmap_end - mmap_min_addr)
+#if defined(CONFIG_CHERI_PURECAP_UABI)
- if (!is_compat_task())
align_len = cheri_representable_length(len);
+#endif
- if (align_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED)
- if (flags & MAP_FIXED) {
+#if defined(CONFIG_CHERI_PURECAP_UABI)
if (!is_compat_task() && (addr & ~cheri_representable_alignment_mask(len)))
There is no requirement for the specified address to be aligned. Rather, the requirement is for the new reservation, implied from the specified address and length, not to overlap with any other (as usual). In other words, no reservation should already exist within the aligned range.
As mmap() is now being modified for MAP_FIXED. I will check on removing this restriction.
//Amit
Kevin
Use the recently introduced PCuABI reservation interfaces for mmap/munmap syscall. The capability returned by mmap syscalls are now bounded and also the range is preserved in reservation layer to perform different range constraint checks in subsequent syscalls.
The kernel internal mapping functions vm_mmap and vm_munmap are also modified to accept user pointers. However, the users of vm_mmap/vm_munmap are not modified and hence the reservation interface might not be completely functional.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/cheri.h | 3 ++ include/linux/mm.h | 16 ++++---- mm/mmap.c | 90 ++++++++++++++++++++++++++++++++++++------- mm/util.c | 10 +---- 4 files changed, 91 insertions(+), 28 deletions(-)
diff --git a/include/linux/cheri.h b/include/linux/cheri.h index e5f588b056ad..02ef0e911e63 100644 --- a/include/linux/cheri.h +++ b/include/linux/cheri.h @@ -37,6 +37,9 @@ (CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM) #endif
+#define cheri_is_null_derived(cap) \ + cheri_is_equal_exact((uintcap_t)cheri_address_get(cap), cap) + /** * cheri_build_user_cap() - Create a userspace capability. * @addr: Requested capability address. diff --git a/include/linux/mm.h b/include/linux/mm.h index 00cb9fd3a5ee..77e3374c65e0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3139,16 +3139,18 @@ unsigned long randomize_page(unsigned long start, unsigned long range);
extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
-extern unsigned long mmap_region(struct file *file, unsigned long addr, +extern unsigned long mmap_region(struct file *file, user_uintptr_t addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf); -extern unsigned long do_mmap(struct file *file, unsigned long addr, + struct list_head *uf, unsigned long prot); + +extern user_uintptr_t do_mmap(struct file *file, user_uintptr_t addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); + extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, - unsigned long start, size_t len, struct list_head *uf, + user_uintptr_t start, size_t len, struct list_head *uf, bool downgrade); -extern int do_munmap(struct mm_struct *, unsigned long, size_t, +extern int do_munmap(struct mm_struct *, user_uintptr_t, size_t, struct list_head *uf); extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
@@ -3170,8 +3172,8 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} /* These take the mm semaphore themselves */ extern int __must_check vm_brk(unsigned long, unsigned long); extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long); -extern int vm_munmap(unsigned long, size_t); -extern unsigned long __must_check vm_mmap(struct file *, unsigned long, +extern int vm_munmap(user_uintptr_t, size_t); +extern user_uintptr_t __must_check vm_mmap(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long, unsigned long);
diff --git a/mm/mmap.c b/mm/mmap.c index 74e52dc512fa..173fb75f7122 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,7 +47,9 @@ #include <linux/oom.h> #include <linux/sched/mm.h> #include <linux/ksm.h> +#include <linux/cap_addr_mgmt.h>
+#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -56,7 +58,6 @@ #define CREATE_TRACE_POINTS #include <trace/events/mmap.h>
-#include "cap_addr_mgmt.h" #include "internal.h"
#ifndef arch_mmap_check @@ -1225,7 +1226,7 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, /* * The caller must write-lock current->mm->mmap_lock. */ -unsigned long do_mmap(struct file *file, unsigned long addr, +user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1233,6 +1234,14 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct mm_struct *mm = current->mm; vm_flags_t vm_flags; int pkey = 0; + unsigned long addr = (ptraddr_t)user_addr; + bool new_caps = true; +#ifdef CONFIG_CHERI_PURECAP_UABI + bool ignore_reserv = false; + VMA_ITERATOR(vmi, mm, addr); +#else + bool ignore_reserv = true; +#endif
validate_mm(mm); *populate = 0; @@ -1240,6 +1249,23 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (is_compat_task()) { + ignore_reserv = true; + goto skip_pcuabi_checks; + } + if (cheri_tag_get(user_addr)) { + if (!capability_owns_range(user_addr, addr, len) || !(flags & MAP_FIXED)) + return -EINVAL; + if (!reserv_vmi_range_fully_mapped(&vmi, addr, len)) + return -ERESERVATION; + new_caps = false; + } else { + if (user_addr && !cheri_is_null_derived(user_addr)) + return -EINVAL; + } +skip_pcuabi_checks: +#endif /* * Does the application expect PROT_READ to imply PROT_EXEC? * @@ -1397,11 +1423,26 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); + if (!ignore_reserv) + vm_flags |= VM_PCUABI_RESERVE; + if (new_caps) + user_addr = addr; + + addr = mmap_region(file, user_addr, len, vm_flags, pgoff, uf, prot); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; + if (!IS_ERR_VALUE(addr)) { + if (!ignore_reserv) { + if (new_caps) + user_addr = build_owning_capability(addr, len, prot); + } else { + user_addr = (user_uintptr_t)uaddr_to_user_ptr_safe(addr); + } + return user_addr; + } + return addr; }
@@ -2559,11 +2600,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise. */ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, - unsigned long start, size_t len, struct list_head *uf, + user_uintptr_t user_start, size_t len, struct list_head *uf, bool downgrade) { unsigned long end; struct vm_area_struct *vma; + int ret; + unsigned long start = (ptraddr_t)user_start;
if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) return -EINVAL; @@ -2580,7 +2623,16 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (!vma) return 0;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade); + if (vma->vm_flags & VM_PCUABI_RESERVE) { + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; + if (!reserv_vma_match_capability(vma, user_start)) + return -ERESERVATION; + } + + ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade); + + return ret; }
/* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls. @@ -2589,7 +2641,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, * @len: The length to be munmapped. * @uf: The userfaultfd list_head */ -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, +int do_munmap(struct mm_struct *mm, user_uintptr_t start, size_t len, struct list_head *uf) { VMA_ITERATOR(vmi, mm, start); @@ -2597,15 +2649,16 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, return do_vmi_munmap(&vmi, mm, start, len, uf, false); }
-unsigned long mmap_region(struct file *file, unsigned long addr, +unsigned long mmap_region(struct file *file, user_uintptr_t user_addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, unsigned long prot) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; struct vm_area_struct *next, *prev, *merge; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0; + unsigned long addr = (ptraddr_t)user_addr; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end; pgoff_t vm_pgoff; @@ -2628,7 +2681,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, }
/* Unmap any existing mapping in the area */ - if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) + if (do_vmi_munmap(&vmi, mm, user_addr, len, uf, false)) return -ENOMEM;
/* @@ -2643,7 +2696,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
next = vma_next(&vmi); prev = vma_prev(&vmi); - if (vm_flags & VM_SPECIAL) + if (vm_flags & (VM_SPECIAL | VM_PCUABI_RESERVE)) goto cannot_expand;
/* Attempt to expand an old mapping */ @@ -2693,6 +2746,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff; + if (vm_flags & VM_PCUABI_RESERVE) { + error = reserv_vma_insert_entry(vma, addr, len, prot); + if (error) + goto free_vma; + }
if (file) { if (vm_flags & VM_SHARED) { @@ -2845,17 +2903,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return error; }
-static int __vm_munmap(unsigned long start, size_t len, bool downgrade) +static int __vm_munmap(user_uintptr_t user_start, size_t len, bool downgrade) { int ret; struct mm_struct *mm = current->mm; + unsigned long start = (ptraddr_t)user_start; LIST_HEAD(uf); VMA_ITERATOR(vmi, mm, start);
if (mmap_write_lock_killable(mm)) return -EINTR;
- ret = do_vmi_munmap(&vmi, mm, start, len, &uf, downgrade); + ret = do_vmi_munmap(&vmi, mm, user_start, len, &uf, downgrade); /* * Returning 1 indicates mmap_lock is downgraded. * But 1 is not legal return value of vm_munmap() and munmap(), reset @@ -2871,7 +2930,8 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade) return ret; }
-int vm_munmap(unsigned long start, size_t len) +/* TODO [PCuABI] - Update the users of vm_munmap */ +int vm_munmap(user_uintptr_t start, size_t len) { return __vm_munmap(start, len, false); } @@ -2879,7 +2939,11 @@ EXPORT_SYMBOL(vm_munmap);
SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) { +#ifdef CONFIG_CHERI_PURECAP_UABI + addr = cheri_address_set(addr, untagged_addr(cheri_address_get(addr))); +#else addr = untagged_addr(addr); +#endif return __vm_munmap(addr, len, true); }
diff --git a/mm/util.c b/mm/util.c index 61de3bf7712b..6f5d9d864643 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,24 +540,18 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - /* - * TODO [PCuABI] - might need propagating uintcap further down - * to do_mmap to properly handle capabilities - */ ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); - /* TODO [PCuABI] - derive proper capability */ - if (!IS_ERR_VALUE(ret)) - ret = (user_uintptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); } return ret; }
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t addr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset) {
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
@@ -1225,7 +1226,7 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, /*
- The caller must write-lock current->mm->mmap_lock.
*/ -unsigned long do_mmap(struct file *file, unsigned long addr, +user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr,
If we're going to rename parameters, let's fix the address/pointer terminology at the same time, maybe user_ptr?
unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf)
@@ -1233,6 +1234,14 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct mm_struct *mm = current->mm; vm_flags_t vm_flags; int pkey = 0;
- unsigned long addr = (ptraddr_t)user_addr;
- bool new_caps = true;
+#ifdef CONFIG_CHERI_PURECAP_UABI
- bool ignore_reserv = false;
- VMA_ITERATOR(vmi, mm, addr);
+#else
- bool ignore_reserv = true;
+#endif validate_mm(mm); *populate = 0; @@ -1240,6 +1249,23 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (is_compat_task()) {
ignore_reserv = true;
goto skip_pcuabi_checks;
- }
- if (cheri_tag_get(user_addr)) {
if (!capability_owns_range(user_addr, addr, len) || !(flags & MAP_FIXED))
return -EINVAL;
if (!reserv_vmi_range_fully_mapped(&vmi, addr, len))
return -ERESERVATION;
The specified error is ENOMEM in this case:
If any part of AlignedRange(addr.address, length) is not currently
mapped, then the call fails with -ENOMEM.
new_caps = false;
- } else {
if (user_addr && !cheri_is_null_derived(user_addr))
If the tag is not set, then the capability must always be null-derived, whether its address is 0 or not.
return -EINVAL;
- }
+skip_pcuabi_checks: +#endif /* * Does the application expect PROT_READ to imply PROT_EXEC? * @@ -1397,11 +1423,26 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
- if (!ignore_reserv)
vm_flags |= VM_PCUABI_RESERVE;
- if (new_caps)
user_addr = addr;
- addr = mmap_region(file, user_addr, len, vm_flags, pgoff, uf, prot); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len;
- if (!IS_ERR_VALUE(addr)) {
if (!ignore_reserv) {
if (new_caps)
user_addr = build_owning_capability(addr, len, prot);
} else {
user_addr = (user_uintptr_t)uaddr_to_user_ptr_safe(addr);
If ignore_reserv is true, then we are not in PCuABI, so we can simply return the address.
}
return user_addr;
- }
- return addr;
} @@ -2559,11 +2600,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
- Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
*/ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
unsigned long start, size_t len, struct list_head *uf,
bool downgrade)user_uintptr_t user_start, size_t len, struct list_head *uf,
{ unsigned long end; struct vm_area_struct *vma;
- int ret;
- unsigned long start = (ptraddr_t)user_start;
if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) return -EINVAL; @@ -2580,7 +2623,16 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (!vma) return 0;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
- if (vma->vm_flags & VM_PCUABI_RESERVE) {
if (!capability_owns_range(user_start, start, len))
return -EINVAL;
if (!reserv_vma_match_capability(vma, user_start))
IIUC this checks that the bounds of the capability match that of the reservation that contains the mapping. This is not required, what is required is that the capability bounds are a subset of the reservation:
Its bounds are checked against existing reservations. If the bounds of
the capability are not contained within the bounds of any existing reservation, the call fails with -ERESERVATION.
That also applies to the other syscalls.
return -ERESERVATION;
- }
- ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
- return ret;
} /* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls. @@ -2589,7 +2641,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
- @len: The length to be munmapped.
- @uf: The userfaultfd list_head
*/ -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, +int do_munmap(struct mm_struct *mm, user_uintptr_t start, size_t len, struct list_head *uf) { VMA_ITERATOR(vmi, mm, start); @@ -2597,15 +2649,16 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, return do_vmi_munmap(&vmi, mm, start, len, uf, false); } -unsigned long mmap_region(struct file *file, unsigned long addr, +unsigned long mmap_region(struct file *file, user_uintptr_t user_addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
struct list_head *uf, unsigned long prot)
{ struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; struct vm_area_struct *next, *prev, *merge; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0;
- unsigned long addr = (ptraddr_t)user_addr; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end; pgoff_t vm_pgoff;
@@ -2628,7 +2681,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, } /* Unmap any existing mapping in the area */
- if (do_vmi_munmap(&vmi, mm, addr, len, uf, false))
- if (do_vmi_munmap(&vmi, mm, user_addr, len, uf, false)) return -ENOMEM;
/* @@ -2643,7 +2696,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, next = vma_next(&vmi); prev = vma_prev(&vmi);
- if (vm_flags & VM_SPECIAL)
- if (vm_flags & (VM_SPECIAL | VM_PCUABI_RESERVE))
This effectively prevents merging mappings completely in PCuABI. This is too coarse: we only want to prevent merging mappings that belong to different reservations.
Kevin
goto cannot_expand;
/* Attempt to expand an old mapping */ [...]
Use the recently introduced PCuABI reservation interfaces for mremap syscall. The mremap PCuABI specification does not allow expanding the existing mappings so they are moved if MREMAP_MAYMOVE flag is present. Use the relevant capability constraint checks in the input user memory addresses.
Modify vma_merge() to check for similar reservation properties before merging the vma.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mmap.c | 10 ++++- mm/mremap.c | 106 +++++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 93 insertions(+), 23 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 173fb75f7122..803b18c7d746 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -954,7 +954,9 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, /* Can we merge the predecessor? */ if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy) && can_vma_merge_after(prev, vm_flags, anon_vma, file, - pgoff, vm_userfaultfd_ctx, anon_name)) { + pgoff, vm_userfaultfd_ctx, anon_name) + && ((vm_flags & VM_PCUABI_RESERVE) ? + reserv_vma_match_properties(prev, curr) : 1)) { merge_prev = true; vma_prev(vmi); } @@ -963,7 +965,9 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, /* Can we merge the successor? */ if (next && mpol_equal(policy, vma_policy(next)) && can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx, anon_name)) { + vm_userfaultfd_ctx, anon_name) + && ((vm_flags & VM_PCUABI_RESERVE) ? + reserv_vma_match_properties(prev, curr) : 1)) { merge_next = true; }
@@ -3362,6 +3366,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff; + if (vma->vm_flags & VM_PCUABI_RESERVE) + reserv_vma_insert_entry(new_vma, addr, len, 0); if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma)) diff --git a/mm/mremap.c b/mm/mremap.c index b52592303e8b..6d92b233f0e6 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -25,6 +25,7 @@ #include <linux/uaccess.h> #include <linux/userfaultfd_k.h> #include <linux/mempolicy.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/cacheflush.h> #include <asm/tlb.h> @@ -569,7 +570,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, }
static unsigned long move_vma(struct vm_area_struct *vma, - unsigned long old_addr, unsigned long old_len, + user_uintptr_t user_old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, bool *locked, unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) @@ -586,6 +587,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, int err = 0; bool need_rmap_locks; struct vma_iterator vmi; + unsigned long old_addr = (ptraddr_t)user_old_addr;
/* * We'd prefer to avoid failure later on in do_munmap: @@ -703,7 +705,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, }
vma_iter_init(&vmi, mm, old_addr); - if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) { + if (do_vmi_munmap(&vmi, mm, user_old_addr, old_len, uf_unmap, false) < 0) { /* OOM: unable to split vma, just get accounts right */ if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) vm_acct_memory(old_len >> PAGE_SHIFT); @@ -785,16 +787,19 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, return vma; }
-static unsigned long mremap_to(unsigned long addr, unsigned long old_len, - unsigned long new_addr, unsigned long new_len, bool *locked, +static user_uintptr_t mremap_to(user_uintptr_t user_addr, unsigned long old_len, + user_uintptr_t user_new_addr, unsigned long new_len, bool *locked, unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - unsigned long ret = -EINVAL; + user_uintptr_t ret = -EINVAL; unsigned long map_flags = 0; + unsigned long addr = (ptraddr_t)user_addr; + unsigned long new_addr = (ptraddr_t)user_new_addr; + unsigned long old_perm = 0;
if (offset_in_page(new_addr)) goto out; @@ -824,13 +829,13 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, return -ENOMEM;
if (flags & MREMAP_FIXED) { - ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); + ret = do_munmap(mm, user_new_addr, new_len, uf_unmap_early); if (ret) goto out; }
if (old_len > new_len) { - ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap); + ret = do_munmap(mm, user_addr + new_len, old_len - new_len, uf_unmap); if (ret) goto out; old_len = new_len; @@ -865,9 +870,18 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (!(flags & MREMAP_FIXED)) new_addr = ret;
- ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, +#ifdef CONFIG_CHERI_PURECAP_UABI + old_perm = vma->reserv_perm; +#endif + ret = move_vma(vma, user_addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap);
+ if (!IS_ERR_VALUE(ret)) { + if (vma->vm_flags & VM_PCUABI_RESERVE) + ret = build_owning_capability(new_addr, new_len, old_perm); + else + ret = (user_uintptr_t)uaddr_to_user_ptr_safe(new_addr); + } out: return ret; } @@ -893,9 +907,9 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise * This option implies MREMAP_MAYMOVE. */ -SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len, +SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_addr, unsigned long, old_len, unsigned long, new_len, unsigned long, flags, - user_uintptr_t, new_addr) + user_uintptr_t, user_new_addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; @@ -903,10 +917,15 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len bool locked = false; bool downgraded = false; struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; + unsigned long addr = (ptraddr_t)user_addr; + unsigned long new_addr = (ptraddr_t)user_new_addr; + unsigned long old_perm = 0; LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap); +#ifdef CONFIG_CHERI_PURECAP_UABI + struct vm_area_struct *vma_new; +#endif
- /* @TODO [PCuABI] - capability validation */ /* * There is a deliberate asymmetry here: we strip the pointer tag * from the old address but leave the new address alone. This is @@ -918,6 +937,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len * See Documentation/arm64/tagged-address-abi.rst for more information. */ addr = untagged_addr(addr); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_addr = cheri_address_set(user_addr, addr); +#endif
if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; @@ -956,6 +978,40 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (!(vma->vm_flags & VM_PCUABI_RESERVE)) + goto skip_pcuabi_checks; + if (!capability_owns_range(user_addr, addr, old_len ? old_len : new_len)) + goto out; + if (!reserv_vma_range_fully_mapped(vma, addr, old_len ? old_len : new_len)) { + ret = -ERESERVATION; + goto out; + } + if (cheri_tag_get(user_new_addr)) { + if (!capability_owns_range(user_new_addr, new_addr, new_len) || + !(flags & MREMAP_FIXED)) + goto out; + vma_new = vma_lookup(mm, new_addr); + if (!reserv_vma_range_fully_mapped(vma_new, new_addr, new_len)) { + ret = -ERESERVATION; + goto out; + } + } else { + if (!cheri_is_null_derived(user_new_addr)) + goto out; + } + /* + * If new_len > old_len and flags does not contain MREMAP_MAYMOVE + * then this fails as PCuABI does not allow increasing reservation. + */ + if (new_len > old_len && !(flags & MREMAP_MAYMOVE)) { + ret = -ERESERVATION; + goto out; + } + old_perm = vma->reserv_perm; +skip_pcuabi_checks: +#endif + if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -977,7 +1033,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len }
if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) { - ret = mremap_to(addr, old_len, new_addr, new_len, + ret = mremap_to(user_addr, old_len, user_new_addr, new_len, &locked, flags, &uf, &uf_unmap_early, &uf_unmap); goto out; @@ -993,7 +1049,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len int retval; VMA_ITERATOR(vmi, mm, addr + new_len);
- retval = do_vmi_munmap(&vmi, mm, addr + new_len, + retval = do_vmi_munmap(&vmi, mm, user_addr + new_len, old_len - new_len, &uf_unmap, true); /* Returning 1 indicates mmap_lock is downgraded to read. */ if (retval == 1) { @@ -1003,7 +1059,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
- ret = addr; + ret = user_addr; goto out; }
@@ -1019,8 +1075,13 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len /* old_len exactly to the end of the area.. */ if (old_len == vma->vm_end - addr) { - /* can we just expand the current mapping? */ - if (vma_expandable(vma, new_len - old_len)) { + /* + * can we just expand the current mapping? + * PCuABI specification does not allow increasing reservation + * size so just skip this path. + */ + if (!(vma->vm_flags & VM_PCUABI_RESERVE) && + vma_expandable(vma, new_len - old_len)) { long pages = (new_len - old_len) >> PAGE_SHIFT; unsigned long extension_start = addr + old_len; unsigned long extension_end = addr + new_len; @@ -1083,8 +1144,14 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
- ret = move_vma(vma, addr, old_len, new_len, new_addr, + ret = move_vma(vma, user_addr, old_len, new_len, new_addr, &locked, flags, &uf, &uf_unmap); + if (!IS_ERR_VALUE(ret)) { + if (vma->vm_flags & VM_PCUABI_RESERVE) + ret = build_owning_capability(new_addr, new_len, old_perm); + else + ret = (user_uintptr_t)uaddr_to_user_ptr_safe(new_addr); + } } out: if (offset_in_page(ret)) @@ -1098,8 +1165,5 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len userfaultfd_unmap_complete(mm, &uf_unmap_early); mremap_userfaultfd_complete(&uf, addr, ret, old_len); userfaultfd_unmap_complete(mm, &uf_unmap); - /* TODO [PCuABI] - derive proper capability */ - return IS_ERR_VALUE(ret) ? - ret : - (user_intptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); + return ret; }
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
@@ -3362,6 +3366,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff;
if (vma->vm_flags & VM_PCUABI_RESERVE)
if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma))reserv_vma_insert_entry(new_vma, addr, len, 0);
I don't think these changes are particularly related to mremap(), mapping can be split or merged for other reasons. It would make sense to have a separate patch for everything concerning mapping / reservation interactions that isn't related to a specific syscall.
[...]
@@ -956,6 +978,40 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; } +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!(vma->vm_flags & VM_PCUABI_RESERVE))
goto skip_pcuabi_checks;
- if (!capability_owns_range(user_addr, addr, old_len ? old_len : new_len))
goto out;
- if (!reserv_vma_range_fully_mapped(vma, addr, old_len ? old_len : new_len)) {
I'm not sure why we need this check, we are not creating a new mapping on the old_addr side.
ret = -ERESERVATION;
goto out;
- }
- if (cheri_tag_get(user_new_addr)) {
if (!capability_owns_range(user_new_addr, new_addr, new_len) ||
!(flags & MREMAP_FIXED))
The spec clarifies that new_addr is only considered if MREMAP_FIXED is specified (because it is an optional argument). We should skip those checks otherwise.
goto out;
vma_new = vma_lookup(mm, new_addr);
if (!reserv_vma_range_fully_mapped(vma_new, new_addr, new_len)) {
ret = -ERESERVATION;
goto out;
}
- } else {
if (!cheri_is_null_derived(user_new_addr))
goto out;
- }
- /*
* If new_len > old_len and flags does not contain MREMAP_MAYMOVE
* then this fails as PCuABI does not allow increasing reservation.
*/
- if (new_len > old_len && !(flags & MREMAP_MAYMOVE)) {
ret = -ERESERVATION;
Given the rule in the spec, that is:
The semantics of MREMAP_MAYMOVE is modified so that it is considered
that a mapping must be moved if the range (old_address.address, new_size) does not fit in the reservation to which the old mapping belongs, in addition to existing conditions (i.e. not clashing with existing mappings).
I don't think we need any explicit check: if the mapping cannot be grown in place, and MREMAP_MAYMOVE is not passed, then we will fail with -ENOMEM as usual, which is exactly what is specified.
goto out;
- }
- old_perm = vma->reserv_perm;
+skip_pcuabi_checks: +#endif
- if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -977,7 +1033,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len } if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) {
ret = mremap_to(addr, old_len, new_addr, new_len,
goto out;ret = mremap_to(user_addr, old_len, user_new_addr, new_len, &locked, flags, &uf, &uf_unmap_early, &uf_unmap);
@@ -993,7 +1049,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len int retval; VMA_ITERATOR(vmi, mm, addr + new_len);
retval = do_vmi_munmap(&vmi, mm, addr + new_len,
/* Returning 1 indicates mmap_lock is downgraded to read. */ if (retval == 1) {retval = do_vmi_munmap(&vmi, mm, user_addr + new_len, old_len - new_len, &uf_unmap, true);
@@ -1003,7 +1059,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
ret = addr;
goto out; }ret = user_addr;
@@ -1019,8 +1075,13 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len /* old_len exactly to the end of the area.. */ if (old_len == vma->vm_end - addr) {
/* can we just expand the current mapping? */
if (vma_expandable(vma, new_len - old_len)) {
/*
* can we just expand the current mapping?
* PCuABI specification does not allow increasing reservation
* size so just skip this path.
Reservations cannot be expanded, but mappings still can. Sure, in most cases the mapping ends where the reservation does, but we shouldn't assume it. We should rather check whether there is enough free space after the mapping, within the reservation - I suppose this means modifying vma_expandable().
I'm realising there is some ambiguity on this point in the spec, as the Reservations section claims that unmapped space within a reservation cannot be reused. That's true when it comes to creating new mappings (e.g. mmap), but expanding mappings is still allowed. I will amend the spec to clarify this.
Kevin
*/
if (!(vma->vm_flags & VM_PCUABI_RESERVE) &&
vma_expandable(vma, new_len - old_len)) { long pages = (new_len - old_len) >> PAGE_SHIFT; unsigned long extension_start = addr + old_len; unsigned long extension_end = addr + new_len;
@@ -1083,8 +1144,14 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
ret = move_vma(vma, addr, old_len, new_len, new_addr,
ret = move_vma(vma, user_addr, old_len, new_len, new_addr, &locked, flags, &uf, &uf_unmap);
if (!IS_ERR_VALUE(ret)) {
if (vma->vm_flags & VM_PCUABI_RESERVE)
ret = build_owning_capability(new_addr, new_len, old_perm);
else
ret = (user_uintptr_t)uaddr_to_user_ptr_safe(new_addr);
}}
out: if (offset_in_page(ret)) @@ -1098,8 +1165,5 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len userfaultfd_unmap_complete(mm, &uf_unmap_early); mremap_userfaultfd_complete(&uf, addr, ret, old_len); userfaultfd_unmap_complete(mm, &uf_unmap);
- /* TODO [PCuABI] - derive proper capability */
- return IS_ERR_VALUE(ret) ?
ret :
(user_intptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret);
- return ret;
}
Hi,
On 10/27/23 19:36, Kevin Brodsky wrote:
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
@@ -3362,6 +3366,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff;
if (vma->vm_flags & VM_PCUABI_RESERVE)
if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma))reserv_vma_insert_entry(new_vma, addr, len, 0);
I don't think these changes are particularly related to mremap(), mapping can be split or merged for other reasons. It would make sense to have a separate patch for everything concerning mapping / reservation interactions that isn't related to a specific syscall.
mremap is the only user of copy_vma(). Anyways I can try adding vma operations (copy_vma, merge_vma) in a single patch.
[...]
@@ -956,6 +978,40 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; } +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!(vma->vm_flags & VM_PCUABI_RESERVE))
goto skip_pcuabi_checks;
- if (!capability_owns_range(user_addr, addr, old_len ? old_len : new_len))
goto out;
- if (!reserv_vma_range_fully_mapped(vma, addr, old_len ? old_len : new_len)) {
I'm not sure why we need this check, we are not creating a new mapping on the old_addr side.
ok.
ret = -ERESERVATION;
goto out;
- }
- if (cheri_tag_get(user_new_addr)) {
if (!capability_owns_range(user_new_addr, new_addr, new_len) ||
!(flags & MREMAP_FIXED))
The spec clarifies that new_addr is only considered if MREMAP_FIXED is specified (because it is an optional argument). We should skip those checks otherwise.
ok.
goto out;
vma_new = vma_lookup(mm, new_addr);
if (!reserv_vma_range_fully_mapped(vma_new, new_addr, new_len)) {
ret = -ERESERVATION;
goto out;
}
- } else {
if (!cheri_is_null_derived(user_new_addr))
goto out;
- }
- /*
* If new_len > old_len and flags does not contain MREMAP_MAYMOVE
* then this fails as PCuABI does not allow increasing reservation.
*/
- if (new_len > old_len && !(flags & MREMAP_MAYMOVE)) {
ret = -ERESERVATION;
Given the rule in the spec, that is:
The semantics of MREMAP_MAYMOVE is modified so that it is considered
that a mapping must be moved if the range (old_address.address, new_size) does not fit in the reservation to which the old mapping belongs, in addition to existing conditions (i.e. not clashing with existing mappings).
I don't think we need any explicit check: if the mapping cannot be grown in place, and MREMAP_MAYMOVE is not passed, then we will fail with -ENOMEM as usual, which is exactly what is specified.
ok.
goto out;
- }
- old_perm = vma->reserv_perm;
+skip_pcuabi_checks: +#endif
- if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -977,7 +1033,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len } if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) {
ret = mremap_to(addr, old_len, new_addr, new_len,
goto out;ret = mremap_to(user_addr, old_len, user_new_addr, new_len, &locked, flags, &uf, &uf_unmap_early, &uf_unmap);
@@ -993,7 +1049,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len int retval; VMA_ITERATOR(vmi, mm, addr + new_len);
retval = do_vmi_munmap(&vmi, mm, addr + new_len,
/* Returning 1 indicates mmap_lock is downgraded to read. */ if (retval == 1) {retval = do_vmi_munmap(&vmi, mm, user_addr + new_len, old_len - new_len, &uf_unmap, true);
@@ -1003,7 +1059,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
ret = addr;
goto out; }ret = user_addr;
@@ -1019,8 +1075,13 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len /* old_len exactly to the end of the area.. */ if (old_len == vma->vm_end - addr) {
/* can we just expand the current mapping? */
if (vma_expandable(vma, new_len - old_len)) {
/*
* can we just expand the current mapping?
* PCuABI specification does not allow increasing reservation
* size so just skip this path.
Reservations cannot be expanded, but mappings still can. Sure, in most cases the mapping ends where the reservation does, but we shouldn't assume it. We should rather check whether there is enough free space after the mapping, within the reservation - I suppose this means modifying vma_expandable().
ok
I'm realising there is some ambiguity on this point in the spec, as the Reservations section claims that unmapped space within a reservation cannot be reused. That's true when it comes to creating new mappings (e.g. mmap), but expanding mappings is still allowed. I will amend the spec to clarify this.
Yes this will make it clear.
//Amit
Kevin
*/
if (!(vma->vm_flags & VM_PCUABI_RESERVE) &&
vma_expandable(vma, new_len - old_len)) { long pages = (new_len - old_len) >> PAGE_SHIFT; unsigned long extension_start = addr + old_len; unsigned long extension_end = addr + new_len;
@@ -1083,8 +1144,14 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
ret = move_vma(vma, addr, old_len, new_len, new_addr,
ret = move_vma(vma, user_addr, old_len, new_len, new_addr, &locked, flags, &uf, &uf_unmap);
if (!IS_ERR_VALUE(ret)) {
if (vma->vm_flags & VM_PCUABI_RESERVE)
ret = build_owning_capability(new_addr, new_len, old_perm);
else
ret = (user_uintptr_t)uaddr_to_user_ptr_safe(new_addr);
} out: if (offset_in_page(ret))}
@@ -1098,8 +1165,5 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len userfaultfd_unmap_complete(mm, &uf_unmap_early); mremap_userfaultfd_complete(&uf, addr, ret, old_len); userfaultfd_unmap_complete(mm, &uf_unmap);
- /* TODO [PCuABI] - derive proper capability */
- return IS_ERR_VALUE(ret) ?
ret :
(user_intptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret);
- return ret; }
On 03/11/2023 08:53, Amit Daniel Kachhap wrote:
On 10/27/23 19:36, Kevin Brodsky wrote:
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
@@ -3362,6 +3366,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, Â Â Â Â Â Â Â Â Â new_vma->vm_start = addr; Â Â Â Â Â Â Â Â Â new_vma->vm_end = addr + len; Â Â Â Â Â Â Â Â Â new_vma->vm_pgoff = pgoff; +Â Â Â Â Â Â Â if (vma->vm_flags & VM_PCUABI_RESERVE) +Â Â Â Â Â Â Â Â Â Â Â reserv_vma_insert_entry(new_vma, addr, len, 0); Â Â Â Â Â Â Â Â Â if (vma_dup_policy(vma, new_vma)) Â Â Â Â Â Â Â Â Â Â Â Â Â goto out_free_vma; Â Â Â Â Â Â Â Â Â if (anon_vma_clone(new_vma, vma))
I don't think these changes are particularly related to mremap(), mapping can be split or merged for other reasons. It would make sense to have a separate patch for everything concerning mapping / reservation interactions that isn't related to a specific syscall.
mremap is the only user of copy_vma(). Anyways I can try adding vma operations (copy_vma, merge_vma) in a single patch.
Right sorry, I didn't mean this hunk, as it is indeed only relevant to mremap(), but rather the changes to vma_merge(), which impact most m* syscalls.
Kevin
On 27/10/2023 16:06, Kevin Brodsky wrote:
@@ -1019,8 +1075,13 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len /* old_len exactly to the end of the area.. */ if (old_len == vma->vm_end - addr) {
/* can we just expand the current mapping? */
if (vma_expandable(vma, new_len - old_len)) {
/*
* can we just expand the current mapping?
* PCuABI specification does not allow increasing reservation
* size so just skip this path.
Reservations cannot be expanded, but mappings still can. Sure, in most cases the mapping ends where the reservation does, but we shouldn't assume it. We should rather check whether there is enough free space after the mapping, within the reservation - I suppose this means modifying vma_expandable().
I'm realising there is some ambiguity on this point in the spec, as the Reservations section claims that unmapped space within a reservation cannot be reused. That's true when it comes to creating new mappings (e.g. mmap), but expanding mappings is still allowed. I will amend the spec to clarify this.
The spec is now amended to prevent only the creation of new mappings in unmapped space, allowing mremap() to grow mappings in-place [1].
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca...
Use the recently introduced PCuABI reservation interfaces to verify the address range for mprotect syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mprotect.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index c9188e2cb2a6..5cefc65b5dbd 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -32,6 +32,7 @@ #include <linux/sched/sysctl.h> #include <linux/userfaultfd_k.h> #include <linux/memory-tiers.h> +#include <linux/cap_addr_mgmt.h> #include <asm/cacheflush.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> @@ -728,7 +729,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, /* * pkey==-1 when doing a legacy mprotect() */ -static int do_mprotect_pkey(user_uintptr_t start, size_t len, +static int do_mprotect_pkey(user_uintptr_t user_start, size_t len, unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; @@ -739,9 +740,7 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, (prot & PROT_READ); struct mmu_gather tlb; struct vma_iterator vmi; - - /* TODO [PCuABI] - capability checks for uaccess */ - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)user_start);
prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ @@ -755,6 +754,12 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, end = start + len; if (end <= start) return -ENOMEM; + +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); +#endif + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; if (!arch_validate_prot(prot, start)) return -EINVAL;
@@ -812,6 +817,12 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, break; }
+ /* Check if the capability range is valid with mmap lock. */ + if (!reserv_vma_match_capability(vma, user_start)) { + error = -ERESERVATION; + break; + } + /* Does the application expect PROT_READ to imply PROT_EXEC */ if (rier && (vma->vm_flags & VM_MAYEXEC)) prot |= PROT_EXEC;
Use the recently introduced PCuABI reservation interfaces to verify the address range for madvise syscall.
do_madvise() function is used by virtual address monitoring damon and this may not satisfy the reservation range criteria so add a parameter to skip the reservation checks.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/mm.h | 3 ++- io_uring/advise.c | 2 +- mm/damon/vaddr.c | 2 +- mm/madvise.c | 27 +++++++++++++++++++++++---- 4 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 77e3374c65e0..c75c12d54fc7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3152,7 +3152,8 @@ extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, bool downgrade); extern int do_munmap(struct mm_struct *, user_uintptr_t, size_t, struct list_head *uf); -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); +extern int do_madvise(struct mm_struct *mm, user_uintptr_t start, size_t len_in, + int behavior, bool reserv_ignore);
#ifdef CONFIG_MMU extern int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, diff --git a/io_uring/advise.c b/io_uring/advise.c index 952d9289a311..2e43142cf4df 100644 --- a/io_uring/advise.c +++ b/io_uring/advise.c @@ -55,7 +55,7 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags) WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
/* TODO [PCuABI] - capability checks for uaccess */ - ret = do_madvise(current->mm, user_ptr_addr(ma->addr), ma->len, ma->advice); + ret = do_madvise(current->mm, (user_uintptr_t)ma->addr, ma->len, ma->advice, false); io_req_set_res(req, ret, 0); return IOU_OK; #else diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 1fec16d7263e..fcdd3f4f608f 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -623,7 +623,7 @@ static unsigned long damos_madvise(struct damon_target *target, if (!mm) return 0;
- applied = do_madvise(mm, start, len, behavior) ? 0 : len; + applied = do_madvise(mm, start, len, behavior, true) ? 0 : len; mmput(mm);
return applied; diff --git a/mm/madvise.c b/mm/madvise.c index ad59e1e07ec9..d815f5647678 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -31,6 +31,7 @@ #include <linux/swapops.h> #include <linux/shmem_fs.h> #include <linux/mmu_notifier.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/tlb.h>
@@ -1382,13 +1383,16 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. */ -int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) +int do_madvise(struct mm_struct *mm, user_uintptr_t user_start, size_t len_in, + int behavior, bool reserv_ignore) { unsigned long end; int error; int write; size_t len; struct blk_plug plug; + unsigned long start = (ptraddr_t)user_start; + struct vma_iterator vmi;
if (!madvise_behavior_valid(behavior)) return -EINVAL; @@ -1421,14 +1425,29 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh mmap_read_lock(mm); }
- /* TODO [PCuABI] - capability checks for uaccess */ start = untagged_addr_remote(mm, start); end = start + len;
+#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); +#endif + if (!reserv_ignore) { + vma_iter_init(&vmi, current->mm, start); + if (!capability_owns_range(user_start, start, len)) { + error = -EINVAL; + goto out; + } + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_match_capability(&vmi, user_start)) { + error = -ERESERVATION; + goto out; + } + } blk_start_plug(&plug); error = madvise_walk_vmas(mm, start, end, behavior, madvise_vma_behavior); blk_finish_plug(&plug); +out: if (write) mmap_write_unlock(mm); else @@ -1439,7 +1458,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
SYSCALL_DEFINE3(madvise, user_uintptr_t, start, size_t, len_in, int, behavior) { - return do_madvise(current->mm, start, len_in, behavior); + return do_madvise(current->mm, start, len_in, behavior, false); }
SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, @@ -1494,7 +1513,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
while (iov_iter_count(&iter)) { ret = do_madvise(mm, user_ptr_addr(iter_iov_addr(&iter)), - iter_iov_len(&iter), behavior); + iter_iov_len(&iter), behavior, false); if (ret < 0) break; iov_iter_advance(&iter, iter_iov_len(&iter));
Use the recently introduced PCuABI reservation interfaces to verify the address range for mlock, mlock2 and munlock syscalls.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mlock.c | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 40b43f8740df..65bce6a234a4 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -25,6 +25,7 @@ #include <linux/memcontrol.h> #include <linux/mm_inline.h> #include <linux/secretmem.h> +#include <linux/cap_addr_mgmt.h>
#include "internal.h"
@@ -563,14 +564,19 @@ static int __mlock_posix_error_return(long retval) return retval; }
-static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) +static __must_check int do_mlock(user_uintptr_t user_start, size_t len, vm_flags_t flags) { unsigned long locked; unsigned long lock_limit; int error = -ENOMEM; + unsigned long start = untagged_addr((ptraddr_t)user_start); + struct vma_iterator vmi;
- start = untagged_addr(start); - +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif if (!can_do_mlock()) return -EPERM;
@@ -584,6 +590,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla if (mmap_write_lock_killable(current->mm)) return -EINTR;
+ vma_iter_init(&vmi, current->mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_match_capability(&vmi, user_start)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } locked += current->mm->locked_vm; if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) { /* @@ -610,12 +622,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla return 0; }
-SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(mlock, user_uintptr_t, start, size_t, len) { return do_mlock(start, len, VM_LOCKED); }
-SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(mlock2, user_uintptr_t, start, size_t, len, int, flags) { vm_flags_t vm_flags = VM_LOCKED;
@@ -628,17 +640,29 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) return do_mlock(start, len, vm_flags); }
-SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(munlock, user_uintptr_t, user_start, size_t, len) { int ret; + unsigned long start = untagged_addr((ptraddr_t)user_start); + struct vma_iterator vmi;
- start = untagged_addr(start); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif
len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
if (mmap_write_lock_killable(current->mm)) return -EINTR; + vma_iter_init(&vmi, current->mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_vmi_match_capability(&vmi, user_start)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } ret = apply_vma_lock_flags(start, len, 0); mmap_write_unlock(current->mm);
Use the recently introduced PCuABI reservation interfaces to verify the address range for msync syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/msync.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/mm/msync.c b/mm/msync.c index ac4c9bfea2e7..d0181c07524f 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -14,6 +14,7 @@ #include <linux/file.h> #include <linux/syscalls.h> #include <linux/sched.h> +#include <linux/cap_addr_mgmt.h>
/* * MS_SYNC syncs the entire file - including mappings. @@ -29,15 +30,20 @@ * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ -SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(msync, user_uintptr_t, user_start, size_t, len, int, flags) { unsigned long end; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; int unmapped_error = 0; int error = -EINVAL; + unsigned long start = untagged_addr((ptraddr_t)user_start);
- start = untagged_addr(start); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif
if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; @@ -61,6 +67,11 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) */ mmap_read_lock(mm); vma = find_vma(mm, start); + /* Check if the range exists within the reservation with mmap lock. */ + if (vma && !reserv_vma_match_capability(vma, user_start)) { + error = -ERESERVATION; + goto out_unlock; + } for (;;) { struct file *file; loff_t fstart, fend;
PCuABI specification introduces limitations in expanding the capability permissions through mprotect() system calls. This needs the capabilities to be initially created with maximum permissions, the memory mappings may possess in their lifetime.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/uapi/asm-generic/mman-common.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..e7ba511c2bad 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -17,6 +17,12 @@ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
+/* PCuABI mapping and capability permissions */ +#define _PROT_MAX_SHIFT 16 +#define PROT_MAX(prot) ((prot) << _PROT_MAX_SHIFT) +#define PROT_EXTRACT(prot) ((prot) & (PROT_READ | PROT_WRITE | PROT_EXEC)) +#define PROT_MAX_EXTRACT(prot) (((prot) >> _PROT_MAX_SHIFT) & (PROT_READ | PROT_WRITE | PROT_EXEC)) + /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */
Helper functions capability_may_set_prot(), mapping_to_capability_perm() and build_owning_capability() are added/modified to manage capability permissions in address space management syscalls as per PCuABI specifications.
Also a arch specific hook arch_map_to_cap_perm() is added to manage arch specific capability permissions.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- arch/arm64/include/asm/cap_addr_mgmt.h | 22 +++++++ include/linux/cap_addr_mgmt.h | 42 ++++++++++-- mm/cap_addr_mgmt.c | 88 ++++++++++++++++++++------ mm/mmap.c | 8 ++- 4 files changed, 133 insertions(+), 27 deletions(-) create mode 100644 arch/arm64/include/asm/cap_addr_mgmt.h
diff --git a/arch/arm64/include/asm/cap_addr_mgmt.h b/arch/arm64/include/asm/cap_addr_mgmt.h new file mode 100644 index 000000000000..aadb4768d2fd --- /dev/null +++ b/arch/arm64/include/asm/cap_addr_mgmt.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_CAP_ADDR_MGMT_H +#define __ASM_CAP_ADDR_MGMT_H + +#include <linux/cheri.h> +#include <linux/mman.h> + +static __always_inline cheri_perms_t arch_map_to_cap_perm(int prot, bool has_tag_access) +{ + cheri_perms_t perms = 0; + + if ((prot & PROT_READ) && has_tag_access) + perms |= ARM_CAP_PERMISSION_MUTABLE_LOAD; + + if ((prot & PROT_EXEC) && + (cheri_perms_get(cheri_pcc_get()) & ARM_CAP_PERMISSION_EXECUTIVE)) + perms |= ARM_CAP_PERMISSION_EXECUTIVE; + + return perms; +} +#define arch_map_to_cap_perm arch_map_to_cap_perm +#endif /* __ASM_CAP_ADDR_MGMT_H */ diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index 60af5633ef6f..717f51008bc0 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -7,6 +7,7 @@ #include <linux/list.h> #include <linux/mm_types.h> #include <linux/types.h> +#include <asm/cap_addr_mgmt.h>
#ifdef CONFIG_CHERI_PURECAP_UABI #define CHERI_REPRESENTABLE_ALIGNMENT(len) \ @@ -26,13 +27,13 @@ * @vma: VMA pointer to insert the reservation entry. * @start: Reservation start value. * @len: Reservation length. - * @perm: Memory mapping permission for the reserved range. + * @perm: Capability permission for the reserved range. * * Return: 0 if reservation entry added successfully or -ERESERVATION * otherwise. */ int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, - unsigned long len, unsigned long perm); + unsigned long len, cheri_perms_t perm);
/** * reserv_range_insert_entry() - Adds the reservation details across the VMA's @@ -42,13 +43,13 @@ int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, * This function internally uses mmap_lock to synchronize the vma updates. * @start: Reservation start value. * @len: Reservation length. - * @perm: Memory mapping permission for the reserved range. + * @perm: Capability permission for the reserved range. * * Return: valid capability with bounded range and requested permission or * negative errorcode otherwise. */ user_uintptr_t reserv_range_insert_entry(unsigned long start, unsigned long len, - unsigned long perm); + cheri_perms_t perm);
/** * reserv_vmi_range_fully_mapped() - Searches the reservation interface for the @@ -130,9 +131,38 @@ bool capability_owns_range(user_uintptr_t cap, unsigned long addr, unsigned long * requested base address, length and memory protection flags. * @addr: Requested capability address. * @len: Requested capability length. - * @perm: Requested memory mapping permission flags. + * @perm: Requested capability permission flags. * * Return: A new capability derived from cheri_user_root_cap. */ -user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, unsigned long perm); +user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, cheri_perms_t perm); + +/** + * capability_may_set_prot() - Verify if the mapping protection flags confirms + * with the capability permission flags. + * @cap: Capability value. + * @prot: Memory protection flags. + * + * Return: True if the capability permissions includes the protection flags + * or false otherwise. + */ +bool capability_may_set_prot(user_uintptr_t cap, int prot); + +/** + * mapping_to_capability_perm() - Converts memory mapping protection flags to + * capability permission flags. + * @prot: Memory protection flags. + * @has_tag_access: Capability permissions to have tag check flags. + * + * Return: Capability permission flags + */ +cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access); + +#ifndef arch_map_to_cap_perm +static __always_inline cheri_perms_t arch_map_to_cap_perm(int prot, + bool has_tag_access) +{ + return 0; +} +#endif /* arch_map_to_cap_perm */ #endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index a4a85b37d59d..b451fa279a48 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -9,12 +9,8 @@ #ifdef CONFIG_CHERI_PURECAP_UABI
int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, - unsigned long len, unsigned long perm) + unsigned long len, cheri_perms_t perm) { - /* TODO [PCuABI] - capability permission conversion from memory permission */ - cheri_perms_t cheri_perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | - CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; - if (is_compat_task() || !(vma->vm_flags & VM_PCUABI_RESERVE)) return 0;
@@ -34,17 +30,13 @@ int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, vma->reserv_start = start; vma->reserv_len = cheri_representable_length(len); if (perm) - vma->reserv_perm = cheri_perms; + vma->reserv_perm = perm;
return 0; }
-user_uintptr_t reserv_range_insert_entry(unsigned long start, unsigned long len, - unsigned long perm __maybe_unused) +user_uintptr_t reserv_range_insert_entry(unsigned long start, unsigned long len, cheri_perms_t perm) { - /* TODO [PCuABI] - capability permission conversion from memory permission */ - cheri_perms_t cheri_perm = CHERI_PERMS_READ | CHERI_PERMS_WRITE | - CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long end = start + len; @@ -74,7 +66,7 @@ user_uintptr_t reserv_range_insert_entry(unsigned long start, unsigned long len, vm_flags_set(vma, VM_PCUABI_RESERVE); WRITE_ONCE(vma->reserv_start, start); WRITE_ONCE(vma->reserv_len, len); - WRITE_ONCE(vma->reserv_perm, cheri_perm); + WRITE_ONCE(vma->reserv_perm, perm); } mmap_write_unlock(current->mm); ret = build_owning_capability(start, len, perm); @@ -190,19 +182,67 @@ bool capability_owns_range(user_uintptr_t cap, unsigned long addr, unsigned long align_len, CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM); }
-user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, - unsigned long perm __maybe_unused) +user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, cheri_perms_t perm) { unsigned long align_start = round_down(addr, PAGE_SIZE); unsigned long align_len = cheri_representable_length(round_up(len, PAGE_SIZE));
- /* TODO [PCuABI] - capability permission conversion from memory permission */ - cheri_perms_t perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | - CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + return (user_uintptr_t)cheri_build_user_cap(align_start, align_len, perm); +} + +static bool mapping_may_have_prot_flag(int prot, int map_val) +{ + int prot_max = PROT_MAX_EXTRACT(prot); + + if (prot_max) + return !!(prot_max & map_val); + else + return !!(prot & map_val); +} + +bool capability_may_set_prot(user_uintptr_t cap, int prot) +{ + cheri_perms_t perms = cheri_perms_get(cap); + + if (is_compat_task()) + return true; + + if (((prot & PROT_READ) && !(perms & CHERI_PERM_LOAD)) || + ((prot & PROT_WRITE) && !(perms & CHERI_PERM_STORE)) || + ((prot & PROT_EXEC) && !(perms & CHERI_PERM_EXECUTE))) + return false;
- return (user_uintptr_t)cheri_build_user_cap(align_start, align_len, perms); + return true; }
+cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) +{ + cheri_perms_t perms = 0; + + if (mapping_may_have_prot_flag(prot, PROT_READ)) { + perms |= CHERI_PERM_LOAD; + if (has_tag_access) + perms |= CHERI_PERM_LOAD_CAP; + } + if (mapping_may_have_prot_flag(prot, PROT_WRITE)) { + perms |= CHERI_PERM_STORE; + if (has_tag_access) + perms |= (CHERI_PERM_STORE_CAP | CHERI_PERM_STORE_LOCAL_CAP); + } + if (mapping_may_have_prot_flag(prot, PROT_EXEC)) { + perms |= CHERI_PERM_EXECUTE; + if (cheri_perms_get(cheri_pcc_get()) & CHERI_PERM_SYSTEM_REGS) + perms |= CHERI_PERM_SYSTEM_REGS; + } + /* Fetch any extra architecture specific permissions */ + perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot, + has_tag_access); + perms |= CHERI_PERMS_ROOTCAP; + + return perms; +} + + #else
int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start, @@ -249,9 +289,19 @@ bool capability_owns_range(user_uintptr_t cap, unsigned long addr, unsigned long return true; }
-user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, unsigned long perm) +user_uintptr_t build_owning_capability(unsigned long addr, unsigned long len, cheri_perms_t perm) { return addr; }
+bool capability_may_set_prot(user_uintptr_t cap, int prot) +{ + return true; +} + +cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) +{ + return 0; +} + #endif /* CONFIG_CHERI_PURECAP_UABI */ diff --git a/mm/mmap.c b/mm/mmap.c index 803b18c7d746..cb7a4b71ad82 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1440,7 +1440,9 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, if (!IS_ERR_VALUE(addr)) { if (!ignore_reserv) { if (new_caps) - user_addr = build_owning_capability(addr, len, prot); + user_addr = build_owning_capability(addr, len, + mapping_to_capability_perm(prot, + (flags & MAP_SHARED) ? false : true)); } else { user_addr = (user_uintptr_t)uaddr_to_user_ptr_safe(addr); } @@ -2751,7 +2753,9 @@ unsigned long mmap_region(struct file *file, user_uintptr_t user_addr, vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff; if (vm_flags & VM_PCUABI_RESERVE) { - error = reserv_vma_insert_entry(vma, addr, len, prot); + error = reserv_vma_insert_entry(vma, addr, len, + mapping_to_capability_perm(prot, + (vm_flags & (VM_SHARED | VM_MAYSHARE)) ? false : true)); if (error) goto free_vma; }
The existing userspace may not use the maximum protection bits in the protection flags introduced by PCuABI and hence such applications may have inconsistency in the memory protection flag updated via mprotect() syscall with the capability permission bits.
Reduce the impact of such failures by setting the capability to maximum permission if no maximum protection bits are detected.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/cap_addr_mgmt.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c index b451fa279a48..3471406006dc 100644 --- a/mm/cap_addr_mgmt.c +++ b/mm/cap_addr_mgmt.c @@ -219,6 +219,12 @@ cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) { cheri_perms_t perms = 0;
+ if (!PROT_MAX_EXTRACT(prot)) { + perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + goto skip_calc_perm; + } + if (mapping_may_have_prot_flag(prot, PROT_READ)) { perms |= CHERI_PERM_LOAD; if (has_tag_access) @@ -238,6 +244,7 @@ cheri_perms_t mapping_to_capability_perm(int prot, bool has_tag_access) perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot, has_tag_access); perms |= CHERI_PERMS_ROOTCAP; +skip_calc_perm:
return perms; }
MAP_GROWSDOWN flag is not supported by PCuABI specification. Hence reject such requests with -EOPNOTSUPP error.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mmap.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index cb7a4b71ad82..e8e9190f26cb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1258,6 +1258,13 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, ignore_reserv = true; goto skip_pcuabi_checks; } + /* + * Introduce checks for PCuABI: + * - MAP_GROWSDOWN flag do not have fixed bounds and hence not + * supported in PCuABI reservation model. + */ + if (flags & MAP_GROWSDOWN) + return -EOPNOTSUPP; if (cheri_tag_get(user_addr)) { if (!capability_owns_range(user_addr, addr, len) || !(flags & MAP_FIXED)) return -EINVAL;
Add a check that the requested protection bits does not exceed the maximum protection bits.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index e8e9190f26cb..3bd230e0b565 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1262,9 +1262,15 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, * Introduce checks for PCuABI: * - MAP_GROWSDOWN flag do not have fixed bounds and hence not * supported in PCuABI reservation model. + * - PCuABI reservation model introduces the concept of maximum + * protection the mappings can have. Add a check to make sure the + * requested protection does not exceed the maximum protection. */ if (flags & MAP_GROWSDOWN) return -EOPNOTSUPP; + if ((PROT_MAX_EXTRACT(prot) != 0) && + ((PROT_EXTRACT(prot) & PROT_MAX_EXTRACT(prot)) != PROT_EXTRACT(prot))) + return -EINVAL; if (cheri_tag_get(user_addr)) { if (!capability_owns_range(user_addr, addr, len) || !(flags & MAP_FIXED)) return -EINVAL;
Check that the permission of new user address does not exceed the permission of old user address for mremap syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mremap.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/mremap.c b/mm/mremap.c index 6d92b233f0e6..fc0e5fb2508d 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -991,6 +991,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_addr, unsigned long, ol if (!capability_owns_range(user_new_addr, new_addr, new_len) || !(flags & MREMAP_FIXED)) goto out; + if ((cheri_perms_get(user_addr) | cheri_perms_get(user_new_addr)) + != cheri_perms_get(user_addr)) + goto out; vma_new = vma_lookup(mm, new_addr); if (!reserv_vma_range_fully_mapped(vma_new, new_addr, new_len)) { ret = -ERESERVATION;
Check that the requested permission matches the constraints of input user capability address for mprotect syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mprotect.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 5cefc65b5dbd..f2c06967451d 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -760,6 +760,8 @@ static int do_mprotect_pkey(user_uintptr_t user_start, size_t len, #endif if (!capability_owns_range(user_start, start, len)) return -EINVAL; + if (!capability_may_set_prot(user_start, prot)) + return -EINVAL; if (!arch_validate_prot(prot, start)) return -EINVAL;
Different capability permission and bound constraints are added as per PCuABI specification. mincore syscall does not need VMem permission and any of RWX memory permission so standard capability_owns_range() interface is not used here.
Also as mincore() allows the address range to not span whole pages so checking only a single byte at the page intersection is sufficient.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mincore.c | 46 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 41 insertions(+), 5 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c index 3a307bfa91c4..dfa6b5b9c3d3 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -19,6 +19,7 @@ #include <linux/hugetlb.h> #include <linux/pgtable.h>
+#include <linux/cap_addr_mgmt.h> #include <linux/uaccess.h> #include "swap.h"
@@ -184,15 +185,19 @@ static const struct mm_walk_ops mincore_walk_ops = { * all the arguments, we hold the mmap semaphore: we should * just return the amount of info we're asked for. */ -static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec) +static long do_mincore(user_uintptr_t user_addr, unsigned long pages, unsigned char *vec) { struct vm_area_struct *vma; unsigned long end; + unsigned long addr = (ptraddr_t)user_addr; int err;
vma = vma_lookup(current->mm, addr); if (!vma) return -ENOMEM; + /* Check if the capability range is valid with mmap lock. */ + if (!reserv_vma_match_capability(vma, user_addr)) + return -ERESERVATION; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); if (!can_do_mincore(vma)) { unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE); @@ -229,14 +234,16 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v * mapped * -EAGAIN - A kernel resource was temporarily unavailable. */ -SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, +SYSCALL_DEFINE3(mincore, user_uintptr_t, user_start, size_t, len, unsigned char __user *, vec) { long retval; unsigned long pages; unsigned char *tmp; - - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)user_start); +#ifdef CONFIG_CHERI_PURECAP_UABI + unsigned long cap_start, cap_len; +#endif
/* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) @@ -253,6 +260,35 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, if (!access_ok(vec, pages)) return -EFAULT;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (is_compat_task()) + goto skip_pcuabi_checks; + /* + * mincore syscall does not need VMem permission so as to allow ordinary pages. + * Also at least one of the standard memory permissions RWX will help to reject + * non memory capabilities. + */ + user_start = cheri_address_set(user_start, start); + if (cheri_is_invalid(user_start) || cheri_is_sealed(user_start) || + !(CHERI_PERM_GLOBAL & cheri_perms_get(user_start)) || + !((CHERI_PERM_LOAD | CHERI_PERM_STORE | CHERI_PERM_EXECUTE) + & cheri_perms_get(user_start))) + return -EINVAL; + /* + * mincore syscall can be invoked as: + * mincore(align_down(p, PAGE_SIZE), sz + (p.addr % PAGE_SIZE), vec) + * Hence, the capability might not consider the increased range due to + * alignment. In this scenario, check only the single byte at the page + * intersection. + */ + cap_start = cheri_base_get(user_start); + cap_len = cheri_length_get(user_start); + if ((start + PAGE_SIZE <= cap_start) || + (cap_start + cap_len < start + len - offset_in_page(len))) + return -EINVAL; +skip_pcuabi_checks: +#endif + tmp = (void *) __get_free_page(GFP_USER); if (!tmp) return -EAGAIN; @@ -264,7 +300,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, * the temporary buffer size. */ mmap_read_lock(current->mm); - retval = do_mincore(start, min(pages, PAGE_SIZE), tmp); + retval = do_mincore(user_start, min(pages, PAGE_SIZE), tmp); mmap_read_unlock(current->mm);
if (retval <= 0)
linux-morello@op-lists.linaro.org