Hi All,
This patch series introduces the mm reservation interface to manage the owning capability of the allocated addresses. Looking for feedback regarding interface names, interface directory structure, reservation layer outside the VMA(current approach) vs reservation layer inside the VMA etc.
Below are the implemented features in brief:
1) Reservation interface to implement the different PCuABI reservation rules. This reservations sits outside the VMA layer and can be used before and after the VMA updates. Currently all interfaces supports only mmap_lock locked version. 2) The reservation interfaces and owning capability helpers are created as a library so that they can be used by different components (i.e. mm, elf loaders etc.). 3) munmap() syscall allows shrinking the mappings but reservation range remains fixed so they cannot be mapped again until the last mapping in the reservation range is unmapped. 4) mremap() trying to remap new size lesser then old size behaves same as munmap. mremap() with new size larger than old size and with MREMAP_MAYMOVE flag will move the reservation also if the mapped range is same as reservation range. 4) Reservation bound constraint checks added for mprotect, madvise, mlock, mincore and msync syscall. 5) Helpers added to validate the capability address permission constraints. 6) Capability permission constraint checks added for mmap, mremap and mprotect syscall. 7) Details about several rules implemented can be found in PCuABI spec here [1].
Limitations/Unimplemented works:
1) Users of vm_mmap/vm_munmap() i.e. filesystems, loaders etc are not modified to preserve capability addresses so patch 6 "mm/(mmap, munmap): Limit reservation for only syscalls" added to limit reservation to syscalls. 2) Patch 15 "lib/cap_addr_mgmt: Reduce the maximum protection check impact" added to boot in the busybox. 3) Cover remaning memory addressing syscalls.
Testing:
1) Chaitanya v2 selftests [2]. 2) Busybox boot.
The whole series can be found here [3].
[1]: https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca... [2]: https://git.morello-project.org/chaitanya_prakash/linux.git review/mmap_testcase [3]: https://git.morello-project.org/amitdaniel/linux.git review/purecap_mm_reservation_v1
Thanksm, Amit Daniel
Amit Daniel Kachhap (19): uapi: errno.h: Introduce PCuABI memory reservation error mm: Add capability reservation interfaces for PCuABI lib/cap_addr_mgmt: Add capability bound helpers for PCuABI mm/(mmap, mremap): Add flags to ignore reservation in unmap functions mm/mmap: Use the PCuABI reservations in mmap/munmap mm/(mmap, munmap): Limit reservation for only syscalls mm/mremap: Add the PCuABI reservation interfaces mm/mprotect: Add the PCuABI reservation interfaces mm/madvise: Add the PCuABI reservation interfaces mm/mlock: Add the PCuABI reservation interfaces mm/mincore: Add the PCuABI reservation interfaces mm/msync: Add the PCuABI reservation interfaces uapi: mman-common.h: Helpers for maximum capability permissions lib/cap_addr_mgmt: Add capability permission helpers for PCuABI lib/cap_addr_mgmt: Reduce the maximum protection check impact mm/mmap: Disable MAP_GROWSDOWN mapping flag for PCuABI mm/mmap: Add capability permission constraints for PCuABI mm/mremap: Add capability permission constraints for PCuABI mm/mprotect: Add capability permission constraints for PCuABI
arch/arm64/include/asm/cap_addr_mgmt.h | 22 +++ fs/aio.c | 2 +- include/linux/cap_addr_mgmt.h | 167 +++++++++++++++++ include/linux/cheri.h | 3 + include/linux/mm.h | 20 +- include/linux/mm_types.h | 3 + include/uapi/asm-generic/errno.h | 2 + include/uapi/asm-generic/mman-common.h | 6 + io_uring/advise.c | 2 +- ipc/shm.c | 2 +- kernel/fork.c | 8 + lib/Makefile | 1 + lib/cap_addr_mgmt.c | 250 +++++++++++++++++++++++++ mm/damon/vaddr.c | 2 +- mm/internal.h | 4 +- mm/madvise.c | 27 ++- mm/mincore.c | 18 +- mm/mlock.c | 37 +++- mm/mmap.c | 134 +++++++++++-- mm/mprotect.c | 22 ++- mm/mremap.c | 117 ++++++++++-- mm/msync.c | 17 +- mm/nommu.c | 2 +- mm/util.c | 16 +- 24 files changed, 808 insertions(+), 76 deletions(-) create mode 100644 arch/arm64/include/asm/cap_addr_mgmt.h create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 lib/cap_addr_mgmt.c
PCuABI specification introduces this error and is used to denote any error during managing memory reservations.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/uapi/asm-generic/errno.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h index cf9c51ac49f9..4589a3165fe1 100644 --- a/include/uapi/asm-generic/errno.h +++ b/include/uapi/asm-generic/errno.h @@ -120,4 +120,6 @@
#define EHWPOISON 133 /* Memory page has hardware error */
+#define ERESERVATION 192 /* PCuABI memory reservation error */ + #endif
PCuABI needs the address space reservation interfaces to manage the owning capability of the allocated addresses. This interface prevents two unrelated owning capabilities created by the kernel to overlap.
The reservation interface stores the ranges of different virtual addresses mappings and is tied to reservation which is same as the bound of the capability provided by the kernel to userspace. It also stores the owning capability permissions to manage the syscall requests for updating permissions.
Few basic rules are followed by the reservation interfaces:
- Reservations can only be created or destroyed and they are never expanded or shrunk. Reservations are created when new memory mapping is made outside of an existing reservation.
- A single reservation can have many mappings. However unused region of the reservation cannot be re-used again.
- Reservations start and end addresses are aligned to page size.
- Reservations length value is aligned to CHERI re-presentable length.
More rules about the address space reservation interface can be found in the PCuABI specification.
Here, We use maple tree library functions to create and use the reservation interface. This reservation interface supports four allowed operation (insert/delete/move/check) similar to the VMA update operations (create/delete/expand/shrink/move). These interfaces can be used before and after the VMA updates to implement the different reservation rules.
The different reservation API's are supposed to be called with {struct mm}.mmap_lock read/write lock.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/cap_addr_mgmt.h | 112 +++++++++++++++++++++ include/linux/mm_types.h | 3 + kernel/fork.c | 8 ++ lib/Makefile | 1 + lib/cap_addr_mgmt.c | 181 ++++++++++++++++++++++++++++++++++ 5 files changed, 305 insertions(+) create mode 100644 include/linux/cap_addr_mgmt.h create mode 100644 lib/cap_addr_mgmt.c
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h new file mode 100644 index 000000000000..fd67e9b21ecd --- /dev/null +++ b/include/linux/cap_addr_mgmt.h @@ -0,0 +1,112 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_CAP_ADDR_MGMT_H +#define _LINUX_CAP_ADDR_MGMT_H + +#include <linux/init.h> +#include <linux/maple_tree.h> +#include <linux/types.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI + +struct reserv_mt_entry { + unsigned long reserv_start; + unsigned long reserv_len; + unsigned long reserv_perm; +}; + +/** + * reserv_mt_insert_entry() - Add the reservation for the virtual address + * range from start to (start + len) with perm permission as the entry. + * @rv_mt: Maple tree pointer to insert the reservation entry. + * @start: Reservation start value. + * @len: Reservation length. + * @perm: Capability permission for the reserved range. + * + * Return: 0 if reservation entry added successfully or -ERESERVATION/-ENOMEM + * otherwise. + */ +int reserv_mt_insert_entry(struct maple_tree *rv_mt, unsigned long start, + unsigned long len, unsigned long perm); + +/** + * reserv_mt_capability_bound_valid() - Search and matches the reservation + * interface for the virtual address range derived from the capability bound + * values. + * @rv_mt: Maple tree pointer to search the reservation entry. + * @start: Reservation capability value. + * + * Return: True if reservation entry found with the exact capability bound or + * false otherwise. + */ +bool reserv_mt_capability_bound_valid(struct maple_tree *rv_mt, uintcap_t start); + +/** + * reserv_mt_range_valid() - Searches the reservation interface for the virtual + * address range from start to (start + len). This is useful to find any + * overlaps with the existing mappngs. + * @rv_mt: Maple tree pointer to search the reservation entry. + * @start: Virtual address start value. + * @len: Virtual address length. + * + * Return: True if the maple tree has any overlap with the given range or + * false otherwise. + */ +bool reserv_mt_range_valid(struct maple_tree *rv_mt, unsigned long start, + unsigned long len); + +/** + * reserv_mt_range_fully_mapped() - Searches the reservation interface for the + * virtual address range from start to (start + len). This is useful to find + * if the requested range maps exactly with the reserved range. + * @rv_mt: Maple tree pointer to search the reservation entry. + * @start: Virtual address start value. + * @len: Virtual address length. + * + * Return: True if the maple tree mapping matches fully with the given range or + * false otherwise. + */ +bool reserv_mt_range_fully_mapped(struct maple_tree *rv_mt, unsigned long start, + unsigned long len); +/** + * reserv_mt_move_entry() - Remove the old reservation for the virtual address range + * from old_start to (old_start + old_len) and add a new reservation with range + * new_start to (new_start + new_len) with the same perm permission as the entry. + * @rv_mt: Maple tree pointer to search/insert the reservation entry. + * @old_start: Reservation old start value. + * @old_len: Reservation old length. + * @new_start: Reservation new start value. + * @new_len: Reservation new length. + * @perm: Capability permission for the reserved range (out parameter). + * + * Return: 0 if reservation entry moved successfully or -ERESERVATION otherwise. + */ +int reserv_mt_move_entry(struct maple_tree *rv_mt, unsigned long old_start, + unsigned long old_len, unsigned long new_start, + unsigned long new_len, unsigned long *perm); + +/** + * reserv_mt_delete_range() - Deletes the maple tree entry for the virtual + * address range from start to (start + len). If the requested range does + * not match completely and falls in the start, end or in between then the + * entry is shrunk appropriately. + * @rv_mt: Maple tree pointer to search the reservation entry. + * @start: Virtual address start value. + * @len: Virtual address length. + * + * Return: 0 if virtual address range deleted successfully or -ERESERVATION + * otherwise. + */ +int reserv_mt_delete_range(struct maple_tree *rv_mt, unsigned long start, + unsigned long len); + +/** + * reserv_mt_init() - Initialises the reservation interfaces. + * + * Return: None. + */ +void __init reserv_mt_init(void); + +#endif /* CONFIG_CHERI_PURECAP_UABI */ + +#endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 12e87f83287d..81e8f80d5bd6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -584,6 +584,9 @@ struct kioctx_table; struct mm_struct { struct { struct maple_tree mm_mt; +#ifdef CONFIG_CHERI_PURECAP_UABI + struct maple_tree reserv_mt; /* Tree to hold reserved address ranges */ +#endif #ifdef CONFIG_MMU unsigned long (*get_unmapped_area) (struct file *filp, unsigned long addr, unsigned long len, diff --git a/kernel/fork.c b/kernel/fork.c index d6fd09ba8d0a..45083d3e92ab 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -99,6 +99,7 @@ #include <linux/stackprotector.h> #include <linux/user_events.h> #include <linux/iommu.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -1081,6 +1082,9 @@ void __init fork_init(void)
lockdep_init_task(&init_task); uprobes_init(); +#ifdef CONFIG_CHERI_PURECAP_UABI + reserv_mt_init(); +#endif }
int __weak arch_dup_task_struct(struct task_struct *dst, @@ -1259,6 +1263,10 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mt_init_flags(&mm->mm_mt, MM_MT_FLAGS); mt_set_external_lock(&mm->mm_mt, &mm->mmap_lock); +#ifdef CONFIG_CHERI_PURECAP_UABI + mt_init_flags(&mm->reserv_mt, MM_MT_FLAGS); + mt_set_external_lock(&mm->reserv_mt, &mm->mmap_lock); +#endif atomic_set(&mm->mm_users, 1); atomic_set(&mm->mm_count, 1); seqcount_init(&mm->write_protect_seq); diff --git a/lib/Makefile b/lib/Makefile index 3072f6caa337..0c3f6b57ca63 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -274,6 +274,7 @@ obj-$(CONFIG_POLYNOMIAL) += polynomial.o
obj-y += cheri.o obj-$(CONFIG_CHERI_PURECAP_UABI) += user_ptr.o +obj-$(CONFIG_CHERI_PURECAP_UABI) += cap_addr_mgmt.o
# stackdepot.c should not be instrumented or call instrumented functions. # Prevent the compiler from calling builtins like memcmp() or bcmp() from this diff --git a/lib/cap_addr_mgmt.c b/lib/cap_addr_mgmt.c new file mode 100644 index 000000000000..e22868506e70 --- /dev/null +++ b/lib/cap_addr_mgmt.c @@ -0,0 +1,181 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/slab.h> + +/* SLAB cache for reserv_mt_entry structures */ +static struct kmem_cache *reserv_mt_entry_cachep; + +static struct reserv_mt_entry *reserv_mt_alloc(void) +{ + struct reserv_mt_entry *rv_entry; + + rv_entry = kmem_cache_alloc(reserv_mt_entry_cachep, GFP_KERNEL); + + return rv_entry; +} + +int reserv_mt_insert_entry(struct maple_tree *rv_mt, unsigned long start, + unsigned long len, unsigned long perm) +{ + struct reserv_mt_entry *rv_entry; + unsigned long align_start = round_down(start, PAGE_SIZE); + unsigned long align_end = align_start + round_up(len, PAGE_SIZE) - 1; + MA_STATE(mas, rv_mt, align_start, align_end); + + rv_entry = reserv_mt_alloc(); + if (!rv_entry) + return -ENOMEM; + rv_entry->reserv_perm = perm; + rv_entry->reserv_start = align_start; + rv_entry->reserv_len = round_up(len, PAGE_SIZE); + + if (mas_store_gfp(&mas, rv_entry, GFP_KERNEL)) + return -ERESERVATION; + + return 0; +} + +bool reserv_mt_capability_bound_valid(struct maple_tree *rv_mt, uintcap_t start) +{ + struct reserv_mt_entry *rv_entry; + unsigned long align_start = cheri_base_get(start); + unsigned long align_end = align_start + cheri_length_get(start) - 1; + MA_STATE(mas, rv_mt, align_start, align_end); + + /* Check if there is match with the existing reservations */ + do { + rv_entry = mas_find(&mas, align_end); + if (!rv_entry) + return false; + if (rv_entry->reserv_start == align_start && + (rv_entry->reserv_start + cheri_representable_length(rv_entry->reserv_len) - 1) == align_end) + return true; + } while (1); + + return false; +} + +bool reserv_mt_range_valid(struct maple_tree *rv_mt, unsigned long start, + unsigned long len) +{ + unsigned long align_start = round_down(start, PAGE_SIZE); + unsigned long align_end = align_start + round_up(len, PAGE_SIZE) - 1; + MA_STATE(mas, rv_mt, align_start, align_end); + + /* Check if there is overlap with the existing mappings */ + if (mas_find(&mas, align_end)) + return true; + + return false; +} + +bool reserv_mt_range_fully_mapped(struct maple_tree *rv_mt, unsigned long start, + unsigned long len) +{ + unsigned long align_start = round_down(start, PAGE_SIZE); + unsigned long align_end = align_start + round_up(len, PAGE_SIZE) - 1; + struct reserv_mt_entry *rv_entry; + MA_STATE(mas, rv_mt, align_start, align_end); + + /* Try finding the given range */ + rv_entry = mas_find(&mas, align_end); + if (!rv_entry) + return false; + + /* Check if the range fully mapped */ + if (align_start != mas.index || align_end != mas.last || + mas.index != rv_entry->reserv_start || + mas.last != (rv_entry->reserv_start + rv_entry->reserv_len - 1)) + return false; + + return true; +} + +int reserv_mt_move_entry(struct maple_tree *rv_mt, unsigned long old_start, + unsigned long old_len, unsigned long new_start, + unsigned long new_len, unsigned long *perm) +{ + struct reserv_mt_entry *rv_entry; + unsigned long align_start = round_down(old_start, PAGE_SIZE); + unsigned long align_end = align_start + round_up(old_len, PAGE_SIZE) - 1; + MA_STATE(mas, rv_mt, align_start, align_end); + + /* Try finding the old range */ + rv_entry = mas_find(&mas, align_end); + if (!rv_entry) + return -ERESERVATION; + + if (align_start != mas.index || align_end != mas.last || + mas.index != rv_entry->reserv_start || + mas.last != (rv_entry->reserv_start + rv_entry->reserv_len - 1)) + return -ERESERVATION; /* Only full mapped range can be moved */ + + /* Try removing the old reservation */ + rv_entry = mas_erase(&mas); + if (!rv_entry) + return -ERESERVATION; + + align_start = round_down(new_start, PAGE_SIZE); + align_end = align_start + round_up(new_len, PAGE_SIZE) - 1; + mas_set_range(&mas, align_start, align_end); + rv_entry->reserv_start = align_start; + rv_entry->reserv_len = round_up(new_len, PAGE_SIZE); + if (mas_store_gfp(&mas, rv_entry, GFP_KERNEL)) + return -ERESERVATION; + *perm = rv_entry->reserv_perm; + + return 0; +} + +int reserv_mt_delete_range(struct maple_tree *rv_mt, unsigned long start, + unsigned long len) +{ + struct reserv_mt_entry *rv_entry, *rv_new; + unsigned long align_start = round_down(start, PAGE_SIZE); + unsigned long align_end = align_start + round_up(len, PAGE_SIZE) - 1; + unsigned long deleted_start, deleted_end; + MA_STATE(mas, rv_mt, align_start, align_end); + + rv_entry = mas_find(&mas, align_end); + if (!rv_entry) + return -ERESERVATION; + + /* mas_erase() used below does not retain the index so store it */ + deleted_start = mas.index; + deleted_end = mas.last; + + mas_erase(&mas); + /* Return if the deleted range matches with the requested range */ + if (align_start == deleted_start && align_end == deleted_end) + return 0; + mas.index = deleted_start; + mas.last = deleted_end; + /* Process if the deleted range falls in between, start or end */ + if (align_start > deleted_start && align_end < deleted_end) { + rv_new = reserv_mt_alloc(); + if (!rv_new) + return -ENOMEM; + memcpy(rv_new, rv_entry, sizeof(struct reserv_mt_entry)); + mas.last = deleted_start - 1; + if (mas_store_gfp(&mas, rv_new, GFP_KERNEL)) + return -ERESERVATION; + mas.index = align_end + 1; + mas.last = deleted_end; + } else if (align_start > deleted_start) { + mas.last = align_start - 1; + } else if (align_end < deleted_end) { + mas.index = align_end + 1; + } + if (mas_store_gfp(&mas, rv_entry, GFP_KERNEL)) + return -ERESERVATION; + + return 0; +} + +void __init reserv_mt_init(void) +{ + reserv_mt_entry_cachep = KMEM_CACHE(reserv_mt_entry, SLAB_PANIC|SLAB_ACCOUNT); +}
Helper functions such as capability_owns_range() and build_owning_capability() are added as per PCuABI specifications.
These may be helpful in adding different PCuABI reservation constraints.
Note: These helper functions do not check for capability permission constraints and full support will be added later.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/cap_addr_mgmt.h | 22 ++++++++++++++++++++++ lib/cap_addr_mgmt.c | 21 +++++++++++++++++++++ 2 files changed, 43 insertions(+)
diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index fd67e9b21ecd..c8040871316b 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -107,6 +107,28 @@ int reserv_mt_delete_range(struct maple_tree *rv_mt, unsigned long start, */ void __init reserv_mt_init(void);
+/** + * capability_owns_range() - Check if the address range is within the valid + * capability bound. + * @cap: A Capability value. + * @addr: Address start value. + * @len: Address length. + * + * Return: True if address within the capability bound or false otherwise. + */ +bool capability_owns_range(uintcap_t cap, unsigned long addr, unsigned long len); + +/** + * build_owning_capability() - Creates a userspace capability after converting + * protection flags to the relevant capability permission fl. + * @addr: Requested capability address. + * @len: Requested capability length. + * @prot: Requested protection flags. + * + * Return: A new capability derived from cheri_user_root_cap. + */ +uintcap_t build_owning_capability(unsigned long addr, unsigned long len, int prot); + #endif /* CONFIG_CHERI_PURECAP_UABI */
#endif /* _LINUX_CAP_ADDR_MGMT_H */ diff --git a/lib/cap_addr_mgmt.c b/lib/cap_addr_mgmt.c index e22868506e70..f6007a4e9c4e 100644 --- a/lib/cap_addr_mgmt.c +++ b/lib/cap_addr_mgmt.c @@ -179,3 +179,24 @@ void __init reserv_mt_init(void) { reserv_mt_entry_cachep = KMEM_CACHE(reserv_mt_entry, SLAB_PANIC|SLAB_ACCOUNT); } + +bool capability_owns_range(uintcap_t cap, unsigned long addr, unsigned long len) +{ + unsigned long align_addr = round_down(addr, PAGE_SIZE); + unsigned long align_len = cheri_representable_length(round_up(len, PAGE_SIZE)); + + return cheri_check_cap((const void * __capability)cheri_address_set(cap, align_addr), + align_len, CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM); +} + +uintcap_t build_owning_capability(unsigned long start, unsigned long len, int prot __maybe_unused) +{ + unsigned long align_start = round_down(start, PAGE_SIZE); + unsigned long align_len = cheri_representable_length(round_up(len, PAGE_SIZE)); + + /* TODO [PCuABI] - capability permission conversion from memory permission */ + cheri_perms_t perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + + return (uintcap_t)cheri_build_user_cap(align_start, align_len, perms); +}
do_vmi_munmap()/do_unmap() are the functions used in several places to unmap the memory mapping. However when we introduce PCuABI memory reservation interface then we need to ignore reservation in internal functions during memory unmapping, shrinking the mappings or merging the fragmented VMA's.
Both functions are modified to add flags to ignore PCuABI reservation and carry on with the usual unmapping activity. As do_unmap() is used in several external places so instead of adding a parameter to use reservation, an equivalent unmapping function do_munmap_use_reserv() is created.
These changes keep the functionality intact and will help to integrate reservation interfaces in different scenarios in subsequent commits.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/mm.h | 5 ++++- mm/mmap.c | 28 +++++++++++++++++++++++----- mm/mremap.c | 4 ++-- 3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index c1f4996a957f..1b32c2b81464 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3143,11 +3143,14 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); + extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, - bool downgrade); + bool downgrade, bool reserve_ignore); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); +extern int do_munmap_use_reserv(struct mm_struct *mm, unsigned long start, size_t len, + struct list_head *uf); extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
#ifdef CONFIG_MMU diff --git a/mm/mmap.c b/mm/mmap.c index bc422cc4a14b..f4a9099365bf 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2500,6 +2500,8 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, * @uf: The userfaultfd list_head * @downgrade: set to true if the user wants to attempt to write_downgrade the * mmap_lock + * @reserve_ignore: set to true if the user wants to ignore reservation + * completely or false if the user wants to strictly use reservation. * * This function takes a @mas that is either pointing to the previous VMA or set * to MA_START and sets it up to remove the mapping(s). The @len will be @@ -2509,7 +2511,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, */ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, - bool downgrade) + bool downgrade, bool reserve_ignore) { unsigned long end; struct vm_area_struct *vma; @@ -2543,9 +2545,25 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, { VMA_ITERATOR(vmi, mm, start);
- return do_vmi_munmap(&vmi, mm, start, len, uf, false); + return do_vmi_munmap(&vmi, mm, start, len, uf, false, true); }
+/* do_munmap_use_reserv() - Wrapper function for non-maple tree aware do_munmap() + * calls used in cases where PCuABI memory reservation is used. + * @mm: The mm_struct + * @start: The start address to munmap + * @len: The length to be munmapped. + * @uf: The userfaultfd list_head + */ +int do_munmap_use_reserv(struct mm_struct *mm, unsigned long start, size_t len, + struct list_head *uf) +{ + VMA_ITERATOR(vmi, mm, start); + + return do_vmi_munmap(&vmi, mm, start, len, uf, false, false); +} + + unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf) @@ -2577,7 +2595,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, }
/* Unmap any existing mapping in the area */ - if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) + if (do_vmi_munmap(&vmi, mm, addr, len, uf, false, true)) return -ENOMEM;
/* @@ -2804,7 +2822,7 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade) if (mmap_write_lock_killable(mm)) return -EINTR;
- ret = do_vmi_munmap(&vmi, mm, start, len, &uf, downgrade); + ret = do_vmi_munmap(&vmi, mm, start, len, &uf, downgrade, true); /* * Returning 1 indicates mmap_lock is downgraded. * But 1 is not legal return value of vm_munmap() and munmap(), reset @@ -3057,7 +3075,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags) if (ret) goto limits_failed;
- ret = do_vmi_munmap(&vmi, mm, addr, len, &uf, 0); + ret = do_vmi_munmap(&vmi, mm, addr, len, &uf, 0, true); if (ret) goto munmap_failed;
diff --git a/mm/mremap.c b/mm/mremap.c index b52592303e8b..305e7bcf06f9 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -703,7 +703,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, }
vma_iter_init(&vmi, mm, old_addr); - if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) { + if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false, true) < 0) { /* OOM: unable to split vma, just get accounts right */ if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) vm_acct_memory(old_len >> PAGE_SHIFT); @@ -994,7 +994,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len VMA_ITERATOR(vmi, mm, addr + new_len);
retval = do_vmi_munmap(&vmi, mm, addr + new_len, - old_len - new_len, &uf_unmap, true); + old_len - new_len, &uf_unmap, true, true); /* Returning 1 indicates mmap_lock is downgraded to read. */ if (retval == 1) { downgraded = true;
Use the recently introduced PCuABI reservation interfaces for mmap/munmap syscall. The capability returned by mmap syscalls are now bounded and also the range is preserved in reservation layer to perform different range constraint checks.
The kernel internal mapping functions vm_mmap and vm_munmap are also modified to accept user pointers. However, the users of vm_mmap/vm_munmap are not modified and hence the reservation interface might not be completely functional.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/cheri.h | 3 ++ include/linux/mm.h | 10 +++--- mm/mmap.c | 81 +++++++++++++++++++++++++++++++++++++++---- mm/util.c | 10 ++---- 4 files changed, 84 insertions(+), 20 deletions(-)
diff --git a/include/linux/cheri.h b/include/linux/cheri.h index e5f588b056ad..02ef0e911e63 100644 --- a/include/linux/cheri.h +++ b/include/linux/cheri.h @@ -37,6 +37,9 @@ (CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM) #endif
+#define cheri_is_null_derived(cap) \ + cheri_is_equal_exact((uintcap_t)cheri_address_get(cap), cap) + /** * cheri_build_user_cap() - Create a userspace capability. * @addr: Requested capability address. diff --git a/include/linux/mm.h b/include/linux/mm.h index 1b32c2b81464..4206b761d777 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3140,16 +3140,16 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf); -extern unsigned long do_mmap(struct file *file, unsigned long addr, +extern user_uintptr_t do_mmap(struct file *file, user_uintptr_t addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf);
extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, - unsigned long start, size_t len, struct list_head *uf, + user_uintptr_t start, size_t len, struct list_head *uf, bool downgrade, bool reserve_ignore); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); -extern int do_munmap_use_reserv(struct mm_struct *mm, unsigned long start, size_t len, +extern int do_munmap_use_reserv(struct mm_struct *mm, user_uintptr_t start, size_t len, struct list_head *uf); extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
@@ -3171,8 +3171,8 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} /* These take the mm semaphore themselves */ extern int __must_check vm_brk(unsigned long, unsigned long); extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long); -extern int vm_munmap(unsigned long, size_t); -extern unsigned long __must_check vm_mmap(struct file *, unsigned long, +extern int vm_munmap(user_uintptr_t, size_t); +extern user_uintptr_t __must_check vm_mmap(struct file *, user_uintptr_t, unsigned long, unsigned long, unsigned long, unsigned long);
diff --git a/mm/mmap.c b/mm/mmap.c index f4a9099365bf..8f9c3d8686ab 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,8 @@ #include <linux/sched/mm.h> #include <linux/ksm.h>
+#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -1224,7 +1226,7 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, /* * The caller must write-lock current->mm->mmap_lock. */ -unsigned long do_mmap(struct file *file, unsigned long addr, +user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1232,6 +1234,11 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct mm_struct *mm = current->mm; vm_flags_t vm_flags; int pkey = 0; + unsigned long addr = (ptraddr_t)user_addr; +#ifdef CONFIG_CHERI_PURECAP_UABI + bool is_reservation = false; + int ret; +#endif
validate_mm(mm); *populate = 0; @@ -1239,6 +1246,20 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (cheri_tag_get(user_addr)) { + if (!capability_owns_range(user_addr, addr, len)) + return -EINVAL; + if (!reserv_mt_range_fully_mapped(&mm->reserv_mt, addr, len) || + !reserv_mt_capability_bound_valid(&mm->reserv_mt, user_addr)) + return -ERESERVATION; + } else { + if (!cheri_is_null_derived(user_addr)) + return -EINVAL; + is_reservation = true; + } +#endif /* CONFIG_CHERI_PURECAP_UABI */ + /* * Does the application expect PROT_READ to imply PROT_EXEC? * @@ -1396,11 +1417,35 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (is_reservation) { + /* + * Check if there is any overlap with the existing reservation. + * This may help in filtering out any reservation error before + * the actual memory mapping. + */ + if (reserv_mt_range_valid(&mm->reserv_mt, addr, len)) + return -ERESERVATION; + } +#endif addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; +#ifdef CONFIG_CHERI_PURECAP_UABI + if (!IS_ERR_VALUE(addr)) { + if (is_reservation) { + ret = reserv_mt_insert_entry(&mm->reserv_mt, addr, len, prot); + if (ret) + return ret; + user_addr = build_owning_capability(addr, len, prot); + } else { + user_addr = cheri_address_set(user_addr, addr); + } + return user_addr; + } +#endif return addr; }
@@ -2510,11 +2555,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise. */ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, - unsigned long start, size_t len, struct list_head *uf, + user_uintptr_t user_start, size_t len, struct list_head *uf, bool downgrade, bool reserve_ignore) { unsigned long end; struct vm_area_struct *vma; + int ret; + unsigned long start = (ptraddr_t)user_start;
if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) return -EINVAL; @@ -2531,7 +2578,21 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (!vma) return 0;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade); +#ifdef CONFIG_CHERI_PURECAP_UABI + if (!reserve_ignore) { + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; + if (!reserv_mt_capability_bound_valid(&mm->reserv_mt, user_start)) + return -ERESERVATION; + } +#endif + + ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade); +#ifdef CONFIG_CHERI_PURECAP_UABI + if (!reserve_ignore && ret >= 0) + reserv_mt_delete_range(&mm->reserv_mt, start, len); +#endif + return ret; }
/* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls. @@ -2555,7 +2616,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, * @len: The length to be munmapped. * @uf: The userfaultfd list_head */ -int do_munmap_use_reserv(struct mm_struct *mm, unsigned long start, size_t len, +int do_munmap_use_reserv(struct mm_struct *mm, user_uintptr_t start, size_t len, struct list_head *uf) { VMA_ITERATOR(vmi, mm, start); @@ -2812,17 +2873,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return error; }
-static int __vm_munmap(unsigned long start, size_t len, bool downgrade) +static int __vm_munmap(user_uintptr_t user_start, size_t len, bool downgrade) { int ret; struct mm_struct *mm = current->mm; + unsigned long start = (ptraddr_t)user_start; LIST_HEAD(uf); VMA_ITERATOR(vmi, mm, start);
if (mmap_write_lock_killable(mm)) return -EINTR;
- ret = do_vmi_munmap(&vmi, mm, start, len, &uf, downgrade, true); + ret = do_vmi_munmap(&vmi, mm, user_start, len, &uf, downgrade, false); /* * Returning 1 indicates mmap_lock is downgraded. * But 1 is not legal return value of vm_munmap() and munmap(), reset @@ -2838,7 +2900,8 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade) return ret; }
-int vm_munmap(unsigned long start, size_t len) +/* TODO [PCuABI] - Update the users of vm_munmap */ +int vm_munmap(user_uintptr_t start, size_t len) { return __vm_munmap(start, len, false); } @@ -2846,7 +2909,11 @@ EXPORT_SYMBOL(vm_munmap);
SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) { +#ifdef CONFIG_CHERI_PURECAP_UABI + addr = cheri_address_set(addr, untagged_addr(cheri_address_get(addr))); +#else addr = untagged_addr(addr); +#endif return __vm_munmap(addr, len, true); }
diff --git a/mm/util.c b/mm/util.c index 61de3bf7712b..6f5d9d864643 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,24 +540,18 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - /* - * TODO [PCuABI] - might need propagating uintcap further down - * to do_mmap to properly handle capabilities - */ ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); - /* TODO [PCuABI] - derive proper capability */ - if (!IS_ERR_VALUE(ret)) - ret = (user_uintptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); } return ret; }
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t addr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset) {
On 10/08/2023 10:03, Amit Daniel Kachhap wrote:
@@ -1396,11 +1417,35 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; } +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (is_reservation) {
/*
* Check if there is any overlap with the existing reservation.
* This may help in filtering out any reservation error before
* the actual memory mapping.
*/
if (reserv_mt_range_valid(&mm->reserv_mt, addr, len))
This is fundamentally incompatible with the way reservations are defined. If a new reservation is to be created (because addr is null-derived), then it is the kernel's job to find an appropriate range for that reservation. *This cannot fail*, unless we have actually run out of address space.
Concretely, this means that get_unmapped_area() needs to provide us with an address at which we can create the reservation (in other words, where the range can be represented as capability bounds). Unfortunately that may require some invasive changes...
return -ERESERVATION;
- }
+#endif addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!IS_ERR_VALUE(addr)) {
if (is_reservation) {
ret = reserv_mt_insert_entry(&mm->reserv_mt, addr, len, prot);
if (ret)
return ret;
user_addr = build_owning_capability(addr, len, prot);
The range that is associated with a reservation must be representable as capability bounds. That's precisely because we set the bounds of the returned capability exactly to that range. At the moment there doesn't seem to be any handling of representability, which requires both that the base is sufficiently aligned (as per cheri_representable_alignment_mask()) and that the length is representable (as per cheri_representable_length()).
} else {
user_addr = cheri_address_set(user_addr, addr);
}
We need to be careful to leave compat64 unchanged. Most of these changes will got unnoticed in compat64, but really it does not make sense to manipulate reservations at all in that case (and changes in further patches introduce undesirable changes in semantics). We should probably try to replace all these direct additions with hooks, which would do nothing in compat64. Having all the hooks for all the mm syscalls in the same file feels quite attractive, and would drastically reduce the need for #ifdef'ing in the core mm codebase. The reservation management probably needs to be separated from the capability handling too (see next comment) - I suspect we could get away with having the reservation interface be a no-op in !PCuABI to avoid #ifdefs (without using hooks as such in that case).
return user_addr;
- }
+#endif return addr; } @@ -2510,11 +2555,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
- Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
*/ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
unsigned long start, size_t len, struct list_head *uf,
bool downgrade, bool reserve_ignore)user_uintptr_t user_start, size_t len, struct list_head *uf,
{ unsigned long end; struct vm_area_struct *vma;
- int ret;
- unsigned long start = (ptraddr_t)user_start;
if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) return -EINVAL; @@ -2531,7 +2578,21 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (!vma) return 0;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
+#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!reserve_ignore) {
Reservations should never be bypassed entirely. For our model to work, every single mapping must be contained within a reservation (any mapping outside of a reservation could be overwritten using MAP_FIXED, which is something we want to prevent in PCuABI). Regardless of the caller (kernel or user), mmap() must create a new reservation if needed, and munmap() destroy it when removing the last mapping inside a reservation.
It seems that these conditionals are conflating reservation management and capability handling. As mentioned the former is always required. I do see the issue with the latter though: the kernel is not manipulating mappings through capabilities, and we cannot realistically change this in a hybrid kernel. Still, I find using an extra argument to skip the capability handling rather unpleasant. I can think of two ways to avoid it:
1. Move all the capability handling outside of the common function (used by both the kernel and user). That would be ideal if possible.
2. Get the kernel to use some wrapper that creates an appropriate capability to pass to the handler. This could make sense if that particular function is rarely called by the kernel, but it is otherwise a bit dubious as the capability manipulation is unnecessary.
if (!capability_owns_range(user_start, start, len))
return -EINVAL;
if (!reserv_mt_capability_bound_valid(&mm->reserv_mt, user_start))
return -ERESERVATION;
- }
+#endif
- ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
+#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!reserve_ignore && ret >= 0)
reserv_mt_delete_range(&mm->reserv_mt, start, len);
Reservations are immutable - they are never expanded or shrunk, only destroyed. The complete set of rules can be found in the "Reservations" section of the PCuABI spec [1].
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca...
Hi kevin, On 8/16/23 19:18, Kevin Brodsky wrote:
On 10/08/2023 10:03, Amit Daniel Kachhap wrote:
@@ -1396,11 +1417,35 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; } +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (is_reservation) {
/*
* Check if there is any overlap with the existing reservation.
* This may help in filtering out any reservation error before
* the actual memory mapping.
*/
if (reserv_mt_range_valid(&mm->reserv_mt, addr, len))
This is fundamentally incompatible with the way reservations are defined. If a new reservation is to be created (because addr is null-derived), then it is the kernel's job to find an appropriate range for that reservation. *This cannot fail*, unless we have actually run out of address space.
Concretely, this means that get_unmapped_area() needs to provide us with an address at which we can create the reservation (in other words, where the range can be represented as capability bounds). Unfortunately that may require some invasive changes...
Yes I agree with your suggestion that get_unmapped_area() should not fail.
return -ERESERVATION;
- }
+#endif addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!IS_ERR_VALUE(addr)) {
if (is_reservation) {
ret = reserv_mt_insert_entry(&mm->reserv_mt, addr, len, prot);
if (ret)
return ret;
user_addr = build_owning_capability(addr, len, prot);
The range that is associated with a reservation must be representable as capability bounds. That's precisely because we set the bounds of the returned capability exactly to that range. At the moment there doesn't seem to be any handling of representability, which requires both that the base is sufficiently aligned (as per cheri_representable_alignment_mask()) and that the length is representable (as per cheri_representable_length()).
I am considering cheri_representable_length() while comparing reservation length in function reserv_mt_capability_bound_valid(). However the maple entry stores them as simple length. I think there was requirement of getting the actual length as the reverse of cheri_representable_length() is not possible.
There was some issue in using cheri_representable_alignment_mask() as it was masking lot of bits. I will share more details on this as I didn't analyzed the root cause earlier.
} else {
user_addr = cheri_address_set(user_addr, addr);
}
We need to be careful to leave compat64 unchanged. Most of these changes will got unnoticed in compat64, but really it does not make sense to manipulate reservations at all in that case (and changes in further patches introduce undesirable changes in semantics). We should probably try to replace all these direct additions with hooks, which would do nothing in compat64. Having all the hooks for all the mm syscalls in the same file feels quite attractive, and would drastically reduce the need for #ifdef'ing in the core mm codebase. The reservation management probably needs to be separated from the capability handling too (see next comment) - I suspect we could get away with having the reservation interface be a no-op in !PCuABI to avoid #ifdefs (without using hooks as such in that case).
I will try to consider above suggestion in the next version.
return user_addr;
- }
+#endif return addr; } @@ -2510,11 +2555,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
- Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
*/ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
unsigned long start, size_t len, struct list_head *uf,
bool downgrade, bool reserve_ignore) { unsigned long end; struct vm_area_struct *vma;user_uintptr_t user_start, size_t len, struct list_head *uf,
- int ret;
- unsigned long start = (ptraddr_t)user_start;
if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) return -EINVAL; @@ -2531,7 +2578,21 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (!vma) return 0;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
+#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!reserve_ignore) {
Reservations should never be bypassed entirely. For our model to work, every single mapping must be contained within a reservation (any mapping outside of a reservation could be overwritten using MAP_FIXED, which is something we want to prevent in PCuABI). Regardless of the caller (kernel or user), mmap() must create a new reservation if needed, and munmap() destroy it when removing the last mapping inside a reservation.
do_vmi_munmap() called from munmap() syscall doesn't ignore the reservation. But do_vmi_munmap() indirectly called from mmap() or mremap() syscall sometimes need to ignore reservation as the reservation handling is already done there. May be some better code restructuring will make it more explicit.
It seems that these conditionals are conflating reservation management and capability handling. As mentioned the former is always required. I do see the issue with the latter though: the kernel is not manipulating mappings through capabilities, and we cannot realistically change this in a hybrid kernel. Still, I find using an extra argument to skip the capability handling rather unpleasant. I can think of two ways to avoid it:
- Move all the capability handling outside of the common function (used
by both the kernel and user). That would be ideal if possible.
- Get the kernel to use some wrapper that creates an appropriate
capability to pass to the handler. This could make sense if that particular function is rarely called by the kernel, but it is otherwise a bit dubious as the capability manipulation is unnecessary.
I will try to consider above suggestion in the next version.
if (!capability_owns_range(user_start, start, len))
return -EINVAL;
if (!reserv_mt_capability_bound_valid(&mm->reserv_mt, user_start))
return -ERESERVATION;
- }
+#endif
- ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
+#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!reserve_ignore && ret >= 0)
reserv_mt_delete_range(&mm->reserv_mt, start, len);
Reservations are immutable - they are never expanded or shrunk, only destroyed. The complete set of rules can be found in the "Reservations" section of the PCuABI spec [1].
In this RFC version, reservation layer stores each mappings along with their root reservation and also does VMA type split operation here if required. Hopefully I can simplify these things and merge the reservation interface inside the VMA.
Amit
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca...
On 17/08/2023 11:21, Amit Daniel Kachhap wrote:
+Â Â Â ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade); +#ifdef CONFIG_CHERI_PURECAP_UABI +Â Â Â if (!reserve_ignore && ret >= 0) +Â Â Â Â Â Â Â reserv_mt_delete_range(&mm->reserv_mt, start, len);
Reservations are immutable - they are never expanded or shrunk, only destroyed. The complete set of rules can be found in the "Reservations" section of the PCuABI spec [1].
In this RFC version, reservation layer stores each mappings along with their root reservation and also does VMA type split operation here if required. Hopefully I can simplify these things and merge the reservation interface inside the VMA.
Fair enough, I have to admit I hadn't looked at patch 2 closely. I now understand the logic better. However, I think there is another fundamental issue with that design. When creating a new reservation, we need to find an appropriate location. I mentioned earlier that this range needs to have representable bounds. There is however an even more important requirement: this range must not overlap with any other reservation. get_unmapped_area() therefore needs to consider existing *reservations* to find a hole, and mappings become irrelevant. (Similarly, MAP_FIXED with a null-derived capability must fail if the new reservation would overlap with an existing one.)
With the current data structure, finding a "reservation hole" is rather awkward, since we cannot directly iterate over reservations. If we move to a model where we directly tag each VMA with its reservation, I think the issue remains the same: we do not have direct access to the reservations.
This is the reason why I initially assumed that the data structure represented *reservation* ranges, not mapping ranges, because the former is what really matters to us. Each reservation could then have a list of mappings associated to it, but that may not even be required - when unmapping, we could simply query the VMAs for the corresponding reservation range and see if there's any left. Maybe I'm missing something though?
Kevin
Hi Kevin,
On 8/28/23 13:25, Kevin Brodsky wrote:
On 17/08/2023 11:21, Amit Daniel Kachhap wrote:
+Â Â Â ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade); +#ifdef CONFIG_CHERI_PURECAP_UABI +Â Â Â if (!reserve_ignore && ret >= 0) +Â Â Â Â Â Â Â reserv_mt_delete_range(&mm->reserv_mt, start, len);
Reservations are immutable - they are never expanded or shrunk, only destroyed. The complete set of rules can be found in the "Reservations" section of the PCuABI spec [1].
In this RFC version, reservation layer stores each mappings along with their root reservation and also does VMA type split operation here if required. Hopefully I can simplify these things and merge the reservation interface inside the VMA.
Fair enough, I have to admit I hadn't looked at patch 2 closely. I now understand the logic better. However, I think there is another fundamental issue with that design. When creating a new reservation, we need to find an appropriate location. I mentioned earlier that this range needs to have representable bounds. There is however an even more important requirement: this range must not overlap with any other reservation. get_unmapped_area() therefore needs to consider existing *reservations* to find a hole, and mappings become irrelevant.
Agree.
(Similarly, MAP_FIXED with a null-derived capability must fail if the new reservation would overlap with an existing one.)
Ok.
With the current data structure, finding a "reservation hole" is rather awkward, since we cannot directly iterate over reservations. If we move to a model where we directly tag each VMA with its reservation, I think the issue remains the same: we do not have direct access to the reservations.
I had a look at the unmapped_area_topdown() implementation. The good thing is that there already exists retry implementation. We can use reservation start/end limits directly from vma structure in vm_start_gap()/vm_end_gap() function.
This is the reason why I initially assumed that the data structure represented *reservation* ranges, not mapping ranges, because the former is what really matters to us. Each reservation could then have a list of mappings associated to it, but that may not even be required - when unmapping, we could simply query the VMAs for the corresponding reservation range and see if there's any left. Maybe I'm missing something though?
Agree that searching based on reservation range is the optimal method here but managing the multiple mapping range within the reservation will again need a new maple tree or something. I think most of the time searching from vma range does the job. We can revisit this later if required.
Amit Daniel
Kevin
vm_mmap/vm_munmap/do_mmap are used across several places like filesystems, loaders and drivers to create memory mappings in the kernel.
At this point they have not been updated to handle user capabilities so limit PCuABI reservation to syscalls only.
Note: This commit may be temporary till full PCuABI support is added in kernel.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- fs/aio.c | 2 +- include/linux/mm.h | 4 +++- ipc/shm.c | 2 +- mm/internal.h | 4 ++-- mm/mmap.c | 42 +++++++++++++++++++++++------------------- mm/nommu.c | 2 +- mm/util.c | 6 +++--- 7 files changed, 34 insertions(+), 28 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c index 83ab611483ba..eadd9173b269 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -580,7 +580,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, &unused, NULL, true); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 4206b761d777..67fe51267564 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3140,9 +3140,11 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf); + extern user_uintptr_t do_mmap(struct file *file, user_uintptr_t addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + unsigned long pgoff, unsigned long *populate, struct list_head *uf, + bool reserv_ignore);
extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, user_uintptr_t start, size_t len, struct list_head *uf, diff --git a/ipc/shm.c b/ipc/shm.c index 16b75e8bcda1..87cd93788c5a 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1671,7 +1671,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; }
- addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL, true); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/internal.h b/mm/internal.h index 4c26360de5a7..b5fdb2666577 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -779,8 +779,8 @@ extern u64 hwpoison_filter_memcg; extern u32 hwpoison_filter_enable;
extern user_uintptr_t __must_check vm_mmap_pgoff(struct file *, user_uintptr_t, - unsigned long, unsigned long, - unsigned long, unsigned long); + unsigned long, unsigned long, + unsigned long, unsigned long, bool reserve_ignore);
extern void set_pageblock_order(void); unsigned long reclaim_pages(struct list_head *folio_list); diff --git a/mm/mmap.c b/mm/mmap.c index 8f9c3d8686ab..34880a7c3c30 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1229,7 +1229,8 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long *populate, struct list_head *uf, + bool reserve_ignore) { struct mm_struct *mm = current->mm; vm_flags_t vm_flags; @@ -1247,16 +1248,18 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, return -EINVAL;
#ifdef CONFIG_CHERI_PURECAP_UABI - if (cheri_tag_get(user_addr)) { - if (!capability_owns_range(user_addr, addr, len)) - return -EINVAL; - if (!reserv_mt_range_fully_mapped(&mm->reserv_mt, addr, len) || - !reserv_mt_capability_bound_valid(&mm->reserv_mt, user_addr)) - return -ERESERVATION; - } else { - if (!cheri_is_null_derived(user_addr)) - return -EINVAL; - is_reservation = true; + if (!reserve_ignore) { + if (cheri_tag_get(user_addr)) { + if (!capability_owns_range(user_addr, addr, len)) + return -EINVAL; + if (!reserv_mt_range_fully_mapped(&mm->reserv_mt, addr, len) || + !reserv_mt_capability_bound_valid(&mm->reserv_mt, user_addr)) + return -ERESERVATION; + } else { + if (!cheri_is_null_derived(user_addr)) + return -EINVAL; + is_reservation = true; + } } #endif /* CONFIG_CHERI_PURECAP_UABI */
@@ -1418,7 +1421,7 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, }
#ifdef CONFIG_CHERI_PURECAP_UABI - if (is_reservation) { + if (!reserve_ignore && is_reservation) { /* * Check if there is any overlap with the existing reservation. * This may help in filtering out any reservation error before @@ -1434,7 +1437,7 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; #ifdef CONFIG_CHERI_PURECAP_UABI - if (!IS_ERR_VALUE(addr)) { + if (!reserve_ignore && !IS_ERR_VALUE(addr)) { if (is_reservation) { ret = reserv_mt_insert_entry(&mm->reserv_mt, addr, len, prot); if (ret) @@ -1487,7 +1490,7 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, return PTR_ERR(file); }
- retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); + retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, false); out_fput: if (file) fput(file); @@ -2873,7 +2876,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return error; }
-static int __vm_munmap(user_uintptr_t user_start, size_t len, bool downgrade) +static int __vm_munmap(user_uintptr_t user_start, size_t len, bool downgrade, + bool reserve_ignore) { int ret; struct mm_struct *mm = current->mm; @@ -2884,7 +2888,7 @@ static int __vm_munmap(user_uintptr_t user_start, size_t len, bool downgrade) if (mmap_write_lock_killable(mm)) return -EINTR;
- ret = do_vmi_munmap(&vmi, mm, user_start, len, &uf, downgrade, false); + ret = do_vmi_munmap(&vmi, mm, user_start, len, &uf, downgrade, reserve_ignore); /* * Returning 1 indicates mmap_lock is downgraded. * But 1 is not legal return value of vm_munmap() and munmap(), reset @@ -2903,7 +2907,7 @@ static int __vm_munmap(user_uintptr_t user_start, size_t len, bool downgrade) /* TODO [PCuABI] - Update the users of vm_munmap */ int vm_munmap(user_uintptr_t start, size_t len) { - return __vm_munmap(start, len, false); + return __vm_munmap(start, len, false, true); } EXPORT_SYMBOL(vm_munmap);
@@ -2914,7 +2918,7 @@ SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) #else addr = untagged_addr(addr); #endif - return __vm_munmap(addr, len, true); + return __vm_munmap(addr, len, true, false); }
@@ -2990,7 +2994,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, pgoff, &populate, NULL, true); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index f670d9979a26..b7d6a9180ca8 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1255,7 +1255,7 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, goto out; }
- retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); + retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
if (file) fput(file); diff --git a/mm/util.c b/mm/util.c index 6f5d9d864643..528d42d5211f 100644 --- a/mm/util.c +++ b/mm/util.c @@ -529,7 +529,7 @@ EXPORT_SYMBOL_GPL(account_locked_vm);
user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, unsigned long len, unsigned long prot, - unsigned long flag, unsigned long pgoff) + unsigned long flag, unsigned long pgoff, bool reserve_ignore) { user_uintptr_t ret; struct mm_struct *mm = current->mm; @@ -541,7 +541,7 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, if (mmap_write_lock_killable(mm)) return -EINTR; ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, - &uf); + &uf, reserve_ignore); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) @@ -560,7 +560,7 @@ user_uintptr_t vm_mmap(struct file *file, user_uintptr_t addr, if (unlikely(offset_in_page(offset))) return -EINVAL;
- return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT); + return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT, true); } EXPORT_SYMBOL(vm_mmap);
Use the recently introduced PCuABI reservation interfaces for mremap syscall. The mremap PCuABI specification does not allow expanding the existing mappings so they are moved if MREMAP_MAYMOVE flag is present. Use the relevant capability constraint checks in the input user memory addresses.
Here we use do_munmap_use_reserv() version of unmap function to remove/shrink the mapping of the memory.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mremap.c | 110 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 93 insertions(+), 17 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c index 305e7bcf06f9..2ed627c4c25d 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -25,6 +25,7 @@ #include <linux/uaccess.h> #include <linux/userfaultfd_k.h> #include <linux/mempolicy.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/cacheflush.h> #include <asm/tlb.h> @@ -785,16 +786,21 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, return vma; }
-static unsigned long mremap_to(unsigned long addr, unsigned long old_len, - unsigned long new_addr, unsigned long new_len, bool *locked, +static user_uintptr_t mremap_to(user_uintptr_t user_addr, unsigned long old_len, + user_uintptr_t user_new_addr, unsigned long new_len, bool *locked, unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - unsigned long ret = -EINVAL; + user_uintptr_t ret = -EINVAL; unsigned long map_flags = 0; + unsigned long addr = (ptraddr_t)user_addr; + unsigned long new_addr = (ptraddr_t)user_new_addr; +#ifdef CONFIG_CHERI_PURECAP_UABI + unsigned long old_perm; +#endif
if (offset_in_page(new_addr)) goto out; @@ -824,13 +830,13 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, return -ENOMEM;
if (flags & MREMAP_FIXED) { - ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); + ret = do_munmap_use_reserv(mm, user_new_addr, new_len, uf_unmap_early); if (ret) goto out; }
if (old_len > new_len) { - ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap); + ret = do_munmap_use_reserv(mm, user_addr + new_len, old_len - new_len, uf_unmap); if (ret) goto out; old_len = new_len; @@ -865,9 +871,27 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (!(flags & MREMAP_FIXED)) new_addr = ret;
+#ifdef CONFIG_CHERI_PURECAP_UABI + /* + * Remove the old length reservation and create a new length + * reservation. If move_vma() fails then restore the old reservation. + * Reservation is moved before move_vma() as any failure in moving + * reservation will return and will not proceed to move_vma(). + */ + ret = reserv_mt_move_entry(&mm->reserv_mt, addr, old_len, new_addr, new_len, &old_perm); + if (ret) + goto out; +#endif ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap);
+#ifdef CONFIG_CHERI_PURECAP_UABI + /* If any error in move_vma() then revert the reservations. */ + if (IS_ERR_VALUE(ret)) + reserv_mt_move_entry(&mm->reserv_mt, new_addr, new_len, addr, old_len, &old_perm); + else + ret = build_owning_capability(new_addr, new_len, old_perm); +#endif out: return ret; } @@ -893,9 +917,9 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise * This option implies MREMAP_MAYMOVE. */ -SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len, +SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_addr, unsigned long, old_len, unsigned long, new_len, unsigned long, flags, - user_uintptr_t, new_addr) + user_uintptr_t, user_new_addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; @@ -903,10 +927,14 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len bool locked = false; bool downgraded = false; struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; + unsigned long addr = (ptraddr_t)user_addr; + unsigned long new_addr = (ptraddr_t)user_new_addr; +#ifdef CONFIG_CHERI_PURECAP_UABI + unsigned long old_perm; +#endif LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap);
- /* @TODO [PCuABI] - capability validation */ /* * There is a deliberate asymmetry here: we strip the pointer tag * from the old address but leave the new address alone. This is @@ -918,6 +946,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len * See Documentation/arm64/tagged-address-abi.rst for more information. */ addr = untagged_addr(addr); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_addr = cheri_address_set(user_addr, addr); +#endif
if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; @@ -948,6 +979,29 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len if (!new_len) return ret;
+#ifdef CONFIG_CHERI_PURECAP_UABI + if (!capability_owns_range(user_addr, addr, old_len ? old_len : new_len)) + return ret; + if (!reserv_mt_capability_bound_valid(&mm->reserv_mt, user_addr)) + return -ERESERVATION; + if (flags & MREMAP_FIXED) { + if (cheri_tag_get(user_new_addr)) { + if (!capability_owns_range(user_new_addr, new_addr, new_len)) + return ret; + } else { + if (!cheri_is_null_derived(user_new_addr)) + return ret; + } + } + /* + * If new_len > old_len and flags does not contain MREMAP_MAYMOVE + * then this fails as PCuABI does not allow increasing reservation. + */ + if (new_len > old_len && !(flags & MREMAP_MAYMOVE)) + return -ERESERVATION; + +#endif + if (mmap_write_lock_killable(current->mm)) return -EINTR; vma = vma_lookup(mm, addr); @@ -993,8 +1047,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len int retval; VMA_ITERATOR(vmi, mm, addr + new_len);
- retval = do_vmi_munmap(&vmi, mm, addr + new_len, - old_len - new_len, &uf_unmap, true, true); + /* Call the unmap considering reservation here */ + retval = do_vmi_munmap(&vmi, mm, user_addr + new_len, + old_len - new_len, &uf_unmap, true, false); /* Returning 1 indicates mmap_lock is downgraded to read. */ if (retval == 1) { downgraded = true; @@ -1003,7 +1058,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
- ret = addr; + ret = user_addr; goto out; }
@@ -1019,8 +1074,13 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len /* old_len exactly to the end of the area.. */ if (old_len == vma->vm_end - addr) { - /* can we just expand the current mapping? */ - if (vma_expandable(vma, new_len - old_len)) { + /* + * can we just expand the current mapping? + * PCuABI specification does not allow increasing reservation + * size so just skip this path. + */ + if (!IS_ENABLED(CONFIG_CHERI_PURECAP_UABI) && + vma_expandable(vma, new_len - old_len)) { long pages = (new_len - old_len) >> PAGE_SHIFT; unsigned long extension_start = addr + old_len; unsigned long extension_end = addr + new_len; @@ -1083,8 +1143,27 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len goto out; }
+#ifdef CONFIG_CHERI_PURECAP_UABI + /* + * Remove old length reservation and create new length reservation. + * If move_vma() fails then restore back the old reservation. + * Reservation is moved before move_vma() as any failure in moving + * reservation will return and will not proceed to move_vma(). + */ + ret = reserv_mt_move_entry(&mm->reserv_mt, addr, old_len, + new_addr, new_len, &old_perm); + if (ret) + goto out; +#endif ret = move_vma(vma, addr, old_len, new_len, new_addr, &locked, flags, &uf, &uf_unmap); +#ifdef CONFIG_CHERI_PURECAP_UABI + if (IS_ERR_VALUE(ret)) + reserv_mt_move_entry(&mm->reserv_mt, new_addr, new_len, + addr, old_len, &old_perm); + else + ret = build_owning_capability(new_addr, new_len, old_perm); +#endif } out: if (offset_in_page(ret)) @@ -1098,8 +1177,5 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len userfaultfd_unmap_complete(mm, &uf_unmap_early); mremap_userfaultfd_complete(&uf, addr, ret, old_len); userfaultfd_unmap_complete(mm, &uf_unmap); - /* TODO [PCuABI] - derive proper capability */ - return IS_ERR_VALUE(ret) ? - ret : - (user_intptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); + return ret; }
Use the recently introduced PCuABI reservation interfaces to verify the address range for mprotect syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mprotect.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index c9188e2cb2a6..68b190cbc493 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -32,6 +32,7 @@ #include <linux/sched/sysctl.h> #include <linux/userfaultfd_k.h> #include <linux/memory-tiers.h> +#include <linux/cap_addr_mgmt.h> #include <asm/cacheflush.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> @@ -728,7 +729,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, /* * pkey==-1 when doing a legacy mprotect() */ -static int do_mprotect_pkey(user_uintptr_t start, size_t len, +static int do_mprotect_pkey(user_uintptr_t user_start, size_t len, unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; @@ -739,9 +740,7 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, (prot & PROT_READ); struct mmu_gather tlb; struct vma_iterator vmi; - - /* TODO [PCuABI] - capability checks for uaccess */ - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)user_start);
prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ @@ -755,6 +754,12 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, end = start + len; if (end <= start) return -ENOMEM; + +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif if (!arch_validate_prot(prot, start)) return -EINVAL;
@@ -777,6 +782,13 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, if (!vma) goto out;
+#ifdef CONFIG_CHERI_PURECAP_UABI + /* Check if the capability range is valid with mmap lock. */ + if (!reserv_mt_capability_bound_valid(¤t->mm->reserv_mt, user_start)) { + error = -ERESERVATION; + goto out; + } +#endif if (unlikely(grows & PROT_GROWSDOWN)) { if (vma->vm_start >= end) goto out;
Use the recently introduced PCuABI reservation interfaces to verify the address range for madvise syscall.
do_madvise() function is used by virtual address monitoring damon and this may not satisfy the reservation range criteria so add a parameter to skip the reservation checks.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/linux/mm.h | 3 ++- io_uring/advise.c | 2 +- mm/damon/vaddr.c | 2 +- mm/madvise.c | 27 +++++++++++++++++++++++---- 4 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 67fe51267564..db2573e8eb15 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3153,7 +3153,8 @@ extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); extern int do_munmap_use_reserv(struct mm_struct *mm, user_uintptr_t start, size_t len, struct list_head *uf); -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); +extern int do_madvise(struct mm_struct *mm, user_uintptr_t start, size_t len_in, + int behavior, bool reserv_ignore);
#ifdef CONFIG_MMU extern int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, diff --git a/io_uring/advise.c b/io_uring/advise.c index 952d9289a311..2e43142cf4df 100644 --- a/io_uring/advise.c +++ b/io_uring/advise.c @@ -55,7 +55,7 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags) WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
/* TODO [PCuABI] - capability checks for uaccess */ - ret = do_madvise(current->mm, user_ptr_addr(ma->addr), ma->len, ma->advice); + ret = do_madvise(current->mm, (user_uintptr_t)ma->addr, ma->len, ma->advice, false); io_req_set_res(req, ret, 0); return IOU_OK; #else diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 1fec16d7263e..fcdd3f4f608f 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -623,7 +623,7 @@ static unsigned long damos_madvise(struct damon_target *target, if (!mm) return 0;
- applied = do_madvise(mm, start, len, behavior) ? 0 : len; + applied = do_madvise(mm, start, len, behavior, true) ? 0 : len; mmput(mm);
return applied; diff --git a/mm/madvise.c b/mm/madvise.c index ad59e1e07ec9..5e4ef37cbfd4 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -31,6 +31,7 @@ #include <linux/swapops.h> #include <linux/shmem_fs.h> #include <linux/mmu_notifier.h> +#include <linux/cap_addr_mgmt.h>
#include <asm/tlb.h>
@@ -1382,13 +1383,15 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. */ -int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) +int do_madvise(struct mm_struct *mm, user_uintptr_t user_start, size_t len_in, + int behavior, bool reserv_ignore) { unsigned long end; int error; int write; size_t len; struct blk_plug plug; + unsigned long start = (ptraddr_t)user_start;
if (!madvise_behavior_valid(behavior)) return -EINVAL; @@ -1421,14 +1424,30 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh mmap_read_lock(mm); }
- /* TODO [PCuABI] - capability checks for uaccess */ start = untagged_addr_remote(mm, start); end = start + len;
+#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!reserv_ignore) { + if (!capability_owns_range(user_start, start, len)) { + error = -EINVAL; + goto out; + } + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_mt_capability_bound_valid(&mm->reserv_mt, user_start)) { + error = -ERESERVATION; + goto out; + } + } +#endif blk_start_plug(&plug); error = madvise_walk_vmas(mm, start, end, behavior, madvise_vma_behavior); blk_finish_plug(&plug); +#ifdef CONFIG_CHERI_PURECAP_UABI +out: +#endif if (write) mmap_write_unlock(mm); else @@ -1439,7 +1458,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
SYSCALL_DEFINE3(madvise, user_uintptr_t, start, size_t, len_in, int, behavior) { - return do_madvise(current->mm, start, len_in, behavior); + return do_madvise(current->mm, start, len_in, behavior, false); }
SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, @@ -1494,7 +1513,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
while (iov_iter_count(&iter)) { ret = do_madvise(mm, user_ptr_addr(iter_iov_addr(&iter)), - iter_iov_len(&iter), behavior); + iter_iov_len(&iter), behavior, false); if (ret < 0) break; iov_iter_advance(&iter, iter_iov_len(&iter));
Use the recently introduced PCuABI reservation interfaces to verify the address range for mlock, mlock2 and munlock syscalls.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mlock.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 40b43f8740df..43b6b8a8a0ff 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -25,6 +25,7 @@ #include <linux/memcontrol.h> #include <linux/mm_inline.h> #include <linux/secretmem.h> +#include <linux/cap_addr_mgmt.h>
#include "internal.h"
@@ -563,13 +564,18 @@ static int __mlock_posix_error_return(long retval) return retval; }
-static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) +static __must_check int do_mlock(user_uintptr_t user_start, size_t len, vm_flags_t flags) { unsigned long locked; unsigned long lock_limit; int error = -ENOMEM; + unsigned long start = untagged_addr((ptraddr_t)user_start);
- start = untagged_addr(start); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif
if (!can_do_mlock()) return -EPERM; @@ -584,6 +590,13 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla if (mmap_write_lock_killable(current->mm)) return -EINTR;
+#ifdef CONFIG_CHERI_PURECAP_UABI + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_mt_capability_bound_valid(¤t->mm->reserv_mt, user_start)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } +#endif locked += current->mm->locked_vm; if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) { /* @@ -610,12 +623,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla return 0; }
-SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(mlock, user_uintptr_t, start, size_t, len) { return do_mlock(start, len, VM_LOCKED); }
-SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(mlock2, user_uintptr_t, start, size_t, len, int, flags) { vm_flags_t vm_flags = VM_LOCKED;
@@ -628,17 +641,29 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) return do_mlock(start, len, vm_flags); }
-SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(munlock, user_uintptr_t, user_start, size_t, len) { int ret; + unsigned long start = untagged_addr((ptraddr_t)user_start);
- start = untagged_addr(start); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif
len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
if (mmap_write_lock_killable(current->mm)) return -EINTR; +#ifdef CONFIG_CHERI_PURECAP_UABI + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_mt_capability_bound_valid(¤t->mm->reserv_mt, user_start)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } +#endif ret = apply_vma_lock_flags(start, len, 0); mmap_write_unlock(current->mm);
Use the recently introduced PCuABI reservation interfaces to verify the address range for mincore syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mincore.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c index 3a307bfa91c4..70dc9af2cfb7 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -18,6 +18,7 @@ #include <linux/shmem_fs.h> #include <linux/hugetlb.h> #include <linux/pgtable.h> +#include <linux/cap_addr_mgmt.h>
#include <linux/uaccess.h> #include "swap.h" @@ -229,14 +230,19 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v * mapped * -EAGAIN - A kernel resource was temporarily unavailable. */ -SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, +SYSCALL_DEFINE3(mincore, user_uintptr_t, user_start, size_t, len, unsigned char __user *, vec) { long retval; unsigned long pages; unsigned char *tmp; + unsigned long start = untagged_addr((ptraddr_t)user_start);
- start = untagged_addr(start); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif
/* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) @@ -253,6 +259,14 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, if (!access_ok(vec, pages)) return -EFAULT;
+#ifdef CONFIG_CHERI_PURECAP_UABI + mmap_read_lock(current->mm); + /* Check if the range exists within the reservation with mmap lock. */ + retval = reserv_mt_capability_bound_valid(¤t->mm->reserv_mt, user_start); + mmap_read_unlock(current->mm); + if (!retval) + return -ERESERVATION; +#endif tmp = (void *) __get_free_page(GFP_USER); if (!tmp) return -EAGAIN;
Use the recently introduced PCuABI reservation interfaces to verify the address range for msync syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/msync.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/mm/msync.c b/mm/msync.c index ac4c9bfea2e7..840869a2d601 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -14,6 +14,7 @@ #include <linux/file.h> #include <linux/syscalls.h> #include <linux/sched.h> +#include <linux/cap_addr_mgmt.h>
/* * MS_SYNC syncs the entire file - including mappings. @@ -29,15 +30,20 @@ * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ -SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(msync, user_uintptr_t, user_start, size_t, len, int, flags) { unsigned long end; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; int unmapped_error = 0; int error = -EINVAL; + unsigned long start = untagged_addr((ptraddr_t)user_start);
- start = untagged_addr(start); +#ifdef CONFIG_CHERI_PURECAP_UABI + user_start = cheri_address_set(user_start, start); + if (!capability_owns_range(user_start, start, len)) + return -EINVAL; +#endif
if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; @@ -60,6 +66,13 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) * anyway and there is nothing left to do, so return immediately. */ mmap_read_lock(mm); +#ifdef CONFIG_CHERI_PURECAP_UABI + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_mt_capability_bound_valid(&mm->reserv_mt, user_start)) { + error = -ERESERVATION; + goto out_unlock; + } +#endif vma = find_vma(mm, start); for (;;) { struct file *file;
PCuABI specification introduces limitations in expanding the capability permissions through mprotect() system calls. This needs the capabilities to be initially created with maximum permissions, the memory mappings may possess in their lifetime.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- include/uapi/asm-generic/mman-common.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..e7ba511c2bad 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -17,6 +17,12 @@ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
+/* PCuABI mapping and capability permissions */ +#define _PROT_MAX_SHIFT 16 +#define PROT_MAX(prot) ((prot) << _PROT_MAX_SHIFT) +#define PROT_EXTRACT(prot) ((prot) & (PROT_READ | PROT_WRITE | PROT_EXEC)) +#define PROT_MAX_EXTRACT(prot) (((prot) >> _PROT_MAX_SHIFT) & (PROT_READ | PROT_WRITE | PROT_EXEC)) + /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */
Helper functions such as mapping_may_have_prot_flag(), capability_may_set_prot() and build_owning_capability() are added/modified to manage capability permissions in address space management syscalls as per PCuABI specifications.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- arch/arm64/include/asm/cap_addr_mgmt.h | 22 ++++++++++++ include/linux/cap_addr_mgmt.h | 35 +++++++++++++++++- lib/cap_addr_mgmt.c | 49 +++++++++++++++++++++++--- mm/mmap.c | 3 +- mm/mremap.c | 6 ++-- 5 files changed, 107 insertions(+), 8 deletions(-) create mode 100644 arch/arm64/include/asm/cap_addr_mgmt.h
diff --git a/arch/arm64/include/asm/cap_addr_mgmt.h b/arch/arm64/include/asm/cap_addr_mgmt.h new file mode 100644 index 000000000000..aadb4768d2fd --- /dev/null +++ b/arch/arm64/include/asm/cap_addr_mgmt.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_CAP_ADDR_MGMT_H +#define __ASM_CAP_ADDR_MGMT_H + +#include <linux/cheri.h> +#include <linux/mman.h> + +static __always_inline cheri_perms_t arch_map_to_cap_perm(int prot, bool has_tag_access) +{ + cheri_perms_t perms = 0; + + if ((prot & PROT_READ) && has_tag_access) + perms |= ARM_CAP_PERMISSION_MUTABLE_LOAD; + + if ((prot & PROT_EXEC) && + (cheri_perms_get(cheri_pcc_get()) & ARM_CAP_PERMISSION_EXECUTIVE)) + perms |= ARM_CAP_PERMISSION_EXECUTIVE; + + return perms; +} +#define arch_map_to_cap_perm arch_map_to_cap_perm +#endif /* __ASM_CAP_ADDR_MGMT_H */ diff --git a/include/linux/cap_addr_mgmt.h b/include/linux/cap_addr_mgmt.h index c8040871316b..3e2865bb95f7 100644 --- a/include/linux/cap_addr_mgmt.h +++ b/include/linux/cap_addr_mgmt.h @@ -8,6 +8,7 @@ #include <linux/types.h>
#ifdef CONFIG_CHERI_PURECAP_UABI +#include <asm/cap_addr_mgmt.h>
struct reserv_mt_entry { unsigned long reserv_start; @@ -124,10 +125,42 @@ bool capability_owns_range(uintcap_t cap, unsigned long addr, unsigned long len) * @addr: Requested capability address. * @len: Requested capability length. * @prot: Requested protection flags. + * @has_tag_access: capability needs tag related permissions. * * Return: A new capability derived from cheri_user_root_cap. */ -uintcap_t build_owning_capability(unsigned long addr, unsigned long len, int prot); +uintcap_t build_owning_capability(unsigned long addr, unsigned long len, int prot, + bool has_tag_access); + +/** + * mapping_may_have_prot_flag() - Verify if the mapping matches with the maximum + * protection flags if present otherwise with the normal protection flags. + * @prot: Complete(normal + maximum) protection flags. + * @map_val: Mapping flags to verify. + * + * Return: True if mapping flag matches with the protection flags or false + * otherwise. + */ +bool mapping_may_have_prot_flag(int prot, int map_val); + +/** + * capability_may_set_prot() - Verify if the mapping protection flags confirms + * with the capability permission flags. + * @cap: Capability value. + * @prot: Normal protection flags. + * + * Return: True if the capability permissions includes the protection flags + * or false otherwise. + */ +bool capability_may_set_prot(uintcap_t cap, int prot); + +#ifndef arch_map_to_cap_perm +static __always_inline cheri_perms_t arch_map_to_cap_perm(int prot, + bool has_tag_access) +{ + return 0; +} +#endif /* arch_map_to_cap_perm */
#endif /* CONFIG_CHERI_PURECAP_UABI */
diff --git a/lib/cap_addr_mgmt.c b/lib/cap_addr_mgmt.c index f6007a4e9c4e..f2e290696e73 100644 --- a/lib/cap_addr_mgmt.c +++ b/lib/cap_addr_mgmt.c @@ -189,14 +189,55 @@ bool capability_owns_range(uintcap_t cap, unsigned long addr, unsigned long len) align_len, CHERI_PERM_GLOBAL | CHERI_PERM_SW_VMEM); }
-uintcap_t build_owning_capability(unsigned long start, unsigned long len, int prot __maybe_unused) +bool mapping_may_have_prot_flag(int prot, int map_val) +{ + int prot_max = PROT_MAX_EXTRACT(prot); + + if (prot_max) + return !!(prot_max & map_val); + else + return !!(prot & map_val); +} + +bool capability_may_set_prot(uintcap_t cap, int prot) +{ + cheri_perms_t perms = cheri_perms_get(cap); + + if (((prot & PROT_READ) && !(perms & CHERI_PERM_LOAD)) || + ((prot & PROT_WRITE) && !(perms & CHERI_PERM_STORE)) || + ((prot & PROT_EXEC) && !(perms & CHERI_PERM_EXECUTE))) + return false; + + return true; +} + +uintcap_t build_owning_capability(unsigned long start, unsigned long len, int prot, + bool has_tag_access) { unsigned long align_start = round_down(start, PAGE_SIZE); unsigned long align_len = cheri_representable_length(round_up(len, PAGE_SIZE)); + cheri_perms_t perms = 0; + + if (mapping_may_have_prot_flag(prot, PROT_READ)) { + perms |= CHERI_PERM_LOAD; + if (has_tag_access) + perms |= CHERI_PERM_LOAD_CAP; + } + if (mapping_may_have_prot_flag(prot, PROT_WRITE)) { + perms |= CHERI_PERM_STORE; + if (has_tag_access) + perms |= (CHERI_PERM_STORE_CAP | CHERI_PERM_STORE_LOCAL_CAP); + } + if (mapping_may_have_prot_flag(prot, PROT_EXEC)) { + perms |= CHERI_PERM_EXECUTE; + if (cheri_perms_get(cheri_pcc_get()) & CHERI_PERM_SYSTEM_REGS) + perms |= CHERI_PERM_SYSTEM_REGS; + }
- /* TODO [PCuABI] - capability permission conversion from memory permission */ - cheri_perms_t perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | - CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + /* Fetch any extra architecture specific permissions */ + perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot, + has_tag_access); + perms |= CHERI_PERMS_ROOTCAP;
return (uintcap_t)cheri_build_user_cap(align_start, align_len, perms); } diff --git a/mm/mmap.c b/mm/mmap.c index 34880a7c3c30..771d99f965da 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1442,7 +1442,8 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, ret = reserv_mt_insert_entry(&mm->reserv_mt, addr, len, prot); if (ret) return ret; - user_addr = build_owning_capability(addr, len, prot); + user_addr = build_owning_capability(addr, len, prot, + (flags & MAP_PRIVATE) ? true : false); } else { user_addr = cheri_address_set(user_addr, addr); } diff --git a/mm/mremap.c b/mm/mremap.c index 2ed627c4c25d..57a21cf833a2 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -890,7 +890,8 @@ static user_uintptr_t mremap_to(user_uintptr_t user_addr, unsigned long old_len, if (IS_ERR_VALUE(ret)) reserv_mt_move_entry(&mm->reserv_mt, new_addr, new_len, addr, old_len, &old_perm); else - ret = build_owning_capability(new_addr, new_len, old_perm); + ret = build_owning_capability(new_addr, new_len, old_perm, + (map_flags & MAP_SHARED) ? false : true); #endif out: return ret; @@ -1162,7 +1163,8 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_addr, unsigned long, ol reserv_mt_move_entry(&mm->reserv_mt, new_addr, new_len, addr, old_len, &old_perm); else - ret = build_owning_capability(new_addr, new_len, old_perm); + ret = build_owning_capability(new_addr, new_len, old_perm, + (map_flags & MAP_SHARED) ? false : true); #endif } out:
The existing userspace may not use the maximum protection bits in the protection flags introduced by PCuABI and hence such applications may have inconsistency in the memory protection flag updated via mprotect() syscall with the capability permission bits.
Reduce the impact of such failures by setting the capability to maximum permission if no maximum protection bits are detected.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- lib/cap_addr_mgmt.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/lib/cap_addr_mgmt.c b/lib/cap_addr_mgmt.c index f2e290696e73..fef7791f5941 100644 --- a/lib/cap_addr_mgmt.c +++ b/lib/cap_addr_mgmt.c @@ -218,6 +218,12 @@ uintcap_t build_owning_capability(unsigned long start, unsigned long len, int pr unsigned long align_len = cheri_representable_length(round_up(len, PAGE_SIZE)); cheri_perms_t perms = 0;
+ if (!PROT_MAX_EXTRACT(prot)) { + perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP; + goto skip_calc_perm; + } + if (mapping_may_have_prot_flag(prot, PROT_READ)) { perms |= CHERI_PERM_LOAD; if (has_tag_access) @@ -238,6 +244,7 @@ uintcap_t build_owning_capability(unsigned long start, unsigned long len, int pr perms |= arch_map_to_cap_perm(PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot, has_tag_access); perms |= CHERI_PERMS_ROOTCAP; +skip_calc_perm:
return (uintcap_t)cheri_build_user_cap(align_start, align_len, perms); }
MAP_GROWSDOWN flag is not supported by PCuABI specification. Hence reject such requests with -EOPNOTSUPP error.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mmap.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index 771d99f965da..ad6991e4bb68 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1248,6 +1248,14 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, return -EINVAL;
#ifdef CONFIG_CHERI_PURECAP_UABI + /* + * Introduce checks for PCuABI: + * - MAP_GROWSDOWN flag do not have fixed bounds and hence not + * supported in PCuABI reservation model. + */ + if (flags & MAP_GROWSDOWN) + return -EOPNOTSUPP; + if (!reserve_ignore) { if (cheri_tag_get(user_addr)) { if (!capability_owns_range(user_addr, addr, len))
Add a check that the requested protection bits does not exceed the maximum protection bits.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index ad6991e4bb68..a081f2d11315 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1252,11 +1252,17 @@ user_uintptr_t do_mmap(struct file *file, user_uintptr_t user_addr, * Introduce checks for PCuABI: * - MAP_GROWSDOWN flag do not have fixed bounds and hence not * supported in PCuABI reservation model. + * - PCuABI reservation model introduces the concept of maximum + * protection the mappings can have. Add a check to make sure the + * requested protection does not exceed the maximum protection. */ if (flags & MAP_GROWSDOWN) return -EOPNOTSUPP;
if (!reserve_ignore) { + if ((PROT_MAX_EXTRACT(prot) != 0) && + ((PROT_EXTRACT(prot) & PROT_MAX_EXTRACT(prot)) != PROT_EXTRACT(prot))) + return -EINVAL; if (cheri_tag_get(user_addr)) { if (!capability_owns_range(user_addr, addr, len)) return -EINVAL;
Check that the permission of new user address does not exceed the permission of old user address for mremap syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mremap.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/mremap.c b/mm/mremap.c index 57a21cf833a2..e90d7698a3dd 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -989,6 +989,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_addr, unsigned long, ol if (cheri_tag_get(user_new_addr)) { if (!capability_owns_range(user_new_addr, new_addr, new_len)) return ret; + if ((cheri_perms_get(user_addr) | cheri_perms_get(user_new_addr)) + != cheri_perms_get(user_addr)) + return ret; } else { if (!cheri_is_null_derived(user_new_addr)) return ret;
Check that the requested permission matches the constraints of input user capability address for mprotect syscall.
Signed-off-by: Amit Daniel Kachhap amit.kachhap@arm.com --- mm/mprotect.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 68b190cbc493..eb9ccc9a1e8c 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -759,6 +759,8 @@ static int do_mprotect_pkey(user_uintptr_t user_start, size_t len, user_start = cheri_address_set(user_start, start); if (!capability_owns_range(user_start, start, len)) return -EINVAL; + if (!capability_may_set_prot(user_start, prot)) + return -EINVAL; #endif if (!arch_validate_prot(prot, start)) return -EINVAL;
linux-morello@op-lists.linaro.org