Hi,
This series adds reservation management and modifies the behaviour of address space management syscalls as per the PCuABI specification [1]. It also restricts the bounds and permissions of those initial capabilities that my previous series [2] couldn't take care of.
The series is largely based on Amit's v3 series [3] plus this follow-up [4] (squashed in the corresponding patch), with various additions and tweaks. The most important (user-facing) changes are the following:
* Owning capabilities are now always created based on the corresponding reservation's bounds and permissions, ensuring there is no mismatch (and simplifying mmap/mremap a little).
* Capability/reservation permissions are now calculated based on VM_{READ,WRITE}_CAPS instead of PROT_SHARED/VM_SHARED. This fixes the io_uring case, where we do allow capabilities to be stored in a shared mapping.
* A stack reservation is created, its size is controlled by a new cheri.max_stack_size sysctl as per the spec. The initial stack capabilities (CSP and AT_CHERI_STACK_CAP) are narrowed accordingly.
* PCuABI restrictions are added to shmdt() too.
* PROT_CAP_INVOKE is handled (adding BranchSealedPair).
* mmap/mremap/shmat now ensure that no existing reservation is overwritten if a null-derived pointer is passed with MAP_FIXED (if the new reservation would overlap with any existing one, -ERESERVATION is returned).
* The reservation lookup helper has been fixed to ensure that a reservation is found even if it starts before the targeted range.
Here is a rough breakdown of the patches:
* Patch 1: fixup for kselfests. * Patch 2-8: infrastructure, uapi additions * Patch 9-14: reservation management * Patch 15-22: capability handling in address space management syscalls * Patch 23-33: capability permissions handling * Patch 34: extra restriction for mmap() * Patch 35-36: restriction of initial capabilities
Having made the appropriate fixes to LTP and Musl, the usual LTP and Musl tests are passing, as well as the Morello kselftests with Chaitanya's extra tests [5].
Special thanks to Amit for his original work as well as his detailed review of this updated series, and to Chaitanya for writing those extra kselftests, which proved very useful to catch mistakes early.
Review branch:
https://git.morello-project.org/kbrodsky-arm/linux/-/tree/morello/reservatio...
Thanks, Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca... [2] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [3] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [4] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [5] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
Amit Daniel Kachhap (25): uapi: errno.h: Introduce PCuABI memory reservation error linux/sched/coredump.h: Add MMF_PCUABI_RESERV mm flag linux/user_ptr.h: Add a typedef user_ptr_perms_t linux/user_ptr.h: Add user_ptr_is_valid, user_ptr_set_addr linux/user_ptr.h: Add helpers to manage owning pointers mm/reserv: Add address space reservation API mm/mmap: Handle reservations in get_unmapped_area mm/(mmap,mremap): Handle PCuABI reservations during VMA operations fs/binfmt_elf: Create appropriate reservations in PCuABI mm/mmap: Add PCuABI capability handling in mmap/munmap mm/mremap: Add PCuABI capability handling in mremap mm/mprotect: Add PCuABI capability handling in mprotect mm/madvise: Add PCuABI capability handling in madvise mm/mlock: Add PCuABI capability handling in mlock{,2} and munlock mm/msync: Add PCuABI capability handling in msync mm/mincore: Add PCuABI capability constraints ipc/shm: Add PCuABI capability handling in shmat/shmdt uapi: mman-common.h: Macros for maximum capability permissions arm64: user_ptr: Implement Morello capability permission helper linux/user_ptr.h: Infer capability permissions from prot/vm_flags in PCuABI mm/mmap: Add capability permission constraints for PCuABI mm/mremap: Add capability permission constraints for PCuABI mm/mprotect: Add capability permissions constraints for PCuABI mm/mmap: Disable MAP_GROWSDOWN mapping flag for PCuABI arm64: vdso: Create appropriate capability
Kevin Brodsky (11): kselftests/arm64: morello: Fix expected permissions with MAP_SHARED linux/mm_types.h: Introduce reserv_struct fs/exec: Create a stack reservation in PCuABI arm64: vdso: Create appropriate reservation fs/binfmt_elf: Enable reservations fs/binfmt_elf: Set appropriate permissions for initial reservations arm64: morello: Ensure appropriate permissions for initial reservations uapi: mm: Introduce PROT_CAP_INVOKE arm64: user_ptr: Handle PROT_CAP_INVOKE fs/binfmt_elf: Create mappings with PROT_CAP_INVOKE fs/binfmt_elf: Restrict stack capability bounds
Documentation/core-api/user_ptr.rst | 28 ++ arch/Kconfig | 3 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/elf.h | 9 +- arch/arm64/include/asm/mmu.h | 2 +- arch/arm64/include/asm/user_ptr.h | 37 +++ arch/arm64/kernel/morello.c | 16 + arch/arm64/kernel/vdso.c | 37 ++- fs/binfmt_elf.c | 63 ++-- fs/exec.c | 59 ++++ include/linux/mm.h | 15 +- include/linux/mm_reserv.h | 302 +++++++++++++++++++ include/linux/mm_types.h | 9 + include/linux/sched/coredump.h | 2 + include/linux/shm.h | 4 +- include/linux/user_ptr.h | 114 ++++++- include/uapi/asm-generic/errno.h | 2 + include/uapi/asm-generic/mman-common.h | 8 + io_uring/advise.c | 3 +- ipc/shm.c | 44 ++- kernel/fork.c | 3 + lib/user_ptr.c | 73 +++++ mm/Makefile | 1 + mm/damon/vaddr.c | 2 +- mm/internal.h | 2 +- mm/madvise.c | 26 +- mm/mincore.c | 54 +++- mm/mlock.c | 36 ++- mm/mmap.c | 182 +++++++++-- mm/mprotect.c | 25 +- mm/mremap.c | 96 ++++-- mm/msync.c | 12 +- mm/reserv.c | 181 +++++++++++ mm/util.c | 9 +- tools/testing/selftests/arm64/morello/mmap.c | 2 +- 35 files changed, 1312 insertions(+), 150 deletions(-) create mode 100644 arch/arm64/include/asm/user_ptr.h create mode 100644 include/linux/mm_reserv.h create mode 100644 mm/reserv.c
mmap(..., MAP_SHARED) creates a mapping without tag access, and returns a capability that cannot load/store capabilities. Adjust the expected permissions in the syscall_mmap2 test accordingly.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- tools/testing/selftests/arm64/morello/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/morello/mmap.c b/tools/testing/selftests/arm64/morello/mmap.c index 618eb221d0ae..72f86736512a 100644 --- a/tools/testing/selftests/arm64/morello/mmap.c +++ b/tools/testing/selftests/arm64/morello/mmap.c @@ -81,7 +81,7 @@ void syscall_mmap2(void) ASSERT_EQ(retval, (int)msg_len);
addr = mmap_verified(NULL, MMAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, - 0, CAP_LOAD_PERMS | CAP_STORE_PERMS); + 0, CHERI_PERM_LOAD | CHERI_PERM_STORE);
EXPECT_NE(addr, NULL) goto clean_up;
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
The PCuABI specification introduces this error, it is used to indicate errors related to memory reservations.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/uapi/asm-generic/errno.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h index cf9c51ac49f9..4589a3165fe1 100644 --- a/include/uapi/asm-generic/errno.h +++ b/include/uapi/asm-generic/errno.h @@ -120,4 +120,6 @@
#define EHWPOISON 133 /* Memory page has hardware error */
+#define ERESERVATION 192 /* PCuABI memory reservation error */ + #endif
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
The PCuABI specification introduces memory reservations. Add a flag MMF_PCUABI_RESERV to indicate that memory reservations are in use for this mm. This will help to differentiate PCuABI and compat64 processes.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/sched/coredump.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 02f5090ffea2..87b686ae8b0c 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -92,6 +92,8 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_VM_MERGE_ANY 30 #define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
+#define MMF_PCUABI_RESERV 31 /* PCuABI memory reservation feature */ + #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ MMF_VM_MERGE_ANY_MASK)
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
A typedef user_ptr_perms_t is created of type cheri_perms_t in case of PCuABI. Otherwise, this defaults to int in non-PCuABI case. This will allow manipulating permissions unconditionally (without #ifdef).
Note: this change will cause linux/cheri.h to get included everywhere as linux/kernel.h includes linux/user_ptr.h. This is unfortunate, but it seems difficult to avoid in this case.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/user_ptr.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 825c703dd44c..997e6cfa95e2 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -2,6 +2,7 @@ #ifndef _LINUX_USER_PTR_H #define _LINUX_USER_PTR_H
+#include <linux/cheri.h> #include <linux/limits.h> #include <linux/typecheck.h>
@@ -27,6 +28,8 @@
#ifdef CONFIG_CHERI_PURECAP_UABI
+typedef cheri_perms_t user_ptr_perms_t; + /** * uaddr_to_user_ptr() - Convert a user-provided address to a user pointer. * @addr: The address to set the pointer to. @@ -109,6 +112,8 @@ bool check_user_ptr_rw(void __user *ptr, size_t len);
#else /* CONFIG_CHERI_PURECAP_UABI */
+typedef int user_ptr_perms_t; + static inline void __user *uaddr_to_user_ptr(ptraddr_t addr) { return as_user_ptr(addr);
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Add user_ptr_is_valid() and user_ptr_set_addr() helpers to operate on user pointers in different situations in subsequent patches.
* user_ptr_is_valid() returns the tag value in PCuABI.
* user_ptr_set_addr() sets the address field of the user pointer.
Both of the above helpers use CHERI compiler builtins for PCuABI case.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- Documentation/core-api/user_ptr.rst | 12 +++++++++++ include/linux/user_ptr.h | 33 ++++++++++++++++++++++++++++- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/Documentation/core-api/user_ptr.rst b/Documentation/core-api/user_ptr.rst index 6ac493f7e461..d314bc215c65 100644 --- a/Documentation/core-api/user_ptr.rst +++ b/Documentation/core-api/user_ptr.rst @@ -212,6 +212,12 @@ should always be used when error codes are stored in user pointers. Operating on user pointers ==========================
+Validity +---------- + +``user_ptr_is_valid(p)`` (``<linux/user_ptr.h>``) can be used to check +whether a user pointer is valid, + Comparison ----------
@@ -250,6 +256,12 @@ This can be done using the ``check_user_ptr_*()`` functions, see Note that these functions are no-ops (always succeed) when PCuABI is not selected, as there is no user pointer metadata to check in that case.
+Setting the address +------------------- + +``user_ptr_set_addr(p, a)`` (``<linux/user_ptr.h>``) can be used to set +the address of a user pointer. + Bounds ------
diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 997e6cfa95e2..3b0bc117fcfb 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -207,7 +207,22 @@ static inline ptraddr_t user_ptr_limit(const void __user *ptr) }
/** - * user_ptr_is_same() - Checks where two user pointers are exactly the same. + * user_ptr_is_valid() - Check if a user pointer is valid. + * @ptr: The user pointer to check. + * + * Return: true if @ptr is valid (tag set in PCuABI). + */ +static inline bool user_ptr_is_valid(const void __user *ptr) +{ +#ifdef CONFIG_CHERI_PURECAP_UABI + return __builtin_cheri_tag_get(ptr); +#else + return 0; +#endif +} + +/** + * user_ptr_is_same() - Check whether two user pointers are exactly the same. * @p1: The first user pointer to check. * @p2: The second user pointer to check. * @@ -226,6 +241,22 @@ static inline bool user_ptr_is_same(const void __user *p1, const void __user *p2 #endif }
+/** + * user_ptr_set_addr() - Set the address of the user pointer. + * @ptr: The user pointer to set the address of. + * @addr: The address to set the pointer to. + * + * Return: A user pointer with its address set to @addr. + */ +static inline void __user *user_ptr_set_addr(void __user *ptr, ptraddr_t addr) +{ +#ifdef CONFIG_CHERI_PURECAP_UABI + return __builtin_cheri_address_set(ptr, addr); +#else + return as_user_ptr(addr); +#endif +} + /** * user_ptr_set_bounds() - Set the lower and upper bounds of a user pointer. * @ptr: The input user pointer.
Add reservation information to mm_struct in PCuABI. This will be used by subsequent patches to manage address space reservations.
Co-developed-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/mm_types.h | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 774bd7d6ad60..25cbbe18f5b8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -607,6 +607,12 @@ struct vma_numab_state { int prev_scan_seq; };
+struct reserv_struct { + ptraddr_t start; + size_t len; + user_ptr_perms_t perms; +}; + /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory @@ -711,6 +717,9 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI + struct reserv_struct reserv_data; +#endif } __randomize_layout;
#ifdef CONFIG_NUMA
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Helper functions check_user_ptr_owning(), make_user_ptr_owning() and user_ptr_owning_perms_from_prot() are added to manage owning capabilities as per the PCuABI specification. These helpers will be mostly used by memory management syscalls to apply the different capability constraints.
* check_user_ptr_owning() checks if the capability owns the input range after page aligning them and has the CHERI_PERM_SW_VMEM permission bit set.
* make_user_ptr_owning() creates the relevant owning capability from the input reservation.
Both of these functions are implemented on top of the cheri_* helpers in linux/cheri.h.
* user_ptr_owning_perms_from_prot() converts memory mapping protection flags and vm_flags to capability permissions.
Note: These helper functions currently check only capability bounds and not capability permission constraints, support for which will be added in a subsequent patch.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- Documentation/core-api/user_ptr.rst | 15 ++++++++ include/linux/user_ptr.h | 60 +++++++++++++++++++++++++++++ lib/user_ptr.c | 33 ++++++++++++++++ 3 files changed, 108 insertions(+)
diff --git a/Documentation/core-api/user_ptr.rst b/Documentation/core-api/user_ptr.rst index d314bc215c65..0632bc9f4e8b 100644 --- a/Documentation/core-api/user_ptr.rst +++ b/Documentation/core-api/user_ptr.rst @@ -348,3 +348,18 @@ accidentally providing capabilities to userspace in PCuABI. | routines suffixed with ``with_captags``. See ``<linux/uaccess.h>`` | | for details. | +-----------------------------------------------------------------------+ + +Managing user pointers by mm subsystem +====================================== + +The user pointers created by the Linux mm subsystem are referred to as +owning capabilities in PCuABI and have the owning permission bit +CHERI_PERM_SW_VMEM set. CHERI bounds representability is also considered for +user pointer bounds. The APIs below consider those requirements while +creating and checking user pointers. + +* ``check_user_ptr_owning(ptr, len)`` +* ``make_user_ptr_owning(reserv, len)`` +* ``user_ptr_owning_perms_from_prot(prot, vm_flags)`` + +See ``<linux/user_ptr.h>`` for details on how to use them. diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index 3b0bc117fcfb..a48a294b434e 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -6,6 +6,8 @@ #include <linux/limits.h> #include <linux/typecheck.h>
+struct reserv_struct; + /** * as_user_ptr() - Convert an arbitrary integer value to a user pointer. * @x: The integer value to convert. @@ -110,6 +112,46 @@ bool check_user_ptr_read(const void __user *ptr, size_t len); bool check_user_ptr_write(void __user *ptr, size_t len); bool check_user_ptr_rw(void __user *ptr, size_t len);
+/** + * check_user_ptr_owning() - Check whether a user pointer owns a memory region. + * @user_ptr: The pointer to check. + * @len: The size of the region to check. + * + * Checks whether @ptr owns the memory region starting at the address of @ptr + * and of size @len. + * + * Return: true if @ptr passes the check. + */ +bool check_user_ptr_owning(user_uintptr_t user_ptr, size_t len); + +/** + * make_user_ptr_owning() - Create a user pointer owning the specified + * reservation. + * + * @reserv: Reservation information. + * @addr: Address to set the user pointer to. + * + * Return: The constructed user pointer. + * + * The bounds of the returned user pointer are set (exactly) to the bounds of + * @reserv, and so are its permissions. + */ +user_uintptr_t make_user_ptr_owning(const struct reserv_struct *reserv, + ptraddr_t addr); + +/** + * user_ptr_owning_perms_from_prot() - Calculate capability permissions from + * prot flags and vm_flags. + * @prot: Memory protection flags. + * @vm_flags: vm_flags of the underlying VMA. + * + * Return: Calculated capability permission flags. + * + * Note: unsigned long is used instead of vm_flags_t as linux/mm_types.h cannot + * be included here. + */ +user_ptr_perms_t user_ptr_owning_perms_from_prot(int prot, unsigned long vm_flags); + #else /* CONFIG_CHERI_PURECAP_UABI */
typedef int user_ptr_perms_t; @@ -150,6 +192,24 @@ static inline bool check_user_ptr_rw(void __user *ptr, size_t len) return true; }
+static inline bool check_user_ptr_owning(user_uintptr_t user_ptr, size_t len) + +{ + return true; +} + +static inline user_uintptr_t make_user_ptr_owning(const struct reserv_struct *reserv, + ptraddr_t addr) +{ + return addr; +} + +static inline user_ptr_perms_t user_ptr_owning_perms_from_prot(int prot, + unsigned long vm_flags) +{ + return 0; +} + #endif /* CONFIG_CHERI_PURECAP_UABI */
/** diff --git a/lib/user_ptr.c b/lib/user_ptr.c index 115efc9fe678..6f7fc9111b1d 100644 --- a/lib/user_ptr.c +++ b/lib/user_ptr.c @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ #include <linux/bug.h> #include <linux/cheri.h> +#include <linux/mm_types.h> #include <linux/user_ptr.h>
void __user *uaddr_to_user_ptr(ptraddr_t addr) @@ -70,3 +71,35 @@ bool check_user_ptr_rw(void __user *ptr, size_t len) { return cheri_check_cap(ptr, len, CHERI_PERM_LOAD | CHERI_PERM_STORE); } + +bool check_user_ptr_owning(user_uintptr_t user_ptr, size_t len) +{ + ptraddr_t addr; + + addr = round_down((ptraddr_t)user_ptr, PAGE_SIZE); + len = round_up(len, PAGE_SIZE); + user_ptr = cheri_address_set(user_ptr, addr); + + return cheri_check_cap((void * __capability)user_ptr, len, + CHERI_PERMS_ROOTCAP); +} + +user_uintptr_t make_user_ptr_owning(const struct reserv_struct *reserv, + ptraddr_t addr) +{ + user_uintptr_t user_ptr; + + user_ptr = (user_uintptr_t)cheri_build_user_cap(reserv->start, + reserv->len, + reserv->perms); + user_ptr = cheri_address_set(user_ptr, addr); + + return user_ptr; +} + +user_ptr_perms_t user_ptr_owning_perms_from_prot(int prot, unsigned long vm_flags) +{ + /* TODO [PCuABI] - capability permission conversion from memory permission */ + return (CHERI_PERMS_READ | CHERI_PERMS_WRITE | + CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP); +}
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
This patch introduces helpers to manage address space reservations and owning capabilities, as introduced in the PCuABI specification. This interface prevents two unrelated owning capabilities created by the kernel from overlapping.
The reservation interface stores virtual address ranges as reservation entries, which is the same as the bounds of the owning capability provided by the kernel to userspace. It also stores the owning capability permissions to manage the future syscall requests for updating permissions.
The reservation interface follows a few basic rules:
- Reservations can only be created or destroyed but never expanded or shrunk. Reservations are created when new memory mapping is made outside of an existing reservation. - A single reservation can have many mappings. However, unused regions of the reservation cannot be reused again. - The Reservation start address is aligned to CHERI representable base. - The Reservation length value is aligned to CHERI representable length.
More rules about the address space reservation interface can be found in the PCuABI specification.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/mm_reserv.h | 302 ++++++++++++++++++++++++++++++++++++++ mm/Makefile | 1 + mm/reserv.c | 181 +++++++++++++++++++++++ 3 files changed, 484 insertions(+) create mode 100644 include/linux/mm_reserv.h create mode 100644 mm/reserv.c
diff --git a/include/linux/mm_reserv.h b/include/linux/mm_reserv.h new file mode 100644 index 000000000000..2debbcbf7495 --- /dev/null +++ b/include/linux/mm_reserv.h @@ -0,0 +1,302 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_MM_RESERV_H +#define _LINUX_MM_RESERV_H + +#include <linux/cheri.h> +#include <linux/mm_types.h> +#include <linux/sched/coredump.h> +#include <linux/types.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI +#define reserv_representable_alignment(len) \ + (reserv_is_supported(current->mm) \ + ? (PAGE_MASK & ~cheri_representable_alignment_mask(len)) : 0) + +#define reserv_representable_base(base, len) \ + (reserv_is_supported(current->mm) \ + ? ((base) & cheri_representable_alignment_mask(len)) : (base)) + +#define reserv_representable_length(len) \ + (reserv_is_supported(current->mm) \ + ? cheri_representable_length(len) : (len)) + +#define reserv_vma_reserv_start(vma) \ + (reserv_is_supported((vma)->vm_mm) \ + ? (vma)->reserv_data.start : (vma)->vm_start) + +#define reserv_vma_reserv_len(vma) \ + (reserv_is_supported((vma)->vm_mm) \ + ? (vma)->reserv_data.len : ((vma)->vm_end - (vma)->vm_start)) + +/** + * reserv_vma_set_reserv() - Set the reservation information in the VMA. + * @vma: Target VMA. + * @start: Reservation start address. + * @len: Reservation length. + * @prot: prot flags to calculate the reservation permissions. + * + * Return: 0 if reservation information set successfully or negative errorcode + * otherwise. + * + * The start address is stored as CHERI representable base and the length as + * CHERI representable length. They are expected to not overlap with any other + * VMA. This function should be called with mmap_lock held. + */ +int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, int prot); + +/** + * reserv_vma_set_reserv_start_len() - Set the reservation information in the VMA. + * @vma: Target VMA. + * @start: Reservation start address. + * @len: Reservation length. + * + * Return: 0 if reservation information set successfully or negative errorcode + * otherwise. + * + * The start address is stored as CHERI representable base and the length as + * CHERI representable length. They are expected to not overlap with any other + * VMA. The reservation permissions are left unchanged. This function should + * be called with mmap_lock held. + */ +int reserv_vma_set_reserv_start_len(struct vm_area_struct *vma, ptraddr_t start, + size_t len); + +/** + * reserv_vma_set_reserv_data() - Set the reservation information in the VMA. + * @vma: Target VMA. + * @reserv_data: New reservation information + * + * The VMA's reservation information is set to the contents of @reserv_data. + * This function should be called with mmap_lock held. + */ +void reserv_vma_set_reserv_data(struct vm_area_struct *vma, + const struct reserv_struct *reserv_data); + +/** + * reserv_find_reserv_info_range() - Find a reservation spanning at least the + * input address range. + * @start: Region start address. + * @len: Region length. + * @locked: Flag to indicate if mmap_lock is already held. + * @reserv_info: Pointer to a reserv_struct to set if a matching reservation is + * found. + * + * Return: True if a matching reservation is found or false otherwise. + * + * This function internally uses mmap_lock to access VMAs if mmap_lock is not + * already held. + + */ +bool reserv_find_reserv_info_range(ptraddr_t start, size_t len, bool locked, + struct reserv_struct *reserv_info); + +/** + * reserv_vma_range_within_reserv() - Check that the input address range falls + * within @vma's reservation. + * @vma: Target VMA. + * @start: Region start address. + * @len: Region length. + * + * Return: True if the input address range falls within the reserved virtual + * address range or false otherwise. + * + * This function should be called with mmap_lock held. + */ +bool reserv_vma_range_within_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len); + +/** + * reserv_cap_within_reserv() - Check that the capability bounds of @cap + * are wholly contained within an existing reservation. + * @cap: Capability to check. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: True if the input capability bounds fall within a reservation or + * false otherwise. + * + * This function internally uses mmap_lock to access VMAs if mmap_lock is not + * already held. + */ +bool reserv_cap_within_reserv(user_uintptr_t cap, bool locked); + +/** + * reserv_aligned_range_within_reserv() - Check that the input address range falls + * within any reservation. + * @start: Region start address. + * @len: Region length. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: True if the input address range (aligned for representability) falls + * within a reservation or false otherwise. + * + * @start and @len are appropriately aligned down/up so that the range that is + * checked corresponds to that of a new reservation. This function should be + * called with mmap_lock held. + */ +bool reserv_aligned_range_within_reserv(ptraddr_t start, size_t len, + bool locked); + +/** + * reserv_range_mapped() - Check that the input address range is fully mapped. + * @start: Region start address. + * @len: Region length. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: 0 if the range is fully mapped or negative errorcode otherwise. + * + * This is useful to find if the requested range is fully mapped without + * fragmentation. This function internally uses mmap_lock to access VMAs if + * mmap_lock is not already held. + */ +int reserv_range_mapped(ptraddr_t start, size_t len, bool locked); + +/** + * reserv_make_user_ptr_owning() - Build an owning user pointer for a given + * reservation. + * @vma_addr: VMA address. + * @locked: Flag to indicate if mmap_lock is already held. + * + * Return: the constructed user pointer. + * + * @vma_addr must be the address of an existing VMA, whose reservation + * information will be used to set the user pointer's bounds and permissions. + * Its address will be set to @vma_addr. This function internally uses + * mmap_lock to access VMAs if mmap_lock is not already held. + */ +user_uintptr_t reserv_make_user_ptr_owning(ptraddr_t vma_addr, bool locked); + +/** + * reserv_vma_make_user_ptr_owning() - Build an owning user pointer for a given + * reservation. + * @vma: Target VMA. + * + * Return: the constructed user pointer. + * + * @vma's reservation information will be used to set the user + * pointer's bounds and permissions. Its address will be set to @vma's start + * address. This function should be called with mmap_lock held. + */ +user_uintptr_t reserv_vma_make_user_ptr_owning(struct vm_area_struct *vma); + +/** + * reserv_is_supported() - Check if reservations are enabled for the given mm. + * + * @mm: The mm pointer. + * + * Return: True if mm has reservations enabled or false otherwise. + */ +static inline bool reserv_is_supported(struct mm_struct *mm) +{ + return test_bit(MMF_PCUABI_RESERV, &mm->flags); +} + +/** + * reserv_mm_set_flag() - Set the MMF_PCUABI_RESERV flag according to @compat. + * + * @mm: mm pointer. + * @compat: Flag indicating if the current task is compat. + */ +static inline void reserv_mm_set_flag(struct mm_struct *mm, bool compat) +{ + if (compat) + clear_bit(MMF_PCUABI_RESERV, &mm->flags); + else + set_bit(MMF_PCUABI_RESERV, &mm->flags); +} + +/** + * reserv_fork() - Copy the MMF_PCUABI_RESERV flag from @oldmm to @mm. + * + * @mm: New mm pointer. + * @oldmm: Old mm pointer. + */ +static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) +{ + if (test_bit(MMF_PCUABI_RESERV, &oldmm->flags)) + set_bit(MMF_PCUABI_RESERV, &mm->flags); +} + +#else /* CONFIG_CHERI_PURECAP_UABI */ + +#define reserv_representable_alignment(len) 0 + +#define reserv_representable_base(base, len) base + +#define reserv_representable_length(len) len + +#define reserv_vma_reserv_start(vma) vma->vm_start + +#define reserv_vma_reserv_len(vma) (vma->vm_end - vma->vm_start) + +static inline int reserv_vma_set_reserv(struct vm_area_struct *vma, + ptraddr_t start, size_t len, int prot) +{ + return 0; +} + +static inline int reserv_vma_set_reserv_start_len(struct vm_area_struct *vma, + ptraddr_t start, size_t len) +{ + return 0; +} + +static inline void reserv_vma_set_reserv_data(struct vm_area_struct *vma, + const struct reserv_struct *reserv_data) +{} + +static inline bool reserv_find_reserv_info_range(ptraddr_t start, + size_t len, bool locked, + struct reserv_struct *reserv_info) +{ + return true; +} + +static inline bool reserv_vma_range_within_reserv(struct vm_area_struct *vma, + ptraddr_t start, + size_t len) +{ + return true; +} + +static inline bool reserv_cap_within_reserv(user_uintptr_t cap, bool locked) +{ + return true; +} + +static inline bool reserv_aligned_range_within_reserv(ptraddr_t start, + size_t len, + bool locked) +{ + return true; +} + +static inline int reserv_range_mapped(ptraddr_t start, size_t len, bool locked) +{ + return 0; +} + +static inline user_uintptr_t reserv_make_user_ptr_owning(ptraddr_t vma_addr, + bool locked) +{ + return vma_addr; +} + +static inline user_uintptr_t reserv_vma_make_user_ptr_owning(struct vm_area_struct *vma) +{ + return vma->vm_start; +} + +static inline bool reserv_is_supported(struct mm_struct *mm) +{ + return false; +} + +static inline void reserv_mm_set_flag(struct mm_struct *mm, bool compat) {} + +static inline void reserv_fork(struct mm_struct *mm, struct mm_struct *oldmm) {} + +#endif /* CONFIG_CHERI_PURECAP_UABI */ + +#endif /* _LINUX_MM_RESERV_H */ diff --git a/mm/Makefile b/mm/Makefile index 33873c8aedb3..94a7ab7057f0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -41,6 +41,7 @@ mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ msync.o page_vma_mapped.o pagewalk.o \ pgtable-generic.o rmap.o vmalloc.o
+mmu-$(CONFIG_CHERI_PURECAP_UABI) += reserv.o
ifdef CONFIG_CROSS_MEMORY_ATTACH mmu-$(CONFIG_MMU) += process_vm_access.o diff --git a/mm/reserv.c b/mm/reserv.c new file mode 100644 index 000000000000..de5a3095a863 --- /dev/null +++ b/mm/reserv.c @@ -0,0 +1,181 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/bug.h> +#include <linux/mm_reserv.h> +#include <linux/mm.h> +#include <linux/slab.h> + +int reserv_vma_set_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len, int prot) +{ + if (!reserv_is_supported(vma->vm_mm)) + return 0; + if (start + len < start) + return -EINVAL; + /* Reservation base/length is expected as page aligned */ + VM_BUG_ON(start & ~PAGE_MASK || len % PAGE_SIZE); + + vma->reserv_data.start = start & cheri_representable_alignment_mask(len); + vma->reserv_data.len = cheri_representable_length(len); + vma->reserv_data.perms = user_ptr_owning_perms_from_prot(prot, + vma->vm_flags); + + return 0; +} + +int reserv_vma_set_reserv_start_len(struct vm_area_struct *vma, ptraddr_t start, + size_t len) +{ + if (!reserv_is_supported(vma->vm_mm)) + return 0; + if (start + len < start) + return -EINVAL; + /* Reservation base/length is expected as page aligned */ + VM_BUG_ON(start & ~PAGE_MASK || len % PAGE_SIZE); + + vma->reserv_data.start = start & cheri_representable_alignment_mask(len); + vma->reserv_data.len = cheri_representable_length(len); + + return 0; +} + +void reserv_vma_set_reserv_data(struct vm_area_struct *vma, + const struct reserv_struct *reserv_data) +{ + if (!reserv_is_supported(vma->vm_mm)) + return; + + vma->reserv_data = *reserv_data; +} + +bool reserv_find_reserv_info_range(ptraddr_t start, size_t len, + bool locked, struct reserv_struct *reserv_info) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *next, *prev; + struct reserv_struct *info = NULL; + + if (!reserv_is_supported(mm)) + return true; + if (!locked && mmap_read_lock_killable(mm)) + return false; + + next = find_vma_prev(mm, start, &prev); + + if (next && reserv_vma_range_within_reserv(next, start, len)) + info = &next->reserv_data; + else if (prev && reserv_vma_range_within_reserv(prev, start, len)) + info = &prev->reserv_data; + + if (info && reserv_info) + *reserv_info = *info; + + if (!locked) + mmap_read_unlock(mm); + + return !!info; +} + +bool reserv_vma_range_within_reserv(struct vm_area_struct *vma, ptraddr_t start, + size_t len) +{ + if (!reserv_is_supported(vma->vm_mm)) + return true; + + /* Check if there is match with the existing reservations */ + return vma->reserv_data.start <= start && + vma->reserv_data.start + vma->reserv_data.len >= start + len; +} + +bool reserv_cap_within_reserv(user_uintptr_t cap, bool locked) +{ + return reserv_find_reserv_info_range(cheri_base_get(cap), + cheri_length_get(cap), + locked, NULL); +} + +bool reserv_aligned_range_within_reserv(ptraddr_t start, size_t len, + bool locked) +{ + ptraddr_t aligned_start = start & cheri_representable_alignment_mask(len); + size_t aligned_len = cheri_representable_length(len); + + if (start + len < start) + return false; + + return reserv_find_reserv_info_range(aligned_start, aligned_len, + locked, NULL); +} + +int reserv_range_mapped(ptraddr_t start, size_t len, bool locked) +{ + struct vm_area_struct *vma, *last_vma = NULL; + struct mm_struct *mm = current->mm; + ptraddr_t end = start + len - 1; + int ret = -ENOMEM; + VMA_ITERATOR(vmi, mm, 0); + + if (!reserv_is_supported(mm)) + return 0; + if (!locked && mmap_read_lock_killable(mm)) + return -EINTR; + + start = untagged_addr(start); + start = round_down(start, PAGE_SIZE); + len = round_up(len, PAGE_SIZE); + vma_iter_set(&vmi, start); + /* Try walking the given range */ + do { + vma = mas_find(&vmi.mas, end); + if (vma) { + /* The new and old vma should be continuous */ + if (last_vma && last_vma->vm_end != vma->vm_start) + goto out; + /* End range is within the vma so return success */ + if (end < vma->vm_end) { + ret = 0; + goto out; + } + last_vma = vma; + } + } while (vma); +out: + if (!locked) + mmap_read_unlock(mm); + return ret; +} + +user_uintptr_t reserv_make_user_ptr_owning(ptraddr_t vma_addr, bool locked) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + struct reserv_struct reserv; + + if (!reserv_is_supported(mm)) + return vma_addr; + if (!locked && mmap_read_lock_killable(mm)) + return vma_addr; + + vma = find_vma(mm, vma_addr); + + if (WARN_ON(!vma || vma->vm_start != vma_addr)) { + if (!locked) + mmap_read_unlock(mm); + return vma_addr; + } + + reserv = vma->reserv_data; + + if (!locked) + mmap_read_unlock(mm); + + return make_user_ptr_owning(&reserv, vma_addr); +} + +user_uintptr_t reserv_vma_make_user_ptr_owning(struct vm_area_struct *vma) +{ + if (!reserv_is_supported(vma->vm_mm)) + return vma->vm_start; + + return make_user_ptr_owning(&vma->reserv_data, vma->vm_start); +}
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
In CHERI architectures, all ranges cannot be represented as capability bounds so add the necessary CHERI base and length alignment checks when generating free unmapped virtual address ranges or evaluating a fixed range.
The PCuABI reservation interface stores the unusable alignment gaps at the start and end. These gaps should be considered when finding the free unmapped address space.
In the MAP_FIXED case, the requested address range should completely reside within the reservation range or not overlap with any existing reservation range.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/mm.h | 5 ++-- mm/mmap.c | 72 ++++++++++++++++++++++++++++++++++------------ 2 files changed, 56 insertions(+), 21 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index ce2501062292..efc17977a31e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -30,6 +30,7 @@ #include <linux/kasan.h> #include <linux/memremap.h> #include <linux/slab.h> +#include <linux/mm_reserv.h>
struct mempolicy; struct anon_vma; @@ -3470,7 +3471,7 @@ static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long gap = stack_guard_start_gap(vma); - unsigned long vm_start = vma->vm_start; + unsigned long vm_start = reserv_vma_reserv_start(vma);
vm_start -= gap; if (vm_start > vma->vm_start) @@ -3480,7 +3481,7 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma)
static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { - unsigned long vm_end = vma->vm_end; + unsigned long vm_end = reserv_vma_reserv_start(vma) + reserv_vma_reserv_len(vma);
if (vma->vm_flags & VM_GROWSUP) { vm_end += stack_guard_gap; diff --git a/mm/mmap.c b/mm/mmap.c index bec26ad4fdb0..6ae675961785 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,7 @@ #include <linux/sched/mm.h> #include <linux/ksm.h>
+#include <linux/mm_reserv.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> #include <asm/tlb.h> @@ -1655,7 +1656,7 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) } else { tmp = mas_prev(&mas, 0); if (tmp && vm_end_gap(tmp) > gap) { - high_limit = tmp->vm_start; + high_limit = reserv_vma_reserv_start(tmp); mas_reset(&mas); goto retry; } @@ -1706,27 +1707,43 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, struct vm_area_struct *vma, *prev; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long aligned_len = reserv_representable_length(len);
- if (len > mmap_end - mmap_min_addr) + if (aligned_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED) + /* + * If MAP_FIXED is passed in the reservation case, the aligned range + * should be either completely contained inside an existing + * reservation, or completely outside (new reservation). + * Let this scenario fallthrough for the corresponding checks below. + */ + if ((flags & MAP_FIXED) && !reserv_is_supported(mm)) return addr;
- if (addr) { + if (addr || (flags & MAP_FIXED)) { + unsigned long aligned_addr; + addr = PAGE_ALIGN(addr); - vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && - (!prev || addr >= vm_end_gap(prev))) + aligned_addr = reserv_representable_base(addr, len); + vma = find_vma_prev(mm, aligned_addr, &prev); + if (mmap_end - aligned_len >= aligned_addr && aligned_addr >= mmap_min_addr && + (!vma || aligned_addr + aligned_len <= vm_start_gap(vma)) && + (!prev || aligned_addr >= vm_end_gap(prev))) return addr; + else if (flags & MAP_FIXED) { + if ((vma && reserv_vma_range_within_reserv(vma, aligned_addr, aligned_len)) || + (prev && reserv_vma_range_within_reserv(prev, aligned_addr, aligned_len))) + return addr; + return -ERESERVATION; + } }
info.flags = 0; - info.length = len; + info.length = aligned_len; info.low_limit = mm->mmap_base; info.high_limit = mmap_end; - info.align_mask = 0; + info.align_mask = reserv_representable_alignment(len); info.align_offset = 0; return vm_unmapped_area(&info); } @@ -1754,29 +1771,46 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, struct mm_struct *mm = current->mm; struct vm_unmapped_area_info info; const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags); + unsigned long aligned_len = reserv_representable_length(len);
/* requested length too big for entire address space */ - if (len > mmap_end - mmap_min_addr) + if (aligned_len > mmap_end - mmap_min_addr) return -ENOMEM;
- if (flags & MAP_FIXED) + /* + * If MAP_FIXED is passed in the reservation case, the aligned range + * should be either completely contained inside an existing + * reservation, or completely outside (new reservation). + * Let this scenario fallthrough for the corresponding checks below. + */ + if ((flags & MAP_FIXED) && !reserv_is_supported(mm)) return addr;
/* requesting a specific address */ - if (addr) { + if (addr || (flags & MAP_FIXED)) { + unsigned long aligned_addr; + addr = PAGE_ALIGN(addr); - vma = find_vma_prev(mm, addr, &prev); - if (mmap_end - len >= addr && addr >= mmap_min_addr && - (!vma || addr + len <= vm_start_gap(vma)) && - (!prev || addr >= vm_end_gap(prev))) + aligned_addr = reserv_representable_base(addr, len); + vma = find_vma_prev(mm, aligned_addr, &prev); + if (mmap_end - aligned_len >= aligned_addr && aligned_addr >= mmap_min_addr && + (!vma || aligned_addr + aligned_len <= vm_start_gap(vma)) && + (!prev || aligned_addr >= vm_end_gap(prev))) return addr; + else if (flags & MAP_FIXED) { + if ((vma && reserv_vma_range_within_reserv(vma, aligned_addr, aligned_len)) || + (prev && reserv_vma_range_within_reserv(prev, aligned_addr, aligned_len))) { + return addr; + } + return -ERESERVATION; + } }
info.flags = VM_UNMAPPED_AREA_TOPDOWN; - info.length = len; + info.length = aligned_len; info.low_limit = PAGE_SIZE; info.high_limit = arch_get_mmap_base(addr, mm->mmap_base); - info.align_mask = 0; + info.align_mask = reserv_representable_alignment(len); info.align_offset = 0; addr = vm_unmapped_area(&info);
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
PCuABI memory reservations require adding reservation properties while creating and modifying the VMA. reserv_vma_set_reserv() and variants are used to update those reservation details. Currently, these properties are added only for mmap/mremap, and later commits will add them for other syscalls (shmat) and special VMA mappings.
PCuABI memory reservations also prevent merging or expanding VMAs that do not belong to the same reservation. Use suitable reservation interfaces to check those properties before performing such operations on the VMA.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/mm.h | 4 ++-- kernel/fork.c | 3 +++ mm/mmap.c | 32 +++++++++++++++++++++++++------- mm/mremap.c | 13 +++++++++---- 4 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index efc17977a31e..f9b5ad66a938 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3259,7 +3259,7 @@ extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); extern void unlink_file_vma(struct vm_area_struct *); extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, - bool *need_rmap_locks); + bool *need_rmap_locks, struct reserv_struct *reserv_info); extern void exit_mmap(struct mm_struct *); struct vm_area_struct *vma_modify(struct vma_iterator *vmi, struct vm_area_struct *prev, @@ -3365,7 +3365,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf); + struct list_head *uf, unsigned long prot); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, diff --git a/kernel/fork.c b/kernel/fork.c index a460a65624d7..ccbfc0c520ae 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -99,6 +99,7 @@ #include <linux/stackprotector.h> #include <linux/user_events.h> #include <linux/iommu.h> +#include <linux/mm_reserv.h>
#include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -678,6 +679,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, goto out; khugepaged_fork(mm, oldmm);
+ reserv_fork(mm, oldmm); + retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count); if (retval) goto out; diff --git a/mm/mmap.c b/mm/mmap.c index 6ae675961785..40d64fa163a2 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -911,7 +911,8 @@ static struct vm_area_struct /* Can we merge the predecessor? */ if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy) && can_vma_merge_after(prev, vm_flags, anon_vma, file, - pgoff, vm_userfaultfd_ctx, anon_name)) { + pgoff, vm_userfaultfd_ctx, anon_name) + && reserv_vma_range_within_reserv(prev, addr, end - addr)) { merge_prev = true; vma_prev(vmi); } @@ -920,7 +921,8 @@ static struct vm_area_struct /* Can we merge the successor? */ if (next && mpol_equal(policy, vma_policy(next)) && can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx, anon_name)) { + vm_userfaultfd_ctx, anon_name) && + reserv_vma_range_within_reserv(next, addr, end - addr)) { merge_next = true; }
@@ -1382,7 +1384,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; }
- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); + addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, prot); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) @@ -2785,7 +2787,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, unsigned long prot) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; @@ -2797,6 +2799,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, bool writable_file_mapping = false; pgoff_t vm_pgoff; int error; + struct reserv_struct reserv_info; + bool new_reserv; VMA_ITERATOR(vmi, mm, addr);
/* Check against address space limit. */ @@ -2814,6 +2818,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return -ENOMEM; }
+ new_reserv = !reserv_find_reserv_info_range(addr, len, true, &reserv_info); + /* Unmap any existing mapping in the area */ if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) return -ENOMEM; @@ -2840,7 +2846,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Check next */ if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen, - NULL_VM_UFFD_CTX, NULL)) { + NULL_VM_UFFD_CTX, NULL) && + reserv_vma_range_within_reserv(next, addr, len)) { merge_end = next->vm_end; vma = next; vm_pgoff = next->vm_pgoff - pglen; @@ -2851,7 +2858,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, pgoff, vma->vm_userfaultfd_ctx, NULL) : can_vma_merge_after(prev, vm_flags, NULL, file, pgoff, - NULL_VM_UFFD_CTX, NULL))) { + NULL_VM_UFFD_CTX, NULL)) && + reserv_vma_range_within_reserv(prev, addr, len)) { merge_start = prev->vm_start; vma = prev; vm_pgoff = prev->vm_pgoff; @@ -2959,6 +2967,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr, if (vma_iter_prealloc(&vmi, vma)) goto close_and_free_vma;
+ if (new_reserv) { + reserv_vma_set_reserv(vma, addr, len, prot); + } else { + reserv_vma_set_reserv_data(vma, &reserv_info); + } + /* Lock the VMA since it is modified after insertion into VMA tree */ vma_start_write(vma); vma_iter_store(&vmi, vma); @@ -3432,7 +3446,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) */ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, unsigned long addr, unsigned long len, pgoff_t pgoff, - bool *need_rmap_locks) + bool *need_rmap_locks, struct reserv_struct *reserv_info) { struct vm_area_struct *vma = *vmap; unsigned long vma_start = vma->vm_start; @@ -3484,6 +3498,10 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma->vm_start = addr; new_vma->vm_end = addr + len; new_vma->vm_pgoff = pgoff; + if (reserv_info) + reserv_vma_set_reserv_data(new_vma, reserv_info); + else + reserv_vma_set_reserv_start_len(new_vma, addr, len); if (vma_dup_policy(vma, new_vma)) goto out_free_vma; if (anon_vma_clone(new_vma, vma)) diff --git a/mm/mremap.c b/mm/mremap.c index 515217a95293..6a9fb59df4a6 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -651,7 +651,8 @@ static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, bool *locked, unsigned long flags, - struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap, + struct reserv_struct *reserv_info) { long to_account = new_len - old_len; struct mm_struct *mm = vma->vm_mm; @@ -705,7 +706,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, vma_start_write(vma); new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT); new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff, - &need_rmap_locks); + &need_rmap_locks, reserv_info); if (!new_vma) { if (vm_flags & VM_ACCOUNT) vm_unacct_memory(to_account >> PAGE_SHIFT); @@ -874,6 +875,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, struct vm_area_struct *vma; unsigned long ret = -EINVAL; unsigned long map_flags = 0; + struct reserv_struct reserv_info, *reserv_ptr = NULL;
if (offset_in_page(new_addr)) goto out; @@ -902,6 +904,9 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if ((mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM;
+ if (reserv_find_reserv_info_range(new_addr, new_len, true, &reserv_info)) + reserv_ptr = &reserv_info; + if (flags & MREMAP_FIXED) { ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); if (ret) @@ -945,7 +950,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, new_addr = ret;
ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, - uf_unmap); + uf_unmap, reserv_ptr);
out: return ret; @@ -1160,7 +1165,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len }
ret = move_vma(vma, addr, old_len, new_len, new_addr, - &locked, flags, &uf, &uf_unmap); + &locked, flags, &uf, &uf_unmap, NULL); } out: if (offset_in_page(ret))
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
binfmt_elf already creates appropriate initial reservations thanks to elf_map() creating an initial mapping with the total size of the executable/interpreter. vm_mmap() will then automatically create a reservation of the right size. However total_size is only computed for dynamic executables at the moment, so for reservations to be correct for static executables too we compute it in the static case as well.
Additionally, vm_brk_flags() is not aware of reservations, so we shouldn't use it. An equivalent vm_mmap() call is made instead.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- fs/binfmt_elf.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index cf96d703b333..e330dec431be 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -48,6 +48,7 @@ #include <linux/uaccess.h> #include <linux/rseq.h> #include <linux/cheri.h> +#include <linux/mm_reserv.h> #include <asm/param.h> #include <asm/page.h>
@@ -689,15 +690,24 @@ static unsigned long elf_load(struct file *filep, unsigned long addr, * If the header is requesting these pages to be * executable, honour that (ppc32 needs this). */ - int error; - zero_start = ELF_PAGEALIGN(zero_start); zero_end = ELF_PAGEALIGN(zero_end);
- error = vm_brk_flags(zero_start, zero_end - zero_start, - prot & PROT_EXEC ? VM_EXEC : 0); - if (error) - map_addr = error; + if (!reserv_is_supported(current->mm)) { + int error; + + error = vm_brk_flags(zero_start, zero_end - zero_start, + prot & PROT_EXEC ? VM_EXEC : 0); + if (error) + map_addr = error; + } else if (zero_end > zero_start) { + unsigned long addr; + + addr = vm_mmap(0, zero_start, zero_end - zero_start, prot, + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, 0); + if (BAD_ADDR(addr)) + map_addr = addr; + } } return map_addr; } @@ -1332,6 +1342,15 @@ static int load_elf_binary(struct linux_binprm *bprm) * is needed. */ elf_flags |= MAP_FIXED_NOREPLACE; + + if (reserv_is_supported(current->mm)) { + total_size = total_mapping_size(elf_phdata, + elf_ex->e_phnum); + if (!total_size) { + retval = -EINVAL; + goto out_free_dentry; + } + } } else if (elf_ex->e_type == ET_DYN) { /* * This logic is run once for the first LOAD Program
The stack needs special handling in PCuABI to ensure that its underlying reservation is appropriately sized. A new sysctl "cheri.max_stack_size" is introduced to that effect, as per the PCuABI specification, specifying the reservation size (before alignment) unless the hard limit of RLIMIT_STACK is lower (unlimited by default).
The default value for cheri.max_stack_size (128 MB) should be plenty for any application. The minimum value (256 KB) was chosen to leave a safe amount of space over the initial stack mapping expansion (128 KB, see setup_arg_pages()). The maximum value is fairly arbitrary, but has the advantage of fitting in an int.
acct_stack_growth() is also modified to prevent a mapping with VM_GROWSDOWN from growing beyond its reservation (for this reason the reservation details are set before calling expand_stack_locked()). This is only used for the main stack, as PROT_GROWSDOWN is not supported in PCuABI.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- fs/exec.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ mm/mmap.c | 3 +++ 2 files changed, 62 insertions(+)
diff --git a/fs/exec.c b/fs/exec.c index 48b29402f838..cabb0560877e 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -66,6 +66,9 @@ #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> +#include <linux/mm_reserv.h> +#include <linux/cheri.h> +#include <linux/mman.h>
#include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -113,6 +116,56 @@ bool path_noexec(const struct path *path) (path->mnt->mnt_sb->s_iflags & SB_I_NOEXEC); }
+#ifdef CONFIG_CHERI_PURECAP_UABI +static int cheri_max_stack_size = 128 * 1024 * 1024; +static int __cheri_max_stack_size_min = 256 * 1024; +static int __cheri_max_stack_size_max = 1024 * 1024 * 1024; + +static struct ctl_table pcuabi_sysctls[] = { + { + .procname = "max_stack_size", + .mode = 0644, + .data = &cheri_max_stack_size, + .maxlen = sizeof(int), + .proc_handler = proc_dointvec_minmax, + .extra1 = &__cheri_max_stack_size_min, + .extra2 = &__cheri_max_stack_size_max, + }, + { } +}; + +static void register_pcuabi_sysctls(void) +{ + register_sysctl_init("cheri", pcuabi_sysctls); +} + +static int set_stack_reserv(struct vm_area_struct *vma, + struct linux_binprm *bprm) +{ + size_t rlim_stack_max = bprm->rlim_stack.rlim_max; + ptraddr_t start; + size_t len; + + len = min(rlim_stack_max, (size_t)cheri_max_stack_size); + + start = (vma->vm_end - len) & cheri_representable_alignment_mask(len); + /* + * vma->vm_end must remain unchanged, so after aligning down the start + * address, we need to recalculate the length. + */ + len = (vma->vm_end - start); + return reserv_vma_set_reserv(vma, start, len, PROT_READ | PROT_WRITE); +} +#else +static inline void register_pcuabi_sysctls(void) {} + +static inline int set_stack_reserv(struct vm_area_struct *vma, + struct linux_binprm *bprm) +{ + return 0; +} +#endif + #ifdef CONFIG_USELIB /* * Note that a shared library must be both readable and executable due to @@ -862,6 +915,11 @@ int setup_arg_pages(struct linux_binprm *bprm, stack_base = vma->vm_end - stack_expand; #endif current->mm->start_stack = bprm->p; + + ret = set_stack_reserv(vma, bprm); + if (ret) + goto out_unlock; + ret = expand_stack_locked(vma, stack_base); if (ret) ret = -EFAULT; @@ -2195,6 +2253,7 @@ static struct ctl_table fs_exec_sysctls[] = { static int __init init_fs_exec_sysctls(void) { register_sysctl_init("fs", fs_exec_sysctls); + register_pcuabi_sysctls(); return 0; }
diff --git a/mm/mmap.c b/mm/mmap.c index 40d64fa163a2..886a729fb13c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1973,6 +1973,9 @@ static int acct_stack_growth(struct vm_area_struct *vma, if (size > rlimit(RLIMIT_STACK)) return -ENOMEM;
+ if (reserv_is_supported(mm) && size > reserv_vma_reserv_len(vma)) + return -ERESERVATION; + /* mlock limit tests */ if (!mlock_future_ok(mm, vma->vm_flags, grow << PAGE_SHIFT)) return -ENOMEM;
Create an appropriate reservation for the vDSO spanning both mappings.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- arch/arm64/kernel/vdso.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 3121e70f598a..f9059577581f 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -21,6 +21,7 @@ #include <linux/time_namespace.h> #include <linux/timekeeper_internal.h> #include <linux/vmalloc.h> +#include <linux/mman.h> #include <vdso/datapage.h> #include <vdso/helpers.h> #include <vdso/vsyscall.h> @@ -202,9 +203,9 @@ static int __setup_additional_pages(enum vdso_abi abi, struct linux_binprm *bprm, int uses_interp) { - unsigned long vdso_base, vdso_text_len, vdso_mapping_len; + unsigned long vdso_base, vdso_text_base, vdso_text_len, vdso_mapping_len; unsigned long gp_flags = 0; - void *ret; + struct vm_area_struct *ret;
BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES);
@@ -224,18 +225,26 @@ static int __setup_additional_pages(enum vdso_abi abi, if (IS_ERR(ret)) goto up_fail;
+ if (reserv_vma_set_reserv(ret, vdso_base, vdso_mapping_len, + PROT_READ | PROT_EXEC)) + goto up_fail; + if (system_supports_bti_kernel()) gp_flags = VM_ARM64_BTI;
- vdso_base += VVAR_NR_PAGES * PAGE_SIZE; - mm->context.vdso = (void *)vdso_base; - ret = _install_special_mapping(mm, vdso_base, vdso_text_len, + vdso_text_base = vdso_base + VVAR_NR_PAGES * PAGE_SIZE; + mm->context.vdso = (void *)vdso_text_base; + ret = _install_special_mapping(mm, vdso_text_base, vdso_text_len, VM_READ|VM_EXEC|gp_flags| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, vdso_info[abi].cm); if (IS_ERR(ret)) goto up_fail;
+ if (reserv_vma_set_reserv(ret, vdso_base, vdso_mapping_len, + PROT_READ | PROT_EXEC)) + goto up_fail; + return 0;
up_fail:
Enable reservations early (before any mapping is created) if the new process is in PCuABI, and otherwise disable them (the corresponding mm flag being inherited from the parent).
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- fs/binfmt_elf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index e330dec431be..8617b5328197 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1287,6 +1287,8 @@ static int load_elf_binary(struct linux_binprm *bprm) if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space) current->flags |= PF_RANDOMIZE;
+ reserv_mm_set_flag(current->mm, ELF_COMPAT); + setup_new_exec(bprm);
/* Do this so that we can load the interpreter, if need be. We will
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Use the recently introduced PCuABI reservation interfaces to add different parameter constraints for mmap/munmap syscall. The capability returned by mmap syscall is now bounded to the reservation range. The in-kernel memory mapping vm_mmap() function do not check the constraints on parameters. These reservation checks added do not affect other ABIs or compat.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/mm.h | 3 +++ mm/internal.h | 2 +- mm/mmap.c | 58 ++++++++++++++++++++++++++++++++++++++++++---- mm/util.c | 9 +------ 4 files changed, 59 insertions(+), 13 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index f9b5ad66a938..3d1e867bd903 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3411,6 +3411,9 @@ struct vm_unmapped_area_info {
extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info);
+int check_pcuabi_map_ptr_arg(user_uintptr_t user_ptr, unsigned long len, + bool map_fixed, bool locked); + /* truncate.c */ extern void truncate_inode_pages(struct address_space *, loff_t); extern void truncate_inode_pages_range(struct address_space *, diff --git a/mm/internal.h b/mm/internal.h index 58df037c3824..3a88f1e2ffee 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -861,7 +861,7 @@ extern u64 hwpoison_filter_flags_value; extern u64 hwpoison_filter_memcg; extern u32 hwpoison_filter_enable;
-extern user_uintptr_t __must_check vm_mmap_pgoff(struct file *, user_uintptr_t, +extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
diff --git a/mm/mmap.c b/mm/mmap.c index 886a729fb13c..059a9d21ec54 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1392,12 +1392,45 @@ unsigned long do_mmap(struct file *file, unsigned long addr, return addr; }
-user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, +int check_pcuabi_map_ptr_arg(user_uintptr_t user_ptr, unsigned long len, + bool map_fixed, bool locked) +{ + ptraddr_t addr = (ptraddr_t)user_ptr; + + if (!reserv_is_supported(current->mm)) + return 0; + + if (!check_user_ptr_owning(user_ptr, len)) { + if (!user_ptr_is_same((void __user *)user_ptr, as_user_ptr(addr))) + return -EINVAL; + /* + * Checking that the aligned range is wholly contained inside a + * reservation is sufficient. If the range is only partially + * contained within a reservation, get_unmapped_area will + * ensure that -ERESERVATION is returned. + */ + if (reserv_aligned_range_within_reserv(addr, len, locked)) + return -ERESERVATION; + return 0; + } + + if (!map_fixed) + return -EINVAL; + if (!reserv_cap_within_reserv(user_ptr, locked)) + return -ERESERVATION; + if (reserv_range_mapped(addr, len, locked)) + return -ENOMEM; + + return 0; +} + +user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; - user_uintptr_t retval; + user_uintptr_t retval = -EINVAL; + ptraddr_t addr = (ptraddr_t)user_ptr;
if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); @@ -1430,7 +1463,18 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t addr, unsigned long len, return PTR_ERR(file); }
+ retval = check_pcuabi_map_ptr_arg(user_ptr, len, flags & MAP_FIXED, false); + if (retval) + goto out_fput; + retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); + if (!IS_ERR_VALUE(retval) && reserv_is_supported(current->mm)) { + if (user_ptr_is_valid((const void __user *)user_ptr)) + retval = user_ptr; + else + retval = reserv_make_user_ptr_owning((ptraddr_t)retval, + false); + } out_fput: if (file) fput(file); @@ -3082,9 +3126,15 @@ int vm_munmap(unsigned long start, size_t len) } EXPORT_SYMBOL(vm_munmap);
-SYSCALL_DEFINE2(munmap, user_uintptr_t, addr, size_t, len) +SYSCALL_DEFINE2(munmap, user_uintptr_t, user_ptr, size_t, len) { - addr = untagged_addr(addr); + ptraddr_t addr = untagged_addr((ptraddr_t)user_ptr); + + if (reserv_is_supported(current->mm) && !check_user_ptr_owning(user_ptr, len)) + return -EINVAL; + if (!reserv_cap_within_reserv(user_ptr, false)) + return -ERESERVATION; + return __vm_munmap(addr, len, true); }
diff --git a/mm/util.c b/mm/util.c index afd40ed9c3c8..bd69a417c6a9 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,7 +540,7 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc) } EXPORT_SYMBOL_GPL(account_locked_vm);
-user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, +unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long pgoff) { @@ -553,19 +553,12 @@ user_uintptr_t vm_mmap_pgoff(struct file *file, user_uintptr_t addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - /* - * TODO [PCuABI] - might need propagating uintcap further down - * to do_mmap to properly handle capabilities - */ ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); - /* TODO [PCuABI] - derive proper capability */ - if (!IS_ERR_VALUE(ret)) - ret = (user_uintptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); } return ret; }
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Use the recently introduced PCuABI reservation interfaces to add different parameter constraints for mremap syscall. The capability returned by the syscall is either the same as the input capability or a new one if a new reservation is created, with the same permissions as the pointer to the original mapping, as per the PCuABI specification.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- mm/mremap.c | 64 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 51 insertions(+), 13 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c index 6a9fb59df4a6..78858a167242 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -25,6 +25,7 @@ #include <linux/uaccess.h> #include <linux/userfaultfd_k.h> #include <linux/mempolicy.h> +#include <linux/mm_reserv.h>
#include <asm/cacheflush.h> #include <asm/tlb.h> @@ -865,17 +866,35 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, return vma; }
-static unsigned long mremap_to(unsigned long addr, unsigned long old_len, - unsigned long new_addr, unsigned long new_len, bool *locked, +static user_uintptr_t make_new_user_ptr_owning(ptraddr_t new_vma_addr, + user_uintptr_t old_user_ptr) +{ + user_uintptr_t ret; + + ret = reserv_make_user_ptr_owning((ptraddr_t)new_vma_addr, true); + if (IS_ERR_VALUE(ret)) + return ret; + +#ifdef CONFIG_CHERI_PURECAP_UABI + if (reserv_is_supported(current->mm)) + ret = cheri_perms_and(ret, cheri_perms_get(old_user_ptr)); +#endif + return ret; +} + +static user_uintptr_t mremap_to(user_uintptr_t user_ptr, unsigned long old_len, + user_uintptr_t new_user_ptr, unsigned long new_len, bool *locked, unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - unsigned long ret = -EINVAL; + user_uintptr_t ret = -EINVAL; unsigned long map_flags = 0; struct reserv_struct reserv_info, *reserv_ptr = NULL; + ptraddr_t addr = untagged_addr((ptraddr_t)user_ptr); + ptraddr_t new_addr = (ptraddr_t)new_user_ptr;
if (offset_in_page(new_addr)) goto out; @@ -952,6 +971,13 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap, reserv_ptr);
+ if (!IS_ERR_VALUE(ret) && reserv_is_supported(mm)) { + if ((flags & MREMAP_FIXED) && + user_ptr_is_valid((const void __user *)new_user_ptr)) + ret = new_user_ptr; + else + ret = make_new_user_ptr_owning((ptraddr_t)ret, user_ptr); + } out: return ret; } @@ -977,19 +1003,20 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise * This option implies MREMAP_MAYMOVE. */ -SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len, +SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_ptr, unsigned long, old_len, unsigned long, new_len, unsigned long, flags, - user_uintptr_t, new_addr) + user_uintptr_t, new_user_ptr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; user_uintptr_t ret = -EINVAL; bool locked = false; struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; + ptraddr_t addr = (ptraddr_t)user_ptr; + ptraddr_t new_addr = (ptraddr_t)new_user_ptr; LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap);
- /* @TODO [PCuABI] - capability validation */ /* * There is a deliberate asymmetry here: we strip the pointer tag * from the old address but leave the new address alone. This is @@ -1039,7 +1066,18 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len ret = -EFAULT; goto out; } + if (reserv_is_supported(mm) && + !check_user_ptr_owning(user_ptr, old_len ? old_len : new_len)) + goto out; + if (!reserv_cap_within_reserv(user_ptr, true)) { + ret = -ERESERVATION; + goto out; + } + ret = check_pcuabi_map_ptr_arg(new_user_ptr, new_len, flags & MREMAP_FIXED, true); + if (ret) + goto out;
+ ret = -EINVAL; if (is_vm_hugetlb_page(vma)) { struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -1061,7 +1099,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len }
if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) { - ret = mremap_to(addr, old_len, new_addr, new_len, + ret = mremap_to(user_ptr, old_len, new_user_ptr, new_len, &locked, flags, &uf, &uf_unmap_early, &uf_unmap); goto out; @@ -1086,7 +1124,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len if (ret) goto out;
- ret = addr; + ret = user_ptr; goto out_unlocked; }
@@ -1140,7 +1178,7 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len locked = true; new_addr = addr; } - ret = addr; + ret = user_ptr; goto out; } } @@ -1166,6 +1204,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len
ret = move_vma(vma, addr, old_len, new_len, new_addr, &locked, flags, &uf, &uf_unmap, NULL); + + if (!IS_ERR_VALUE(ret) && reserv_is_supported(mm)) + ret = make_new_user_ptr_owning((ptraddr_t)ret, user_ptr); } out: if (offset_in_page(ret)) @@ -1177,8 +1218,5 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, addr, unsigned long, old_len userfaultfd_unmap_complete(mm, &uf_unmap_early); mremap_userfaultfd_complete(&uf, addr, ret, old_len); userfaultfd_unmap_complete(mm, &uf_unmap); - /* TODO [PCuABI] - derive proper capability */ - return IS_ERR_VALUE(ret) ? - ret : - (user_intptr_t)uaddr_to_user_ptr_safe((ptraddr_t)ret); + return ret; }
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Use the recently introduced PCuABI reservation interfaces and add the relevant capability/reservation constraint checks on the mprotect syscall parameters.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- mm/mprotect.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 4dffb34f62fd..7bf46faa7fd6 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -32,6 +32,7 @@ #include <linux/sched/sysctl.h> #include <linux/userfaultfd_k.h> #include <linux/memory-tiers.h> +#include <linux/mm_reserv.h> #include <asm/cacheflush.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> @@ -677,7 +678,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, /* * pkey==-1 when doing a legacy mprotect() */ -static int do_mprotect_pkey(user_uintptr_t start, size_t len, +static int do_mprotect_pkey(user_uintptr_t user_ptr, size_t len, unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; @@ -688,9 +689,7 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, (prot & PROT_READ); struct mmu_gather tlb; struct vma_iterator vmi; - - /* TODO [PCuABI] - capability checks for uaccess */ - start = untagged_addr(start); + unsigned long start = untagged_addr(user_ptr);
prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ @@ -704,6 +703,11 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, end = start + len; if (end <= start) return -ENOMEM; + + if (reserv_is_supported(current->mm) && + !(check_user_ptr_owning(user_ptr, len))) + return -EINVAL; + if (!arch_validate_prot(prot, start)) return -EINVAL;
@@ -712,6 +716,10 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, if (mmap_write_lock_killable(current->mm)) return -EINTR;
+ error = -ERESERVATION; + if (!reserv_cap_within_reserv(user_ptr, true)) + goto out; + /* * If userspace did not allocate the pkey, do not let * them use it here. @@ -825,18 +833,18 @@ static int do_mprotect_pkey(user_uintptr_t start, size_t len, return error; }
-SYSCALL_DEFINE3(mprotect, user_uintptr_t, start, size_t, len, +SYSCALL_DEFINE3(mprotect, user_uintptr_t, user_ptr, size_t, len, unsigned long, prot) { - return do_mprotect_pkey(start, len, prot, -1); + return do_mprotect_pkey(user_ptr, len, prot, -1); }
#ifdef CONFIG_ARCH_HAS_PKEYS
-SYSCALL_DEFINE4(pkey_mprotect, user_uintptr_t, start, size_t, len, +SYSCALL_DEFINE4(pkey_mprotect, user_uintptr_t, user_ptr, size_t, len, unsigned long, prot, int, pkey) { - return do_mprotect_pkey(start, len, prot, pkey); + return do_mprotect_pkey(user_ptr, len, prot, pkey); }
SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Use the recently introduced PCuABI reservation interfaces to verify the address range for madvise syscall, and in the corresponding io_uring command.
do_madvise() has some internal kernel users that manipulate raw addresses, so add a parameter to skip the reservation checks.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/mm.h | 3 ++- io_uring/advise.c | 3 +-- mm/damon/vaddr.c | 2 +- mm/madvise.c | 26 +++++++++++++++++++++----- 4 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 3d1e867bd903..691d624ad167 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3375,7 +3375,8 @@ extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, bool unlock); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); +extern int do_madvise(struct mm_struct *mm, user_uintptr_t user_ptr, size_t len_in, + int behavior, bool cap_check_skip);
#ifdef CONFIG_MMU extern int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, diff --git a/io_uring/advise.c b/io_uring/advise.c index 952d9289a311..b711af00719a 100644 --- a/io_uring/advise.c +++ b/io_uring/advise.c @@ -54,8 +54,7 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags)
WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
- /* TODO [PCuABI] - capability checks for uaccess */ - ret = do_madvise(current->mm, user_ptr_addr(ma->addr), ma->len, ma->advice); + ret = do_madvise(current->mm, (user_uintptr_t)ma->addr, ma->len, ma->advice, false); io_req_set_res(req, ret, 0); return IOU_OK; #else diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index a4d1f63c5b23..3138da113117 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -643,7 +643,7 @@ static unsigned long damos_madvise(struct damon_target *target, if (!mm) return 0;
- applied = do_madvise(mm, start, len, behavior) ? 0 : len; + applied = do_madvise(mm, start, len, behavior, true) ? 0 : len; mmput(mm);
return applied; diff --git a/mm/madvise.c b/mm/madvise.c index d0c8e854636e..6668d76c8a35 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -31,6 +31,7 @@ #include <linux/swapops.h> #include <linux/shmem_fs.h> #include <linux/mmu_notifier.h> +#include <linux/mm_reserv.h>
#include <asm/tlb.h>
@@ -1394,13 +1395,15 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. */ -int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) +int do_madvise(struct mm_struct *mm, user_uintptr_t user_ptr, size_t len_in, + int behavior, bool cap_check_skip) { unsigned long end; int error; int write; size_t len; struct blk_plug plug; + unsigned long start = (ptraddr_t)user_ptr;
if (!madvise_behavior_valid(behavior)) return -EINVAL; @@ -1433,14 +1436,27 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh mmap_read_lock(mm); }
- /* TODO [PCuABI] - capability checks for uaccess */ start = untagged_addr_remote(mm, start); end = start + len;
+ if (!cap_check_skip) { + if (reserv_is_supported(current->mm) && + !check_user_ptr_owning(user_ptr, len)) { + error = -EINVAL; + goto out; + } + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_cap_within_reserv(user_ptr, true)) { + error = -ERESERVATION; + goto out; + } + } + blk_start_plug(&plug); error = madvise_walk_vmas(mm, start, end, behavior, madvise_vma_behavior); blk_finish_plug(&plug); +out: if (write) mmap_write_unlock(mm); else @@ -1449,9 +1465,9 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh return error; }
-SYSCALL_DEFINE3(madvise, user_uintptr_t, start, size_t, len_in, int, behavior) +SYSCALL_DEFINE3(madvise, user_uintptr_t, user_ptr, size_t, len_in, int, behavior) { - return do_madvise(current->mm, start, len_in, behavior); + return do_madvise(current->mm, user_ptr, len_in, behavior, false); }
SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, @@ -1506,7 +1522,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
while (iov_iter_count(&iter)) { ret = do_madvise(mm, user_ptr_addr(iter_iov_addr(&iter)), - iter_iov_len(&iter), behavior); + iter_iov_len(&iter), behavior, true); if (ret < 0) break; iov_iter_advance(&iter, iter_iov_len(&iter));
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Use the recently introduced PCuABI reservation interfaces to verify the input pointer for the mlock, mlock2, and munlock syscalls.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- mm/mlock.c | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c index 086546ac5766..d971ac948fad 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -25,6 +25,7 @@ #include <linux/memcontrol.h> #include <linux/mm_inline.h> #include <linux/secretmem.h> +#include <linux/mm_reserv.h>
#include "internal.h"
@@ -621,13 +622,12 @@ static int __mlock_posix_error_return(long retval) return retval; }
-static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) +static __must_check int do_mlock(user_uintptr_t user_ptr, size_t len, vm_flags_t flags) { unsigned long locked; unsigned long lock_limit; int error = -ENOMEM; - - start = untagged_addr(start); + unsigned long start = untagged_addr(user_ptr);
if (!can_do_mlock()) return -EPERM; @@ -635,6 +635,9 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
+ if (reserv_is_supported(current->mm) && !check_user_ptr_owning(user_ptr, len)) + return -EINVAL; + lock_limit = rlimit(RLIMIT_MEMLOCK); lock_limit >>= PAGE_SHIFT; locked = len >> PAGE_SHIFT; @@ -642,6 +645,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla if (mmap_write_lock_killable(current->mm)) return -EINTR;
+ /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_cap_within_reserv(user_ptr, true)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } + locked += current->mm->locked_vm; if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) { /* @@ -668,12 +677,12 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla return 0; }
-SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(mlock, user_uintptr_t, user_ptr, size_t, len) { - return do_mlock(start, len, VM_LOCKED); + return do_mlock(user_ptr, len, VM_LOCKED); }
-SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(mlock2, user_uintptr_t, user_ptr, size_t, len, int, flags) { vm_flags_t vm_flags = VM_LOCKED;
@@ -683,20 +692,27 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) if (flags & MLOCK_ONFAULT) vm_flags |= VM_LOCKONFAULT;
- return do_mlock(start, len, vm_flags); + return do_mlock(user_ptr, len, vm_flags); }
-SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) +SYSCALL_DEFINE2(munlock, user_uintptr_t, user_ptr, size_t, len) { int ret; - - start = untagged_addr(start); + unsigned long start = untagged_addr(user_ptr);
len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK;
+ if (reserv_is_supported(current->mm) && !check_user_ptr_owning(user_ptr, len)) + return -EINVAL; + if (mmap_write_lock_killable(current->mm)) return -EINTR; + /* Check if the range exists within the reservation with mmap lock. */ + if (!reserv_cap_within_reserv(user_ptr, true)) { + mmap_write_unlock(current->mm); + return -ERESERVATION; + } ret = apply_vma_lock_flags(start, len, 0); mmap_write_unlock(current->mm);
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Use the recently introduced PCuABI reservation interfaces to verify the input pointer for the msync syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/msync.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/msync.c b/mm/msync.c index ac4c9bfea2e7..64278c54c483 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -14,6 +14,7 @@ #include <linux/file.h> #include <linux/syscalls.h> #include <linux/sched.h> +#include <linux/mm_reserv.h>
/* * MS_SYNC syncs the entire file - including mappings. @@ -29,15 +30,14 @@ * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ -SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) +SYSCALL_DEFINE3(msync, user_uintptr_t, user_ptr, size_t, len, int, flags) { unsigned long end; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; int unmapped_error = 0; int error = -EINVAL; - - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)user_ptr);
if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; @@ -45,6 +45,12 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) goto out; if ((flags & MS_ASYNC) && (flags & MS_SYNC)) goto out; + if (reserv_is_supported(mm) && !check_user_ptr_owning(user_ptr, len)) + goto out; + if (!reserv_cap_within_reserv(user_ptr, false)) { + error = -ERESERVATION; + goto out; + } error = -ENOMEM; len = (len + ~PAGE_MASK) & PAGE_MASK; end = start + len;
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Capability permissions and bounds constraints are added as per the PCuABI specification for the mincore syscall. mincore() does not require VMem and only one of the RWX permissions, so the standard check_user_ptr_owning() interface is not used, and permissions are verified explicitly.
Also, as mincore() allows the address range to not span whole pages, the bounds checking is tweaked accordingly.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- mm/mincore.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 51 insertions(+), 3 deletions(-)
diff --git a/mm/mincore.c b/mm/mincore.c index dd164cb84ba8..e59a877e3e99 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -19,6 +19,7 @@ #include <linux/hugetlb.h> #include <linux/pgtable.h>
+#include <linux/mm_reserv.h> #include <linux/uaccess.h> #include "swap.h"
@@ -172,6 +173,50 @@ static inline bool can_do_mincore(struct vm_area_struct *vma) file_permission(vma->vm_file, MAY_WRITE) == 0; }
+static int check_pcuabi_mincore_ptr_arg(user_uintptr_t user_ptr, + unsigned long start, + unsigned long len) +{ +#ifdef CONFIG_CHERI_PURECAP_UABI + ptraddr_t cap_base, min_upper_bound; + size_t cap_len; + cheri_perms_t perms; + + if (!reserv_is_supported(current->mm)) + return 0; + + cap_base = cheri_base_get(user_ptr); + cap_len = cheri_length_get(user_ptr); + perms = cheri_perms_get(user_ptr); + + /* + * mincore does not require the VMem permission so as to allow ordinary + * pointers (unlike other address space management syscalls requiring an + * owning capability). + * Requiring at least one of the standard memory permissions RWX will + * however help to reject non-memory capabilities. + */ + if (!(cheri_is_valid(user_ptr) && cheri_is_unsealed(user_ptr) && + (perms & CHERI_PERM_GLOBAL) && + (perms & (CHERI_PERM_LOAD | CHERI_PERM_STORE | CHERI_PERM_EXECUTE)))) + return -EINVAL; + + /* + * mincore syscall can be invoked as: + * mincore(align_down(p, PAGE_SIZE), sz + (p.addr % PAGE_SIZE), vec) + * Hence, the capability bounds may not encompass the page-aligned range. + * For that reason, we only require that the capability bounds cover at + * least one byte in the first and last page. + */ + min_upper_bound = start + PAGE_ALIGN_DOWN(len) + + (offset_in_page(len) > 0 ? 1 : 0); + if (!((start + PAGE_SIZE > cap_base) && + (cap_base + cap_len >= min_upper_bound))) + return -EINVAL; +#endif + return 0; +} + static const struct mm_walk_ops mincore_walk_ops = { .pmd_entry = mincore_pte_range, .pte_hole = mincore_unmapped_range, @@ -229,14 +274,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v * mapped * -EAGAIN - A kernel resource was temporarily unavailable. */ -SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, +SYSCALL_DEFINE3(mincore, user_uintptr_t, user_ptr, size_t, len, unsigned char __user *, vec) { long retval; unsigned long pages; unsigned char *tmp; - - start = untagged_addr(start); + unsigned long start = untagged_addr((ptraddr_t)user_ptr);
/* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) @@ -253,6 +297,10 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, if (!access_ok(vec, pages)) return -EFAULT;
+ retval = check_pcuabi_mincore_ptr_arg(user_ptr, start, len); + if (retval) + return retval; + tmp = (void *) __get_free_page(GFP_USER); if (!tmp) return -EAGAIN;
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Use the recently introduced PCuABI reservation interfaces to verify the input pointer for the shmat and shmdt syscalls, and create a capability with appropriate bounds and permissions, if necessary, when shmat succeeds.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/linux/shm.h | 4 ++-- ipc/shm.c | 44 +++++++++++++++++++++++++++++++++----------- 2 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h index d8e69aed3d32..bf5b2e5cbd0c 100644 --- a/include/linux/shm.h +++ b/include/linux/shm.h @@ -14,7 +14,7 @@ struct sysv_shm { struct list_head shm_clist; };
-long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr, +long do_shmat(int shmid, char __user *shmaddr, int shmflg, user_uintptr_t *user_ptr, unsigned long shmlba); bool is_file_shm_hugepages(struct file *file); void exit_shm(struct task_struct *task); @@ -25,7 +25,7 @@ struct sysv_shm { };
static inline long do_shmat(int shmid, char __user *shmaddr, - int shmflg, unsigned long *addr, + int shmflg, user_uintptr_t *user_ptr, unsigned long shmlba) { return -ENOSYS; diff --git a/ipc/shm.c b/ipc/shm.c index 7bb7c4bbc383..ddf83a1f62ed 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -44,6 +44,7 @@ #include <linux/mount.h> #include <linux/ipc_namespace.h> #include <linux/rhashtable.h> +#include <linux/mm_reserv.h>
#include <linux/uaccess.h>
@@ -1519,14 +1520,13 @@ COMPAT_SYSCALL_DEFINE3(old_shmctl, int, shmid, int, cmd, void __user *, uptr) * Fix shmaddr, allocate descriptor, map shm, add attach descriptor to lists. * * NOTE! Despite the name, this is NOT a direct system call entrypoint. The - * "raddr" thing points to kernel space, and there has to be a wrapper around + * "ruser_ptr" thing points to kernel space, and there has to be a wrapper around * this. */ long do_shmat(int shmid, char __user *shmaddr, int shmflg, - ulong *raddr, unsigned long shmlba) + user_uintptr_t *ruser_ptr, unsigned long shmlba) { struct shmid_kernel *shp; - /* TODO [PCuABI] - capability checks for address space management */ unsigned long addr = user_ptr_addr(shmaddr); unsigned long size; struct file *file, *base; @@ -1538,6 +1538,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, struct shm_file_data *sfd; int f_flags; unsigned long populate = 0; + user_uintptr_t user_ptr = (user_uintptr_t)shmaddr;
err = -EINVAL; if (shmid < 0) @@ -1666,11 +1667,23 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; }
+ user_ptr = (user_uintptr_t)user_ptr_set_addr(shmaddr, addr); + err = check_pcuabi_map_ptr_arg(user_ptr, size, flags & MAP_FIXED, true); + if (err) + goto invalid; + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); - *raddr = addr; + + if (!IS_ERR_VALUE(addr) && reserv_is_supported(current->mm)) { + if (!user_ptr_is_valid((const void __user *)user_ptr)) + user_ptr = reserv_make_user_ptr_owning(addr, true); + } else { + user_ptr = addr; + } + *ruser_ptr = user_ptr; err = 0; - if (IS_ERR_VALUE(addr)) - err = (long)addr; + if (IS_ERR_VALUE(user_ptr)) + err = (long)user_ptr; invalid: mmap_write_unlock(current->mm); if (populate) @@ -1699,15 +1712,14 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
SYSCALL_DEFINE3(__retptr__(shmat), int, shmid, char __user *, shmaddr, int, shmflg) { - unsigned long ret; + user_uintptr_t ret; long err;
err = do_shmat(shmid, shmaddr, shmflg, &ret, SHMLBA); if (err) return err; force_successful_syscall_return(); - /* TODO [PCuABI] - derive proper capability */ - return (user_uintptr_t)uaddr_to_user_ptr_safe(ret); + return ret; }
#ifdef CONFIG_COMPAT @@ -1718,7 +1730,7 @@ SYSCALL_DEFINE3(__retptr__(shmat), int, shmid, char __user *, shmaddr, int, shmf
COMPAT_SYSCALL_DEFINE3(shmat, int, shmid, compat_uptr_t, shmaddr, int, shmflg) { - unsigned long ret; + user_uintptr_t ret; long err;
err = do_shmat(shmid, compat_ptr(shmaddr), shmflg, &ret, COMPAT_SHMLBA); @@ -1737,7 +1749,6 @@ long ksys_shmdt(char __user *shmaddr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - /* TODO [PCuABI] - capability checks for address space management */ unsigned long addr = user_ptr_addr(shmaddr); int retval = -EINVAL; #ifdef CONFIG_MMU @@ -1783,6 +1794,7 @@ long ksys_shmdt(char __user *shmaddr) */ if ((vma->vm_ops == &shm_vm_ops) && (vma->vm_start - addr)/PAGE_SIZE == vma->vm_pgoff) { + user_uintptr_t user_ptr = (user_uintptr_t)shmaddr;
/* * Record the file of the shm segment being @@ -1792,6 +1804,15 @@ long ksys_shmdt(char __user *shmaddr) */ file = vma->vm_file; size = i_size_read(file_inode(vma->vm_file)); + + if (reserv_is_supported(mm) && + !check_user_ptr_owning(user_ptr, size)) + goto out_unlock; + if (!reserv_cap_within_reserv(user_ptr, true)) { + retval = -ERESERVATION; + goto out_unlock; + } + do_vma_munmap(&vmi, vma, vma->vm_start, vma->vm_end, NULL, false); /* @@ -1836,6 +1857,7 @@ long ksys_shmdt(char __user *shmaddr)
#endif
+out_unlock: mmap_write_unlock(mm); return retval; }
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
The PCuABI specification introduces limitations in expanding the capability permissions, notably through the mprotect() system call. This requires owning capabilities to be initially created with the maximum permissions that the memory mappings may possess in their lifetime.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/uapi/asm-generic/mman-common.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..e7ba511c2bad 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -17,6 +17,12 @@ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
+/* PCuABI mapping and capability permissions */ +#define _PROT_MAX_SHIFT 16 +#define PROT_MAX(prot) ((prot) << _PROT_MAX_SHIFT) +#define PROT_EXTRACT(prot) ((prot) & (PROT_READ | PROT_WRITE | PROT_EXEC)) +#define PROT_MAX_EXTRACT(prot) (((prot) >> _PROT_MAX_SHIFT) & (PROT_READ | PROT_WRITE | PROT_EXEC)) + /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Implement arch_user_ptr_owning_perms_from_prot() in PCuABI to handle Morello-specific flags and permissions. The hook will be used in a subsequent patch to restrict the permissions of owning capabilities.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/user_ptr.h | 34 +++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) create mode 100644 arch/arm64/include/asm/user_ptr.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 83a5817afa7d..fbf4ed6c6b5b 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -31,6 +31,7 @@ config ARM64 select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV select HAVE_ARCH_CHERI_H + select HAVE_ARCH_USER_PTR_H select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/include/asm/user_ptr.h b/arch/arm64/include/asm/user_ptr.h new file mode 100644 index 000000000000..8e75f3e8b980 --- /dev/null +++ b/arch/arm64/include/asm/user_ptr.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_USER_PTR_H +#define __ASM_USER_PTR_H + +#include <linux/cheri.h> +#include <linux/mman.h> +#include <linux/sched/task_stack.h> +#include <asm/processor.h> + +#ifdef CONFIG_CHERI_PURECAP_UABI + +static inline +user_ptr_perms_t arch_user_ptr_owning_perms_from_prot(int prot, unsigned long vm_flags) +{ + struct pt_regs *regs = task_pt_regs(current); + cheri_perms_t perms = 0; + + if ((prot & PROT_READ) && (vm_flags & VM_READ_CAPS)) + perms |= ARM_CAP_PERMISSION_MUTABLE_LOAD; + + if (prot & PROT_EXEC) { + if (cheri_perms_get(regs->pcc) & CHERI_PERM_SYSTEM_REGS) + perms |= CHERI_PERM_SYSTEM_REGS; + if (cheri_perms_get(regs->pcc) & ARM_CAP_PERMISSION_EXECUTIVE) + perms |= ARM_CAP_PERMISSION_EXECUTIVE; + } + + return perms; +} +#define arch_user_ptr_owning_perms_from_prot arch_user_ptr_owning_perms_from_prot + +#endif /* CONFIG_CHERI_PURECAP_UABI */ + +#endif /* __ASM_USER_PTR_H */
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
user_ptr_owning_perms_from_prot() is modified to calculate capability permissions depending on prot flags and whether tag access is enabled (VM_{READ,WRITE}_CAPS), as per the PCuABI specification.
A new helper function user_ptr_may_set_prot() is introduced to check whether prot flags may be set depending on the capability permissions.
Also introduce an arch hook arch_user_ptr_owning_perms_from_prot() to convert flags to arch-specific capability permissions.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- Documentation/core-api/user_ptr.rst | 1 + arch/Kconfig | 3 ++ include/linux/user_ptr.h | 16 ++++++++++ lib/user_ptr.c | 46 +++++++++++++++++++++++++++-- 4 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/Documentation/core-api/user_ptr.rst b/Documentation/core-api/user_ptr.rst index 0632bc9f4e8b..902686c27895 100644 --- a/Documentation/core-api/user_ptr.rst +++ b/Documentation/core-api/user_ptr.rst @@ -361,5 +361,6 @@ creating and checking user pointers. * ``check_user_ptr_owning(ptr, len)`` * ``make_user_ptr_owning(reserv, len)`` * ``user_ptr_owning_perms_from_prot(prot, vm_flags)`` +* ``user_ptr_may_set_prot(ptr, prot)``
See ``<linux/user_ptr.h>`` for details on how to use them. diff --git a/arch/Kconfig b/arch/Kconfig index 19f7bbb20a41..161f7002b0ab 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1502,6 +1502,9 @@ config CHERI_PURECAP_UABI availability of CHERI capabilities at compile-time; the resulting kernel image will not boot on incompatible hardware.
+config HAVE_ARCH_USER_PTR_H + bool + source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig" diff --git a/include/linux/user_ptr.h b/include/linux/user_ptr.h index a48a294b434e..72b07c6e410a 100644 --- a/include/linux/user_ptr.h +++ b/include/linux/user_ptr.h @@ -152,6 +152,17 @@ user_uintptr_t make_user_ptr_owning(const struct reserv_struct *reserv, */ user_ptr_perms_t user_ptr_owning_perms_from_prot(int prot, unsigned long vm_flags);
+/** + * user_ptr_may_set_prot() - Check if the user pointer allows setting the given + * memory protection flags. + * @user_ptr: User pointer. + * @prot: Memory protection flags. + * + * Return: True if the capability permissions allow setting the protection flags + * or false otherwise. + */ +bool user_ptr_may_set_prot(user_uintptr_t user_ptr, int prot); + #else /* CONFIG_CHERI_PURECAP_UABI */
typedef int user_ptr_perms_t; @@ -210,6 +221,11 @@ static inline user_ptr_perms_t user_ptr_owning_perms_from_prot(int prot, return 0; }
+static inline bool user_ptr_may_set_prot(user_uintptr_t user_ptr, int prot) +{ + return true; +} + #endif /* CONFIG_CHERI_PURECAP_UABI */
/** diff --git a/lib/user_ptr.c b/lib/user_ptr.c index 6f7fc9111b1d..c2c53a3bc698 100644 --- a/lib/user_ptr.c +++ b/lib/user_ptr.c @@ -1,9 +1,23 @@ /* SPDX-License-Identifier: GPL-2.0-only */ #include <linux/bug.h> #include <linux/cheri.h> +#include <linux/mman.h> #include <linux/mm_types.h> #include <linux/user_ptr.h>
+#ifdef CONFIG_HAVE_ARCH_USER_PTR_H +#include <asm/user_ptr.h> +#endif + +#ifndef arch_user_ptr_owning_perms_from_prot +static inline +user_ptr_perms_t arch_user_ptr_owning_perms_from_prot(int prot, unsigned long vm_flags) +{ + return 0; +} +#endif /* arch_user_ptr_owning_perms_from_prot */ + + void __user *uaddr_to_user_ptr(ptraddr_t addr) { /* @@ -99,7 +113,33 @@ user_uintptr_t make_user_ptr_owning(const struct reserv_struct *reserv,
user_ptr_perms_t user_ptr_owning_perms_from_prot(int prot, unsigned long vm_flags) { - /* TODO [PCuABI] - capability permission conversion from memory permission */ - return (CHERI_PERMS_READ | CHERI_PERMS_WRITE | - CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP); + user_ptr_perms_t perms = CHERI_PERMS_ROOTCAP; + int used_prot = PROT_MAX_EXTRACT(prot) ? PROT_MAX_EXTRACT(prot) : prot; + + if (used_prot & PROT_READ) { + perms |= CHERI_PERM_LOAD; + if (vm_flags & VM_READ_CAPS) + perms |= CHERI_PERM_LOAD_CAP; + } + if (used_prot & PROT_WRITE) { + perms |= CHERI_PERM_STORE; + if (vm_flags & VM_WRITE_CAPS) + perms |= (CHERI_PERM_STORE_CAP | CHERI_PERM_STORE_LOCAL_CAP); + } + if (used_prot & PROT_EXEC) + perms |= CHERI_PERM_EXECUTE; + + /* Fetch any extra architecture specific permissions */ + perms |= arch_user_ptr_owning_perms_from_prot(used_prot, vm_flags); + + return perms; +} + +bool user_ptr_may_set_prot(user_uintptr_t user_ptr, int prot) +{ + user_ptr_perms_t perms = cheri_perms_get(user_ptr); + + return !(((prot & PROT_READ) && !(perms & CHERI_PERM_LOAD)) || + ((prot & PROT_WRITE) && !(perms & CHERI_PERM_STORE)) || + ((prot & PROT_EXEC) && !(perms & CHERI_PERM_EXECUTE))); }
As things stand, the permissions of initial reservations will be based on the prot flags of the first segment of the executable/interpreter. Since both RX and RW segments are typically present, the reservation should rather have RWX permissions. Add appropriate PROT_MAX() flags to obtain the right permissions.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- fs/binfmt_elf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 8617b5328197..e52b6adb14b9 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -887,6 +887,8 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state, if (p_flags & PF_X) prot |= PROT_EXEC;
+ prot |= PROT_MAX(PROT_READ | PROT_WRITE | PROT_EXEC); + return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp); }
Initial reservations are created via the standard do_mmap() in binfmt_elf, and as a result their permissions are calculated via the standard helper that calls arch_user_ptr_owning_perms_from_prot(). The latter relies on the current value of PCC to decide whether to return System and Executive, so we need to ensure that this value is well-defined when binfmt_elf creates the mappings.
morello_thread_init_user() is called from arch_setup_new_exec(), which itself is called very early by binfmt_elf. This is therefore a good place to set PCC to a temporary value with all permissions, while waiting for the final value set by morello_thread_start().
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- arch/arm64/kernel/morello.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/arm64/kernel/morello.c b/arch/arm64/kernel/morello.c index 2dbfa49c23f7..a445d3aca710 100644 --- a/arch/arm64/kernel/morello.c +++ b/arch/arm64/kernel/morello.c @@ -164,6 +164,22 @@ void morello_thread_init_user(void)
write_sysreg(cctlr, cctlr_el0); morello_state->cctlr = cctlr; + + if (is_pure_task()) { + /* + * arch_user_ptr_owning_perms_from_prot() checks the permissions + * of PCC to decide which permissions to return. It ends up + * being called from binfmt_elf before the thread has even + * started, at which point the value of PCC will be that of the + * old process. To avoid this issue, set PCC to a temporary + * value with all permissions, so that initial reservations + * (executable, interpreter, etc.) are assigned appropriate + * permissions (especially Executive). This value is never + * visible to userspace as morello_thread_start() will set the + * final value. + */ + task_pt_regs(current)->pcc = cheri_user_root_cap; + } }
void morello_thread_save_user_state(struct task_struct *tsk)
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Add a check that the requested protection bits do not exceed the maximum protection bits.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mmap.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index 059a9d21ec54..1b8534de40fb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1463,6 +1463,17 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, return PTR_ERR(file); }
+ /* + * Introduce additional checks for PCuABI: + * - The PCuABI reservation model introduces the concept of maximum + * protection mappings can have. Add a check to make sure the + * requested protection does not exceed the maximum protection. + */ + if (reserv_is_supported(current->mm)) { + if ((PROT_MAX_EXTRACT(prot) != 0) && + ((PROT_EXTRACT(prot) & PROT_MAX_EXTRACT(prot)) != PROT_EXTRACT(prot))) + goto out_fput; + } retval = check_pcuabi_map_ptr_arg(user_ptr, len, flags & MAP_FIXED, false); if (retval) goto out_fput;
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Check that the permissions of the new user pointer does not exceed the permission of old user pointer for the mremap syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mremap.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/mm/mremap.c b/mm/mremap.c index 78858a167242..6e9598a0d73c 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -996,6 +996,20 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) return 1; }
+static int check_mremap_user_ptr_perms(user_uintptr_t user_ptr, user_uintptr_t new_user_ptr, + unsigned long flags) +{ +#ifdef CONFIG_CHERI_PURECAP_UABI + if (!reserv_is_supported(current->mm) || !(flags & MREMAP_FIXED)) + return 0; + + if ((cheri_perms_get(user_ptr) | cheri_perms_get(new_user_ptr)) + != cheri_perms_get(user_ptr)) + return -EINVAL; +#endif + return 0; +} + /* * Expand (or shrink) an existing mapping, potentially moving it at the * same time (controlled by the MREMAP_MAYMOVE flag and available VM space) @@ -1074,6 +1088,9 @@ SYSCALL_DEFINE5(__retptr__(mremap), user_uintptr_t, user_ptr, unsigned long, old goto out; } ret = check_pcuabi_map_ptr_arg(new_user_ptr, new_len, flags & MREMAP_FIXED, true); + if (ret) + goto out; + ret = check_mremap_user_ptr_perms(user_ptr, new_user_ptr, flags); if (ret) goto out;
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Check that the requested prot flags can be set based on the input capability's permissions for the mprotect syscall.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- mm/mprotect.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 7bf46faa7fd6..0f84e680ca5a 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -705,7 +705,8 @@ static int do_mprotect_pkey(user_uintptr_t user_ptr, size_t len, return -ENOMEM;
if (reserv_is_supported(current->mm) && - !(check_user_ptr_owning(user_ptr, len))) + !(check_user_ptr_owning(user_ptr, len) && + user_ptr_may_set_prot(user_ptr, prot))) return -EINVAL;
if (!arch_validate_prot(prot, start))
Introduce PROT_CAP_INVOKE as per the PCuABI specification.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- include/uapi/asm-generic/mman-common.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index e7ba511c2bad..58c70cde6fe7 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -18,6 +18,8 @@ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
/* PCuABI mapping and capability permissions */ +#define PROT_CAP_INVOKE 0x2000 /* mmap flag: provide CInvoke capability permission */ + #define _PROT_MAX_SHIFT 16 #define PROT_MAX(prot) ((prot) << _PROT_MAX_SHIFT) #define PROT_EXTRACT(prot) ((prot) & (PROT_READ | PROT_WRITE | PROT_EXEC))
Add the BranchSealedPair permission if PROT_CAP_INVOKE is passed, as per the PCuABI specification.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- arch/arm64/include/asm/user_ptr.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm64/include/asm/user_ptr.h b/arch/arm64/include/asm/user_ptr.h index 8e75f3e8b980..47e1eab2a420 100644 --- a/arch/arm64/include/asm/user_ptr.h +++ b/arch/arm64/include/asm/user_ptr.h @@ -25,6 +25,9 @@ user_ptr_perms_t arch_user_ptr_owning_perms_from_prot(int prot, unsigned long vm perms |= ARM_CAP_PERMISSION_EXECUTIVE; }
+ if (prot & PROT_CAP_INVOKE) + perms |= ARM_CAP_PERMISSION_BRANCH_SEALED_PAIR; + return perms; } #define arch_user_ptr_owning_perms_from_prot arch_user_ptr_owning_perms_from_prot
In PCuABI, specifically on Morello, initial executable/interpreter reservations and corresponding capabilities should have the BranchSealedPair permission. Pass PROT_CAP_INVOKE for that purpose.
This is a no-op on other ABIs / architectures.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- fs/binfmt_elf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index e52b6adb14b9..c7490ece875a 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -887,7 +887,7 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state, if (p_flags & PF_X) prot |= PROT_EXEC;
- prot |= PROT_MAX(PROT_READ | PROT_WRITE | PROT_EXEC); + prot |= PROT_MAX(PROT_READ | PROT_WRITE | PROT_EXEC) | PROT_CAP_INVOKE;
return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp); }
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
The MAP_GROWSDOWN flag is not supported by PCuABI specification. Hence, reject such requests with -EOPNOTSUPP error.
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com --- mm/mmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index 1b8534de40fb..c1da5b13ec3c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1468,11 +1468,17 @@ user_uintptr_t ksys_mmap_pgoff(user_uintptr_t user_ptr, unsigned long len, * - The PCuABI reservation model introduces the concept of maximum * protection mappings can have. Add a check to make sure the * requested protection does not exceed the maximum protection. + * - MAP_GROWSDOWN mappings have no fixed bounds and hence are not + * supported in the PCuABI reservation model. */ if (reserv_is_supported(current->mm)) { if ((PROT_MAX_EXTRACT(prot) != 0) && ((PROT_EXTRACT(prot) & PROT_MAX_EXTRACT(prot)) != PROT_EXTRACT(prot))) goto out_fput; + if (flags & MAP_GROWSDOWN) { + retval = -EOPNOTSUPP; + goto out_fput; + } } retval = check_pcuabi_map_ptr_arg(user_ptr, len, flags & MAP_FIXED, false); if (retval)
Now that the stack has an appropriate reservation, we can use its bounds to set those of CSP and AT_CHERI_STACK_CAP.
Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- fs/binfmt_elf.c | 28 ++++++++-------------------- 1 file changed, 8 insertions(+), 20 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index c7490ece875a..1439d1064197 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -200,16 +200,13 @@ struct elf_load_info { };
#if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) -static void __user *make_user_sp(unsigned long p) +static void __user *make_user_sp(struct linux_binprm *bprm, unsigned long p) { - /* - * TODO [PCuABI] - derive a capability with bounds matching the stack - * reservation. - */ - void __user *sp = uaddr_to_user_ptr_safe(p); + void __user *sp = (void __user *)reserv_vma_make_user_ptr_owning(bprm->vma);
sp = cheri_perms_and(sp, CHERI_PERM_GLOBAL | CHERI_PERMS_READ | CHERI_PERMS_WRITE); + sp = cheri_address_set(sp, p);
return sp; } @@ -301,18 +298,9 @@ static void __user *make_elf_rx_cap(const struct elf_load_info *load_info) len, perms); }
-static void __user *make_root_stack_cap(void) +static void __user *make_root_stack_cap(struct linux_binprm *bprm) { - /* - * TODO [PCuABI] - derive a capability with bounds matching the stack - * reservation. - */ - void __user *cap = uaddr_to_user_ptr_safe(0); - - cap = cheri_perms_and(cap, CHERI_PERMS_ROOTCAP | - CHERI_PERMS_READ | CHERI_PERMS_WRITE); - - return cap; + return (void __user *)reserv_vma_make_user_ptr_owning(bprm->vma); }
static void set_bprm_stack_caps(struct linux_binprm *bprm, void __user *sp, @@ -343,7 +331,7 @@ static void set_bprm_stack_caps(struct linux_binprm *bprm, void __user *sp, bprm->pcuabi.auxv = cheri_bounds_set(p, len); } #else /* CONFIG_CHERI_PURECAP_UABI && ELF_COMPAT == 0 */ -static void __user *make_user_sp(unsigned long p) +static void __user *make_user_sp(struct linux_binprm *bprm, unsigned long p) { return uaddr_to_user_ptr_safe(p); } @@ -401,7 +389,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, */
p = arch_align_stack(p); - sp = make_user_sp(p); + sp = make_user_sp(bprm, p);
/* * If this architecture has a platform capability string, copy it @@ -506,7 +494,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, bprm->pcuabi.pcc = make_user_pcc(exec_load_info); }
- NEW_AUX_ENT(AT_CHERI_STACK_CAP, make_root_stack_cap()); + NEW_AUX_ENT(AT_CHERI_STACK_CAP, make_root_stack_cap(bprm)); NEW_AUX_ENT(AT_CHERI_SEAL_CAP, cheri_user_root_seal_cap); NEW_AUX_ENT(AT_CHERI_CID_CAP, cheri_user_root_cid_cap);
From: Amit Daniel Kachhap amitdaniel.kachhap@arm.com
Save the capability corresponding to the vDSO reservation in mm_context_t so that we can directly use it in ARCH_DLINFO, restricting the bounds and permissions of AT_SYSINFO_EHDR. We explicitly remove the VMem permission as per the PCuABI specification (we do not provide an owning capability for the vDSO).
Signed-off-by: Amit Daniel Kachhap amitdaniel.kachhap@arm.com Co-developed-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky kevin.brodsky@arm.com --- arch/arm64/include/asm/elf.h | 9 ++------- arch/arm64/include/asm/mmu.h | 2 +- arch/arm64/kernel/vdso.c | 20 +++++++++++++++++--- 3 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 7d253fc3961c..be91c27ba24b 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -174,12 +174,7 @@ extern int aarch64_setup_additional_pages(struct linux_binprm *bprm, extern int purecap_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); #define arch_setup_additional_pages purecap_setup_additional_pages -/* - * TODO [PCuABI]: Look into restricting the bounds of this capability to just - * the vDSO pages, as currently the bounds are of the root user capability. - */ -#define ARCH_DLINFO SETUP_DLINFO(uaddr_to_user_ptr_safe( \ - (elf_addr_t)current->mm->context.vdso)) +#define ARCH_DLINFO SETUP_DLINFO(current->mm->context.vdso) #else /* !CONFIG_CHERI_PURECAP_UABI */ #define arch_setup_additional_pages aarch64_setup_additional_pages #define ARCH_DLINFO SETUP_DLINFO((elf_addr_t)current->mm->context.vdso) @@ -219,7 +214,7 @@ typedef compat_elf_greg_t compat_elf_gregset_t[COMPAT_ELF_NGREG]; SET_PERSONALITY_AARCH64(); \ })
-#define COMPAT_ARCH_DLINFO SETUP_DLINFO((elf_addr_t)current->mm->context.vdso) +#define COMPAT_ARCH_DLINFO SETUP_DLINFO(current->mm->context.vdso)
#define compat_arch_setup_additional_pages aarch64_setup_additional_pages
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 12a7889f0a37..5b3e3e8c0fe1 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -23,7 +23,7 @@ typedef struct { void *sigpage; #endif refcount_t pinned; - void *vdso; + user_uintptr_t vdso; unsigned long flags; } mm_context_t;
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index f9059577581f..9cda023bda85 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -84,10 +84,23 @@ static union { } vdso_data_store __page_aligned_data; struct vdso_data *vdso_data = vdso_data_store.data;
+static user_uintptr_t make_vdso_ptr(struct vm_area_struct *vdso_text_vma) +{ + user_uintptr_t ret; + + ret = reserv_vma_make_user_ptr_owning(vdso_text_vma); +#ifdef CONFIG_CHERI_PURECAP_UABI + if (!is_compat_task()) + ret = cheri_perms_clear(ret, CHERI_PERM_SW_VMEM); +#endif + + return ret; +} + static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma) { - current->mm->context.vdso = (void *)new_vma->vm_start; + current->mm->context.vdso = make_vdso_ptr(new_vma);
return 0; } @@ -233,7 +246,6 @@ static int __setup_additional_pages(enum vdso_abi abi, gp_flags = VM_ARM64_BTI;
vdso_text_base = vdso_base + VVAR_NR_PAGES * PAGE_SIZE; - mm->context.vdso = (void *)vdso_text_base; ret = _install_special_mapping(mm, vdso_text_base, vdso_text_len, VM_READ|VM_EXEC|gp_flags| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, @@ -245,10 +257,12 @@ static int __setup_additional_pages(enum vdso_abi abi, PROT_READ | PROT_EXEC)) goto up_fail;
+ mm->context.vdso = make_vdso_ptr(ret); + return 0;
up_fail: - mm->context.vdso = NULL; + mm->context.vdso = 0; return PTR_ERR(ret); }
On 24/04/2024 16:51, Kevin Brodsky wrote:
Hi,
This series adds reservation management and modifies the behaviour of address space management syscalls as per the PCuABI specification [1]. It also restricts the bounds and permissions of those initial capabilities that my previous series [2] couldn't take care of.
The series is largely based on Amit's v3 series [3] plus this follow-up [4] (squashed in the corresponding patch), with various additions and tweaks. The most important (user-facing) changes are the following:
Owning capabilities are now always created based on the corresponding reservation's bounds and permissions, ensuring there is no mismatch (and simplifying mmap/mremap a little).
Capability/reservation permissions are now calculated based on VM_{READ,WRITE}_CAPS instead of PROT_SHARED/VM_SHARED. This fixes the io_uring case, where we do allow capabilities to be stored in a shared mapping.
A stack reservation is created, its size is controlled by a new cheri.max_stack_size sysctl as per the spec. The initial stack capabilities (CSP and AT_CHERI_STACK_CAP) are narrowed accordingly.
PCuABI restrictions are added to shmdt() too.
PROT_CAP_INVOKE is handled (adding BranchSealedPair).
mmap/mremap/shmat now ensure that no existing reservation is overwritten if a null-derived pointer is passed with MAP_FIXED (if the new reservation would overlap with any existing one, -ERESERVATION is returned).
The reservation lookup helper has been fixed to ensure that a reservation is found even if it starts before the targeted range.
Here is a rough breakdown of the patches:
- Patch 1: fixup for kselfests.
- Patch 2-8: infrastructure, uapi additions
- Patch 9-14: reservation management
- Patch 15-22: capability handling in address space management syscalls
- Patch 23-33: capability permissions handling
- Patch 34: extra restriction for mmap()
- Patch 35-36: restriction of initial capabilities
Having made the appropriate fixes to LTP and Musl, the usual LTP and Musl tests are passing, as well as the Morello kselftests with Chaitanya's extra tests [5].
Special thanks to Amit for his original work as well as his detailed review of this updated series, and to Chaitanya for writing those extra kselftests, which proved very useful to catch mistakes early.
Review branch:
https://git.morello-project.org/kbrodsky-arm/linux/-/tree/morello/reservatio...
Thanks, Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca... [2] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [3] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [4] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [5] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
Amit Daniel Kachhap (25): uapi: errno.h: Introduce PCuABI memory reservation error linux/sched/coredump.h: Add MMF_PCUABI_RESERV mm flag linux/user_ptr.h: Add a typedef user_ptr_perms_t linux/user_ptr.h: Add user_ptr_is_valid, user_ptr_set_addr linux/user_ptr.h: Add helpers to manage owning pointers mm/reserv: Add address space reservation API mm/mmap: Handle reservations in get_unmapped_area mm/(mmap,mremap): Handle PCuABI reservations during VMA operations fs/binfmt_elf: Create appropriate reservations in PCuABI mm/mmap: Add PCuABI capability handling in mmap/munmap mm/mremap: Add PCuABI capability handling in mremap mm/mprotect: Add PCuABI capability handling in mprotect mm/madvise: Add PCuABI capability handling in madvise mm/mlock: Add PCuABI capability handling in mlock{,2} and munlock mm/msync: Add PCuABI capability handling in msync mm/mincore: Add PCuABI capability constraints ipc/shm: Add PCuABI capability handling in shmat/shmdt uapi: mman-common.h: Macros for maximum capability permissions arm64: user_ptr: Implement Morello capability permission helper linux/user_ptr.h: Infer capability permissions from prot/vm_flags in PCuABI mm/mmap: Add capability permission constraints for PCuABI mm/mremap: Add capability permission constraints for PCuABI mm/mprotect: Add capability permissions constraints for PCuABI mm/mmap: Disable MAP_GROWSDOWN mapping flag for PCuABI arm64: vdso: Create appropriate capability
Kevin Brodsky (11): kselftests/arm64: morello: Fix expected permissions with MAP_SHARED linux/mm_types.h: Introduce reserv_struct fs/exec: Create a stack reservation in PCuABI arm64: vdso: Create appropriate reservation fs/binfmt_elf: Enable reservations fs/binfmt_elf: Set appropriate permissions for initial reservations arm64: morello: Ensure appropriate permissions for initial reservations uapi: mm: Introduce PROT_CAP_INVOKE arm64: user_ptr: Handle PROT_CAP_INVOKE fs/binfmt_elf: Create mappings with PROT_CAP_INVOKE fs/binfmt_elf: Restrict stack capability bounds
Applied on next, with a small fix in patch 16 (one case of ret = addr wasn't amended). Thanks Amit for the extra reviewing and testing!
Kevin
Documentation/core-api/user_ptr.rst | 28 ++ arch/Kconfig | 3 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/elf.h | 9 +- arch/arm64/include/asm/mmu.h | 2 +- arch/arm64/include/asm/user_ptr.h | 37 +++ arch/arm64/kernel/morello.c | 16 + arch/arm64/kernel/vdso.c | 37 ++- fs/binfmt_elf.c | 63 ++-- fs/exec.c | 59 ++++ include/linux/mm.h | 15 +- include/linux/mm_reserv.h | 302 +++++++++++++++++++ include/linux/mm_types.h | 9 + include/linux/sched/coredump.h | 2 + include/linux/shm.h | 4 +- include/linux/user_ptr.h | 114 ++++++- include/uapi/asm-generic/errno.h | 2 + include/uapi/asm-generic/mman-common.h | 8 + io_uring/advise.c | 3 +- ipc/shm.c | 44 ++- kernel/fork.c | 3 + lib/user_ptr.c | 73 +++++ mm/Makefile | 1 + mm/damon/vaddr.c | 2 +- mm/internal.h | 2 +- mm/madvise.c | 26 +- mm/mincore.c | 54 +++- mm/mlock.c | 36 ++- mm/mmap.c | 182 +++++++++-- mm/mprotect.c | 25 +- mm/mremap.c | 96 ++++-- mm/msync.c | 12 +- mm/reserv.c | 181 +++++++++++ mm/util.c | 9 +- tools/testing/selftests/arm64/morello/mmap.c | 2 +- 35 files changed, 1312 insertions(+), 150 deletions(-) create mode 100644 arch/arm64/include/asm/user_ptr.h create mode 100644 include/linux/mm_reserv.h create mode 100644 mm/reserv.c
linux-morello@op-lists.linaro.org