On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
+/**
- reserv_vma_insert_entry() - Adds the reservation details in the VMA for the
- virtual address range from start to (start + len) with perm permission as
- the entry. The start address is expected to be CHERI representable base
- and the length may expand to CHERI representable length without interfering
- with the successive vma. This function should be called with mmap_lock
- held.
- @vma: VMA pointer to insert the reservation entry.
- @start: Reservation start value.
- @len: Reservation length.
- @perm: Memory mapping permission for the reserved range.
- Return: 0 if reservation entry added successfully or -ERESERVATION
otherwise.
- */
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm);
The name of this function doesn't really correspond to what it does any more. Since we store the reservation information directly in the VMA, there is nothing to "insert". Maybe something like reserv_vma_set_reserv()? Same comment regarding reserv_range_insert_entry().
Also, as a general comment: for new code, it would be preferable to use ptraddr_t instead of unsigned long. Of course it's not always clear-cut, as we should try to be consistent with existing code too, but this being a whole new header I think using ptraddr_t throughout makes sense.
[...]
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 12e87f83287d..38bad6b201ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI
- unsigned long reserv_start;
- unsigned long reserv_len;
- unsigned long reserv_perm;
I've been wondering, are these fields automatically preserved when splitting or merging VMAs?
By the way, the overall approach to manage reservations makes a lot more sense to me now, I think it's going in the right direction.
+#endif } __randomize_layout; #ifdef CONFIG_SCHED_MM_CID diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..c2ca31fe5798 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \
pgtable-generic.o rmap.o vmalloc.o
pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c new file mode 100644 index 000000000000..fe318c92dc7a --- /dev/null +++ b/mm/cap_addr_mgmt.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0
+#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/mm.h> +#include <linux/slab.h>
+#ifdef CONFIG_CHERI_PURECAP_UABI
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm)
+{
- /* TODO [PCuABI] - capability permission conversion from memory permission */
- cheri_perms_t cheri_perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE |
CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP;
- if (is_compat_task() || !(vma->vm_flags & VM_PCUABI_RESERVE))
I've been pondering about VM_PCUABI_RESERVE. I think it's quite clear it can't be a permanent solution - in PCuABI, reservation management is required for all mappings, with no exception. I do however see the issue with vm_mmap() and friends. My feeling is that it is not worth trying to move to capability-based address space management in kernel subsystems, like binfmt_elf. The benefits are small and it seems that the hassle is quite high. On the other hand, reservation management can never be skipped. At the moment, it seems to me that the following approach would be the most practical:
* Moving all capability operations to the top-level syscall handler (or corresponding utility function, i.e. ksys_mmap_pgoff() for mmap). This way lower level helpers that can be called from the kernel too, such as vm_mmap(), keep operating purely on addresses and we don't have to worry about their callers. That should also mean fewer changes overall.
* The reservation management remains next to the mapping management, like in these patches. It is done unconditionally in PCuABI.
* No new vma->vm_flags. However, we could reduce the #ifdef'ing + checks for compat by introducing a new mm->flags. Reservation management is a property of the address space, so it would be sensible to introduce a per-mm flag.
Does that sound sensible?
[...]
+bool reserv_vmi_range_fully_mapped(struct vma_iterator *vmi, unsigned long start,
unsigned long len)
+{
- unsigned long align_start, align_len;
- struct vm_area_struct *vma;
- if (is_compat_task())
return true;
- mas_set_range(&vmi->mas, start, start + len);
- /* Try finding the given range */
- vma = mas_find(&vmi->mas, start + len);
- if (!vma || !(vma->vm_flags & VM_PCUABI_RESERVE))
return true;
Surely if no VMA is found (!vma) then the range is fully *unmapped* and the answer is false?
- align_start = round_down(vma->vm_start, max(PAGE_SIZE, CHERI_REPRESENTABLE_ALIGNMENT(len)));
- align_len = cheri_representable_length(round_up((vma->vm_end - vma->vm_start), PAGE_SIZE));
- /* Check if the range fully mapped */
- if (start != vma->vm_start || (start + len) != vma->vm_end ||
align_start != vma->reserv_start ||
align_len != vma->reserv_len)
This checks that the found VMA corresponds exactly to the specified range. What we actually want though is to check that the range is fully mapped, i.e. there exists *any number* of mappings spanning every single page.
Assuming this function is always used for enforcing rules in the spec such as:
If any part of AlignedRange(addr.address, length) is not currently
mapped, then the call fails with -ENOMEM.
Then reservation bounds are irrelevant - this is purely about mappings. Also note that AlignedRange() is about page alignment, not capability representation.
Kevin