Hi kevin, On 8/16/23 19:18, Kevin Brodsky wrote:
On 10/08/2023 10:03, Amit Daniel Kachhap wrote:
@@ -1396,11 +1417,35 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; } +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (is_reservation) {
/*
* Check if there is any overlap with the existing reservation.
* This may help in filtering out any reservation error before
* the actual memory mapping.
*/
if (reserv_mt_range_valid(&mm->reserv_mt, addr, len))
This is fundamentally incompatible with the way reservations are defined. If a new reservation is to be created (because addr is null-derived), then it is the kernel's job to find an appropriate range for that reservation. *This cannot fail*, unless we have actually run out of address space.
Concretely, this means that get_unmapped_area() needs to provide us with an address at which we can create the reservation (in other words, where the range can be represented as capability bounds). Unfortunately that may require some invasive changes...
Yes I agree with your suggestion that get_unmapped_area() should not fail.
return -ERESERVATION;
- }
+#endif addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) *populate = len; +#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!IS_ERR_VALUE(addr)) {
if (is_reservation) {
ret = reserv_mt_insert_entry(&mm->reserv_mt, addr, len, prot);
if (ret)
return ret;
user_addr = build_owning_capability(addr, len, prot);
The range that is associated with a reservation must be representable as capability bounds. That's precisely because we set the bounds of the returned capability exactly to that range. At the moment there doesn't seem to be any handling of representability, which requires both that the base is sufficiently aligned (as per cheri_representable_alignment_mask()) and that the length is representable (as per cheri_representable_length()).
I am considering cheri_representable_length() while comparing reservation length in function reserv_mt_capability_bound_valid(). However the maple entry stores them as simple length. I think there was requirement of getting the actual length as the reverse of cheri_representable_length() is not possible.
There was some issue in using cheri_representable_alignment_mask() as it was masking lot of bits. I will share more details on this as I didn't analyzed the root cause earlier.
} else {
user_addr = cheri_address_set(user_addr, addr);
}
We need to be careful to leave compat64 unchanged. Most of these changes will got unnoticed in compat64, but really it does not make sense to manipulate reservations at all in that case (and changes in further patches introduce undesirable changes in semantics). We should probably try to replace all these direct additions with hooks, which would do nothing in compat64. Having all the hooks for all the mm syscalls in the same file feels quite attractive, and would drastically reduce the need for #ifdef'ing in the core mm codebase. The reservation management probably needs to be separated from the capability handling too (see next comment) - I suspect we could get away with having the reservation interface be a no-op in !PCuABI to avoid #ifdefs (without using hooks as such in that case).
I will try to consider above suggestion in the next version.
return user_addr;
- }
+#endif return addr; } @@ -2510,11 +2555,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
- Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
*/ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
unsigned long start, size_t len, struct list_head *uf,
bool downgrade, bool reserve_ignore) { unsigned long end; struct vm_area_struct *vma;user_uintptr_t user_start, size_t len, struct list_head *uf,
- int ret;
- unsigned long start = (ptraddr_t)user_start;
if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) return -EINVAL; @@ -2531,7 +2578,21 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (!vma) return 0;
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
+#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!reserve_ignore) {
Reservations should never be bypassed entirely. For our model to work, every single mapping must be contained within a reservation (any mapping outside of a reservation could be overwritten using MAP_FIXED, which is something we want to prevent in PCuABI). Regardless of the caller (kernel or user), mmap() must create a new reservation if needed, and munmap() destroy it when removing the last mapping inside a reservation.
do_vmi_munmap() called from munmap() syscall doesn't ignore the reservation. But do_vmi_munmap() indirectly called from mmap() or mremap() syscall sometimes need to ignore reservation as the reservation handling is already done there. May be some better code restructuring will make it more explicit.
It seems that these conditionals are conflating reservation management and capability handling. As mentioned the former is always required. I do see the issue with the latter though: the kernel is not manipulating mappings through capabilities, and we cannot realistically change this in a hybrid kernel. Still, I find using an extra argument to skip the capability handling rather unpleasant. I can think of two ways to avoid it:
- Move all the capability handling outside of the common function (used
by both the kernel and user). That would be ideal if possible.
- Get the kernel to use some wrapper that creates an appropriate
capability to pass to the handler. This could make sense if that particular function is rarely called by the kernel, but it is otherwise a bit dubious as the capability manipulation is unnecessary.
I will try to consider above suggestion in the next version.
if (!capability_owns_range(user_start, start, len))
return -EINVAL;
if (!reserv_mt_capability_bound_valid(&mm->reserv_mt, user_start))
return -ERESERVATION;
- }
+#endif
- ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
+#ifdef CONFIG_CHERI_PURECAP_UABI
- if (!reserve_ignore && ret >= 0)
reserv_mt_delete_range(&mm->reserv_mt, start, len);
Reservations are immutable - they are never expanded or shrunk, only destroyed. The complete set of rules can be found in the "Reservations" section of the PCuABI spec [1].
In this RFC version, reservation layer stores each mappings along with their root reservation and also does VMA type split operation here if required. Hopefully I can simplify these things and merge the reservation interface inside the VMA.
Amit
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/wikis/Morello-pure-ca...