On 24/01/2024 10:36, Amit Daniel Kachhap wrote:
Hi,
On 1/22/24 18:26, Kevin Brodsky wrote:
On 19/01/2024 10:45, Amit Daniel Kachhap wrote:
-unsigned long vm_mmap(struct file *file, unsigned long addr, +/* TODO [PCuABI] - Update the users of vm_mmap */ +user_uintptr_t vm_mmap(struct file *file, user_uintptr_t usrptr,
Provided that in-kernel users do not manipulate mappings with capabilities, do we need to change the prototypes of vm_* at all?
As discussed earlier, elf_map in binfmt_elf.c will use now MAP_FIXED to change mapping of individual elf segment so full capability has to be supplied. In case of vm_munmap() full capability may not be required but for consistency sake may be capability can be used there too.
My proposal in RFCv2 [1] is precisely not to use capabilities at all when calling vm_* directly from kernel subsystems, which is also why I am suggesting dropping patch 5 (no capability manipulation in get_unmapped_area). It seems to me that this makes things quite a bit more straightforward, but let's discuss if this approach is problematic in some ways.
ok I got your perspective now. I actually took a liberty and not used RFCv2 [1] proposal as it is but with a slight change. Let me explain.
Your suggestion:
- Map the first segment [unchanged]
- Call reserv_vma_insert_entry() on that new vma, setting the
reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
Here step 1 internally first maps full segment length and then munmaps (full segment length- segment length) to make sure sufficient memory mapping possible.
Apologies, I had completely misunderstood how (upstream) binfmt_elf maps the executable and interpreter. Since we do indeed start with a vm_mmap() encompassing *all the segments*, then a reservation with appropriate bounds gets created at this point and it just works.
Here step 3 was suggested to skip the reservation check for MAP_FIXED and vm_mmap() kernel call which is a deviation from syscall mmap.
My changes in step 1, 2 and 3 are,: a) skip munmap of (full segment length- segment length) for PCuABI and let the full length mapping exist. Here for mmap, maximum protmax permission is used and hence the reservation bound covers all the segment.
b) Step 2 not needed.
c) Map the rest of the individual segment with full capability and MAP_FIXED flag.(Same as syscall mmap)
I feel this approach is simpler. Let me know your concern.
This would be a convenient solution if we could rely on segments being contiguous, unfortunately we cannot. There can be (unmapped) gaps between segments. We could potentially detect these gaps and unmap them in a loop, but it's not particularly elegant.
AFAICT, in this series (patch 8), the "no-reuse" check is done by calling reserv_vmi_range_mapped() in ksys_mmap_pgoff(). As a result, there is no such check when calling vm_mmap() (which is indeed what I suggested). So it seems to me that keeping the way binmft_elf maps segments unchanged (map the whole range, unmap the range beyond the first segment, then mmap(MAP_FIXED) each remaining segment) should work just fine.
With that said, my original comment in this thread was about whether vm_mmap() and friends manipulate capabilities or not. To me this topic is orthogonal to how we handle the initial reservations in binfmt_elf. The one potential issue is that without a capability, we don't know if we are meant to operate on an existing reservation or not in the MAP_FIXED case. I don't think this is especially problematic though - we can simply choose based on whether the specified range is actually within a reservation or not. All capability checks are done at the syscall handler level, which is what patch 8 already does. I may be missing something though, let me know.
Kevin