Hi,
On 11/9/23 14:44, Kevin Brodsky wrote:
On 03/11/2023 07:56, Amit Daniel Kachhap wrote:
- The reservation management remains next to the mapping management,
like in these patches. It is done unconditionally in PCuABI.
The way binfmt_elf calls vm_mmap() is for each elf segment so the approach I am thinking is to defer the reservation till all the elf segments are mapped. In the end, call the api reserv_range_insert_entry() with the range including the complete elf segments. This api will also return a capability with the bound same as reservation range.
Right I see that binfmt_elf is a special case, as the reservation we need to create is larger than the first vm_mmap() call (for the first segment). Ideally we wouldn't need to add a different path in vm_mmap() just for this situation though. Intuitively the reservation should be created before any mapping it contains in this case, but clearly we cannot do this when representing reservations as vma metadata. Having considered this further, couldn't we simply enlarge the reservation created by the first vm_mmap()? That is:
- Map the first segment [unchanged]
- Call reserv_vma_insert_entry() on that new vma, setting the
reservation bounds to cover all the segments 3. Map the following segments with MAP_FIXED, within that reservation [unchanged] 4. Set AT_CHERI_*_{RW,RX}_CAP based on the bounds of the executable/interpreter reservation
At step 2, there is an assumption that there is sufficient space to create that reservation. However I believe this is essentially the same assumption that exists today, as the kernel has to be able to map all the remaining segments at a fixed offset from the first one.
The calls to vm_mmap() at step 3 almost observe the standard reservation rules. Almost, because they are creating mappings in unmapped ranges within the reservation, which is normally forbidden. I suspect this rule isn't really necessary for in-kernel users anyway (it's meant to prevent userspace from unmapping and remapping the same range, instead of doing that atomically by overwriting the range with MAP_FIXED), so we could avoid this problem by having this check in the syscall handler too, not down in vm_mmap().
I believe that with this approach, there is no need to special-case vm_mmap() for the binfmt_elf usage, but I may well be missing something.
Your suggestion makes sense. How about the below steps?.
1) Call vm_mmap with total segment size and maximum permission. Reservation will be created for full size. 2). call vm_mmap with MAP_FIXED address and each segment size.
The disadvantage of this approach is that there will be extra vma's for each of the holes.
The above approach is also doable but does not look clean as per spec.
- No new vma->vm_flags. However, we could reduce the #ifdef'ing + checks
for compat by introducing a new mm->flags. Reservation management is a property of the address space, so it would be sensible to introduce a per-mm flag.
Any specific reason to avoid vm_flags? (Less free bits). vm_flags makes the vma operation simple to implement. Although vm_flags is still duplicate as the vma using reservation can be determined if it has the reservation fields initialized.
All mappings should have the reservation fields set in PCuABI, with no exception. That's why I don't think it makes sense to have a new vm_flags - whether reservations are used or not is a per-process property, not a per-vma property. AFAICT mm->flags is directly accessible in most functions, in the worst case current->mm can be used.
ok agreed.
//Amit
Kevin