With this patch capabilities provided to userspace to access argv and envp are properly bounded with the permissions defined in the PCuABI spec.
This change is split between two files, as they handle two parts of the process. This cover letter tries to explain the process surrounding the argument strings, hopefully to facilitate reviewing the patch but also to check my understanding, and the logic implemented by the patch. Hopefully I can find some time to refine it after reviews, but in case I cannot I hope the details will be useful.
# General patch comments
One thing that might be an issue is that I don't see how to update fs/exec.c:bprm_stack_limits to handle this new process short of reproducing it almost entirely inside the function.
There are warnings because elf_stack_put_user_cap expects uintcap_t and cheri_build_user_cap builds a capability, but I didn't really know how to handle that. The function itself might be entirely unecessary.
checkpatch complains about elf_stack_put_user_cap, but it follows the style of the other functions, and from missing blank lines which appear to be there, so I'm not too sure what to do about it.
# Giving argument strings to userspace
Calling a new executable is done through execve, where userspace passes the argv and envp pointer arrays to the kernel. In fs/exec.c:do_execveat_common, the kernel creates a struct linux_binprm which is used throughout the process of loading a new binary. Among other things, it keeps track of the stack for the new executable, both the memory space and its current top, which it allocates here.
The argument are copied to this new stack in fs/exec.c:copy_strings. It does so from the top of the stack, starting with the last envp element and ending with the first of argv: opposite to how it will be read later. The function allocates pages as it goes, thus splitting the string in at most page-sized chunks, and handles offsets where the page was already partially used or the remainder of the string will not fill it.
This only puts the /strings/ themselves on the stack. Later, in fs/binfmt_elf.c:create_elf_tables, the kernel puts the auxv, argv and envp arrays themselves – thus the pointers to the strings put on the stack earlier – on the stack. It does so in the order expected by userspace: from bottom to top of the strings, so from the first of argv to the last of envp.
# Making it CHERI
We thus have a potential issue: the moment where the strings themselves are allocated is different from the place where the capabilities are created, with limited ways to pass information in between.
However, one piece of information is available in both cases: the length of the null-terminated string. This way, we can get the same representable length from the length of the string, which remains unchanged, while allocating the string *and* when creating the capability. Assuming we can properly align the strings in copy_strings, we don't need more information to properly create a capability with exact bounds in create_elf_tables.
Thankfully, we can re-use most of the machinery in copy_strings to do so. If the representable length of a string differs from its real length, we need to get the mask needed for aligning it for exact bounds. Given this new length, we can compute the position where it would end up without alignment and ALIGN_DOWN() as we are going downards the stack. This gives the position where the string *must* start for exact capability bounds. Thus, all the padding will be after the string. However as we are going backwards, we need to put the padding first and the string second.
The code already goes through the string chunk by chunk, handling smaller chunks at page and argument boundaries. We can re-use this by first going through the length of padding, and once this is done go through the string as normal. This guarantees the string will start at the properly aligned address with the appropriate amount of padding behind the previous allocations.
We do need to handle this padding and to detect when strings might need proper handling for exact bounds. As there isn't padding normally, we can simply check if there is some. If there is, get the representable length of the string and use it for the exact capability bounds. As all the padding is after the strings, if there was padding needed for alignment the next argument will start in the padding. This is slow, but just go through the zeroes until we find the real start of the argument and go from there.
This is not really properly done in this revision, it is a remnant of the first draft and as such will improperly flag arguments as needing adjustment after one that needed. This is known but should be simply fixable by moving this looping through the zeroes after allocating the capability.
In fs/binfmt_elf.c:create_elf_tables argv and envp are handled exactly the same but in different loops. I removed the comments from the envp section for conciseness.
# Testing
To generate a big enough argument, you can use this dirty bash script: bigarg=""; count=0; while [ $(echo $bigarg | wc -c) -le 20000 ]; do bigarg=${bigarg}"==${count}==This_should_be_a_really_big_arg_string" count=$(($count + 1)) done It is a bit unnecessary and slow but it does allow easily checking that it starts and ends in the proper places.
Without the patch, the kernel log should show a CPU fault if $bigarg is passed as an argument or in the environment. With the patch, it should work fine.
Hopefully this is useful for reviewing, and correct in the first place !
Review branch: https://git.morello-project.org/Teo-CD/linux/-/tree/review-arg-str-resticted... Commit itself: https://git.morello-project.org/Teo-CD/linux/-/commit/a4d9fe880d098e9f9fe1eb...
Thanks very much in advance ! Téo
Teo Couprie Diaz (1): fs: Handle exact bounds for argv and envp
fs/binfmt_elf.c | 111 ++++++++++++++++++++++++++++++++++++++++++-- fs/exec.c | 66 ++++++++++++++++++++++++++ include/linux/elf.h | 4 ++ 3 files changed, 178 insertions(+), 3 deletions(-)