On 30/10/2023 15:42, Kevin Brodsky wrote:
Agree with all comments, sorry for forgetting that one from v2. Fixed !On 25/10/2023 18:57, Teo Couprie Diaz wrote:argv and envp strings can be large enough so that a capability cannot be created to access them exclusively: their bounds would not be exactly representable and go outside of the strings. During allocation on the stack, align the strings that need an adjusted base and pad if needed for alignment or a representable length. Handle this potential padding in fs/binfmt_elf by detecting unexpected 0s."unexpected" is a bit... unexpected for something we actually expect? :D Maybe "extra".This is done irrespective of the calling binary or loaded binary ABIs. Indeed, at the point where strings are put on the stack during `execve` we can only know the ABI of the calling binary, but not that of the loaded binary. Thus the compat path will be slightly slower and the memory used by the strings will be larger, as if to create capabilities with exact bounds. However, this memory impact should stay minimal and in rare cases where argv or envp strings are more than a couple thousand characters.Same comment as in v2.Add a a helper function to put already created capabilities on the stacks/a a/a/during elf setup. Additionally, align the newly created stack to a multiple of the maximum argument length to make sure the relocated arguments will still be respresentable with their new alignment in a PC-uABI kernel.s/respresentable/representable/ Also I would say "in their new location", because the point is that we preserve a sufficient alignment.
Thanks for the comments ! v4 is mostly ready, just need a bit of inputA big improvement over v2 overall, well done!
I wasn't sure, updated wherever I put a dash !Signed-off-by: Teo Couprie Diaz <teo.coupriediaz@arm.com> --- fs/binfmt_elf.c | 101 +++++++++++++++++++++++++++++++++++++++++------- fs/exec.c | 58 +++++++++++++++++++++++++-- 2 files changed, 142 insertions(+), 17 deletions(-) diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6455313d7f36..198c1ed442a6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -151,6 +151,84 @@ static int padzero(unsigned long elf_bss) } return 0; } +/* + * Put a pointer or capability to a string on the stack, return the effective + * size of the string on the stack or an error. + * When building a PC-uABI kernel, this will take into account padding and"PCuABI" without dash (just to make grepping and such more reliable).
Done+ * create an exactly representable capability if in purecap. + */ +static long put_str_ptr(char __user *ustr, + elf_stack_item_t __user *stack_item) +{ + size_t len; +#if defined(CONFIG_CHERI_PURECAP_UABI) + char c; + void __user *str_ptr;Nit: char __user *
It was indeed, but removed as per below.+ len = strnlen_user(ustr, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) + return -EINVAL; + + /* + * Check if the next arg string starts right after the current one. + * If not, the len was not exactly representable and there is padding. + */That comment looks misindented.
+ if (get_user(c, ustr + len)) + return -EFAULT; + if (c == '\0') + /* + * If the string has been adjusted for exact bounds representation, + * its start should already be properly aligned. Get the representable + * length by using the same length that was used during allocation: + * the length of the original string. + * This takes into account the padding due to length change after + * the string, but not that for alignment. Thus we might not end up + * at the start of the next arg. If not, we will need to take a slow + * path to go through the padding. + */ + len = cheri_representable_length(len);I'm realising: do we even need to do this conditionally? The representable length should always be equal to the length if there is no padding. Executing RRLEN unconditionally is definitely faster than get_user() (and of course that's simpler too).
No I guess you are entirely right.
I think I wanted to not waste an instruction, but I obviously
didn't think about it
enough because of the get_user taking even more time.
Doing it unconditionally now.
I wasn't sure when preparing v3 as I got a compile time assertion when+ +#if (ELF_COMPAT == 0) + str_ptr = cheri_perms_and(ustr, + (CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD)); + str_ptr = cheri_bounds_set_exact(str_ptr, len); + if (put_user_ptr(str_ptr, stack_item++)) + return -EFAULT; +#else + if (elf_stack_put_user(ustr, stack_item++))Nit: to reduce duplication some more and make it even clearer what's happening, this line could be common to the two paths, simply setting str_ptr to ustr in compat.
Changed to have elf_stack_put_user outside the #ifdef.
I'd argue that in this case there's even an off-by len+1 ! I don't think+ return -EFAULT; +#endif + + /* + * If right after the end of the argument length we have a zero, + * that means the argument alignment was adjusted in order to create a + * representable capability in purecap, even if we are not loading a + * purecap binary. This padding is added at the end, so find the real + * end by going through the padding. + */ + if (get_user(c, ustr + len)) + return -EFAULT; + if (c == '\0') { + size_t pad_len = 0; + + while (pad_len < MAX_ARG_STRLEN && c == '\0') { + pad_len++;I think there's an off-by-one, since we increment pad_len, then call get_user(), and then only after that check that pad_len < MAX_ARG_STRLEN. I was actually going to suggest using a for loop instead, which makes such issues less likely.
Otherwise agree, there is a mistake.
Also, I'm not sure we need to have that initial get_user() + if. We can directly have a loop, which may exit right away (need to be careful about limit conditions of course).
However I am not sure of the shape you would want for the for
loop,
specifically if it were to remove the if check and get_user ?
There's two conditions there, is the padding too large and is the
character null.
Would you be happy with the following ?
for (pad_len = 0; len + pad_len < MAX_ARG_STRLEN ; pad_len
+= 1)
With the get_user and a break condtion on != '\0' inside the loop
body ?
But that doesn't exit right away, it goes through an iteration (which makes sense to be fair).
So not sure of what you'd expect :)
Fair, as mentioned I wasn't sure so happy to change it !diff --git a/fs/exec.c b/fs/exec.c index 892cd911549d..7bc423a6fc9a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -66,6 +66,7 @@ #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -76,6 +77,12 @@ #include <trace/events/sched.h> +#ifdef CONFIG_CHERI_PURECAP_UABI +#define STACK_ALIGN_SIZE MAX_ARG_STRLEN +#else +#define STACK_ALIGN_SIZE PAGE_SIZE +#endif + static int bprm_creds_from_file(struct linux_binprm *bprm); int suid_dumpable = 0; @@ -515,6 +522,39 @@ static int bprm_stack_limits(struct linux_binprm *bprm) return 0; } +#if defined(CONFIG_CHERI_PURECAP_UABI) +/* + * Compute the a new value for p that allows for creating an exactly + * representable capability with all its padding after. + */ +size_t align_p_for_exact_bounds(unsigned long p, size_t len)I would say s/exact/representable/ (it is indeed in order to be able to set exact capability bounds, but representability is the root cause).
+{ + size_t repr_len = 0, repr_align; + + repr_len = cheri_representable_length(len); + repr_align = ~cheri_representable_alignment_mask(len) + 1; + if (repr_len > len || repr_align > 0) {Do we actually need the conditions? It feels like the calculation should work in any case. That certainly appears to be true for the length, I would expect repr_align to compute to 1 if there is no requirement but worth checking. If it is the case, removing the conditions is not only more readable but also probably faster, as the calculation is straightforward arithmetic (better than adding a branch even if it is a no-op).
I am pretty sure it does, as I did check before adding the repr_align > 0 check.
It certainly makes it much more clear, removing the check.
That's fair, maybe not even an #else by moving it inside the function.@@ -542,6 +582,18 @@ static int copy_strings(int argc, struct user_arg_ptr argv, if (!len) goto out; +#if defined(CONFIG_CHERI_PURECAP_UABI)The nice thing with having moved everything to a function is that we can easily have a #else there, returning p unconditionally, so that we can avoid this #ifdef.
That was the idea yes, changed to "skipped" as it's clearer.+ /* + * Handle all strings as if they were for purecap binaries as we only + * know the calling process compat status, which might be different + * from the binary to be exec'ed. + * + * Start directly at the string end, with all the padding already + * taken into account."taken into account" as in skipped? I'm not sure how to interpret this sentence.
Kevin+ */ + bprm->p = align_p_for_exact_bounds(bprm->p, len); +#endif + ret = -E2BIG; if (!valid_arg_len(bprm, len)) goto out; @@ -775,14 +827,14 @@ int setup_arg_pages(struct linux_binprm *bprm, if (vma->vm_end - vma->vm_start > stack_base) return -ENOMEM; - stack_base = PAGE_ALIGN(stack_top - stack_base); + stack_base = ALIGN(stack_top - stack_base, STACK_ALIGN_SIZE); stack_shift = vma->vm_start - stack_base; mm->arg_start = bprm->p - stack_shift; bprm->p = vma->vm_end - stack_shift; #else stack_top = arch_align_stack(stack_top); - stack_top = PAGE_ALIGN(stack_top); + stack_top = ALIGN(stack_top, STACK_ALIGN_SIZE); if (unlikely(stack_top < mmap_min_addr) || unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr))