On 12/10/2023 08:12, Kevin Brodsky wrote:
On 01/10/2023 20:05, Teo Couprie Diaz wrote:
argv and envp strings can be large enough for their bounds to not be exactly representable.
Let's elaborate on what we mean by "representable" here - the CHERI context is not necessarily obvious for the average reader.
During allocation on the stack, align the strings that need an adjusted base and pad if needed for alignment or exact bounds.
"alignment or exact bounds" sounds strange, because surely alignment is done precisely to create exact bounds?
Handle this eventual padding in fs/binfmt_elf by detecting unexpected 0s.
s/eventual/potential/
This is done whether or not the binary loaded is purecap, and whether or not the calling process is compat. Indeed, at the point where strings are put on the stack during `execve` we can only know the compat status of the process, but not if the binary will be in purecap.
It may not be obvious to the reader what "process" means, as opposed to "binary". Also to avoid talking about compat in one case and purecap in the other, we could generically say "the ABI".
Thus the compat path will be slightly slower and the memory used by the strings will be larger, as if to create capabilities with exact bounds. However, this memory impact should stay minimal and in rare cases where
"... should stay minimal and in rare cases..." doesn't sound grammatically correct, maybe split the sentence in two as these are two different points.
argv or envp strings are more than a couple thousand characters.
Add a a helper function to put already created capabilities on the stack during elf setup.
Additionally, when in purecap align the newly created stack to a multiple
In fact not only in purecap, also in compat64, for the same reason as above.
Agree with the comments on the commit message, will update.
of the maximum argument length to make sure the relocated arguments will still be respresentable with their new alignment.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
fs/binfmt_elf.c | 112 ++++++++++++++++++++++++++++++++++++++++-------- fs/exec.c | 50 +++++++++++++++++++-- 2 files changed, 140 insertions(+), 22 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6455313d7f36..9d4fe44c991b 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -152,6 +152,89 @@ static int padzero(unsigned long elf_bss) return 0; } +static size_t put_str_array(char __user *ustr,
elf_stack_item_t __user *stack_item,
int strc)
+{
- size_t array_len = 0;
array_len evokes the size of an array, i.e. its number of elements. IIUC this rather represents the total number of characters read in ustr, so maybe the naming could match that better (same for array_ret in create_elf_tables()).
Thinking some more, there isn't much point in that function working on the whole array, since it does not keep any state between iterations, aside from ustr and stack_item. If we made it operate on just one string (keeping the loop in the caller), then I think readability would improve overall.
Thinking about it a bit more, you are right and it would greatly simplify things. I think I didn't see it before because in v1 the function was much messier. Will pull the loop out of the functions, thanks.
- while (strc-- > 0) {
size_t len;
+#if defined(CONFIG_CHERI_PURECAP_UABI)
char c;
+#if (ELF_COMPAT == 0)
void __user *str_ptr;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
That part can be kept common to native and compat. That's also true for the final elf_stack_put_user() (we can set str_ptr to ustr in compat). This way we just have one block adding code in native, and no extra code in compat.
That makes sense indeed. So everything but the capability restriction part will be shared between compat and purecap.
/*
* Check if the next arg string starts right after the current one.
* If not, the len was not exactly representable and there is padding.
*/
if (get_user(c, ustr + len))
return -EFAULT;
if (c == '\0')
/*
* If the string has been adjusted for exact bounds representation,
* its start should already be properly aligned. Get the representable
* length by using the same length that was used during allocation:
* the length of the original string.
* This takes into account the padding due to length change after
* the string, but not that for alignment. Thus we might not end up
* at the start of the next arg. If not, we will need to take a slow
* path to go through the padding.
*/
len = cheri_representable_length(len);
Does it really make sense to skip this part in compat? The layout should be exactly the same, so if we don't do this, then we're going to spend more time in the get_user() loop below.
Oh, absolutely right, my mistake !
str_ptr = cheri_perms_and(ustr,
(CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD));
str_ptr = cheri_bounds_set_exact(str_ptr, len);
if (put_user_ptr(str_ptr, stack_item++))
return -EFAULT;
+#else /* ELF_COMPAT */
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
+#endif /* ELF_COMPAT */
/*
* If right after the end of the argument length we have a zero,
* that means the argument alignment was adjusted in order to create a
* representable capability in purecap, even if we are not loading a
* purecap binary. This padding is added at the end, so find the real
* end by going through the padding.
*/
if (get_user(c, ustr + len))
return -EFAULT;
if (c == '\0') {
size_t pad_len = 0;
while (pad_len < MAX_ARG_STRLEN && c == '\0') {
pad_len++;
if (get_user(c, ustr + len + pad_len))
return -EFAULT;
}
ustr += pad_len;
array_len += pad_len;
}
+#else /* CONFIG_CHERI_PURECAP_UABI */
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
+#endif /* CONFIG_CHERI_PURECAP_UABI */
ustr += len;
array_len += len;
- }
- return array_len;
+}
- /* Let's use some macros to make this stack manipulation a little clearer */ #ifdef CONFIG_STACK_GROWSUP #define STACK_ADD(sp, items) ((sp) + (items) * sizeof(elf_stack_item_t))
@@ -213,6 +296,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) elf_stack_item_t *mm_at_argv, *mm_at_envp; #endif
- size_t array_ret = 0;
/* * In some cases (e.g. Hyper-Threading), we want to avoid L1 @@ -396,15 +480,11 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #endif /* In PCuABI, this derives a capability from SP pointing to arg_start */ ustr = sp + (mm->arg_start - user_ptr_addr(sp));
- while (argc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
- }
- array_ret = put_str_array(ustr, stack_item, argc);
- if (array_ret < 0)
array_ret is unsigned so the condition is never true, I would expect the compiler to warn about this. You could use long instead of size_t for instance.
Damn, sorry for missing that, will replace with long then.
return array_ret;
- stack_item += argc;
- ustr += array_ret; if (elf_stack_put_user(0, stack_item++)) return -EFAULT; mm->arg_end = user_ptr_addr(ustr);
@@ -414,15 +494,11 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, *mm_at_envp = (elf_stack_item_t)stack_item; #endif mm->env_start = user_ptr_addr(ustr);
- while (envc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
- }
- array_ret = put_str_array(ustr, stack_item, envc);
- if (array_ret < 0)
return array_ret;
- stack_item += envc;
- ustr += array_ret; if (elf_stack_put_user(0, stack_item++)) return -EFAULT; mm->env_end = user_ptr_addr(ustr);
diff --git a/fs/exec.c b/fs/exec.c index 892cd911549d..a584eb24cf0c 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -66,6 +66,7 @@ #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -530,8 +531,11 @@ static int copy_strings(int argc, struct user_arg_ptr argv, while (argc-- > 0) { const char __user *str;
int len;
unsigned long pos;size_t len;
+#if defined(CONFIG_CHERI_PURECAP_UABI)
size_t tmp_len = 0, pad_len = 0;
+#endif ret = -EFAULT; str = get_user_arg_ptr(argv, argc); @@ -542,10 +546,43 @@ static int copy_strings(int argc, struct user_arg_ptr argv, if (!len) goto out; +#if defined(CONFIG_CHERI_PURECAP_UABI)
/*
* Handle all strings as if they were for purecap binaries as we only
* know the calling process compat status, which might be different
* from the binary to be exec'ed.
*/
tmp_len = cheri_representable_length(len);
Maybe repr_len?
Happy with that, will change
if (tmp_len > len) {
I'm not sure I understand this. Clearly len can be already representable, without bprm->p being sufficiently aligned (repr_align below)?
No you are right : looking at it now I don't know why I thought that way. Will pull the alignment out of the if and only have the padding update if there is a representation change of any sort.
size_t repr_align = ~cheri_representable_alignment_mask(len) + 1;
/*
* We want the capability base to be aligned. As we work from the
* last to the first argument, we can place the start of the string
* to be aligned for representability.
* All padding then goes after it, both for the representable
* length and for the alignment, to fill the space we left before
* the previous argument we put on the stack.
*
* This is slightly different than the usual way representable
* capabilities are created: usually we cannot change the address
* pointed by the capability, so there can be padding between the
* capability base and the capability value.
* Here we can decide where to place the string, so we build a
* capability with equal base and value, with all padding after.
*/
pad_len = bprm->p - ALIGN_DOWN(bprm->p - tmp_len, repr_align) - len;
}
+#endif
- ret = -E2BIG; if (!valid_arg_len(bprm, len)) goto out;
+#if defined(CONFIG_CHERI_PURECAP_UABI)
/* Offset all of the padding and start directly at the string end. */
bprm->p -= pad_len;
+#endif
I think everything could be done here, in one block. bprm->p can be directly adjusted, without an intermediate variable, which seems a lot simpler to understand. To improve the readability further, it seems quite easy to make it a separate function, as there are few inputs and just one output.
You're probably right, I like the idea of moving it out to a function. I will experiment with it !
- /* We're going to work our way backwards. */ pos = bprm->p; str += len;
@@ -774,16 +811,21 @@ int setup_arg_pages(struct linux_binprm *bprm, /* Make sure we didn't let the argument array grow too large. */ if (vma->vm_end - vma->vm_start > stack_base) return -ENOMEM;
+#ifdef CONFIG_CHERI_PURECAP_UABI
- stack_base = ALIGN(stack_top - stack_base, MAX_ARG_STRLEN);
To avoid some #ifdef'ing, we can use this line unconditionally and replace MAX_ARG_STRLEN with a macro (maybe defined at the top of this file).
I wasn't sure if that was better to do it that way, will definitely do that then.
Thanks for sending this v2, it's a big improvement! Just a few things left to iron out here and there.
Kevin
Thanks for the feedback, looking forward to this work being done ! :) Téo
+#else stack_base = PAGE_ALIGN(stack_top - stack_base);
+#endif stack_shift = vma->vm_start - stack_base; mm->arg_start = bprm->p - stack_shift; bprm->p = vma->vm_end - stack_shift; #else stack_top = arch_align_stack(stack_top); +#ifdef CONFIG_CHERI_PURECAP_UABI
- stack_top = ALIGN(stack_top, MAX_ARG_STRLEN);
+#else stack_top = PAGE_ALIGN(stack_top);
+#endif if (unlikely(stack_top < mmap_min_addr) || unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr)) return -ENOMEM;