Hello all,
Here is the v3 of the patch adding capability support for argv and envp strings.
This has mostly refinement and fixes compared to v2.
This time around the one thing I am unsure of is the *_put_user* part in fs/binfmt_elf.c:put_str_ptr(). I tried using the same code for both paths in PC-uABI, but there was a mismatch in expected pointer/capabilities, so I resorted to keeping a section in #if (ELF_COMPAT == 0) . I must be missing something, so open to suggestions.
Happy to take suggestions for any names as well, especially with a new function and a change in behavior in another.
Thanks in advance for your comments, Best regards Téo
# Changes from v2-v3[0] - Commit message clarified - Fixed some typing and use issues - Updated the stack alignment to use a define depending on the build type - Changed the put_str_array function to only put one string pointer, the looping over all strings is restored to be handled in create_elf_tables. - Moved the padding calculations to a separate function that returns the updated value of bprm->p rather than the padding. - Reduced the amount of code not shared between purecap/compat - Removed nonsensical check before alignment computation
# Changes from v1-v2[1]
- Rebased on top of the last release - Update to make use of the properly derived stack capability in binfmt_elf - Move the copying of argv and envp strings in binfmt_elf to a helper function - Check for padding after an arg only - Greatly simplify the change in exec by completely skipping the padding, rather than looping through it and allocating pages in exec - Add more details to the comments explaining the padding process - Rename most variables. I'm not great with names so hopefully they are OK, otherwise feel free to suggest new ones ! - Proper COMPAT handling, but with a slight loss compared to a regular kernel - Detail trade-off in comments and in commit message - Force a greater stack alignment in exec to mitigate issues during relocation - Simplify accesses and use `get_user()` rather than `copy_from_user()` - Use wrappers provided by <cheriintrin.h> rather than builtins - Get rid of the elf_stack_put_user_cap macro
Gitlab patch for review : https://git.morello-project.org/Teo-CD/linux/-/commit/b92f2f777a327b9e49fea8...
[0]: https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [1]: https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... Teo Couprie Diaz (1): fs: Handle exact bounds for argv and envp
fs/binfmt_elf.c | 101 +++++++++++++++++++++++++++++++++++++++++------- fs/exec.c | 58 +++++++++++++++++++++++++-- 2 files changed, 142 insertions(+), 17 deletions(-)
argv and envp strings can be large enough so that a capability cannot be created to access them exclusively: their bounds would not be exactly representable and go outside of the strings.
During allocation on the stack, align the strings that need an adjusted base and pad if needed for alignment or a representable length. Handle this potential padding in fs/binfmt_elf by detecting unexpected 0s.
This is done irrespective of the calling binary or loaded binary ABIs. Indeed, at the point where strings are put on the stack during `execve` we can only know the ABI of the calling binary, but not that of the loaded binary. Thus the compat path will be slightly slower and the memory used by the strings will be larger, as if to create capabilities with exact bounds. However, this memory impact should stay minimal and in rare cases where argv or envp strings are more than a couple thousand characters.
Add a a helper function to put already created capabilities on the stack during elf setup.
Additionally, align the newly created stack to a multiple of the maximum argument length to make sure the relocated arguments will still be respresentable with their new alignment in a PC-uABI kernel.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com --- fs/binfmt_elf.c | 101 +++++++++++++++++++++++++++++++++++++++++------- fs/exec.c | 58 +++++++++++++++++++++++++-- 2 files changed, 142 insertions(+), 17 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6455313d7f36..198c1ed442a6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -151,6 +151,84 @@ static int padzero(unsigned long elf_bss) } return 0; } +/* + * Put a pointer or capability to a string on the stack, return the effective + * size of the string on the stack or an error. + * When building a PC-uABI kernel, this will take into account padding and + * create an exactly representable capability if in purecap. + */ +static long put_str_ptr(char __user *ustr, + elf_stack_item_t __user *stack_item) +{ + size_t len; +#if defined(CONFIG_CHERI_PURECAP_UABI) + char c; + void __user *str_ptr; + len = strnlen_user(ustr, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) + return -EINVAL; + + /* + * Check if the next arg string starts right after the current one. + * If not, the len was not exactly representable and there is padding. + */ + if (get_user(c, ustr + len)) + return -EFAULT; + if (c == '\0') + /* + * If the string has been adjusted for exact bounds representation, + * its start should already be properly aligned. Get the representable + * length by using the same length that was used during allocation: + * the length of the original string. + * This takes into account the padding due to length change after + * the string, but not that for alignment. Thus we might not end up + * at the start of the next arg. If not, we will need to take a slow + * path to go through the padding. + */ + len = cheri_representable_length(len); + +#if (ELF_COMPAT == 0) + str_ptr = cheri_perms_and(ustr, + (CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD)); + str_ptr = cheri_bounds_set_exact(str_ptr, len); + if (put_user_ptr(str_ptr, stack_item++)) + return -EFAULT; +#else + if (elf_stack_put_user(ustr, stack_item++)) + return -EFAULT; +#endif + + /* + * If right after the end of the argument length we have a zero, + * that means the argument alignment was adjusted in order to create a + * representable capability in purecap, even if we are not loading a + * purecap binary. This padding is added at the end, so find the real + * end by going through the padding. + */ + if (get_user(c, ustr + len)) + return -EFAULT; + if (c == '\0') { + size_t pad_len = 0; + + while (pad_len < MAX_ARG_STRLEN && c == '\0') { + pad_len++; + if (get_user(c, ustr + len + pad_len)) + return -EFAULT; + } + ustr += pad_len; + len += pad_len; + } +#else /* CONFIG_CHERI_PURECAP_UABI */ + len = strnlen_user(ustr, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) + return -EINVAL; + + if (elf_stack_put_user(ustr, stack_item++)) + return -EFAULT; +#endif /* CONFIG_CHERI_PURECAP_UABI */ + + return len; +}
/* Let's use some macros to make this stack manipulation a little clearer */ #ifdef CONFIG_STACK_GROWSUP @@ -213,6 +291,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) elf_stack_item_t *mm_at_argv, *mm_at_envp; #endif + long str_ret = 0;
/* * In some cases (e.g. Hyper-Threading), we want to avoid L1 @@ -397,13 +476,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, /* In PCuABI, this derives a capability from SP pointing to arg_start */ ustr = sp + (mm->arg_start - user_ptr_addr(sp)); while (argc-- > 0) { - size_t len; - if (elf_stack_put_user(ustr, stack_item++)) - return -EFAULT; - len = strnlen_user(ustr, MAX_ARG_STRLEN); - if (!len || len > MAX_ARG_STRLEN) - return -EINVAL; - ustr += len; + str_ret = put_str_ptr(ustr, stack_item++); + if (str_ret < 0) + return str_ret; + ustr += str_ret; } if (elf_stack_put_user(0, stack_item++)) return -EFAULT; @@ -415,13 +491,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #endif mm->env_start = user_ptr_addr(ustr); while (envc-- > 0) { - size_t len; - if (elf_stack_put_user(ustr, stack_item++)) - return -EFAULT; - len = strnlen_user(ustr, MAX_ARG_STRLEN); - if (!len || len > MAX_ARG_STRLEN) - return -EINVAL; - ustr += len; + str_ret = put_str_ptr(ustr, stack_item++); + if (str_ret < 0) + return str_ret; + ustr += str_ret; } if (elf_stack_put_user(0, stack_item++)) return -EFAULT; diff --git a/fs/exec.c b/fs/exec.c index 892cd911549d..7bc423a6fc9a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -66,6 +66,7 @@ #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> +#include <linux/cheri.h>
#include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -76,6 +77,12 @@
#include <trace/events/sched.h>
+#ifdef CONFIG_CHERI_PURECAP_UABI +#define STACK_ALIGN_SIZE MAX_ARG_STRLEN +#else +#define STACK_ALIGN_SIZE PAGE_SIZE +#endif + static int bprm_creds_from_file(struct linux_binprm *bprm);
int suid_dumpable = 0; @@ -515,6 +522,39 @@ static int bprm_stack_limits(struct linux_binprm *bprm) return 0; }
+#if defined(CONFIG_CHERI_PURECAP_UABI) +/* + * Compute the a new value for p that allows for creating an exactly + * representable capability with all its padding after. + */ +size_t align_p_for_exact_bounds(unsigned long p, size_t len) +{ + size_t repr_len = 0, repr_align; + + repr_len = cheri_representable_length(len); + repr_align = ~cheri_representable_alignment_mask(len) + 1; + if (repr_len > len || repr_align > 0) { + /* + * We want the capability base to be aligned. As we work from the + * last to the first argument, we can place the start of the string + * to be aligned for representability. + * All padding then goes after it, both for the representable + * length and for the alignment, to fill the space we left before + * the previous argument we put on the stack. + * + * This is slightly different than the usual way representable + * capabilities are created: usually we cannot change the address + * pointed by the capability, so there can be padding between the + * capability base and the capability value. + * Here we can decide where to place the string, so we build a + * capability with equal base and value, with all padding after. + */ + p = ALIGN_DOWN(p - repr_len, repr_align) + len; + } + return p; +} +#endif /* CONFIG_CHERI_PURECAP_UABI */ + /* * 'copy_strings()' copies argument/environment strings from the old * processes's memory to the new process's stack. The call to get_user_pages() @@ -530,7 +570,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
while (argc-- > 0) { const char __user *str; - int len; + size_t len; unsigned long pos;
ret = -EFAULT; @@ -542,6 +582,18 @@ static int copy_strings(int argc, struct user_arg_ptr argv, if (!len) goto out;
+#if defined(CONFIG_CHERI_PURECAP_UABI) + /* + * Handle all strings as if they were for purecap binaries as we only + * know the calling process compat status, which might be different + * from the binary to be exec'ed. + * + * Start directly at the string end, with all the padding already + * taken into account. + */ + bprm->p = align_p_for_exact_bounds(bprm->p, len); +#endif + ret = -E2BIG; if (!valid_arg_len(bprm, len)) goto out; @@ -775,14 +827,14 @@ int setup_arg_pages(struct linux_binprm *bprm, if (vma->vm_end - vma->vm_start > stack_base) return -ENOMEM;
- stack_base = PAGE_ALIGN(stack_top - stack_base); + stack_base = ALIGN(stack_top - stack_base, STACK_ALIGN_SIZE);
stack_shift = vma->vm_start - stack_base; mm->arg_start = bprm->p - stack_shift; bprm->p = vma->vm_end - stack_shift; #else stack_top = arch_align_stack(stack_top); - stack_top = PAGE_ALIGN(stack_top); + stack_top = ALIGN(stack_top, STACK_ALIGN_SIZE);
if (unlikely(stack_top < mmap_min_addr) || unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr))
On 25/10/2023 17:57, Teo Couprie Diaz wrote:
argv and envp strings can be large enough so that a capability cannot be created to access them exclusively: their bounds would not be exactly representable and go outside of the strings.
During allocation on the stack, align the strings that need an adjusted base and pad if needed for alignment or a representable length. Handle this potential padding in fs/binfmt_elf by detecting unexpected 0s.
This is done irrespective of the calling binary or loaded binary ABIs. Indeed, at the point where strings are put on the stack during `execve` we can only know the ABI of the calling binary, but not that of the loaded binary. Thus the compat path will be slightly slower and the memory used by the strings will be larger, as if to create capabilities with exact bounds. However, this memory impact should stay minimal and in rare cases where argv or envp strings are more than a couple thousand characters.
Add a a helper function to put already created capabilities on the stack during elf setup.
Additionally, align the newly created stack to a multiple of the maximum argument length to make sure the relocated arguments will still be respresentable with their new alignment in a PC-uABI kernel.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
fs/binfmt_elf.c | 101 +++++++++++++++++++++++++++++++++++++++++------- fs/exec.c | 58 +++++++++++++++++++++++++-- 2 files changed, 142 insertions(+), 17 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6455313d7f36..198c1ed442a6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -213,6 +291,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) elf_stack_item_t *mm_at_argv, *mm_at_envp; #endif
- long str_ret = 0;
Just noticed that since the while loops have been reintroduced, this variable can be made local to them.
Will take into account if v4 is needed (probably !)
/* * In some cases (e.g. Hyper-Threading), we want to avoid L1 @@ -397,13 +476,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, /* In PCuABI, this derives a capability from SP pointing to arg_start */ ustr = sp + (mm->arg_start - user_ptr_addr(sp)); while (argc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
str_ret = put_str_ptr(ustr, stack_item++);
if (str_ret < 0)
return str_ret;
} if (elf_stack_put_user(0, stack_item++)) return -EFAULT;ustr += str_ret;
@@ -415,13 +491,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #endif mm->env_start = user_ptr_addr(ustr); while (envc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
str_ret = put_str_ptr(ustr, stack_item++);
if (str_ret < 0)
return str_ret;
} if (elf_stack_put_user(0, stack_item++)) return -EFAULT;ustr += str_ret;
On 25/10/2023 18:57, Teo Couprie Diaz wrote:
argv and envp strings can be large enough so that a capability cannot be created to access them exclusively: their bounds would not be exactly representable and go outside of the strings.
During allocation on the stack, align the strings that need an adjusted base and pad if needed for alignment or a representable length. Handle this potential padding in fs/binfmt_elf by detecting unexpected 0s.
"unexpected" is a bit... unexpected for something we actually expect? :D Maybe "extra".
This is done irrespective of the calling binary or loaded binary ABIs. Indeed, at the point where strings are put on the stack during `execve` we can only know the ABI of the calling binary, but not that of the loaded binary. Thus the compat path will be slightly slower and the memory used by the strings will be larger, as if to create capabilities with exact bounds. However, this memory impact should stay minimal and in rare cases where argv or envp strings are more than a couple thousand characters.
Same comment as in v2.
Add a a helper function to put already created capabilities on the stack
s/a a/a/
during elf setup.
Additionally, align the newly created stack to a multiple of the maximum argument length to make sure the relocated arguments will still be respresentable with their new alignment in a PC-uABI kernel.
s/respresentable/representable/
Also I would say "in their new location", because the point is that we preserve a sufficient alignment.
A big improvement over v2 overall, well done!
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
fs/binfmt_elf.c | 101 +++++++++++++++++++++++++++++++++++++++++------- fs/exec.c | 58 +++++++++++++++++++++++++-- 2 files changed, 142 insertions(+), 17 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6455313d7f36..198c1ed442a6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -151,6 +151,84 @@ static int padzero(unsigned long elf_bss) } return 0; } +/*
- Put a pointer or capability to a string on the stack, return the effective
- size of the string on the stack or an error.
- When building a PC-uABI kernel, this will take into account padding and
"PCuABI" without dash (just to make grepping and such more reliable).
- create an exactly representable capability if in purecap.
- */
+static long put_str_ptr(char __user *ustr,
elf_stack_item_t __user *stack_item)
+{
- size_t len;
+#if defined(CONFIG_CHERI_PURECAP_UABI)
- char c;
- void __user *str_ptr;
Nit: char __user *
- len = strnlen_user(ustr, MAX_ARG_STRLEN);
- if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
/*
* Check if the next arg string starts right after the current one.
* If not, the len was not exactly representable and there is padding.
*/
That comment looks misindented.
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0')
/*
* If the string has been adjusted for exact bounds representation,
* its start should already be properly aligned. Get the representable
* length by using the same length that was used during allocation:
* the length of the original string.
* This takes into account the padding due to length change after
* the string, but not that for alignment. Thus we might not end up
* at the start of the next arg. If not, we will need to take a slow
* path to go through the padding.
*/
len = cheri_representable_length(len);
I'm realising: do we even need to do this conditionally? The representable length should always be equal to the length if there is no padding. Executing RRLEN unconditionally is definitely faster than get_user() (and of course that's simpler too).
+#if (ELF_COMPAT == 0)
- str_ptr = cheri_perms_and(ustr,
(CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD));
- str_ptr = cheri_bounds_set_exact(str_ptr, len);
- if (put_user_ptr(str_ptr, stack_item++))
return -EFAULT;
+#else
- if (elf_stack_put_user(ustr, stack_item++))
Nit: to reduce duplication some more and make it even clearer what's happening, this line could be common to the two paths, simply setting str_ptr to ustr in compat.
return -EFAULT;
+#endif
- /*
* If right after the end of the argument length we have a zero,
* that means the argument alignment was adjusted in order to create a
* representable capability in purecap, even if we are not loading a
* purecap binary. This padding is added at the end, so find the real
* end by going through the padding.
*/
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0') {
size_t pad_len = 0;
while (pad_len < MAX_ARG_STRLEN && c == '\0') {
pad_len++;
I think there's an off-by-one, since we increment pad_len, then call get_user(), and then only after that check that pad_len < MAX_ARG_STRLEN. I was actually going to suggest using a for loop instead, which makes such issues less likely.
Also, I'm not sure we need to have that initial get_user() + if. We can directly have a loop, which may exit right away (need to be careful about limit conditions of course).
if (get_user(c, ustr + len + pad_len))
return -EFAULT;
}
ustr += pad_len;
len += pad_len;
- }
+#else /* CONFIG_CHERI_PURECAP_UABI */
- len = strnlen_user(ustr, MAX_ARG_STRLEN);
- if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
- if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
+#endif /* CONFIG_CHERI_PURECAP_UABI */
- return len;
+} /* Let's use some macros to make this stack manipulation a little clearer */ #ifdef CONFIG_STACK_GROWSUP @@ -213,6 +291,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) elf_stack_item_t *mm_at_argv, *mm_at_envp; #endif
- long str_ret = 0;
/* * In some cases (e.g. Hyper-Threading), we want to avoid L1 @@ -397,13 +476,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, /* In PCuABI, this derives a capability from SP pointing to arg_start */ ustr = sp + (mm->arg_start - user_ptr_addr(sp)); while (argc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
str_ret = put_str_ptr(ustr, stack_item++);
if (str_ret < 0)
return str_ret;
} if (elf_stack_put_user(0, stack_item++)) return -EFAULT;ustr += str_ret;
@@ -415,13 +491,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #endif mm->env_start = user_ptr_addr(ustr); while (envc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
str_ret = put_str_ptr(ustr, stack_item++);
if (str_ret < 0)
return str_ret;
} if (elf_stack_put_user(0, stack_item++)) return -EFAULT;ustr += str_ret;
diff --git a/fs/exec.c b/fs/exec.c index 892cd911549d..7bc423a6fc9a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -66,6 +66,7 @@ #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -76,6 +77,12 @@ #include <trace/events/sched.h> +#ifdef CONFIG_CHERI_PURECAP_UABI +#define STACK_ALIGN_SIZE MAX_ARG_STRLEN +#else +#define STACK_ALIGN_SIZE PAGE_SIZE +#endif
static int bprm_creds_from_file(struct linux_binprm *bprm); int suid_dumpable = 0; @@ -515,6 +522,39 @@ static int bprm_stack_limits(struct linux_binprm *bprm) return 0; } +#if defined(CONFIG_CHERI_PURECAP_UABI) +/*
- Compute the a new value for p that allows for creating an exactly
- representable capability with all its padding after.
- */
+size_t align_p_for_exact_bounds(unsigned long p, size_t len)
I would say s/exact/representable/ (it is indeed in order to be able to set exact capability bounds, but representability is the root cause).
+{
- size_t repr_len = 0, repr_align;
- repr_len = cheri_representable_length(len);
- repr_align = ~cheri_representable_alignment_mask(len) + 1;
- if (repr_len > len || repr_align > 0) {
Do we actually need the conditions? It feels like the calculation should work in any case. That certainly appears to be true for the length, I would expect repr_align to compute to 1 if there is no requirement but worth checking. If it is the case, removing the conditions is not only more readable but also probably faster, as the calculation is straightforward arithmetic (better than adding a branch even if it is a no-op).
/*
* We want the capability base to be aligned. As we work from the
* last to the first argument, we can place the start of the string
* to be aligned for representability.
* All padding then goes after it, both for the representable
* length and for the alignment, to fill the space we left before
* the previous argument we put on the stack.
*
* This is slightly different than the usual way representable
* capabilities are created: usually we cannot change the address
* pointed by the capability, so there can be padding between the
* capability base and the capability value.
* Here we can decide where to place the string, so we build a
* capability with equal base and value, with all padding after.
*/
p = ALIGN_DOWN(p - repr_len, repr_align) + len;
- }
- return p;
+} +#endif /* CONFIG_CHERI_PURECAP_UABI */
/*
- 'copy_strings()' copies argument/environment strings from the old
- processes's memory to the new process's stack. The call to get_user_pages()
@@ -530,7 +570,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv, while (argc-- > 0) { const char __user *str;
int len;
unsigned long pos;size_t len;
ret = -EFAULT; @@ -542,6 +582,18 @@ static int copy_strings(int argc, struct user_arg_ptr argv, if (!len) goto out; +#if defined(CONFIG_CHERI_PURECAP_UABI)
The nice thing with having moved everything to a function is that we can easily have a #else there, returning p unconditionally, so that we can avoid this #ifdef.
/*
* Handle all strings as if they were for purecap binaries as we only
* know the calling process compat status, which might be different
* from the binary to be exec'ed.
*
* Start directly at the string end, with all the padding already
* taken into account.
"taken into account" as in skipped? I'm not sure how to interpret this sentence.
Kevin
*/
bprm->p = align_p_for_exact_bounds(bprm->p, len);
+#endif
- ret = -E2BIG; if (!valid_arg_len(bprm, len)) goto out;
@@ -775,14 +827,14 @@ int setup_arg_pages(struct linux_binprm *bprm, if (vma->vm_end - vma->vm_start > stack_base) return -ENOMEM;
- stack_base = PAGE_ALIGN(stack_top - stack_base);
- stack_base = ALIGN(stack_top - stack_base, STACK_ALIGN_SIZE);
stack_shift = vma->vm_start - stack_base; mm->arg_start = bprm->p - stack_shift; bprm->p = vma->vm_end - stack_shift; #else stack_top = arch_align_stack(stack_top);
- stack_top = PAGE_ALIGN(stack_top);
- stack_top = ALIGN(stack_top, STACK_ALIGN_SIZE);
if (unlikely(stack_top < mmap_min_addr) || unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr))
On 30/10/2023 15:42, Kevin Brodsky wrote:
On 25/10/2023 18:57, Teo Couprie Diaz wrote:
argv and envp strings can be large enough so that a capability cannot be created to access them exclusively: their bounds would not be exactly representable and go outside of the strings.
During allocation on the stack, align the strings that need an adjusted base and pad if needed for alignment or a representable length. Handle this potential padding in fs/binfmt_elf by detecting unexpected 0s.
"unexpected" is a bit... unexpected for something we actually expect? :D Maybe "extra".
This is done irrespective of the calling binary or loaded binary ABIs. Indeed, at the point where strings are put on the stack during `execve` we can only know the ABI of the calling binary, but not that of the loaded binary. Thus the compat path will be slightly slower and the memory used by the strings will be larger, as if to create capabilities with exact bounds. However, this memory impact should stay minimal and in rare cases where argv or envp strings are more than a couple thousand characters.
Same comment as in v2.
Add a a helper function to put already created capabilities on the stack
s/a a/a/
during elf setup.
Additionally, align the newly created stack to a multiple of the maximum argument length to make sure the relocated arguments will still be respresentable with their new alignment in a PC-uABI kernel.
s/respresentable/representable/
Also I would say "in their new location", because the point is that we preserve a sufficient alignment.
Agree with all comments, sorry for forgetting that one from v2. Fixed !
A big improvement over v2 overall, well done!
Thanks for the comments ! v4 is mostly ready, just need a bit of input regarding one of the comments below.
Signed-off-by: Teo Couprie Diazteo.coupriediaz@arm.com
fs/binfmt_elf.c | 101 +++++++++++++++++++++++++++++++++++++++++------- fs/exec.c | 58 +++++++++++++++++++++++++-- 2 files changed, 142 insertions(+), 17 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6455313d7f36..198c1ed442a6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -151,6 +151,84 @@ static int padzero(unsigned long elf_bss) } return 0; } +/*
- Put a pointer or capability to a string on the stack, return the effective
- size of the string on the stack or an error.
- When building a PC-uABI kernel, this will take into account padding and
"PCuABI" without dash (just to make grepping and such more reliable).
I wasn't sure, updated wherever I put a dash !
- create an exactly representable capability if in purecap.
- */
+static long put_str_ptr(char __user *ustr,
elf_stack_item_t __user *stack_item)
+{
- size_t len;
+#if defined(CONFIG_CHERI_PURECAP_UABI)
- char c;
- void __user *str_ptr;
Nit: char __user *
Done
- len = strnlen_user(ustr, MAX_ARG_STRLEN);
- if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
/*
* Check if the next arg string starts right after the current one.
* If not, the len was not exactly representable and there is padding.
*/
That comment looks misindented.
It was indeed, but removed as per below.
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0')
/*
* If the string has been adjusted for exact bounds representation,
* its start should already be properly aligned. Get the representable
* length by using the same length that was used during allocation:
* the length of the original string.
* This takes into account the padding due to length change after
* the string, but not that for alignment. Thus we might not end up
* at the start of the next arg. If not, we will need to take a slow
* path to go through the padding.
*/
len = cheri_representable_length(len);
I'm realising: do we even need to do this conditionally? The representable length should always be equal to the length if there is no padding. Executing RRLEN unconditionally is definitely faster than get_user() (and of course that's simpler too).
No I guess you are entirely right.
I think I wanted to not waste an instruction, but I obviously didn't think about it enough because of the get_user taking even more time.
Doing it unconditionally now.
+#if (ELF_COMPAT == 0)
- str_ptr = cheri_perms_and(ustr,
(CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD));
- str_ptr = cheri_bounds_set_exact(str_ptr, len);
- if (put_user_ptr(str_ptr, stack_item++))
return -EFAULT;
+#else
- if (elf_stack_put_user(ustr, stack_item++))
Nit: to reduce duplication some more and make it even clearer what's happening, this line could be common to the two paths, simply setting str_ptr to ustr in compat.
I wasn't sure when preparing v3 as I got a compile time assertion when using what I felt was the most generic way to put the pointer on the stack, with put_user_ptr.
It seems to build when using elf_stack_put_user instead and gets rid of the warning introduced. Testing it looks to be just fine as well, so I just didn't understand those two properly :)
Changed to have elf_stack_put_user outside the #ifdef.
return -EFAULT;
+#endif
- /*
* If right after the end of the argument length we have a zero,
* that means the argument alignment was adjusted in order to create a
* representable capability in purecap, even if we are not loading a
* purecap binary. This padding is added at the end, so find the real
* end by going through the padding.
*/
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0') {
size_t pad_len = 0;
while (pad_len < MAX_ARG_STRLEN && c == '\0') {
pad_len++;
I think there's an off-by-one, since we increment pad_len, then call get_user(), and then only after that check that pad_len < MAX_ARG_STRLEN. I was actually going to suggest using a for loop instead, which makes such issues less likely.
I'd argue that in this case there's even an off-by len+1 ! I don't think it makes sense to do use pad_len by itself here.
Otherwise agree, there is a mistake.
Also, I'm not sure we need to have that initial get_user() + if. We can directly have a loop, which may exit right away (need to be careful about limit conditions of course).
However I am not sure of the shape you would want for the for loop, specifically if it were to remove the if check and get_user ?
There's two conditions there, is the padding too large and is the character null.
Would you be happy with the following ?
for (pad_len = 0; len + pad_len < MAX_ARG_STRLEN ; pad_len += 1)
With the get_user and a break condtion on != '\0' inside the loop body ?
But that doesn't exit right away, it goes through an iteration (which makes sense to be fair).
So not sure of what you'd expect :)
diff --git a/fs/exec.c b/fs/exec.c index 892cd911549d..7bc423a6fc9a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -66,6 +66,7 @@ #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -76,6 +77,12 @@ #include <trace/events/sched.h> +#ifdef CONFIG_CHERI_PURECAP_UABI +#define STACK_ALIGN_SIZE MAX_ARG_STRLEN +#else +#define STACK_ALIGN_SIZE PAGE_SIZE +#endif
- static int bprm_creds_from_file(struct linux_binprm *bprm);
int suid_dumpable = 0; @@ -515,6 +522,39 @@ static int bprm_stack_limits(struct linux_binprm *bprm) return 0; } +#if defined(CONFIG_CHERI_PURECAP_UABI) +/*
- Compute the a new value for p that allows for creating an exactly
- representable capability with all its padding after.
- */
+size_t align_p_for_exact_bounds(unsigned long p, size_t len)
I would say s/exact/representable/ (it is indeed in order to be able to set exact capability bounds, but representability is the root cause).
Fair, as mentioned I wasn't sure so happy to change it !
+{
- size_t repr_len = 0, repr_align;
- repr_len = cheri_representable_length(len);
- repr_align = ~cheri_representable_alignment_mask(len) + 1;
- if (repr_len > len || repr_align > 0) {
Do we actually need the conditions? It feels like the calculation should work in any case. That certainly appears to be true for the length, I would expect repr_align to compute to 1 if there is no requirement but worth checking. If it is the case, removing the conditions is not only more readable but also probably faster, as the calculation is straightforward arithmetic (better than adding a branch even if it is a no-op).
I am pretty sure it does, as I did check before adding the repr_align > 0 check.
It certainly makes it much more clear, removing the check.
@@ -542,6 +582,18 @@ static int copy_strings(int argc, struct user_arg_ptr argv, if (!len) goto out; +#if defined(CONFIG_CHERI_PURECAP_UABI)
The nice thing with having moved everything to a function is that we can easily have a #else there, returning p unconditionally, so that we can avoid this #ifdef.
That's fair, maybe not even an #else by moving it inside the function.
/*
* Handle all strings as if they were for purecap binaries as we only
* know the calling process compat status, which might be different
* from the binary to be exec'ed.
*
* Start directly at the string end, with all the padding already
* taken into account.
"taken into account" as in skipped? I'm not sure how to interpret this sentence.
That was the idea yes, changed to "skipped" as it's clearer.
Kevin
*/
bprm->p = align_p_for_exact_bounds(bprm->p, len);
+#endif
- ret = -E2BIG; if (!valid_arg_len(bprm, len)) goto out;
@@ -775,14 +827,14 @@ int setup_arg_pages(struct linux_binprm *bprm, if (vma->vm_end - vma->vm_start > stack_base) return -ENOMEM;
- stack_base = PAGE_ALIGN(stack_top - stack_base);
- stack_base = ALIGN(stack_top - stack_base, STACK_ALIGN_SIZE);
stack_shift = vma->vm_start - stack_base; mm->arg_start = bprm->p - stack_shift; bprm->p = vma->vm_end - stack_shift; #else stack_top = arch_align_stack(stack_top);
- stack_top = PAGE_ALIGN(stack_top);
- stack_top = ALIGN(stack_top, STACK_ALIGN_SIZE);
if (unlikely(stack_top < mmap_min_addr) || unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr))
On 09/11/2023 18:30, Teo Couprie Diaz wrote:
+#if (ELF_COMPAT == 0)
- str_ptr = cheri_perms_and(ustr,
(CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD));
- str_ptr = cheri_bounds_set_exact(str_ptr, len);
- if (put_user_ptr(str_ptr, stack_item++))
return -EFAULT;
+#else
- if (elf_stack_put_user(ustr, stack_item++))
Nit: to reduce duplication some more and make it even clearer what's happening, this line could be common to the two paths, simply setting str_ptr to ustr in compat.
I wasn't sure when preparing v3 as I got a compile time assertion when using what I felt was the most generic way to put the pointer on the stack, with put_user_ptr.
It seems to build when using elf_stack_put_user instead and gets rid of the warning introduced. Testing it looks to be just fine as well, so I just didn't understand those two properly :)
I recently simplified elf_stack_put_user() [1], so it should be a little easier to follow what it does. Overall, it should do what you want, that is storing some value in a pointer-sized "stack slot", in both native and compat. In native, it is equivalent to put_user_ptr(), though with a cast so that the first argument can also be some arbitrary integer. In compat, it is an elf_addr_t-sized put_user(), and the casts allow a (native) user pointer to be passed (in PCuABI / compat64 this means that the address will be stored). Using put_user_ptr() directly is not valid in compat, as compat stack slots are smaller than a native user pointer.
[1] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
Changed to have elf_stack_put_user outside the #ifdef.
return -EFAULT;
+#endif
- /*
* If right after the end of the argument length we have a zero,
* that means the argument alignment was adjusted in order to create a
* representable capability in purecap, even if we are not loading a
* purecap binary. This padding is added at the end, so find the real
* end by going through the padding.
*/
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0') {
size_t pad_len = 0;
while (pad_len < MAX_ARG_STRLEN && c == '\0') {
pad_len++;
I think there's an off-by-one, since we increment pad_len, then call get_user(), and then only after that check that pad_len < MAX_ARG_STRLEN. I was actually going to suggest using a for loop instead, which makes such issues less likely.
I'd argue that in this case there's even an off-by len+1 ! I don't think it makes sense to do use pad_len by itself here.
Otherwise agree, there is a mistake.
Also, I'm not sure we need to have that initial get_user() + if. We can directly have a loop, which may exit right away (need to be careful about limit conditions of course).
However I am not sure of the shape you would want for the for loop, specifically if it were to remove the if check and get_user ?
There's two conditions there, is the padding too large and is the character null.
Would you be happy with the following ?
for (pad_len = 0; len + pad_len < MAX_ARG_STRLEN ; pad_len += 1)
With the get_user and a break condtion on != '\0' inside the loop body ?
But that doesn't exit right away, it goes through an iteration (which makes sense to be fair).
So not sure of what you'd expect :)
I suppose more or less what you suggest! It's pretty standard to have such a for loop (the last part can simply be pad_len++), and then a conditional break inside. My point about exiting right away was probably misleading - I meant that the first get_user() could break out of the loop, but of course that first iteration will always be run.
Kevin
On 13/11/2023 10:31, Kevin Brodsky wrote:
On 09/11/2023 18:30, Teo Couprie Diaz wrote:
+#if (ELF_COMPAT == 0)
- str_ptr = cheri_perms_and(ustr,
(CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD));
- str_ptr = cheri_bounds_set_exact(str_ptr, len);
- if (put_user_ptr(str_ptr, stack_item++))
return -EFAULT;
+#else
- if (elf_stack_put_user(ustr, stack_item++))
Nit: to reduce duplication some more and make it even clearer what's happening, this line could be common to the two paths, simply setting str_ptr to ustr in compat.
I wasn't sure when preparing v3 as I got a compile time assertion when using what I felt was the most generic way to put the pointer on the stack, with put_user_ptr.
It seems to build when using elf_stack_put_user instead and gets rid of the warning introduced. Testing it looks to be just fine as well, so I just didn't understand those two properly :)
I recently simplified elf_stack_put_user() [1], so it should be a little easier to follow what it does. Overall, it should do what you want, that is storing some value in a pointer-sized "stack slot", in both native and compat. In native, it is equivalent to put_user_ptr(), though with a cast so that the first argument can also be some arbitrary integer. In compat, it is an elf_addr_t-sized put_user(), and the casts allow a (native) user pointer to be passed (in PCuABI / compat64 this means that the address will be stored). Using put_user_ptr() directly is not valid in compat, as compat stack slots are smaller than a native user pointer.
[1] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
Makes sense, thanks for going into details.
It does indeed do what I expect with elf_stack_put_user, so all good.
Changed to have elf_stack_put_user outside the #ifdef.
return -EFAULT;
+#endif
- /*
* If right after the end of the argument length we have a zero,
* that means the argument alignment was adjusted in order to create a
* representable capability in purecap, even if we are not loading a
* purecap binary. This padding is added at the end, so find the real
* end by going through the padding.
*/
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0') {
size_t pad_len = 0;
while (pad_len < MAX_ARG_STRLEN && c == '\0') {
pad_len++;
I think there's an off-by-one, since we increment pad_len, then call get_user(), and then only after that check that pad_len < MAX_ARG_STRLEN. I was actually going to suggest using a for loop instead, which makes such issues less likely.
I'd argue that in this case there's even an off-by len+1 ! I don't think it makes sense to do use pad_len by itself here.
Otherwise agree, there is a mistake.
Also, I'm not sure we need to have that initial get_user() + if. We can directly have a loop, which may exit right away (need to be careful about limit conditions of course).
However I am not sure of the shape you would want for the for loop, specifically if it were to remove the if check and get_user ?
There's two conditions there, is the padding too large and is the character null.
Would you be happy with the following ?
for (pad_len = 0; len + pad_len < MAX_ARG_STRLEN ; pad_len += 1)
With the get_user and a break condtion on != '\0' inside the loop body ?
But that doesn't exit right away, it goes through an iteration (which makes sense to be fair).
So not sure of what you'd expect :)
I suppose more or less what you suggest! It's pretty standard to have such a for loop (the last part can simply be pad_len++), and then a conditional break inside. My point about exiting right away was probably misleading - I meant that the first get_user() could break out of the loop, but of course that first iteration will always be run.
Ah right, ok, I tried to look too deep into it ! Will simply do that then.Thanks for the clarification, should send v4 soon.
Kevin
linux-morello@op-lists.linaro.org