On 23/03/2023 16:58, Luca Vizzarro wrote:
This commit adds support to the futex module to correctly load, store and handle capabilities.
I think this needs some clarifications. Functionally this patch should be a no-op for correct code in PCuABI. What it does is a number of related but separate things: capability-based uaccess for futex operations, appropriate handling of robust list entries, annotations for explicit checking, etc. I think splitting the patch would make it easier to explain what each part does in the commit message.
It would also be nice to mention that this is effectively reverting "arm64: futex: Access user memory via raw 64-bit pointers".
Signed-off-by: Luca Vizzarro Luca.Vizzarro@arm.com
This commit tackles the issue reported at: https://git.morello-project.org/morello/kernel/linux/-/issues/6 Commit also available at: https://git.morello-project.org/Sevenarth/linux/-/commits/morello/futex
The definitions added at arch/arm64/include/asm/futex.h are open to debate and not final. I assume that the mode switch can be important for many other parts of the project and not just futexes. I thought that this could go in cheri.h
This wouldn't be appropriate as A64/C64 is not a CHERI notion, it is completely specific to Morello. It could go to <asm/morello.h>, but I don't expect that we will need to switch to C64 anywhere else so I would keep these in <asm/futex.h> for now.
but I'm not that knowledgeable in these regards. Hence, I am seeking for feedback.
Best, Luca Vizzarro arch/arm64/include/asm/futex.h | 47 ++++++++++++++++++++++++---------- kernel/futex/core.c | 26 +++++++++++-------- kernel/futex/requeue.c | 4 +-- 3 files changed, 50 insertions(+), 27 deletions(-)
diff --git a/arch/arm64/include/asm/futex.h
b/arch/arm64/include/asm/futex.h
index 99d73e8a3175..db85ef63124a 100644 --- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -10,6 +10,18 @@ #include <asm/errno.h> +#ifdef CONFIG_CHERI_PURECAP_UABI +#define __SWITCH_TO_C64 " bx #4\n" \ + ".arch morello+c64\n" +#define __SWITCH_TO_A64 " bx #4\n" \
Nit: maybe prefix those with __ASM to make it clear these are assembly strings.
+ ".arch morello\n" +#define __ADDR_REGISTER "+C"
Unfortunately the kernel often says "addr" when it means pointer, like "uaddr" below. It would be too much code to change, but at least in the code we introduce we should try to use the correct terminology. It would also be good to make it clear this is a R/W constraint (because of the +). All together, maybe __ASM_RW_UPTR_CONSTR?
+#else +#define __SWITCH_TO_C64 +#define __SWITCH_TO_A64 +#define __ADDR_REGISTER "+r" +#endif
#define FUTEX_MAX_LOOPS 128 /* What's the largest number you can
think of? */
#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \ @@ -18,20 +30,24 @@ do { \ \ uaccess_enable_privileged(); \ asm volatile( \ -" prfm pstl1strm, %2\n" \ -"1: ldxr %w1, %2\n" \ + __SWITCH_TO_C64 \ +" prfm pstl1strm, [%2]\n" \ +"1: ldxr %w1, [%2]\n" \ insn "\n" \ -"2: stlxr %w0, %w3, %2\n" \ +"2: stlxr %w0, %w3, [%2]\n" \ " cbz %w0, 3f\n" \ " sub %w4, %w4, %w0\n" \ " cbnz %w4, 1b\n" \ " mov %w0, %w6\n" \ "3:\n" \ " dmb ish\n" \ + __SWITCH_TO_A64 \ _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0) \ _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0) \
Just a note in passing as it may not be obvious to other reviewers (and I did think there was an issue here at first): in case one of the instructions (LDXR / STLXR) faults, the fixup machinery (see ex_handler_uaccess_err_zero() in arch/arm64/mm/extable.c) will jump over the loop by moving PC (to label 3:). This works fine in A64, and it also works with this change because: - The faulting instruction and fixup target instruction are both supposed to be executed in C64, so no mode change is required during fixup. - The fixup address does not have its LSB set, which is exactly what we want, even in C64, as return from exception does not behave like a branch. The LSB must be unset, and A64/C64 is instead selected based on SPSR_ELx.C64, saved on exception entry.
Definitely worth testing that this works correctly though, by using futex() on a location that will fault (either because the memory cannot be accessed or because of a failed capability check). It wouldn't hurt to have some explanation in the commit message either (doesn't need to be very detailed).
- : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp), \ - "+r" (loops) \ + /* FIXME: temporary solution for uaddr. Must be reverted to +Q once + * LLVM supports it for capabilities. */ \
We're trying to use the same format of TODOs to make them easier to find (amongst all the pre-existing TODOs in the kernel...), so it's better if you use "TODO [PCuABI]". Also a small thing, I would use "should" instead of "must", because it should still be functionally correct with this change (though suboptimal).
+ : "=&r" (ret), "=&r" (oldval), __ADDR_REGISTER (uaddr), \ + "=&r" (tmp), "+r" (loops) \ : "r" (oparg), "Ir" (-EAGAIN) \ : "memory"); \ uaccess_disable_privileged(); \ @@ -41,8 +57,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user
*_uaddr)
{ int oldval = 0, ret, tmp; - /* TODO [PCuABI] - perform the access via the user capability */ - u32 *uaddr = (u32 *)user_ptr_addr(__uaccess_mask_ptr(_uaddr)); + u32 __user *uaddr = __uaccess_mask_ptr(_uaddr); if (!access_ok(_uaddr, sizeof(u32))) return -EFAULT; @@ -85,20 +100,20 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32
__user *_uaddr,
int ret = 0; unsigned int loops = FUTEX_MAX_LOOPS; u32 val, tmp; - u32 *uaddr; + u32 __user *uaddr; if (!access_ok(_uaddr, sizeof(u32))) return -EFAULT; - /* TODO [PCuABI] - perform the access via the user capability */ - uaddr = (u32 *)user_ptr_addr(__uaccess_mask_ptr(_uaddr)); + uaddr = __uaccess_mask_ptr(_uaddr); uaccess_enable_privileged(); asm volatile("// futex_atomic_cmpxchg_inatomic\n" -" prfm pstl1strm, %2\n" -"1: ldxr %w1, %2\n" + __SWITCH_TO_C64 +" prfm pstl1strm, [%2]\n" +"1: ldxr %w1, [%2]\n" " sub %w3, %w1, %w5\n" " cbnz %w3, 4f\n" -"2: stlxr %w3, %w6, %2\n" +"2: stlxr %w3, %w6, [%2]\n" " cbz %w3, 3f\n" " sub %w4, %w4, %w3\n" " cbnz %w4, 1b\n" @@ -106,9 +121,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32
__user *_uaddr,
"3:\n" " dmb ish\n" "4:\n" + __SWITCH_TO_A64 _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0) _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0) - : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops) + /* FIXME: temporary solution for uaddr. Must be reverted to +Q once + * LLVM supports it for capabilities. */ + : "+r" (ret), "=&r" (val), __ADDR_REGISTER (uaddr), "=&r" (tmp), + "+r" (loops) : "r" (oldval), "r" (newval), "Ir" (-EAGAIN) : "memory"); uaccess_disable_privileged(); diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 759332a26b5a..1234223f274e 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -31,6 +31,7 @@ * "The futexes are also cursed." * "But they come in a choice of three flavours!" */ +#include <linux/cheri.h> #include <linux/compat.h> #include <linux/jhash.h> #include <linux/pagemap.h> @@ -226,6 +227,8 @@ int get_futex_key(u32 __user *uaddr, bool fshared,
union futex_key *key,
struct address_space *mapping; int err, ro = 0; + /* TODO [PCuABI] - capability checks for uaccess */
/* * The futex address must be "naturally" aligned. */ @@ -411,6 +414,8 @@ int fault_in_user_writeable(u32 __user *uaddr) struct mm_struct *mm = current->mm; int ret; + /* TODO [PCuABI] - capability checks for uaccess */
mmap_read_lock(mm); ret = fixup_user_fault(mm, user_ptr_addr(uaddr), FAULT_FLAG_WRITE, NULL); @@ -750,20 +755,19 @@ static inline int fetch_robust_entry(struct
robust_list __user **entry,
#endif unsigned int *pi) { - unsigned long uentry; + struct robust_list __user *uentry; + ptraddr_t uentry_ptr; - if (get_user(uentry, (unsigned long __user *)head)) + if (get_user_ptr(uentry, head)) return -EFAULT; - /* - * TODO [PCuABI] - pointer conversion to be checked - * Each entry points to either next one or head of the list - * so this should probably operate on capabilities and use - * get_user_ptr instead, or validate the capability prior to - * get_user - */ - *entry = uaddr_to_user_ptr(uentry & ~1UL); - *pi = uentry & 1; + uentry_ptr = user_ptr_addr(uentry); +#ifdef CONFIG_CHERI_PURECAP_UABI + *entry = cheri_address_set(uentry, uentry_ptr & ~1UL); +#else + *entry = uaddr_to_user_ptr(uentry_ptr & ~1UL); +#endif + *pi = uentry_ptr & 1;
This looks functionally correct but I think we can make it simpler, without #ifdef. Note that we don't want to keep using uaddr_to_user_ptr(): it was only introduced because we didn't actually understand the ABI. With your change we now read full user pointers, as we should, and we do not need to create any.
One options is to use USER_PTR_ALIGN_DOWN(), introduced some time ago and described in the user_ptr doc [1]. Clearing the LSB is equivalent to aligning down to the nearest multiple of 2, so USER_PTR_ALIGN_DOWN(uentry, 2) should do the trick.
Like __ADDR_REGISTER, uentry_ptr is badly named, since that's an address. We probably don't need it with that suggestion anyway, user_ptr_addr(uentry) & 1 should be fine.
[1] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/D...
return 0; } diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index cba8b1a6a4cc..77b2443a880c 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -388,7 +388,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int
flags, u32 __user *uaddr2,
* Requeue PI only works on two distinct uaddrs. This * check is only valid for private futexes. See below. */ - if (uaddr1 == uaddr2) + if (user_ptr_addr(uaddr1) == user_ptr_addr(uaddr2)) return -EINVAL; /* @@ -774,7 +774,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr,
unsigned int flags,
if (!IS_ENABLED(CONFIG_FUTEX_PI)) return -ENOSYS; - if (uaddr == uaddr2) + if (user_ptr_addr(uaddr) == user_ptr_addr(uaddr2))
These two changes have no effect. Pointer comparison only considers the address, even for capabilities.
In some cases we actually want to compare full user pointers, user_ptr_is_same() was introduced for that purpose. However you are correct that in this case it is the address that matters, and as a result we can leave the original code unchanged.
Nice patch overall, I don't think you missed anything!
Kevin
return -EINVAL; if (!bitset)