On Wed, 27 Mar 2024 at 08:19, Kevin Brodsky <kevin.brodsky@arm.com> wrote:
On 26/03/2024 15:50, Joshua Lant wrote:
> There is an alignment issue at the user/kernel boundary in xtables with
> capabilities, encountered in macro XT_ALIGN, in the function
> xt_check_target (with message size of (kernel) and (user) not matching).
> This bug occured when running certain iptables commands in the wireguard test
> script netns.sh. e.g.
> iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT
> --to 10.0.0.1
>
> Signed-off-by: Joshua Lant joshualant@gmail.com
> ---
>  include/uapi/linux/netfilter/x_tables.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h
> index 796af83a963a..c53b46118531 100644
> --- a/include/uapi/linux/netfilter/x_tables.h
> +++ b/include/uapi/linux/netfilter/x_tables.h
> @@ -95,6 +95,7 @@ struct _xt_align {
>       __u16 u16;
>       __u32 u32;
>       __u64 u64;
> +        __uintcap_t ucap;

I can see how this sort of change might be necessary - if any of the
netfilter structs contains user pointers. However I did not manage to
find such a struct, either in those mentioned in the comment above (e.g.
ipt_entry) or others under include/uapi/linux/netfilter. Do you have an
idea which struct causes the issue? The error message printed by
xt_check_target() should be helpful.

On a separate note, we cannot use __uintcap_t unconditionally in uapi
headers - they need to be fully backwards-compatible. I think using void
__user * would work just fine here (we want to ensure user pointer
alignment regardless of the ABI, and this achieves exactly that).

Kevin

>  };

>  #define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct _xt_align))

Hi Kevin,

Thanks for reviewing the patches. Sorry for the long post here…

So the bug presents itself in iptables using this function which contains a sendmsg syscall:

mnl_nft_socket_sendmsg()

https://git.netfilter.org/iptables/tree/iptables/nft.c#n193

Forgive my ignorance, but the use of the __user annotation is a little unclear to me… Is it generally regarded as a requirement on all pointers passed between userspace and kernel space, or is it an optional way to provide additional safety to stop developers accidentally dereferencing user pointers?

Is this issue related to the fact that the sendmsg syscall is not supported in the transitional purecap ABI? And so could present itself in other places? I.e. my patch fixes my specific problem, but not necessarily all cases that someone may encounter around sendmsg? If this is the case then what sort of work would be required? Is adding support for this a task that I could reasonably undertake?

Here is the backtrace from the kernel at the point where the error message is printed, along with the message:

[  218.297499] Call trace:
[  218.299931]  dump_backtrace+0xec/0x140
[  218.303671]  show_stack+0x1c/0x30
[  218.306972]  dump_stack_lvl+0x5c/0x78
[  218.310623]  dump_stack+0x14/0x20
[  218.313925]  xt_check_target+0x1dc/0x270
[  218.317848]  nft_target_init+0x1f8/0x2c0
[  218.321759]  nf_tables_newrule+0x6d0/0x920
[  218.325849]  nfnetlink_rcv+0x4b8/0x820
[  218.329586]  netlink_unicast_kernel+0xc4/0x190
[  218.334017]  netlink_unicast+0x100/0x1b0
[  218.337927]  netlink_sendmsg+0x2c4/0x3b8
[  218.341837]  ____sys_sendmsg+0x1a0/0x288
[  218.345747]  __sys_sendmsg+0x154/0x1c8
[  218.349506]  __arm64_sys_sendmsg+0x28/0x38
[  218.353589]  el0_svc_common+0xd8/0x140
[  218.357326]  do_el0_svc+0x4c/0xb8
[  218.360629]  el0_svc+0x24/0x58
[  218.363670]  el0t_64_sync_handler+0x7c/0xe8
[  218.367839]  el0t_64_sync+0x1c0/0x1c8
[  218.371505] x_tables: ip_tables: SNAT.0 target: invalid size 24 (kernel) != (user) 32

It is unclear to me at this point what the actual message is that’s being sent (i.e. what is causing the size discrepancy). The user size is determined in nft_target_init(), by use of the nla_len() function. The len is determined from the total size-header size, so it is the payload I guess that contains an offending capability? However, I have not been able to dig enough yet to find the payload that is sent.

https://elixir.bootlin.com/linux/v6.4/source/net/netfilter/nft_compat.c#L236

(As a complete aside… In examining the backtrace, I see that this “__arm64_sys_sendmsg” function is not referenced anywhere when i grep the kernel… I don’t really understand where that has come from?)


In regards to: “On a separate note, we cannot use __uintcap_t unconditionally in uapi headers ”...

Apologies, I had originally made this conditional using #ifdef CONFIG_CHERI_PURECAP_UABI, but the compiler complained about “leak CONFIG_CHERI_PURECAP_UABI to user-space” so i did not include it…
I have just tried again with #ifdef __CHERI_PURE_CAPABILITY__ and it compiles just fine.

Is this preferable with using __unitcap_t? Or do you mean that by using void __user * instead it would not be required to be conditional at all?

Cheers,

Josh