On Tue, 2 Apr 2024 at 13:03, Kevin Brodsky <kevin.brodsky@arm.com> wrote:
On 28/03/2024 17:20, josh lant wrote:
> On Wed, 27 Mar 2024 at 08:19, Kevin Brodsky <kevin.brodsky@arm.com> wrote:
>
>     On 26/03/2024 15:50, Joshua Lant wrote:
>     > There is an alignment issue at the user/kernel boundary in
>     xtables with
>     > capabilities, encountered in macro XT_ALIGN, in the function
>     > xt_check_target (with message size of (kernel) and (user) not
>     matching).
>     > This bug occured when running certain iptables commands in the
>     wireguard test
>     > script netns.sh. e.g.
>     > iptables -t nat -A POSTROUTING -s 192.168.1.0/24
>     <http://192.168.1.0/24> -d 10.0.0.0/24 <http://10.0.0.0/24> -j SNAT
>     > --to 10.0.0.1
>     >
>     > Signed-off-by: Joshua Lant joshualant@gmail.com
>     > ---
>     >  include/uapi/linux/netfilter/x_tables.h | 1 +
>     >  1 file changed, 1 insertion(+)
>     >
>     > diff --git a/include/uapi/linux/netfilter/x_tables.h
>     b/include/uapi/linux/netfilter/x_tables.h
>     > index 796af83a963a..c53b46118531 100644
>     > --- a/include/uapi/linux/netfilter/x_tables.h
>     > +++ b/include/uapi/linux/netfilter/x_tables.h
>     > @@ -95,6 +95,7 @@ struct _xt_align {
>     >       __u16 u16;
>     >       __u32 u32;
>     >       __u64 u64;
>     > +        __uintcap_t ucap;
>
>     I can see how this sort of change might be necessary - if any of the
>     netfilter structs contains user pointers. However I did not manage to
>     find such a struct, either in those mentioned in the comment above
>     (e.g.
>     ipt_entry) or others under include/uapi/linux/netfilter. Do you
>     have an
>     idea which struct causes the issue? The error message printed by
>     xt_check_target() should be helpful.
>
>     On a separate note, we cannot use __uintcap_t unconditionally in uapi
>     headers - they need to be fully backwards-compatible. I think
>     using void
>     __user * would work just fine here (we want to ensure user pointer
>     alignment regardless of the ABI, and this achieves exactly that).
>
>     Kevin
>
>     >  };
>     > 
>     >  #define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct
>     _xt_align))
>
>
> Hi Kevin,
>
> Thanks for reviewing the patches. Sorry for the long post here…

On the contrary, thank you for providing those details, more is better
than less!

>
> So the bug presents itself in iptables using this function which
> contains a sendmsg syscall:
>
> mnl_nft_socket_sendmsg()
>
> https://git.netfilter.org/iptables/tree/iptables/nft.c#n193
>
> Forgive my ignorance, but the use of the __user annotation is a little
> unclear to me… Is it generally regarded as a requirement on all
> pointers passed between userspace and kernel space, or is it an
> optional way to provide additional safety to stop developers
> accidentally dereferencing user pointers?

In mainline, __user is an annotation that can be used by tools such as
sparse, but has otherwise no effect. In Morello Linux, when PCuABI is
selected, it is actually used to turn all user pointers into
capabilities, and it is therefore essential to use it for all user
pointers. You may find it useful to have a look at the PCuABI overview
in the tree [1].

>
> Is this issue related to the fact that the sendmsg syscall is not
> supported in the transitional purecap ABI? And so could present itself
> in other places? I.e. my patch fixes my specific problem, but not
> necessarily all cases that someone may encounter around sendmsg? If
> this is the case then what sort of work would be required? Is adding
> support for this a task that I could reasonably undertake?

The transitional ABI is not really relevant any more, as we are now able
to test most syscalls and are not aware of regressions in PCuABI (which
of course doesn't meant there isn't any!). The documentation does need
updating, this is something that is on my TODO list. Like the other
syscalls, sendmsg() is expected to work as normal, though our testing is
by no means exhaustive and you may well have found a defect here.

>
> Here is the backtrace from the kernel at the point where the error
> message is printed, along with the message:
>
> [  218.297499] Call trace:
> [  218.299931]  dump_backtrace+0xec/0x140
> [  218.303671]  show_stack+0x1c/0x30
> [  218.306972]  dump_stack_lvl+0x5c/0x78
> [  218.310623]  dump_stack+0x14/0x20
> [  218.313925]  xt_check_target+0x1dc/0x270
> [  218.317848]  nft_target_init+0x1f8/0x2c0
> [  218.321759]  nf_tables_newrule+0x6d0/0x920
> [  218.325849]  nfnetlink_rcv+0x4b8/0x820
> [  218.329586]  netlink_unicast_kernel+0xc4/0x190
> [  218.334017]  netlink_unicast+0x100/0x1b0
> [  218.337927]  netlink_sendmsg+0x2c4/0x3b8
> [  218.341837]  ____sys_sendmsg+0x1a0/0x288
> [  218.345747]  __sys_sendmsg+0x154/0x1c8
> [  218.349506]  __arm64_sys_sendmsg+0x28/0x38
> [  218.353589]  el0_svc_common+0xd8/0x140
> [  218.357326]  do_el0_svc+0x4c/0xb8
> [  218.360629]  el0_svc+0x24/0x58
> [  218.363670]  el0t_64_sync_handler+0x7c/0xe8
> [  218.367839]  el0t_64_sync+0x1c0/0x1c8
> [  218.371505] x_tables: ip_tables: SNAT.0 target: invalid size 24
> (kernel) != (user) 32

Luckily this error allows us to find the relevant struct easily.
"SNAT.0" shows that the relevant struct xt_target is this one [2]. From
there we find that the corresponding uapi struct is
nf_nat_ipv4_multi_range_compat [3]. Its size is 20 regardless of the ABI
(only 16- and 32-bit integers), so its 8-byte aligned size is 24, as
printed.

My suspicion is that the issue lies somewhere in iptables rather than in
the kernel itself, considering that this struct is clearly fixed-size.
It could be that the layout of the message/header that iptables uses
doesn't match what the kernel expects in purecap, and as a result the
size the kernel reads is not what iptables set. However, I know
essentially nothing about iptables or netfilter, so I'm afraid I won't
be of much help to investigate this further...

[1]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/Documentation/cheri/pcuabi.rst
[2]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/net/netfilter/xt_nat.c#L152
[3]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/include/uapi/linux/netfilter/nf_nat.h#L33
[4]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/arch/arm64/include/asm/syscall_wrapper.h#L164

>
> It is unclear to me at this point what the actual message is that’s
> being sent (i.e. what is causing the size discrepancy). The user size
> is determined in nft_target_init(), by use of the nla_len() function.
> The len is determined from the total size-header size, so it is the
> payload I guess that contains an offending capability? However, I have
> not been able to dig enough yet to find the payload that is sent.
>
> https://elixir.bootlin.com/linux/v6.4/source/net/netfilter/nft_compat.c#L236
>
> (As a complete aside… In examining the backtrace, I see that this
> “__arm64_sys_sendmsg” function is not referenced anywhere when i grep
> the kernel… I don’t really understand where that has come from?)

The syscall wrapper machinery does a lot of magic indeed! [4]

>
> In regards to: “On a separate note, we cannot use __uintcap_t
> unconditionally in uapi headers ”...
>
> Apologies, I had originally made this conditional using #ifdef
> CONFIG_CHERI_PURECAP_UABI, but the compiler complained about “leak
> CONFIG_CHERI_PURECAP_UABI to user-space” so i did not include it…

Indeed, CONFIG macros cannot be used in uapi headers, because userspace
cannot know their value when building against the headers.

> I have just tried again with #ifdef __CHERI_PURE_CAPABILITY__ and it
> compiles just fine.

We cannot use that either, because the kernel is not purecap itself.
That's why typing is so tricky with PCuABI in uapi headers :/

>
> Is this preferable with using __unitcap_t? Or do you mean that by
> using void __user * instead it would not be required to be conditional
> at all?

void __user * works unconditionally. From a kernel perspective, it is
always a user pointer (capability in PCuABI). From a user perspective,
__user is defined to nothing, so it is a normal pointer (also a
capability in purecap).

Hope that clarifies the situation a little!

Kevin

>
> Cheers,
>
> Josh
>  


Hi Kevin,

Thanks for your input here. I appreciate you taking the time to go through it bit by bit...

I guess I need to do some more digging. So should I not submit a v2 for this patch until I am able to have a more concrete answer about the cause of the issue? Or should I do that regardless, the update with just the void __user *, and not submitting the defconfig patch above at all (I only found 1 overlapping setting between the standard arm64 config and my updated one)?

Cheers,

Josh