Hi,
This RFC proposes changes to the bpf syscall to support propagating user pointers as capabilities in the pure-capability kernel-user ABI (PCuABI). It also includes an approach to supporting the existing aarch64 ABI as a COMPAT layer (compat64).
Since the bpf syscall is multiplexed this RFC changes only the BPF_PROG_LOAD option. The idea is to agree on the general approach and get some feedback here before the remaining options are changed to fully support the syscall.
Since the changes are incomplete, functions in this series suffixed with _fixed will eventually replace the original function it is updating. The originals remain in place for now.
Patches 8 and 9 provide two potential solutions to one problem. I don't know enough about LSMs to decide there so any input there appreciated.
A basic LTP test that only does a bare minimal bpf(BPF_PROG_LOAD...) syscall has been created for this RFC to verify these changes. This also tests the new CHECK_ATTR macro implementation.
Kernel branch available at: https://git.morello-project.org/zdleaf/linux/-/commits/rfc/bpf
LTP test at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...
Thanks,
Zach
Zachary Leaf (9): arm64: morello: enable eBPF and tracing bpf/net: copy ptrs from user with bpf/sockptr_t bpf: extend bpfptr_t to use pointers bpf: use user pointer types for bpf_attr bpf: add compat64 handling of bpf syscall bpf: make CHECK_ATTR support compat temp for RFC: add padding to BPF_PROG_LOAD bpf/security: update bpf lsm API bpf/security: move lsm hook point
.../morello_transitional_pcuabi_defconfig | 15 ++- include/linux/bpfptr.h | 19 ++++ include/linux/sockptr.h | 9 ++ include/uapi/linux/bpf.h | 55 +++++++++-- kernel/bpf/syscall.c | 95 ++++++++++++++++--- kernel/bpf/verifier.c | 10 +- 6 files changed, 176 insertions(+), 27 deletions(-)
-- 2.34.1
Enable support for running eBPF in the kernel. JIT compilation is disabled for now.
In addition enable syscall tracing options that are required to use BPF_PROG_TYPE_TRACEPOINT.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- .../configs/morello_transitional_pcuabi_defconfig | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 2c0798c18553..0d32128d6610 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -145,5 +145,18 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_MEMTEST=y +# BPF +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_NET_CLS_BPF=m +CONFIG_NET_ACT_BPF=m +CONFIG_BPF_JIT=n +CONFIG_HAVE_EBPF_JIT=y +# for BPF tracepoint support +CONFIG_BPF_EVENTS=y +CONFIG_TRACEPOINTS=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_PERF_EVENTS=y +CONFIG_FTRACE=y +CONFIG_FTRACE_SYSCALLS=y
On 15/11/2022 09:08, Zachary Leaf wrote:
Enable support for running eBPF in the kernel. JIT compilation is disabled for now.
In addition enable syscall tracing options that are required to use BPF_PROG_TYPE_TRACEPOINT.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
.../configs/morello_transitional_pcuabi_defconfig | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 2c0798c18553..0d32128d6610 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -145,5 +145,18 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_MEMTEST=y +# BPF +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_NET_CLS_BPF=m +CONFIG_NET_ACT_BPF=m +CONFIG_BPF_JIT=n
Is this form valid ? Shouldn't it be the following ?
# CONFIG_BPF_JIT is not set
Or does =n force it to be disabled anyway ?
Thanks, Téo
+CONFIG_HAVE_EBPF_JIT=y +# for BPF tracepoint support +CONFIG_BPF_EVENTS=y +CONFIG_TRACEPOINTS=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_PERF_EVENTS=y +CONFIG_FTRACE=y +CONFIG_FTRACE_SYSCALLS=y
On 15/11/2022 13:56, Teo Couprie Diaz wrote:
On 15/11/2022 09:08, Zachary Leaf wrote:
Enable support for running eBPF in the kernel. JIT compilation is disabled for now.
In addition enable syscall tracing options that are required to use BPF_PROG_TYPE_TRACEPOINT.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
.../configs/morello_transitional_pcuabi_defconfig | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 2c0798c18553..0d32128d6610 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -145,5 +145,18 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_MEMTEST=y +# BPF +CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y +CONFIG_NET_CLS_BPF=m +CONFIG_NET_ACT_BPF=m +CONFIG_BPF_JIT=n
Is this form valid ? Shouldn't it be the following ?
I think so as per https://www.kernel.org/doc/html/latest/kbuild/kconfig-language.html
# CONFIG_BPF_JIT is not set
This is just a comment, so I don't believe it affects anything. It just means the option wasn't explicitly given when the config was generated. If an option is not set, it depends what the default for a given option is.
As far as I can tell CONFIG_BPF_JIT is enabled by default on arm64 as per kernel/bpf/Kconfig + arch/arm64/Kconfig.
This is my understanding based on the docs/above files.
In any case it probably doesn't hurt to be explicit with an =n.
I haven't looked into the JIT yet or how it works so thought it better to disable it for now. I should probably add that to the commit message.
Thanks, Zach
Or does =n force it to be disabled anyway ?
Thanks, Téo
+CONFIG_HAVE_EBPF_JIT=y +# for BPF tracepoint support +CONFIG_BPF_EVENTS=y +CONFIG_TRACEPOINTS=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_PERF_EVENTS=y +CONFIG_FTRACE=y +CONFIG_FTRACE_SYSCALLS=y
On 15-11-2022 09:08, Zachary Leaf wrote:
Enable support for running eBPF in the kernel. JIT compilation is disabled for now.
In addition enable syscall tracing options that are required to use BPF_PROG_TYPE_TRACEPOINT.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
.../configs/morello_transitional_pcuabi_defconfig | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 2c0798c18553..0d32128d6610 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -145,5 +145,18 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_MEMTEST=y +# BPF
Comments are nice, but are not typical to defconfigs.
+CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y
This one is already set on line 6.
+CONFIG_NET_CLS_BPF=m +CONFIG_NET_ACT_BPF=m
Are we loading modules? I am not sure why we need these.
+CONFIG_BPF_JIT=n +CONFIG_HAVE_EBPF_JIT=y +# for BPF tracepoint support +CONFIG_BPF_EVENTS=y +CONFIG_TRACEPOINTS=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_PERF_EVENTS=y +CONFIG_FTRACE=y +CONFIG_FTRACE_SYSCALLS=y
Whenever you modify a defconfig, it's useful to also run "make savedefconfig" and compare the created defconfig with the one you modified. In that way, you can identify if there are conflicts, duplicates, or redundant entries.
Thanks, Tudor
On 16/11/2022 14:50, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
Enable support for running eBPF in the kernel. JIT compilation is disabled for now.
In addition enable syscall tracing options that are required to use BPF_PROG_TYPE_TRACEPOINT.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
.../configs/morello_transitional_pcuabi_defconfig | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 2c0798c18553..0d32128d6610 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -145,5 +145,18 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_MEMTEST=y +# BPF
Comments are nice, but are not typical to defconfigs.
Ack
+CONFIG_BPF=y +CONFIG_BPF_SYSCALL=y
This one is already set on line 6.
Ack
+CONFIG_NET_CLS_BPF=m +CONFIG_NET_ACT_BPF=m
Are we loading modules? I am not sure why we need these.
You're right, it doesn't appear that we're building anything as a non-builtin module currently (i.e. no ...=m), at least based on my .config. It doesn't look like these modules are even built for whatever reason - build-scripts probably need updating. Do modules work?
As for what it's for, it's for network traffic-control classifier/actions, so basically classifying packets and taking some action based on that. eBPF can be used for both those via BPF_PROG_TYPE_SCHED_CLS/..._SCHED_ACT program types.
man 8 tc-bpf: https://man7.org/linux/man-pages/man8/tc-bpf.8.html
This article gives a good intro/overview: https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/
In any case I didn't check or test this functionality so will remove for now in v2 unless anyone has different opinions.
+CONFIG_BPF_JIT=n +CONFIG_HAVE_EBPF_JIT=y +# for BPF tracepoint support +CONFIG_BPF_EVENTS=y +CONFIG_TRACEPOINTS=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_PERF_EVENTS=y +CONFIG_FTRACE=y +CONFIG_FTRACE_SYSCALLS=y
Whenever you modify a defconfig, it's useful to also run "make savedefconfig" and compare the created defconfig with the one you modified. In that way, you can identify if there are conflicts, duplicates, or redundant entries.
Good tip.
Thanks, Zach
Thanks, Tudor _______________________________________________ linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 17/11/2022 13:53, Zachary Leaf wrote:
On 16/11/2022 14:50, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
Enable support for running eBPF in the kernel. JIT compilation is disabled for now.
In addition enable syscall tracing options that are required to use BPF_PROG_TYPE_TRACEPOINT.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
.../configs/morello_transitional_pcuabi_defconfig | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 2c0798c18553..0d32128d6610 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -145,5 +145,18 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_MEMTEST=y +# BPF
Comments are nice, but are not typical to defconfigs.
Ack
+CONFIG_BPF=y
Does this actually build right now? If not, in the complete series this patch should be last, or at any rate at a point where it does not stop the kernel from building. This is mainly for the purpose of bisectability - no commit should introduce a regression.
+CONFIG_BPF_SYSCALL=y
This one is already set on line 6.
Ack
+CONFIG_NET_CLS_BPF=m +CONFIG_NET_ACT_BPF=m
Are we loading modules? I am not sure why we need these.
You're right, it doesn't appear that we're building anything as a non-builtin module currently (i.e. no ...=m), at least based on my .config. It doesn't look like these modules are even built for whatever reason - build-scripts probably need updating. Do modules work?
Not at the minute, in fact I have bumped into that issue again just yesterday and created a ticket to track this [1].
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/41
As for what it's for, it's for network traffic-control classifier/actions, so basically classifying packets and taking some action based on that. eBPF can be used for both those via BPF_PROG_TYPE_SCHED_CLS/..._SCHED_ACT program types.
man 8 tc-bpf: https://man7.org/linux/man-pages/man8/tc-bpf.8.html
This article gives a good intro/overview: https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/
In any case I didn't check or test this functionality so will remove for now in v2 unless anyone has different opinions.
Makes sense, that may be a bit too much for the defconfig.
+CONFIG_BPF_JIT=n +CONFIG_HAVE_EBPF_JIT=y +# for BPF tracepoint support +CONFIG_BPF_EVENTS=y +CONFIG_TRACEPOINTS=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_PERF_EVENTS=y +CONFIG_FTRACE=y +CONFIG_FTRACE_SYSCALLS=y
Whenever you modify a defconfig, it's useful to also run "make savedefconfig" and compare the created defconfig with the one you modified. In that way, you can identify if there are conflicts, duplicates, or redundant entries.
Good tip.
In fact this exactly how the defconfig should be updated: run `make savedefconfig` then overwrite the defconfig with its output. This creates a "canonical" defconfig that is stable for a given set of options. It does mean you can't customise it in any way though.
On that note, Téo is right: `# CONFIG_XYZ is not set` is used instead of `CONFIG_XYZ=n` in config files (yes such comments do do something, and no I don't know why it is done this way...).
Kevin
On 22/11/2022 08:21, Kevin Brodsky wrote:
On 17/11/2022 13:53, Zachary Leaf wrote:
On 16/11/2022 14:50, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
Enable support for running eBPF in the kernel. JIT compilation is disabled for now.
In addition enable syscall tracing options that are required to use BPF_PROG_TYPE_TRACEPOINT.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
.../configs/morello_transitional_pcuabi_defconfig | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 2c0798c18553..0d32128d6610 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -145,5 +145,18 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_MEMTEST=y +# BPF
Comments are nice, but are not typical to defconfigs.
Ack
+CONFIG_BPF=y
Does this actually build right now? If not, in the complete series this patch should be last, or at any rate at a point where it does not stop the kernel from building. This is mainly for the purpose of bisectability - no commit should introduce a regression.
It does or at least it should do. CONFIG_BPF is selected[1] by CONFIG_BPF_SYSCALL=y which has already been enabled here since
d0edd12ea75f ("morello: Enable docker in defconfig")
BPF actually works as is currently without these patches since we're not enforcing capabilities. Every cap from userspace just gets cut down to a 64b address and it all appears to work no problem.
Point taken on regression, all these commits should at least compile, I have tested that.
1: https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/Kconfig#L28
+CONFIG_BPF_SYSCALL=y
This one is already set on line 6.
Ack
+CONFIG_NET_CLS_BPF=m +CONFIG_NET_ACT_BPF=m
Are we loading modules? I am not sure why we need these.
You're right, it doesn't appear that we're building anything as a non-builtin module currently (i.e. no ...=m), at least based on my .config. It doesn't look like these modules are even built for whatever reason - build-scripts probably need updating. Do modules work?
Not at the minute, in fact I have bumped into that issue again just yesterday and created a ticket to track this [1].
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/41
As for what it's for, it's for network traffic-control classifier/actions, so basically classifying packets and taking some action based on that. eBPF can be used for both those via BPF_PROG_TYPE_SCHED_CLS/..._SCHED_ACT program types.
man 8 tc-bpf: https://man7.org/linux/man-pages/man8/tc-bpf.8.html
This article gives a good intro/overview: https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/
In any case I didn't check or test this functionality so will remove for now in v2 unless anyone has different opinions.
Makes sense, that may be a bit too much for the defconfig.
+CONFIG_BPF_JIT=n +CONFIG_HAVE_EBPF_JIT=y +# for BPF tracepoint support +CONFIG_BPF_EVENTS=y +CONFIG_TRACEPOINTS=y +CONFIG_HAVE_SYSCALL_TRACEPOINTS=y +CONFIG_PERF_EVENTS=y +CONFIG_FTRACE=y +CONFIG_FTRACE_SYSCALLS=y
Whenever you modify a defconfig, it's useful to also run "make savedefconfig" and compare the created defconfig with the one you modified. In that way, you can identify if there are conflicts, duplicates, or redundant entries.
Good tip.
In fact this exactly how the defconfig should be updated: run `make savedefconfig` then overwrite the defconfig with its output. This creates a "canonical" defconfig that is stable for a given set of options. It does mean you can't customise it in any way though.
On that note, Téo is right: `# CONFIG_XYZ is not set` is used instead of `CONFIG_XYZ=n` in config files (yes such comments do do something, and no I don't know why it is done this way...).
Okay thanks for the info on that, I wasn't aware of how defconfig worked. There's a lot of redundant additions here in that case, the only thing that actually needs to be here is:
CONFIG_FTRACE_SYSCALLS=y
That means this commit can be a lot smaller and it shouldn't really matter if it's at the start or end of the series since the ftrace series is already merged.
Thanks, Zach
Kevin
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpfptr.h | 11 +++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 20 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..abb5b3641f5d 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -52,11 +52,22 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); }
+static inline int copy_from_bpfptr_offset_with_ptr(void *dst, bpfptr_t src, + size_t offset, size_t size) +{ + return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset, size); +} + static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); }
+static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{ + return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size); +} + static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index ea193414298b..e07da559327f 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; }
+static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src, + size_t offset, size_t size) +{ + if (!sockptr_is_kernel(src)) + return copy_from_user_with_ptr(dst, src.user + offset, size); + memcpy(dst, src.kernel + offset, size); + return 0; +} + static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
On 15-11-2022 09:08, Zachary Leaf wrote:
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Is there a reason on why this is not done to copy_to_x functions as well?
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 11 +++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 20 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..abb5b3641f5d 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -52,11 +52,22 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst, bpfptr_t src,
size_t offset, size_t size)
+{
- return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset, size);
+}
- static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); }
+static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{
- return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
copy_from_bpfptr_offset_with_ptr is missing from this patch I think...
+}
- static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) {
diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index ea193414298b..e07da559327f 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; } +static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src,
size_t offset, size_t size)
+{
- if (!sockptr_is_kernel(src))
return copy_from_user_with_ptr(dst, src.user + offset, size);
- memcpy(dst, src.kernel + offset, size);
- return 0;
+}
- static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
On 16/11/2022 15:01, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Is there a reason on why this is not done to copy_to_x functions as well?
Only because I didn't come across anywhere it's required yet, and as a general principle I'd usually try avoid adding any code that "might" be needed, just because of the added maintenance/testing burden etc.
Since this is just the RFC looking at BPF_PROG_LOAD, if it comes up when adding the other options we can add it to this commit then.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 11 +++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 20 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..abb5b3641f5d 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -52,11 +52,22 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst,
^ here
bpfptr_t src, + size_t offset, size_t size) +{ + return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset, size); +}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); } +static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{ + return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
copy_from_bpfptr_offset_with_ptr is missing from this patch I think...
Is it not above?
Thanks, Zach
+}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index ea193414298b..e07da559327f 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; } +static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src, + size_t offset, size_t size) +{ + if (!sockptr_is_kernel(src)) + return copy_from_user_with_ptr(dst, src.user + offset, size); + memcpy(dst, src.kernel + offset, size); + return 0; +}
static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
On 17-11-2022 13:13, Zachary Leaf wrote:
On 16/11/2022 15:01, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Is there a reason on why this is not done to copy_to_x functions as well?
Only because I didn't come across anywhere it's required yet, and as a general principle I'd usually try avoid adding any code that "might" be needed, just because of the added maintenance/testing burden etc.
Since this is just the RFC looking at BPF_PROG_LOAD, if it comes up when adding the other options we can add it to this commit then.
Ack.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 11 +++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 20 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..abb5b3641f5d 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -52,11 +52,22 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst,
^ here
bpfptr_t src, + size_t offset, size_t size) +{ + return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset, size); +}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); } +static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{ + return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
copy_from_bpfptr_offset_with_ptr is missing from this patch I think...
Is it not above?
Indeed, might've been tired, sorry about that!
Thanks, Tudor
Thanks, Zach
+}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index ea193414298b..e07da559327f 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; } +static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src, + size_t offset, size_t size) +{ + if (!sockptr_is_kernel(src)) + return copy_from_user_with_ptr(dst, src.user + offset, size); + memcpy(dst, src.kernel + offset, size); + return 0; +}
static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 17/11/2022 13:28, Tudor Cretu wrote:
On 17-11-2022 13:13, Zachary Leaf wrote:
On 16/11/2022 15:01, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Is there a reason on why this is not done to copy_to_x functions as well?
Only because I didn't come across anywhere it's required yet, and as a general principle I'd usually try avoid adding any code that "might" be needed, just because of the added maintenance/testing burden etc.
Since this is just the RFC looking at BPF_PROG_LOAD, if it comes up when adding the other options we can add it to this commit then.
Ack.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 11 +++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 20 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..abb5b3641f5d 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -52,11 +52,22 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst,
^ here
bpfptr_t src, + size_t offset, size_t size) +{ + return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset, size); +}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); } +static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{ + return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
copy_from_bpfptr_offset_with_ptr is missing from this patch I think...
Is it not above?
Indeed, might've been tired, sorry about that!
No problem. Thanks for the review.
Thanks, Zach
Thanks, Tudor
Thanks, Zach
+}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index ea193414298b..e07da559327f 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; } +static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src, + size_t offset, size_t size) +{ + if (!sockptr_is_kernel(src)) + return copy_from_user_with_ptr(dst, src.user + offset, size); + memcpy(dst, src.kernel + offset, size); + return 0; +}
static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 17/11/2022 14:13, Zachary Leaf wrote:
On 16/11/2022 15:01, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Is there a reason on why this is not done to copy_to_x functions as well?
Only because I didn't come across anywhere it's required yet, and as a general principle I'd usually try avoid adding any code that "might" be needed, just because of the added maintenance/testing burden etc.
Since this is just the RFC looking at BPF_PROG_LOAD, if it comes up when adding the other options we can add it to this commit then.
Makes sense to me.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 11 +++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 20 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..abb5b3641f5d 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -52,11 +52,22 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst,
^ here
bpfptr_t src, + size_t offset, size_t size) +{ + return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset, size);
Some nits while I'm here: - The arguments continuation line should be aligned on the opening parenthesis. - Better to break lines at 80 chars when it's relatively easy like here.
Kevin
+}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); } +static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{ + return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
copy_from_bpfptr_offset_with_ptr is missing from this patch I think...
Is it not above?
Thanks, Zach
+}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index ea193414298b..e07da559327f 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; } +static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src, + size_t offset, size_t size) +{ + if (!sockptr_is_kernel(src)) + return copy_from_user_with_ptr(dst, src.user + offset, size); + memcpy(dst, src.kernel + offset, size); + return 0; +}
static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
The bpf_attr union currently stores pointers as u64 addresses, and hence the make_bpfptr helper function takes u64 addresses instead of pointers.
In preparation for changing bpf_attr for architectures where user pointers are longer than u64 (e.g. Morello PCuABI), adapt make_bpfptr to accept __kernel_aligned_uintptr_t. This will remain u64 on architectures where addresses and pointers are equivalent, but be extended on archs with larger pointer sizes.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpfptr.h | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index abb5b3641f5d..627c1c9c0372 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -31,6 +31,14 @@ static inline bpfptr_t make_bpfptr(u64 addr, bool is_kernel) return USER_BPFPTR(u64_to_user_ptr(addr)); }
+static inline bpfptr_t make_bpfptr_fixed(__kernel_aligned_uintptr_t ptr, bool is_kernel) +{ + if (is_kernel) + return KERNEL_BPFPTR((void *) (uintptr_t) ptr); + else + return USER_BPFPTR((void __user *) ptr); +} + static inline bool bpfptr_is_null(bpfptr_t bpfptr) { if (bpfptr_is_kernel(bpfptr))
On 15-11-2022 09:08, Zachary Leaf wrote:
The bpf_attr union currently stores pointers as u64 addresses, and hence the make_bpfptr helper function takes u64 addresses instead of pointers.
In preparation for changing bpf_attr for architectures where user pointers are longer than u64 (e.g. Morello PCuABI), adapt make_bpfptr to accept __kernel_aligned_uintptr_t. This will remain u64 on architectures
Why __kernel_aligned_uintptr_t and not __kernel_uintptr_t? We're typically using __kernel_aligned_uintptr_t to replace __aligned_u64, but maybe I'm missing something.
where addresses and pointers are equivalent, but be extended on archs with larger pointer sizes.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index abb5b3641f5d..627c1c9c0372 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -31,6 +31,14 @@ static inline bpfptr_t make_bpfptr(u64 addr, bool is_kernel) return USER_BPFPTR(u64_to_user_ptr(addr)); } +static inline bpfptr_t make_bpfptr_fixed(__kernel_aligned_uintptr_t ptr, bool is_kernel) +{
- if (is_kernel)
return KERNEL_BPFPTR((void *) (uintptr_t) ptr);
a cast to (void *) should be sufficient without going through uintptr_t now.
- else
return USER_BPFPTR((void __user *) ptr);
+}
- static inline bool bpfptr_is_null(bpfptr_t bpfptr) { if (bpfptr_is_kernel(bpfptr))
On 16/11/2022 15:12, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
The bpf_attr union currently stores pointers as u64 addresses, and hence the make_bpfptr helper function takes u64 addresses instead of pointers.
In preparation for changing bpf_attr for architectures where user pointers are longer than u64 (e.g. Morello PCuABI), adapt make_bpfptr to accept __kernel_aligned_uintptr_t. This will remain u64 on architectures
Why __kernel_aligned_uintptr_t and not __kernel_uintptr_t? We're typically using __kernel_aligned_uintptr_t to replace __aligned_u64, but maybe I'm missing something.
Not missing anything - we should probably use the more generic __kernel_uintptr_t version here. It's the exact same thing in PCuABI (__uintcap_t) and otherwise it's __u64 v __aligned_u64 and I don't think alignment matters here.
Will change to __kernel_uintptr_t.
where addresses and pointers are equivalent, but be extended on archs with larger pointer sizes.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index abb5b3641f5d..627c1c9c0372 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -31,6 +31,14 @@ static inline bpfptr_t make_bpfptr(u64 addr, bool is_kernel) return USER_BPFPTR(u64_to_user_ptr(addr)); } +static inline bpfptr_t make_bpfptr_fixed(__kernel_aligned_uintptr_t ptr, bool is_kernel) +{ + if (is_kernel) + return KERNEL_BPFPTR((void *) (uintptr_t) ptr);
a cast to (void *) should be sufficient without going through uintptr_t now.
That results in a CToPtr warning for me:
./include/linux/bpfptr.h:37:24: warning: the following conversion will result in a CToPtr operation; the behaviour of CToPtr can be confusing since using CToPtr on an untagged capability will give 0 instead of the integer value and should therefore be explicitly annotated [-Wcher i-pointer-conversion]
For PCuABI __kernel_aligned_uintptr_t is __uintcap_t (128b), void* is 64b and uintptr_t should be u64.
I guess the warning here is erroneous since in this code path, in theory it should be the kernel calling this with a 64b kernel ptr, so ptr should contain a 64b integer representation in __uintcap_t (__kernel_uintptr_t), with nothing in the top bits.
Explicit cast to uintptr_t then tells the compiler there is no chance this __uintcap_t contains a capability so no CToPtr warning. I'm not sure exactly. Would be interested if anyone understands that differently.
Thanks, Zach
+ else + return USER_BPFPTR((void __user *) ptr); +}
static inline bool bpfptr_is_null(bpfptr_t bpfptr) { if (bpfptr_is_kernel(bpfptr))
On 21/11/2022 18:57, Zachary Leaf wrote:
+static inline bpfptr_t make_bpfptr_fixed(__kernel_aligned_uintptr_t ptr, bool is_kernel) +{ + if (is_kernel) + return KERNEL_BPFPTR((void *) (uintptr_t) ptr);
a cast to (void *) should be sufficient without going through uintptr_t now.
That results in a CToPtr warning for me:
./include/linux/bpfptr.h:37:24: warning: the following conversion will result in a CToPtr operation; the behaviour of CToPtr can be confusing since using CToPtr on an untagged capability will give 0 instead of the integer value and should therefore be explicitly annotated [-Wcher i-pointer-conversion]
For PCuABI __kernel_aligned_uintptr_t is __uintcap_t (128b), void* is 64b and uintptr_t should be u64.
I guess the warning here is erroneous since in this code path, in theory it should be the kernel calling this with a 64b kernel ptr, so ptr should contain a 64b integer representation in __uintcap_t (__kernel_uintptr_t), with nothing in the top bits.
Explicit cast to uintptr_t then tells the compiler there is no chance this __uintcap_t contains a capability so no CToPtr warning. I'm not sure exactly. Would be interested if anyone understands that differently.
The way I like to think about casts involving capabilities is that you should only do *one* of the two possible operations that a cast entails a time, that is either 1. representation change (== actual change in the bit pattern) or 2. interpretation change (== moving to a type of a different nature but without any change in the representation).
So if you want to go from A = uintcap_t to B = void *, you actually need two casts, because the representation is different (capability -> 64-bit), and the interpretation is also different (integer -> pointer). uintcap_t can be cast to any other integer type (representation change), and any 64-bit integer can be cast to a pointer (interpretation change), so the following chain works: uintcap_t -> uintptr_t -> void *.
FWIW this maps quite well to the breakdown of casts in C++ (and indeed, on a vanilla 64-bit ABI an int cannot be directly converted to a void * using C++ casts): 1. Representation change == static_cast 2. Interpretation change == reinterpret_cast
Coming back to make_bpfptr(), if the argument contains a kernel pointer stored in a __kernel_uintptr_t, then you indeed know that the top 64 bits are irrelevant and truncating the type is what you want to do. However as discussed offline we should get to the bottom of whether kernel pointers actually get stored in e.g. union bpf_attr, as this is rather unclear at the moment.
Final remark regarding conversions: the warning that Clang prints is significant and explains why the conversion would fail if you didn't do the two-step cast. In hybrid implicit conversions to/from capabilities have a special null-pointer handling: a null pointer always yields the null capability, and an untagged capability always yields the null pointer. Here the input isn't a capability at all but a 64-bit pointer represented as capability, so explicit representation change is required.
Kevin
On 22-11-2022 13:59, Kevin Brodsky wrote:
On 21/11/2022 18:57, Zachary Leaf wrote:
+static inline bpfptr_t make_bpfptr_fixed(__kernel_aligned_uintptr_t ptr, bool is_kernel) +{ + if (is_kernel) + return KERNEL_BPFPTR((void *) (uintptr_t) ptr);
a cast to (void *) should be sufficient without going through uintptr_t now.
That results in a CToPtr warning for me:
./include/linux/bpfptr.h:37:24: warning: the following conversion will result in a CToPtr operation; the behaviour of CToPtr can be confusing since using CToPtr on an untagged capability will give 0 instead of the integer value and should therefore be explicitly annotated [-Wcher i-pointer-conversion]
For PCuABI __kernel_aligned_uintptr_t is __uintcap_t (128b), void* is 64b and uintptr_t should be u64.
I guess the warning here is erroneous since in this code path, in theory it should be the kernel calling this with a 64b kernel ptr, so ptr should contain a 64b integer representation in __uintcap_t (__kernel_uintptr_t), with nothing in the top bits.
Explicit cast to uintptr_t then tells the compiler there is no chance this __uintcap_t contains a capability so no CToPtr warning. I'm not sure exactly. Would be interested if anyone understands that differently.
The way I like to think about casts involving capabilities is that you should only do *one* of the two possible operations that a cast entails a time, that is either 1. representation change (== actual change in the bit pattern) or 2. interpretation change (== moving to a type of a different nature but without any change in the representation).
So if you want to go from A = uintcap_t to B = void *, you actually need two casts, because the representation is different (capability -> 64-bit), and the interpretation is also different (integer -> pointer). uintcap_t can be cast to any other integer type (representation change), and any 64-bit integer can be cast to a pointer (interpretation change), so the following chain works: uintcap_t -> uintptr_t -> void *.
FWIW this maps quite well to the breakdown of casts in C++ (and indeed, on a vanilla 64-bit ABI an int cannot be directly converted to a void * using C++ casts):
- Representation change == static_cast
- Interpretation change == reinterpret_cast
Coming back to make_bpfptr(), if the argument contains a kernel pointer stored in a __kernel_uintptr_t, then you indeed know that the top 64 bits are irrelevant and truncating the type is what you want to do. However as discussed offline we should get to the bottom of whether kernel pointers actually get stored in e.g. union bpf_attr, as this is rather unclear at the moment.
Final remark regarding conversions: the warning that Clang prints is significant and explains why the conversion would fail if you didn't do the two-step cast. In hybrid implicit conversions to/from capabilities have a special null-pointer handling: a null pointer always yields the null capability, and an untagged capability always yields the null pointer. Here the input isn't a capability at all but a 64-bit pointer represented as capability, so explicit representation change is required.
That's very informative!
Thank you both, Tudor
Kevin
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
The bpf_attr union is used to pass config into the bpf syscall. In many cases the union members can contain user pointers. In Morello PCuABI a user pointer is a 128-bit capability (plus 1 out-of-band tag bit) so the __aligned_u64 type is not big enough to hold it. Use the __kernel_aligned_uintptr_t type instead, which is big enough on the affected architectures while remaining __aligned_64 on others.
Use copy_from_user_with_ptr where blocks of memory containing pointers are being copied to preserve capabilities.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/uapi/linux/bpf.h | 14 +++++++------- kernel/bpf/syscall.c | 6 +++--- kernel/bpf/verifier.c | 10 +++++----- 3 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d14b10b85e51..98a23f9a540e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1330,11 +1330,11 @@ union bpf_attr { struct { /* anonymous struct used by BPF_PROG_LOAD command */ __u32 prog_type; /* one of enum bpf_prog_type */ __u32 insn_cnt; - __aligned_u64 insns; - __aligned_u64 license; + __kernel_aligned_uintptr_t insns; + __kernel_aligned_uintptr_t license; __u32 log_level; /* verbosity level of verifier */ __u32 log_size; /* size of user buffer */ - __aligned_u64 log_buf; /* user supplied buffer */ + __kernel_aligned_uintptr_t log_buf; /* user supplied buffer */ __u32 kern_version; /* not used */ __u32 prog_flags; char prog_name[BPF_OBJ_NAME_LEN]; @@ -1346,10 +1346,10 @@ union bpf_attr { __u32 expected_attach_type; __u32 prog_btf_fd; /* fd pointing to BTF type data */ __u32 func_info_rec_size; /* userspace bpf_func_info size */ - __aligned_u64 func_info; /* func info */ + __kernel_aligned_uintptr_t func_info; /* func info */ __u32 func_info_cnt; /* number of bpf_func_info records */ __u32 line_info_rec_size; /* userspace bpf_line_info size */ - __aligned_u64 line_info; /* line info */ + __kernel_aligned_uintptr_t line_info; /* line info */ __u32 line_info_cnt; /* number of bpf_line_info records */ __u32 attach_btf_id; /* in-kernel BTF type id to attach to */ union { @@ -1359,8 +1359,8 @@ union bpf_attr { __u32 attach_btf_obj_fd; }; __u32 core_relo_cnt; /* number of bpf_core_relo */ - __aligned_u64 fd_array; /* array of FDs */ - __aligned_u64 core_relos; + __kernel_aligned_uintptr_t fd_array; /* array of FDs */ + __kernel_aligned_uintptr_t core_relos; __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ };
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index cdaa1152436a..518ab0e9f2a4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2233,7 +2233,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
/* copy eBPF program license from user space */ if (strncpy_from_bpfptr(license, - make_bpfptr(attr->license, uattr.is_kernel), + make_bpfptr_fixed(attr->license, uattr.is_kernel), sizeof(license) - 1) < 0) return -EFAULT; license[sizeof(license) - 1] = 0; @@ -2320,7 +2320,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
err = -EFAULT; if (copy_from_bpfptr(prog->insns, - make_bpfptr(attr->insns, uattr.is_kernel), + make_bpfptr_fixed(attr->insns, uattr.is_kernel), bpf_prog_insn_size(prog)) != 0) goto free_prog_sec;
@@ -4633,7 +4633,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
/* copy attributes from user space, may be less than sizeof(bpf_attr) */ memset(&attr, 0, sizeof(attr)); - if (copy_from_bpfptr(&attr, uattr, size) != 0) + if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0) return -EFAULT;
err = security_bpf(cmd, &attr, size); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d175b70067b3..41827b00492e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10296,7 +10296,7 @@ static int check_btf_func(struct bpf_verifier_env *env, prog = env->prog; btf = prog->aux->btf;
- urecord = make_bpfptr(attr->func_info, uattr.is_kernel); + urecord = make_bpfptr_fixed(attr->func_info, uattr.is_kernel); min_size = min_t(u32, krec_size, urec_size);
krecord = kvcalloc(nfuncs, krec_size, GFP_KERNEL | __GFP_NOWARN); @@ -10440,7 +10440,7 @@ static int check_btf_line(struct bpf_verifier_env *env,
s = 0; sub = env->subprog_info; - ulinfo = make_bpfptr(attr->line_info, uattr.is_kernel); + ulinfo = make_bpfptr_fixed(attr->line_info, uattr.is_kernel); expected_size = sizeof(struct bpf_line_info); ncopy = min_t(u32, expected_size, rec_size); for (i = 0; i < nr_linfo; i++) { @@ -10558,7 +10558,7 @@ static int check_core_relo(struct bpf_verifier_env *env, rec_size % sizeof(u32)) return -EINVAL;
- u_core_relo = make_bpfptr(attr->core_relos, uattr.is_kernel); + u_core_relo = make_bpfptr_fixed(attr->core_relos, uattr.is_kernel); expected_size = sizeof(struct bpf_core_relo); ncopy = min_t(u32, expected_size, rec_size);
@@ -14387,7 +14387,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) env->insn_aux_data[i].orig_idx = i; env->prog = *prog; env->ops = bpf_verifier_ops[env->prog->type]; - env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel); + env->fd_array = make_bpfptr_fixed(attr->fd_array, uattr.is_kernel); is_priv = bpf_capable();
bpf_get_btf_vmlinux(); @@ -14401,7 +14401,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) * and supplied buffer to store the verification trace */ log->level = attr->log_level; - log->ubuf = (char __user *) (unsigned long) attr->log_buf; + log->ubuf = (char __user *) attr->log_buf; log->len_total = attr->log_size;
/* log attributes have to be sane */
The bpf syscall on a native 64 bit system originally did not need a separate compat handler for compat32. Since the union bpf_attr stored pointers as __u64 and was 8B aligned it results in the same struct offsets/memory layout for both 64/32b.
On some new architectures, e.g. Morello PCuABI, userspace pointers are 128 bit (plus 1 out-of-band tag bit), so storing pointers as __u64 is not possible. bpf_attr is therefore extended to hold 128b pointers on these archs, and remains __u64 on those that do not.
In order to support a 64b compat layer (compat64) in PCuABI, add a new union compat_bpf_attr with __u64 members. This retains the offsets and alignment of the original bpf_attr struct.
In order to handle compat64, use generic void* in __sys_bpf(...) to represent either bpf_attr or compat_bpf_attr. Then in each of the multiplexed options e.g. BPF_PROG_LOAD, if in a compat syscall, convert compat_bpf_atr into a native bpf_attr struct and handle as normal.
Since the bpf_attr union contains a number of anonymous structs, converting each option inside the option handlers e.g. bpf_prof_load() seems to be the best way to do it. After the conversion there should be minimal additional compat handling/in_compat_syscall conditions etc.
Currently unrestricted capabilities are created in the kernel here with compat_ptr so some care needs to be taken to check none of these are returned to userspace at any point.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com ---
re: compat_ptr, there is probably another case to be made for having some kind of kernel only flag for capabilities here, to avoid them being returned to userspace by accident (as discussed for make_user_ptr_unsafe)
include/uapi/linux/bpf.h | 39 ++++++++++++++++++++++ kernel/bpf/syscall.c | 71 +++++++++++++++++++++++++++++++++++----- 2 files changed, 102 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 98a23f9a540e..752a520e1481 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1269,6 +1269,45 @@ struct bpf_stack_build_id {
#define BPF_OBJ_NAME_LEN 16U
+union compat_bpf_attr { + struct { /* anonymous struct used by BPF_PROG_LOAD command */ + __u32 prog_type; /* one of enum bpf_prog_type */ + __u32 insn_cnt; + __aligned_u64 insns; + __aligned_u64 license; + __u32 log_level; /* verbosity level of verifier */ + __u32 log_size; /* size of user buffer */ + __aligned_u64 log_buf; /* user supplied buffer */ + __u32 kern_version; /* not used */ + __u32 prog_flags; + char prog_name[BPF_OBJ_NAME_LEN]; + __u32 prog_ifindex; /* ifindex of netdev to prep for */ + /* For some prog types expected attach type must be known at + * load time to verify attach type specific parts of prog + * (context accesses, allowed helpers, etc). + */ + __u32 expected_attach_type; + __u32 prog_btf_fd; /* fd pointing to BTF type data */ + __u32 func_info_rec_size; /* userspace bpf_func_info size */ + __aligned_u64 func_info; /* func info */ + __u32 func_info_cnt; /* number of bpf_func_info records */ + __u32 line_info_rec_size; /* userspace bpf_line_info size */ + __aligned_u64 line_info; /* line info */ + __u32 line_info_cnt; /* number of bpf_line_info records */ + __u32 attach_btf_id; /* in-kernel BTF type id to attach to */ + union { + /* valid prog_fd to attach to bpf prog */ + __u32 attach_prog_fd; + /* or valid module BTF object fd or 0 to attach to vmlinux */ + __u32 attach_btf_obj_fd; + }; + __u32 core_relo_cnt; /* number of bpf_core_relo */ + __aligned_u64 fd_array; /* array of FDs */ + __aligned_u64 core_relos; + __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ + }; +} __attribute__((aligned(8))); + union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 518ab0e9f2a4..6e3b567d254a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2203,21 +2203,68 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) } }
+#ifdef CONFIG_COMPAT +static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest) +{ + const union compat_bpf_attr *cattr = (union compat_bpf_attr *)vattr; + + dest->prog_type = READ_ONCE(cattr->prog_type); + dest->insn_cnt = READ_ONCE(cattr->insn_cnt); + dest->insns = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->insns)); + dest->license = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->license)); + dest->log_level = READ_ONCE(cattr->log_level); + dest->log_size = READ_ONCE(cattr->log_size); + dest->log_buf = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->log_buf)); + dest->kern_version = READ_ONCE(cattr->kern_version); + dest->prog_flags = READ_ONCE(cattr->prog_flags); + strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN); + dest->prog_ifindex = READ_ONCE(cattr->prog_ifindex); + dest->expected_attach_type = READ_ONCE(cattr->expected_attach_type); + dest->prog_btf_fd = READ_ONCE(cattr->prog_btf_fd); + dest->func_info_rec_size = READ_ONCE(cattr->func_info_rec_size); + dest->func_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->func_info)); + dest->func_info_cnt = READ_ONCE(cattr->func_info_cnt); + dest->line_info_rec_size = READ_ONCE(cattr->line_info_rec_size); + dest->line_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->line_info)); + dest->line_info_cnt = READ_ONCE(cattr->line_info_cnt); + dest->attach_btf_id = READ_ONCE(cattr->attach_btf_id); + dest->attach_prog_fd = READ_ONCE(cattr->attach_prog_fd); + /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */ + dest->core_relo_cnt = READ_ONCE(cattr->core_relo_cnt); + dest->fd_array = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->fd_array)); + dest->core_relos = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->core_relos)); + dest->core_relo_rec_size = READ_ONCE(cattr->core_relo_rec_size); +} +#endif + /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr) { - enum bpf_prog_type type = attr->prog_type; + enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; int err; char license[128]; bool is_gpl; + union bpf_attr *attr = NULL; +#ifdef CONFIG_COMPAT + union bpf_attr compat_attr; +#endif
if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL;
+#ifdef CONFIG_COMPAT + if (in_compat_syscall()) + convert_compat_bpf_prog_load(vattr, &compat_attr); + attr = in_compat_syscall() ? &compat_attr : (union bpf_attr *) vattr; +#else + attr = (union bpf_attr *) vattr; +#endif + type = attr->prog_type; + if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ | @@ -4621,22 +4668,30 @@ static int bpf_prog_bind_map(union bpf_attr *attr) static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; +#ifdef CONFIG_COMPAT + union compat_bpf_attr cattr; + void *vattr = in_compat_syscall() ? (void *)&cattr : (void *)&attr; + size_t attrsz = in_compat_syscall() ? sizeof(cattr) : sizeof(attr); +#else + void *vattr = &attr; + size_t attrsz = sizeof(attr); +#endif int err;
if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) return -EPERM;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); + err = bpf_check_uarg_tail_zero(uattr, attrsz, size); if (err) return err; - size = min_t(u32, size, sizeof(attr)); + size = min_t(u32, size, attrsz);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */ - memset(&attr, 0, sizeof(attr)); - if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0) + memset(vattr, 0, attrsz); + if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT;
- err = security_bpf(cmd, &attr, size); + err = security_bpf(cmd, vattr, size); if (err < 0) return err;
@@ -4660,7 +4715,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD: - err = bpf_prog_load(&attr, uattr); + err = bpf_prog_load(vattr, uattr); break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr); -- 2.34.1
On 15-11-2022 09:08, Zachary Leaf wrote:
The bpf syscall on a native 64 bit system originally did not need a separate compat handler for compat32. Since the union bpf_attr stored pointers as __u64 and was 8B aligned it results in the same struct offsets/memory layout for both 64/32b.
On some new architectures, e.g. Morello PCuABI, userspace pointers are 128 bit (plus 1 out-of-band tag bit), so storing pointers as __u64 is not possible. bpf_attr is therefore extended to hold 128b pointers on these archs, and remains __u64 on those that do not.
In order to support a 64b compat layer (compat64) in PCuABI, add a new union compat_bpf_attr with __u64 members. This retains the offsets and alignment of the original bpf_attr struct.
In order to handle compat64, use generic void* in __sys_bpf(...) to represent either bpf_attr or compat_bpf_attr. Then in each of the multiplexed options e.g. BPF_PROG_LOAD, if in a compat syscall, convert compat_bpf_atr into a native bpf_attr struct and handle as normal.
Since the bpf_attr union contains a number of anonymous structs, converting each option inside the option handlers e.g. bpf_prof_load() seems to be the best way to do it. After the conversion there should be minimal additional compat handling/in_compat_syscall conditions etc.
Currently unrestricted capabilities are created in the kernel here with compat_ptr so some care needs to be taken to check none of these are returned to userspace at any point.
This patch should be before PATCH 4 I think. In this way, the compat64 wouldn't be broken between PATCH 4 and PATCH 5.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
re: compat_ptr, there is probably another case to be made for having some kind of kernel only flag for capabilities here, to avoid them being returned to userspace by accident (as discussed for make_user_ptr_unsafe)
include/uapi/linux/bpf.h | 39 ++++++++++++++++++++++ kernel/bpf/syscall.c | 71 +++++++++++++++++++++++++++++++++++----- 2 files changed, 102 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 98a23f9a540e..752a520e1481 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1269,6 +1269,45 @@ struct bpf_stack_build_id {
#define BPF_OBJ_NAME_LEN 16U
+union compat_bpf_attr {
- struct { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type; /* one of enum bpf_prog_type */
__u32 insn_cnt;
__aligned_u64 insns;
__aligned_u64 license;
__u32 log_level; /* verbosity level of verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied buffer */
__u32 kern_version; /* not used */
__u32 prog_flags;
char prog_name[BPF_OBJ_NAME_LEN];
__u32 prog_ifindex; /* ifindex of netdev to prep for */
/* For some prog types expected attach type must be known at
* load time to verify attach type specific parts of prog
* (context accesses, allowed helpers, etc).
*/
__u32 expected_attach_type;
__u32 prog_btf_fd; /* fd pointing to BTF type data */
__u32 func_info_rec_size; /* userspace bpf_func_info size */
__aligned_u64 func_info; /* func info */
__u32 func_info_cnt; /* number of bpf_func_info records */
__u32 line_info_rec_size; /* userspace bpf_line_info size */
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */
union {
/* valid prog_fd to attach to bpf prog */
__u32 attach_prog_fd;
/* or valid module BTF object fd or 0 to attach to vmlinux */
__u32 attach_btf_obj_fd;
};
__u32 core_relo_cnt; /* number of bpf_core_relo */
__aligned_u64 fd_array; /* array of FDs */
__aligned_u64 core_relos;
__u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
- };
+} __attribute__((aligned(8)) > + union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 518ab0e9f2a4..6e3b567d254a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2203,21 +2203,68 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) } }
+#ifdef CONFIG_COMPAT +static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest) +{
- const union compat_bpf_attr *cattr = (union compat_bpf_attr *)vattr;
- dest->prog_type = READ_ONCE(cattr->prog_type);
- dest->insn_cnt = READ_ONCE(cattr->insn_cnt);
- dest->insns = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->insns));
- dest->license = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->license));
- dest->log_level = READ_ONCE(cattr->log_level);
- dest->log_size = READ_ONCE(cattr->log_size);
- dest->log_buf = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->log_buf));
- dest->kern_version = READ_ONCE(cattr->kern_version);
- dest->prog_flags = READ_ONCE(cattr->prog_flags);
- strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN);
- dest->prog_ifindex = READ_ONCE(cattr->prog_ifindex);
- dest->expected_attach_type = READ_ONCE(cattr->expected_attach_type);
- dest->prog_btf_fd = READ_ONCE(cattr->prog_btf_fd);
- dest->func_info_rec_size = READ_ONCE(cattr->func_info_rec_size);
- dest->func_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->func_info));
- dest->func_info_cnt = READ_ONCE(cattr->func_info_cnt);
- dest->line_info_rec_size = READ_ONCE(cattr->line_info_rec_size);
- dest->line_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->line_info));
- dest->line_info_cnt = READ_ONCE(cattr->line_info_cnt);
- dest->attach_btf_id = READ_ONCE(cattr->attach_btf_id);
- dest->attach_prog_fd = READ_ONCE(cattr->attach_prog_fd);
- /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */
- dest->core_relo_cnt = READ_ONCE(cattr->core_relo_cnt);
- dest->fd_array = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->fd_array));
- dest->core_relos = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->core_relos));
- dest->core_relo_rec_size = READ_ONCE(cattr->core_relo_rec_size);
+} +#endif
- /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr) {
- enum bpf_prog_type type = attr->prog_type;
- enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; int err; char license[128]; bool is_gpl;
- union bpf_attr *attr = NULL;
+#ifdef CONFIG_COMPAT
- union bpf_attr compat_attr;
+#endif
if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL;
+#ifdef CONFIG_COMPAT
I think we should use CONFIG_COMPAT64 for newly added compat code. Maybe, I'm not sure, leaving this here as I meant to change CONFIG_COMPAT -> CONFIG_COMPAT64 in the io_uring series as well.
- if (in_compat_syscall())
convert_compat_bpf_prog_load(vattr, &compat_attr);
- attr = in_compat_syscall() ? &compat_attr : (union bpf_attr *) vattr;
+#else
- attr = (union bpf_attr *) vattr;
+#endif
- type = attr->prog_type;
- if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ |
@@ -4621,22 +4668,30 @@ static int bpf_prog_bind_map(union bpf_attr *attr) static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; +#ifdef CONFIG_COMPAT
- union compat_bpf_attr cattr;
- void *vattr = in_compat_syscall() ? (void *)&cattr : (void *)&attr;
- size_t attrsz = in_compat_syscall() ? sizeof(cattr) : sizeof(attr);
+#else
- void *vattr = &attr;
- size_t attrsz = sizeof(attr);
+#endif int err;
if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) return -EPERM;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
- err = bpf_check_uarg_tail_zero(uattr, attrsz, size); if (err) return err;
- size = min_t(u32, size, sizeof(attr));
size = min_t(u32, size, attrsz);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&attr, 0, sizeof(attr));
- if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0)
- memset(vattr, 0, attrsz);
- if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT;
- err = security_bpf(cmd, &attr, size);
- err = security_bpf(cmd, vattr, size); if (err < 0) return err;
@@ -4660,7 +4715,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD:
err = bpf_prog_load(&attr, uattr);
break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr);err = bpf_prog_load(vattr, uattr);
-- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 16/11/2022 15:32, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
The bpf syscall on a native 64 bit system originally did not need a separate compat handler for compat32. Since the union bpf_attr stored pointers as __u64 and was 8B aligned it results in the same struct offsets/memory layout for both 64/32b.
On some new architectures, e.g. Morello PCuABI, userspace pointers are 128 bit (plus 1 out-of-band tag bit), so storing pointers as __u64 is not possible. bpf_attr is therefore extended to hold 128b pointers on these archs, and remains __u64 on those that do not.
In order to support a 64b compat layer (compat64) in PCuABI, add a new union compat_bpf_attr with __u64 members. This retains the offsets and alignment of the original bpf_attr struct.
In order to handle compat64, use generic void* in __sys_bpf(...) to represent either bpf_attr or compat_bpf_attr. Then in each of the multiplexed options e.g. BPF_PROG_LOAD, if in a compat syscall, convert compat_bpf_atr into a native bpf_attr struct and handle as normal.
Since the bpf_attr union contains a number of anonymous structs, converting each option inside the option handlers e.g. bpf_prof_load() seems to be the best way to do it. After the conversion there should be minimal additional compat handling/in_compat_syscall conditions etc.
Currently unrestricted capabilities are created in the kernel here with compat_ptr so some care needs to be taken to check none of these are returned to userspace at any point.
This patch should be before PATCH 4 I think. In this way, the compat64 wouldn't be broken between PATCH 4 and PATCH 5.
Sure, but then this uses compat_ptr() to add capabilities to pointers when converting to the new bpf_attr. How does that work without first amending bpf_attr to accept capabilities? i.e. using compat_ptr and putting it in a __u64 member. Would compat64 actually work?
Seems to me it would be broken either way, but I might be thinking about it the wrong way.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
re: compat_ptr, there is probably another case to be made for having some kind of kernel only flag for capabilities here, to avoid them being returned to userspace by accident (as discussed for make_user_ptr_unsafe)
include/uapi/linux/bpf.h | 39 ++++++++++++++++++++++ kernel/bpf/syscall.c | 71 +++++++++++++++++++++++++++++++++++----- 2 files changed, 102 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 98a23f9a540e..752a520e1481 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1269,6 +1269,45 @@ struct bpf_stack_build_id {
#define BPF_OBJ_NAME_LEN 16U
+union compat_bpf_attr { + struct { /* anonymous struct used by BPF_PROG_LOAD command */ + __u32 prog_type; /* one of enum bpf_prog_type */ + __u32 insn_cnt; + __aligned_u64 insns; + __aligned_u64 license; + __u32 log_level; /* verbosity level of verifier */ + __u32 log_size; /* size of user buffer */ + __aligned_u64 log_buf; /* user supplied buffer */ + __u32 kern_version; /* not used */ + __u32 prog_flags; + char prog_name[BPF_OBJ_NAME_LEN]; + __u32 prog_ifindex; /* ifindex of netdev to prep for */ + /* For some prog types expected attach type must be known at + * load time to verify attach type specific parts of prog + * (context accesses, allowed helpers, etc). + */ + __u32 expected_attach_type; + __u32 prog_btf_fd; /* fd pointing to BTF type data */ + __u32 func_info_rec_size; /* userspace bpf_func_info size */ + __aligned_u64 func_info; /* func info */ + __u32 func_info_cnt; /* number of bpf_func_info records */ + __u32 line_info_rec_size; /* userspace bpf_line_info size */ + __aligned_u64 line_info; /* line info */ + __u32 line_info_cnt; /* number of bpf_line_info records */ + __u32 attach_btf_id; /* in-kernel BTF type id to attach to */ + union { + /* valid prog_fd to attach to bpf prog */ + __u32 attach_prog_fd; + /* or valid module BTF object fd or 0 to attach to vmlinux */ + __u32 attach_btf_obj_fd; + }; + __u32 core_relo_cnt; /* number of bpf_core_relo */ + __aligned_u64 fd_array; /* array of FDs */ + __aligned_u64 core_relos; + __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ + }; +} __attribute__((aligned(8)) > + union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 518ab0e9f2a4..6e3b567d254a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2203,21 +2203,68 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) } }
+#ifdef CONFIG_COMPAT +static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest) +{ + const union compat_bpf_attr *cattr = (union compat_bpf_attr *)vattr;
+ dest->prog_type = READ_ONCE(cattr->prog_type); + dest->insn_cnt = READ_ONCE(cattr->insn_cnt); + dest->insns = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->insns)); + dest->license = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->license)); + dest->log_level = READ_ONCE(cattr->log_level); + dest->log_size = READ_ONCE(cattr->log_size); + dest->log_buf = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->log_buf)); + dest->kern_version = READ_ONCE(cattr->kern_version); + dest->prog_flags = READ_ONCE(cattr->prog_flags); + strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN); + dest->prog_ifindex = READ_ONCE(cattr->prog_ifindex); + dest->expected_attach_type = READ_ONCE(cattr->expected_attach_type); + dest->prog_btf_fd = READ_ONCE(cattr->prog_btf_fd); + dest->func_info_rec_size = READ_ONCE(cattr->func_info_rec_size); + dest->func_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->func_info)); + dest->func_info_cnt = READ_ONCE(cattr->func_info_cnt); + dest->line_info_rec_size = READ_ONCE(cattr->line_info_rec_size); + dest->line_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->line_info)); + dest->line_info_cnt = READ_ONCE(cattr->line_info_cnt); + dest->attach_btf_id = READ_ONCE(cattr->attach_btf_id); + dest->attach_prog_fd = READ_ONCE(cattr->attach_prog_fd); + /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */ + dest->core_relo_cnt = READ_ONCE(cattr->core_relo_cnt); + dest->fd_array = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->fd_array)); + dest->core_relos = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->core_relos)); + dest->core_relo_rec_size = READ_ONCE(cattr->core_relo_rec_size); +} +#endif
/* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr) { - enum bpf_prog_type type = attr->prog_type; + enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; int err; char license[128]; bool is_gpl; + union bpf_attr *attr = NULL; +#ifdef CONFIG_COMPAT + union bpf_attr compat_attr; +#endif
if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL;
+#ifdef CONFIG_COMPAT
I think we should use CONFIG_COMPAT64 for newly added compat code. Maybe, I'm not sure, leaving this here as I meant to change CONFIG_COMPAT -> CONFIG_COMPAT64 in the io_uring series as well.
Yeah, that was something I meant to ask about io_uring as well. CONFIG_COMPAT64 implies CONFIG_CHERI_PURECAP_UABI, so in that case the normal native 64/compat32 can apply without using the extra conversion structs etc.
In that case CONFIG_COMPAT64 is more appropriate here and in io_uring.
+ if (in_compat_syscall()) + convert_compat_bpf_prog_load(vattr, &compat_attr); + attr = in_compat_syscall() ? &compat_attr : (union bpf_attr *) vattr; +#else + attr = (union bpf_attr *) vattr; +#endif + type = attr->prog_type;
if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ | @@ -4621,22 +4668,30 @@ static int bpf_prog_bind_map(union bpf_attr *attr) static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; +#ifdef CONFIG_COMPAT + union compat_bpf_attr cattr; + void *vattr = in_compat_syscall() ? (void *)&cattr : (void *)&attr; + size_t attrsz = in_compat_syscall() ? sizeof(cattr) : sizeof(attr); +#else + void *vattr = &attr; + size_t attrsz = sizeof(attr); +#endif int err;
if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) return -EPERM;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); + err = bpf_check_uarg_tail_zero(uattr, attrsz, size); if (err) return err; - size = min_t(u32, size, sizeof(attr)); + size = min_t(u32, size, attrsz);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */ - memset(&attr, 0, sizeof(attr)); - if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0) + memset(vattr, 0, attrsz); + if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT;
- err = security_bpf(cmd, &attr, size); + err = security_bpf(cmd, vattr, size); if (err < 0) return err;
@@ -4660,7 +4715,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD: - err = bpf_prog_load(&attr, uattr); + err = bpf_prog_load(vattr, uattr); break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr); -- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 18/11/2022 13:05, Zachary Leaf wrote:
On 16/11/2022 15:32, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
The bpf syscall on a native 64 bit system originally did not need a separate compat handler for compat32. Since the union bpf_attr stored pointers as __u64 and was 8B aligned it results in the same struct offsets/memory layout for both 64/32b.
On some new architectures, e.g. Morello PCuABI, userspace pointers are 128 bit (plus 1 out-of-band tag bit), so storing pointers as __u64 is not possible. bpf_attr is therefore extended to hold 128b pointers on these archs, and remains __u64 on those that do not.
In order to support a 64b compat layer (compat64) in PCuABI, add a new union compat_bpf_attr with __u64 members. This retains the offsets and alignment of the original bpf_attr struct.
In order to handle compat64, use generic void* in __sys_bpf(...) to represent either bpf_attr or compat_bpf_attr. Then in each of the multiplexed options e.g. BPF_PROG_LOAD, if in a compat syscall, convert compat_bpf_atr into a native bpf_attr struct and handle as normal.
Since the bpf_attr union contains a number of anonymous structs, converting each option inside the option handlers e.g. bpf_prof_load() seems to be the best way to do it. After the conversion there should be minimal additional compat handling/in_compat_syscall conditions etc.
Currently unrestricted capabilities are created in the kernel here with compat_ptr so some care needs to be taken to check none of these are returned to userspace at any point.
This patch should be before PATCH 4 I think. In this way, the compat64 wouldn't be broken between PATCH 4 and PATCH 5.
Sure, but then this uses compat_ptr() to add capabilities to pointers when converting to the new bpf_attr. How does that work without first amending bpf_attr to accept capabilities? i.e. using compat_ptr and putting it in a __u64 member. Would compat64 actually work?
Seems to me it would be broken either way, but I might be thinking about it the wrong way.
Have a look at how Tudor handled this in patch 3/4 of the io_uring series. Patch 3 splits out the compat path and adds completely "flat" conversion (i.e. even pointer members are assigned as-is). Then patch 4 modifies native to use __kernel_uintptr_t for pointer members, and makes use of compat_ptr() to convert them from the compat struct. I did ponder on this for a while but in the end I think this is the best we can do.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
re: compat_ptr, there is probably another case to be made for having some kind of kernel only flag for capabilities here, to avoid them being returned to userspace by accident (as discussed for make_user_ptr_unsafe)
Forgot to comment on this - I don't see how such pointers would end up returned to userspace here. Besides, even if pointers created by compat_ptr() were returned to the caller, it would hardly be a concern as such pointers are (will be) derived from the caller's DDC, so no escalation would be happening here.
include/uapi/linux/bpf.h | 39 ++++++++++++++++++++++ kernel/bpf/syscall.c | 71 +++++++++++++++++++++++++++++++++++----- 2 files changed, 102 insertions(+), 8 deletions(-)
[...]
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr) { - enum bpf_prog_type type = attr->prog_type; + enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; int err; char license[128]; bool is_gpl; + union bpf_attr *attr = NULL; +#ifdef CONFIG_COMPAT + union bpf_attr compat_attr; +#endif
if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL;
+#ifdef CONFIG_COMPAT
I think we should use CONFIG_COMPAT64 for newly added compat code. Maybe, I'm not sure, leaving this here as I meant to change CONFIG_COMPAT -> CONFIG_COMPAT64 in the io_uring series as well.
Yeah, that was something I meant to ask about io_uring as well. CONFIG_COMPAT64 implies CONFIG_CHERI_PURECAP_UABI, so in that case the normal native 64/compat32 can apply without using the extra conversion structs etc.
In that case CONFIG_COMPAT64 is more appropriate here and in io_uring.
There's nothing functionally wrong with using CONFIG_COMPAT, but as Arnd pointed out on Tudor's patches [1], this is unnecessary overhead in compat32, so it seems preferable to use CONFIG_COMPAT64. This is conceptually different from existing compat handlers (that need to handle actual pointers in structs for instance): here we know that the layout is the same for all ABIs except PCuABI. Arguably the most semantically correct check would be CONFIG_COMPAT && CONFIG_CHERI_PURECAP_UABI, but it's probably OK to take a shortcut and assume that a layout conversion is needed in compat64.
Kevin
[1] https://git.morello-project.org/tudcre01/linux/-/commit/4441ec802dd1229c5743...
On 23/11/2022 10:36, Kevin Brodsky wrote:
On 18/11/2022 13:05, Zachary Leaf wrote:
On 16/11/2022 15:32, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
The bpf syscall on a native 64 bit system originally did not need a separate compat handler for compat32. Since the union bpf_attr stored pointers as __u64 and was 8B aligned it results in the same struct offsets/memory layout for both 64/32b.
On some new architectures, e.g. Morello PCuABI, userspace pointers are 128 bit (plus 1 out-of-band tag bit), so storing pointers as __u64 is not possible. bpf_attr is therefore extended to hold 128b pointers on these archs, and remains __u64 on those that do not.
In order to support a 64b compat layer (compat64) in PCuABI, add a new union compat_bpf_attr with __u64 members. This retains the offsets and alignment of the original bpf_attr struct.
In order to handle compat64, use generic void* in __sys_bpf(...) to represent either bpf_attr or compat_bpf_attr. Then in each of the multiplexed options e.g. BPF_PROG_LOAD, if in a compat syscall, convert compat_bpf_atr into a native bpf_attr struct and handle as normal.
Since the bpf_attr union contains a number of anonymous structs, converting each option inside the option handlers e.g. bpf_prof_load() seems to be the best way to do it. After the conversion there should be minimal additional compat handling/in_compat_syscall conditions etc.
Currently unrestricted capabilities are created in the kernel here with compat_ptr so some care needs to be taken to check none of these are returned to userspace at any point.
This patch should be before PATCH 4 I think. In this way, the compat64 wouldn't be broken between PATCH 4 and PATCH 5.
Sure, but then this uses compat_ptr() to add capabilities to pointers when converting to the new bpf_attr. How does that work without first amending bpf_attr to accept capabilities? i.e. using compat_ptr and putting it in a __u64 member. Would compat64 actually work?
Seems to me it would be broken either way, but I might be thinking about it the wrong way.
Have a look at how Tudor handled this in patch 3/4 of the io_uring series. Patch 3 splits out the compat path and adds completely "flat" conversion (i.e. even pointer members are assigned as-is). Then patch 4 modifies native to use __kernel_uintptr_t for pointer members, and makes use of compat_ptr() to convert them from the compat struct. I did ponder on this for a while but in the end I think this is the best we can do.
Thanks, will take a look and go with this approach.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
re: compat_ptr, there is probably another case to be made for having some kind of kernel only flag for capabilities here, to avoid them being returned to userspace by accident (as discussed for make_user_ptr_unsafe)
Forgot to comment on this - I don't see how such pointers would end up returned to userspace here. Besides, even if pointers created by compat_ptr() were returned to the caller, it would hardly be a concern as such pointers are (will be) derived from the caller's DDC, so no escalation would be happening here.
Ok yep, good point, there should be no problem when they're derived from user's DDC. Currently these are root capabilities which would be an issue where pointers or memory containing ptrs is returned to the user.
I was just thinking about the general case/issue than anything specific here in bpf.
Thanks, Zach
include/uapi/linux/bpf.h | 39 ++++++++++++++++++++++ kernel/bpf/syscall.c | 71 +++++++++++++++++++++++++++++++++++----- 2 files changed, 102 insertions(+), 8 deletions(-)
[...]
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr) { - enum bpf_prog_type type = attr->prog_type; + enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; int err; char license[128]; bool is_gpl; + union bpf_attr *attr = NULL; +#ifdef CONFIG_COMPAT + union bpf_attr compat_attr; +#endif
if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL;
+#ifdef CONFIG_COMPAT
I think we should use CONFIG_COMPAT64 for newly added compat code. Maybe, I'm not sure, leaving this here as I meant to change CONFIG_COMPAT -> CONFIG_COMPAT64 in the io_uring series as well.
Yeah, that was something I meant to ask about io_uring as well. CONFIG_COMPAT64 implies CONFIG_CHERI_PURECAP_UABI, so in that case the normal native 64/compat32 can apply without using the extra conversion structs etc.
In that case CONFIG_COMPAT64 is more appropriate here and in io_uring.
There's nothing functionally wrong with using CONFIG_COMPAT, but as Arnd pointed out on Tudor's patches [1], this is unnecessary overhead in compat32, so it seems preferable to use CONFIG_COMPAT64. This is conceptually different from existing compat handlers (that need to handle actual pointers in structs for instance): here we know that the layout is the same for all ABIs except PCuABI. Arguably the most semantically correct check would be CONFIG_COMPAT && CONFIG_CHERI_PURECAP_UABI, but it's probably OK to take a shortcut and assume that a layout conversion is needed in compat64.
Kevin
[1] https://git.morello-project.org/tudcre01/linux/-/commit/4441ec802dd1229c5743...
On 15/11/2022 10:08, Zachary Leaf wrote:
The bpf syscall on a native 64 bit system originally did not need a separate compat handler for compat32. Since the union bpf_attr stored pointers as __u64 and was 8B aligned it results in the same struct offsets/memory layout for both 64/32b.
On some new architectures, e.g. Morello PCuABI, userspace pointers are 128 bit (plus 1 out-of-band tag bit), so storing pointers as __u64 is not possible. bpf_attr is therefore extended to hold 128b pointers on these archs, and remains __u64 on those that do not.
In order to support a 64b compat layer (compat64) in PCuABI, add a new union compat_bpf_attr with __u64 members. This retains the offsets and alignment of the original bpf_attr struct.
In order to handle compat64, use generic void* in __sys_bpf(...) to represent either bpf_attr or compat_bpf_attr. Then in each of the multiplexed options e.g. BPF_PROG_LOAD, if in a compat syscall, convert compat_bpf_atr into a native bpf_attr struct and handle as normal.
Since the bpf_attr union contains a number of anonymous structs, converting each option inside the option handlers e.g. bpf_prof_load() seems to be the best way to do it. After the conversion there should be minimal additional compat handling/in_compat_syscall conditions etc.
Currently unrestricted capabilities are created in the kernel here with compat_ptr so some care needs to be taken to check none of these are returned to userspace at any point.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
re: compat_ptr, there is probably another case to be made for having some kind of kernel only flag for capabilities here, to avoid them being returned to userspace by accident (as discussed for make_user_ptr_unsafe)
include/uapi/linux/bpf.h | 39 ++++++++++++++++++++++ kernel/bpf/syscall.c | 71 +++++++++++++++++++++++++++++++++++----- 2 files changed, 102 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 98a23f9a540e..752a520e1481 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h
compat_* types do not belong to uapi headers, because they are irrelevant to userspace. There is unfortunately no agreed place to put them (they are often found in a kernel header or in the .c next to the compat handling), so you can choose to move it wherever makes more sense / is easier.
@@ -1269,6 +1269,45 @@ struct bpf_stack_build_id {
#define BPF_OBJ_NAME_LEN 16U
+union compat_bpf_attr {
- struct { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type; /* one of enum bpf_prog_type */
__u32 insn_cnt;
__aligned_u64 insns;
__aligned_u64 license;
__u32 log_level; /* verbosity level of verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied buffer */
__u32 kern_version; /* not used */
__u32 prog_flags;
char prog_name[BPF_OBJ_NAME_LEN];
__u32 prog_ifindex; /* ifindex of netdev to prep for */
/* For some prog types expected attach type must be known at
* load time to verify attach type specific parts of prog
* (context accesses, allowed helpers, etc).
*/
__u32 expected_attach_type;
__u32 prog_btf_fd; /* fd pointing to BTF type data */
__u32 func_info_rec_size; /* userspace bpf_func_info size */
__aligned_u64 func_info; /* func info */
__u32 func_info_cnt; /* number of bpf_func_info records */
__u32 line_info_rec_size; /* userspace bpf_line_info size */
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */
union {
/* valid prog_fd to attach to bpf prog */
__u32 attach_prog_fd;
/* or valid module BTF object fd or 0 to attach to vmlinux */
__u32 attach_btf_obj_fd;
};
__u32 core_relo_cnt; /* number of bpf_core_relo */
__aligned_u64 fd_array; /* array of FDs */
__aligned_u64 core_relos;
__u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
- };
+} __attribute__((aligned(8)));
union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 518ab0e9f2a4..6e3b567d254a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2203,21 +2203,68 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) } }
+#ifdef CONFIG_COMPAT +static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest) +{
- const union compat_bpf_attr *cattr = (union compat_bpf_attr *)vattr;
- dest->prog_type = READ_ONCE(cattr->prog_type);
- dest->insn_cnt = READ_ONCE(cattr->insn_cnt);
- dest->insns = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->insns));
- dest->license = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->license));
- dest->log_level = READ_ONCE(cattr->log_level);
- dest->log_size = READ_ONCE(cattr->log_size);
- dest->log_buf = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->log_buf));
- dest->kern_version = READ_ONCE(cattr->kern_version);
- dest->prog_flags = READ_ONCE(cattr->prog_flags);
- strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN);
- dest->prog_ifindex = READ_ONCE(cattr->prog_ifindex);
- dest->expected_attach_type = READ_ONCE(cattr->expected_attach_type);
- dest->prog_btf_fd = READ_ONCE(cattr->prog_btf_fd);
- dest->func_info_rec_size = READ_ONCE(cattr->func_info_rec_size);
- dest->func_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->func_info));
- dest->func_info_cnt = READ_ONCE(cattr->func_info_cnt);
- dest->line_info_rec_size = READ_ONCE(cattr->line_info_rec_size);
- dest->line_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->line_info));
- dest->line_info_cnt = READ_ONCE(cattr->line_info_cnt);
- dest->attach_btf_id = READ_ONCE(cattr->attach_btf_id);
- dest->attach_prog_fd = READ_ONCE(cattr->attach_prog_fd);
- /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */
- dest->core_relo_cnt = READ_ONCE(cattr->core_relo_cnt);
- dest->fd_array = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->fd_array));
- dest->core_relos = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->core_relos));
- dest->core_relo_rec_size = READ_ONCE(cattr->core_relo_rec_size);
There is no need for special handling with READ_ONCE() here. io_uring is a very special case where the source is a piece of memory that is shared with userspace and extra care is required. Here everything is stored on the stack (as usual) and normal accesses are perfectly fine.
+} +#endif
/* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr) {
- enum bpf_prog_type type = attr->prog_type;
- enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; int err; char license[128]; bool is_gpl;
- union bpf_attr *attr = NULL;
+#ifdef CONFIG_COMPAT
- union bpf_attr compat_attr;
+#endif
if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL;
+#ifdef CONFIG_COMPAT
- if (in_compat_syscall())
convert_compat_bpf_prog_load(vattr, &compat_attr);
- attr = in_compat_syscall() ? &compat_attr : (union bpf_attr *) vattr;
+#else
- attr = (union bpf_attr *) vattr;
+#endif
- type = attr->prog_type;
- if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ |
@@ -4621,22 +4668,30 @@ static int bpf_prog_bind_map(union bpf_attr *attr) static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; +#ifdef CONFIG_COMPAT
- union compat_bpf_attr cattr;
- void *vattr = in_compat_syscall() ? (void *)&cattr : (void *)&attr;
The fact that we are using in_compat_syscall() here reinforces my concern with the ABI that actually applies to bpf_sys_bpf(). If it is somehow obtaining bpf_attr from the current process, then this approach makes sense, otherwise we might have a problem here.
- size_t attrsz = in_compat_syscall() ? sizeof(cattr) : sizeof(attr);
+#else
- void *vattr = &attr;
- size_t attrsz = sizeof(attr);
+#endif int err;
if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) return -EPERM;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
- err = bpf_check_uarg_tail_zero(uattr, attrsz, size); if (err) return err;
- size = min_t(u32, size, sizeof(attr));
size = min_t(u32, size, attrsz);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&attr, 0, sizeof(attr));
- if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0)
- memset(vattr, 0, attrsz);
- if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0)
Note that _with_ptr is only appropriate for native, in compat there is no actual user pointer in the struct.
return -EFAULT;
- err = security_bpf(cmd, &attr, size);
- err = security_bpf(cmd, vattr, size); if (err < 0) return err;
@@ -4660,7 +4715,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD:
err = bpf_prog_load(&attr, uattr);
break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr);err = bpf_prog_load(vattr, uattr);
Considering you will end up touching more than half of the lines in __sys_bpf(), I am tempted to recommend to actually create a compat handler for bpf and a new __sys_compat_bpf() or similar. The duplication this would create is minimal as there is barely any logic in __sys_bpf() - it's all nicely abstracted in helper functions - and this way you can keep all the compat logic in just one function (plus helpers for the struct layout conversions). bpf_prog_load() and friends can remain unchanged if all the conversion work happens in __sys_compat_bpf(), in a similar fashion to more "traditional" compat handlers.
A bit more rationale on why I'm recommending this: most syscalls that take a struct read the same members in that struct in any case, so converting from compat to native is straightforward (just one conversion helper). However here we effectively have a dozen structs stuffed into a union (only marginally nicer than the syscall taking a void *), so each subcommand needs its own conversion. What this means is that you need compat code to 1. read the whole compat union and 2. convert the layout of each struct. If all this compat code is scattered in the native syscall / subcommand handlers, that's an awful lot of conditionals (on in_compat_syscall()). Conversely if all of this can happen in the same function, that's a big win in readability (and maybe a bit faster). Finally if you have just one function doing everything, you might as well have a compat handler calling it (but that's a detail really).
Kevin
-- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 23/11/2022 09:57, Kevin Brodsky wrote:
On 15/11/2022 10:08, Zachary Leaf wrote:
The bpf syscall on a native 64 bit system originally did not need a separate compat handler for compat32. Since the union bpf_attr stored pointers as __u64 and was 8B aligned it results in the same struct offsets/memory layout for both 64/32b.
On some new architectures, e.g. Morello PCuABI, userspace pointers are 128 bit (plus 1 out-of-band tag bit), so storing pointers as __u64 is not possible. bpf_attr is therefore extended to hold 128b pointers on these archs, and remains __u64 on those that do not.
In order to support a 64b compat layer (compat64) in PCuABI, add a new union compat_bpf_attr with __u64 members. This retains the offsets and alignment of the original bpf_attr struct.
In order to handle compat64, use generic void* in __sys_bpf(...) to represent either bpf_attr or compat_bpf_attr. Then in each of the multiplexed options e.g. BPF_PROG_LOAD, if in a compat syscall, convert compat_bpf_atr into a native bpf_attr struct and handle as normal.
Since the bpf_attr union contains a number of anonymous structs, converting each option inside the option handlers e.g. bpf_prof_load() seems to be the best way to do it. After the conversion there should be minimal additional compat handling/in_compat_syscall conditions etc.
Currently unrestricted capabilities are created in the kernel here with compat_ptr so some care needs to be taken to check none of these are returned to userspace at any point.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
re: compat_ptr, there is probably another case to be made for having some kind of kernel only flag for capabilities here, to avoid them being returned to userspace by accident (as discussed for make_user_ptr_unsafe)
include/uapi/linux/bpf.h | 39 ++++++++++++++++++++++ kernel/bpf/syscall.c | 71 +++++++++++++++++++++++++++++++++++----- 2 files changed, 102 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 98a23f9a540e..752a520e1481 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h
compat_* types do not belong to uapi headers, because they are irrelevant to userspace. There is unfortunately no agreed place to put them (they are often found in a kernel header or in the .c next to the compat handling), so you can choose to move it wherever makes more sense / is easier.
Ack
@@ -1269,6 +1269,45 @@ struct bpf_stack_build_id {
#define BPF_OBJ_NAME_LEN 16U
+union compat_bpf_attr {
- struct { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type; /* one of enum bpf_prog_type */
__u32 insn_cnt;
__aligned_u64 insns;
__aligned_u64 license;
__u32 log_level; /* verbosity level of verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied buffer */
__u32 kern_version; /* not used */
__u32 prog_flags;
char prog_name[BPF_OBJ_NAME_LEN];
__u32 prog_ifindex; /* ifindex of netdev to prep for */
/* For some prog types expected attach type must be known at
* load time to verify attach type specific parts of prog
* (context accesses, allowed helpers, etc).
*/
__u32 expected_attach_type;
__u32 prog_btf_fd; /* fd pointing to BTF type data */
__u32 func_info_rec_size; /* userspace bpf_func_info size */
__aligned_u64 func_info; /* func info */
__u32 func_info_cnt; /* number of bpf_func_info records */
__u32 line_info_rec_size; /* userspace bpf_line_info size */
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */
union {
/* valid prog_fd to attach to bpf prog */
__u32 attach_prog_fd;
/* or valid module BTF object fd or 0 to attach to vmlinux */
__u32 attach_btf_obj_fd;
};
__u32 core_relo_cnt; /* number of bpf_core_relo */
__aligned_u64 fd_array; /* array of FDs */
__aligned_u64 core_relos;
__u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
- };
+} __attribute__((aligned(8)));
union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 518ab0e9f2a4..6e3b567d254a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2203,21 +2203,68 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) } }
+#ifdef CONFIG_COMPAT +static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest) +{
- const union compat_bpf_attr *cattr = (union compat_bpf_attr *)vattr;
- dest->prog_type = READ_ONCE(cattr->prog_type);
- dest->insn_cnt = READ_ONCE(cattr->insn_cnt);
- dest->insns = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->insns));
- dest->license = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->license));
- dest->log_level = READ_ONCE(cattr->log_level);
- dest->log_size = READ_ONCE(cattr->log_size);
- dest->log_buf = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->log_buf));
- dest->kern_version = READ_ONCE(cattr->kern_version);
- dest->prog_flags = READ_ONCE(cattr->prog_flags);
- strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN);
- dest->prog_ifindex = READ_ONCE(cattr->prog_ifindex);
- dest->expected_attach_type = READ_ONCE(cattr->expected_attach_type);
- dest->prog_btf_fd = READ_ONCE(cattr->prog_btf_fd);
- dest->func_info_rec_size = READ_ONCE(cattr->func_info_rec_size);
- dest->func_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->func_info));
- dest->func_info_cnt = READ_ONCE(cattr->func_info_cnt);
- dest->line_info_rec_size = READ_ONCE(cattr->line_info_rec_size);
- dest->line_info = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->line_info));
- dest->line_info_cnt = READ_ONCE(cattr->line_info_cnt);
- dest->attach_btf_id = READ_ONCE(cattr->attach_btf_id);
- dest->attach_prog_fd = READ_ONCE(cattr->attach_prog_fd);
- /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */
- dest->core_relo_cnt = READ_ONCE(cattr->core_relo_cnt);
- dest->fd_array = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->fd_array));
- dest->core_relos = (__kernel_aligned_uintptr_t)compat_ptr(READ_ONCE(cattr->core_relos));
- dest->core_relo_rec_size = READ_ONCE(cattr->core_relo_rec_size);
There is no need for special handling with READ_ONCE() here. io_uring is a very special case where the source is a piece of memory that is shared with userspace and extra care is required. Here everything is stored on the stack (as usual) and normal accesses are perfectly fine.
Ah yep. I knew this or should have known it from the io_uring review.
+} +#endif
/* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr) {
- enum bpf_prog_type type = attr->prog_type;
- enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; int err; char license[128]; bool is_gpl;
- union bpf_attr *attr = NULL;
+#ifdef CONFIG_COMPAT
- union bpf_attr compat_attr;
+#endif
if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL;
+#ifdef CONFIG_COMPAT
- if (in_compat_syscall())
convert_compat_bpf_prog_load(vattr, &compat_attr);
- attr = in_compat_syscall() ? &compat_attr : (union bpf_attr *) vattr;
+#else
- attr = (union bpf_attr *) vattr;
+#endif
- type = attr->prog_type;
- if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ |
@@ -4621,22 +4668,30 @@ static int bpf_prog_bind_map(union bpf_attr *attr) static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; +#ifdef CONFIG_COMPAT
- union compat_bpf_attr cattr;
- void *vattr = in_compat_syscall() ? (void *)&cattr : (void *)&attr;
The fact that we are using in_compat_syscall() here reinforces my concern with the ABI that actually applies to bpf_sys_bpf(). If it is somehow obtaining bpf_attr from the current process, then this approach makes sense, otherwise we might have a problem here.
Still will need to look into this more. I'd be surprised if what you're suggesting is the case, since that would mean we're mis-handling user pointers, not storing them with a __user tag, and possibly using them without going through uaccess. That may be possible so will check either way.
- size_t attrsz = in_compat_syscall() ? sizeof(cattr) : sizeof(attr);
+#else
- void *vattr = &attr;
- size_t attrsz = sizeof(attr);
+#endif int err;
if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) return -EPERM;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
- err = bpf_check_uarg_tail_zero(uattr, attrsz, size); if (err) return err;
- size = min_t(u32, size, sizeof(attr));
size = min_t(u32, size, attrsz);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&attr, 0, sizeof(attr));
- if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0)
- memset(vattr, 0, attrsz);
- if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0)
Note that _with_ptr is only appropriate for native, in compat there is no actual user pointer in the struct.
Ack. I assume you mean in compat there *is* user pointers, it just doesn't have a capability so makes no sense to go through the capability preserving _with_ptr variant.
return -EFAULT;
- err = security_bpf(cmd, &attr, size);
- err = security_bpf(cmd, vattr, size); if (err < 0) return err;
@@ -4660,7 +4715,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD:
err = bpf_prog_load(&attr, uattr);
break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr);err = bpf_prog_load(vattr, uattr);
Considering you will end up touching more than half of the lines in __sys_bpf(), I am tempted to recommend to actually create a compat handler for bpf and a new __sys_compat_bpf() or similar. The duplication this would create is minimal as there is barely any logic in __sys_bpf()
- it's all nicely abstracted in helper functions - and this way you can
keep all the compat logic in just one function (plus helpers for the struct layout conversions). bpf_prog_load() and friends can remain unchanged if all the conversion work happens in __sys_compat_bpf(), in a similar fashion to more "traditional" compat handlers.
A bit more rationale on why I'm recommending this: most syscalls that take a struct read the same members in that struct in any case, so converting from compat to native is straightforward (just one conversion helper). However here we effectively have a dozen structs stuffed into a union (only marginally nicer than the syscall taking a void *), so each subcommand needs its own conversion. What this means is that you need compat code to 1. read the whole compat union and 2. convert the layout of each struct. If all this compat code is scattered in the native syscall / subcommand handlers, that's an awful lot of conditionals (on in_compat_syscall()). Conversely if all of this can happen in the same function, that's a big win in readability (and maybe a bit faster). Finally if you have just one function doing everything, you might as well have a compat handler calling it (but that's a detail really).
Agreed on that - see response on [RFC PATCH 9/9] bpf/security: move lsm hook point.
Thanks, Zach
Kevin
-- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 28/11/2022 16:05, Zachary Leaf wrote:
/* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&attr, 0, sizeof(attr));
- if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0)
- memset(vattr, 0, attrsz);
- if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0)
Note that _with_ptr is only appropriate for native, in compat there is no actual user pointer in the struct.
Ack. I assume you mean in compat there *is* user pointers, it just doesn't have a capability so makes no sense to go through the capability preserving _with_ptr variant.
This is a bit subtle indeed. There are technically no user pointers in compat structs, in the sense of no *native* user pointers. There can however be *compat* user pointers, i.e. compat_uptr_t (or compat_caddr_t). The _with_ptr variants are only concerned with native user pointers, which is alright for supporting compat64 in addition to compat32; in both cases compat user pointers remain integers.
Kevin
The CHECK_ATTR macro needs adapting to support union compat_bpf_attr as well as union bpf_attr.
The union bpf_attr contains many structs of various sizes for each of the bpf syscall's multiplexed options. After declaring bpf_attr, we zero the entire memory of the union, then copy_from_user() the bpf_attr config struct passed into the syscall to the new struct.
Since most of the structs do not take up the full memory footprint of the union, we should not have any data past the last struct member for any given option. CHECK_ATTR macro checks there is no data except 0 for the gap between the end of the last member and the end of the union.
Adapt the macro to be able to handle compat calls based on in_compat_syscall(). If in compat, use compat_bpf_attr union with the appropriate offsets.
Since syscall options now take a void* parameter instead of bpf_attr* (to also support compat_bpf_attr*), adapt macro to cast the void* appropriately to select the correct member offsets.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- kernel/bpf/syscall.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 6e3b567d254a..910544bcae6e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -735,6 +735,18 @@ int bpf_get_file_flag(int flags) offsetof(union bpf_attr, CMD##_LAST_FIELD) - \ sizeof(attr->CMD##_LAST_FIELD)) != NULL
+#define __CHECK_ATTR(CMD, TYPE) \ + (memchr_inv((void *) &(((TYPE *)vattr)->CMD##_LAST_FIELD) + \ + sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD), 0, \ + sizeof(*(TYPE *)vattr) - \ + offsetof(TYPE, CMD##_LAST_FIELD) - \ + sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD)) != NULL) + +#define CHECK_ATTR_FIXED(CMD) \ + (in_compat_syscall() ? \ + __CHECK_ATTR(CMD, union compat_bpf_attr) : \ + __CHECK_ATTR(CMD, union bpf_attr)) + /* dst and src must have at least "size" number of bytes. * Return strlen on success and < 0 on error. */ @@ -2253,7 +2265,7 @@ static int bpf_prog_load(void *vattr, bpfptr_t uattr) union bpf_attr compat_attr; #endif
- if (CHECK_ATTR(BPF_PROG_LOAD)) + if (CHECK_ATTR_FIXED(BPF_PROG_LOAD)) return -EINVAL;
#ifdef CONFIG_COMPAT
On 15/11/2022 10:08, Zachary Leaf wrote:
The CHECK_ATTR macro needs adapting to support union compat_bpf_attr as well as union bpf_attr.
The union bpf_attr contains many structs of various sizes for each of the bpf syscall's multiplexed options. After declaring bpf_attr, we zero the entire memory of the union, then copy_from_user() the bpf_attr config struct passed into the syscall to the new struct.
Since most of the structs do not take up the full memory footprint of the union, we should not have any data past the last struct member for any given option. CHECK_ATTR macro checks there is no data except 0 for the gap between the end of the last member and the end of the union.
Adapt the macro to be able to handle compat calls based on in_compat_syscall(). If in compat, use compat_bpf_attr union with the appropriate offsets.
Since syscall options now take a void* parameter instead of bpf_attr* (to also support compat_bpf_attr*), adapt macro to cast the void* appropriately to select the correct member offsets.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
kernel/bpf/syscall.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 6e3b567d254a..910544bcae6e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -735,6 +735,18 @@ int bpf_get_file_flag(int flags) offsetof(union bpf_attr, CMD##_LAST_FIELD) - \ sizeof(attr->CMD##_LAST_FIELD)) != NULL +#define __CHECK_ATTR(CMD, TYPE) \
- (memchr_inv((void *) &(((TYPE *)vattr)->CMD##_LAST_FIELD) + \
sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD), 0, \
sizeof(*(TYPE *)vattr) - \
offsetof(TYPE, CMD##_LAST_FIELD) - \
sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD)) != NULL)
+#define CHECK_ATTR_FIXED(CMD) \
- (in_compat_syscall() ? \
__CHECK_ATTR(CMD, union compat_bpf_attr) : \
__CHECK_ATTR(CMD, union bpf_attr))
/* dst and src must have at least "size" number of bytes.
- Return strlen on success and < 0 on error.
*/ @@ -2253,7 +2265,7 @@ static int bpf_prog_load(void *vattr, bpfptr_t uattr) union bpf_attr compat_attr; #endif
- if (CHECK_ATTR(BPF_PROG_LOAD))
If you go for my recommendation in patch 5 (separate compat handler), then all these CHECK_ATTR() should move to __sys_bpf() and be duplicated in __sys_compat_bpf(). On the plus side, in that case, you don't need conditional code in that macro either - just need to replace union bpf_attr with a macro parameter (or even typeof(*attr)).
Kevin
- if (CHECK_ATTR_FIXED(BPF_PROG_LOAD)) return -EINVAL;
#ifdef CONFIG_COMPAT
note: this is a temporary commit to allow testing of CHECK_ATTR_FIXED and can safely be ignored. These changes will not appear in any final versions of this patchset.
In order to be able to check/test the changes made to CHECK_ATTR, some blank space is required beyond the end of the last struct member. CHECK_ATTR then checks if this space is zero'd.
The BPF_PROG_LOAD struct is the largest struct in the bpf_attr union, and so does not have any space beyond the last member. Since this RFC only implements handling for BPF_PROG_LOAD, add some temporary padding to verify changes to CHECK_ATTR with an LTP test created for this RFC.
Once other syscall options/bpf_attr structs are supported we can use the smaller structs with natural gaps off end for any tests we might want to add for this.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/uapi/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 752a520e1481..b22e42b1e2f7 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1305,6 +1305,7 @@ union compat_bpf_attr { __aligned_u64 fd_array; /* array of FDs */ __aligned_u64 core_relos; __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ + __u64 pad[2]; }; } __attribute__((aligned(8)));
@@ -1401,6 +1402,7 @@ union bpf_attr { __kernel_aligned_uintptr_t fd_array; /* array of FDs */ __kernel_aligned_uintptr_t core_relos; __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ + __u64 pad[2]; };
struct { /* anonymous struct used by BPF_OBJ_* commands */
Currently there is a Linux Security Module (LSM) hook for all bpf syscalls directly after the bpf_attr union is copied into the kernel.
After changes to support compat64 for PCuABI, attr is now a void* since it can represent either a bpf_attr or a compat_bpf_attr.
Change the bpf LSM hook to align with this. eBPF LSM's will therefore have to check in_compat_syscall() to chose the correct union + memory layout of the attr passed to the module.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- Note: This is potential solution 1 of 2. I'm not sure this is ideal as existing LSMs should be broken by this.
The alternative would be to change the location of this hook to be inside each multiplexed option, after we have converted a compat_bpf_attr into native bpf_attr type. The security_bpf() call can then be made with the converted union without changing the signature of security_bpf() as this patch does. That should keep compatibility with existing LSMs.
The next commit in this series shows this alternative option.
include/linux/security.h | 4 ++-- security/security.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h index 25b3ef71f495..f0ad6571d067 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1965,7 +1965,7 @@ struct bpf_map; struct bpf_prog; struct bpf_prog_aux; #ifdef CONFIG_SECURITY -extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size); +extern int security_bpf(int cmd, void *attr, unsigned int size); extern int security_bpf_map(struct bpf_map *map, fmode_t fmode); extern int security_bpf_prog(struct bpf_prog *prog); extern int security_bpf_map_alloc(struct bpf_map *map); @@ -1973,7 +1973,7 @@ extern void security_bpf_map_free(struct bpf_map *map); extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); extern void security_bpf_prog_free(struct bpf_prog_aux *aux); #else -static inline int security_bpf(int cmd, union bpf_attr *attr, +static inline int security_bpf(int cmd, void *attr, unsigned int size) { return 0; diff --git a/security/security.c b/security/security.c index b7cf5cbfdc67..2c5e8cf47248 100644 --- a/security/security.c +++ b/security/security.c @@ -2587,7 +2587,7 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule) #endif /* CONFIG_AUDIT */
#ifdef CONFIG_BPF_SYSCALL -int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) +int security_bpf(int cmd, void *attr, unsigned int size) { return call_int_hook(bpf, 0, cmd, attr, size); } -- 2.34.1
With the addition of compat64 syscall support to eBPF, attr is now a void* either to a union bpf_attr or compat_bpf_attr. This breaks the Linux Security Module (LSM) hook security_bpf that takes a bpf_attr param.
In order to keep the LSM hook and not break existing security modules, move the hooks inside each option where compat_bpf_attr has been converted into a bpf_attr.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/security.h | 4 ++-- kernel/bpf/syscall.c | 12 ++++++------ security/security.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h index f0ad6571d067..25b3ef71f495 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1965,7 +1965,7 @@ struct bpf_map; struct bpf_prog; struct bpf_prog_aux; #ifdef CONFIG_SECURITY -extern int security_bpf(int cmd, void *attr, unsigned int size); +extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size); extern int security_bpf_map(struct bpf_map *map, fmode_t fmode); extern int security_bpf_prog(struct bpf_prog *prog); extern int security_bpf_map_alloc(struct bpf_map *map); @@ -1973,7 +1973,7 @@ extern void security_bpf_map_free(struct bpf_map *map); extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); extern void security_bpf_prog_free(struct bpf_prog_aux *aux); #else -static inline int security_bpf(int cmd, void *attr, +static inline int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 910544bcae6e..aa4ebdb92a18 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2252,7 +2252,7 @@ static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
-static int bpf_prog_load(void *vattr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr, size_t size) { enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; @@ -2277,6 +2277,10 @@ static int bpf_prog_load(void *vattr, bpfptr_t uattr) #endif type = attr->prog_type;
+ err = security_bpf(BPF_PROG_LOAD, attr, size); + if (err < 0) + return err; + if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ | @@ -4703,10 +4707,6 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT;
- err = security_bpf(cmd, vattr, size); - if (err < 0) - return err; - switch (cmd) { case BPF_MAP_CREATE: err = map_create(&attr); @@ -4727,7 +4727,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD: - err = bpf_prog_load(vattr, uattr); + err = bpf_prog_load(vattr, uattr, size); break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr); diff --git a/security/security.c b/security/security.c index 2c5e8cf47248..b7cf5cbfdc67 100644 --- a/security/security.c +++ b/security/security.c @@ -2587,7 +2587,7 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule) #endif /* CONFIG_AUDIT */
#ifdef CONFIG_BPF_SYSCALL -int security_bpf(int cmd, void *attr, unsigned int size) +int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return call_int_hook(bpf, 0, cmd, attr, size); }
On 15-11-2022 09:08, Zachary Leaf wrote:
With the addition of compat64 syscall support to eBPF, attr is now a void* either to a union bpf_attr or compat_bpf_attr. This breaks the Linux Security Module (LSM) hook security_bpf that takes a bpf_attr param.
In order to keep the LSM hook and not break existing security modules, move the hooks inside each option where compat_bpf_attr has been converted into a bpf_attr.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/security.h | 4 ++-- kernel/bpf/syscall.c | 12 ++++++------ security/security.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h index f0ad6571d067..25b3ef71f495 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1965,7 +1965,7 @@ struct bpf_map; struct bpf_prog; struct bpf_prog_aux; #ifdef CONFIG_SECURITY -extern int security_bpf(int cmd, void *attr, unsigned int size); +extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size); extern int security_bpf_map(struct bpf_map *map, fmode_t fmode); extern int security_bpf_prog(struct bpf_prog *prog); extern int security_bpf_map_alloc(struct bpf_map *map); @@ -1973,7 +1973,7 @@ extern void security_bpf_map_free(struct bpf_map *map); extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); extern void security_bpf_prog_free(struct bpf_prog_aux *aux); #else -static inline int security_bpf(int cmd, void *attr, +static inline int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 910544bcae6e..aa4ebdb92a18 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2252,7 +2252,7 @@ static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size -static int bpf_prog_load(void *vattr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr, size_t size) { enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; @@ -2277,6 +2277,10 @@ static int bpf_prog_load(void *vattr, bpfptr_t uattr) #endif type = attr->prog_type;
- err = security_bpf(BPF_PROG_LOAD, attr, size);
I am not sure how security_bpf works, but is "size" correct here in compat64 anymore? attr would be populated with some more size than the size received from userspace if there were compat_ptr converted. Maybe size needs to be recalculated?
In any case, I think this version is more suitable than PATCH 8.
Thanks, Tudor
- if (err < 0)
return err;
- if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ |
@@ -4703,10 +4707,6 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT;
- err = security_bpf(cmd, vattr, size);
- if (err < 0)
return err;
- switch (cmd) { case BPF_MAP_CREATE: err = map_create(&attr);
@@ -4727,7 +4727,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD:
err = bpf_prog_load(vattr, uattr);
break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr);err = bpf_prog_load(vattr, uattr, size);
diff --git a/security/security.c b/security/security.c index 2c5e8cf47248..b7cf5cbfdc67 100644 --- a/security/security.c +++ b/security/security.c @@ -2587,7 +2587,7 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule) #endif /* CONFIG_AUDIT */ #ifdef CONFIG_BPF_SYSCALL -int security_bpf(int cmd, void *attr, unsigned int size) +int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return call_int_hook(bpf, 0, cmd, attr, size); }
On 16/11/2022 16:49, Tudor Cretu wrote:
On 15-11-2022 09:08, Zachary Leaf wrote:
With the addition of compat64 syscall support to eBPF, attr is now a void* either to a union bpf_attr or compat_bpf_attr. This breaks the Linux Security Module (LSM) hook security_bpf that takes a bpf_attr param.
In order to keep the LSM hook and not break existing security modules, move the hooks inside each option where compat_bpf_attr has been converted into a bpf_attr.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/security.h | 4 ++-- kernel/bpf/syscall.c | 12 ++++++------ security/security.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h index f0ad6571d067..25b3ef71f495 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1965,7 +1965,7 @@ struct bpf_map; struct bpf_prog; struct bpf_prog_aux; #ifdef CONFIG_SECURITY -extern int security_bpf(int cmd, void *attr, unsigned int size); +extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size); extern int security_bpf_map(struct bpf_map *map, fmode_t fmode); extern int security_bpf_prog(struct bpf_prog *prog); extern int security_bpf_map_alloc(struct bpf_map *map); @@ -1973,7 +1973,7 @@ extern void security_bpf_map_free(struct bpf_map *map); extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); extern void security_bpf_prog_free(struct bpf_prog_aux *aux); #else -static inline int security_bpf(int cmd, void *attr, +static inline int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 910544bcae6e..aa4ebdb92a18 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2252,7 +2252,7 @@ static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size -static int bpf_prog_load(void *vattr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr, size_t size) { enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; @@ -2277,6 +2277,10 @@ static int bpf_prog_load(void *vattr, bpfptr_t uattr) #endif type = attr->prog_type; + err = security_bpf(BPF_PROG_LOAD, attr, size);
I am not sure how security_bpf works, but is "size" correct here in compat64 anymore? attr would be populated with some more size than the size received from userspace if there were compat_ptr converted. Maybe size needs to be recalculated?
Ah yeah. I think this is just plainly wrong. Will fix.
In any case, I think this version is more suitable than PATCH 8.
Seems to be the consensus so far.
Thanks, Zach
Thanks, Tudor
+ if (err < 0) + return err;
if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ | @@ -4703,10 +4707,6 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT; - err = security_bpf(cmd, vattr, size); - if (err < 0) - return err;
switch (cmd) { case BPF_MAP_CREATE: err = map_create(&attr); @@ -4727,7 +4727,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD: - err = bpf_prog_load(vattr, uattr); + err = bpf_prog_load(vattr, uattr, size); break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr); diff --git a/security/security.c b/security/security.c index 2c5e8cf47248..b7cf5cbfdc67 100644 --- a/security/security.c +++ b/security/security.c @@ -2587,7 +2587,7 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule) #endif /* CONFIG_AUDIT */ #ifdef CONFIG_BPF_SYSCALL -int security_bpf(int cmd, void *attr, unsigned int size) +int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return call_int_hook(bpf, 0, cmd, attr, size); }
On 15/11/2022 10:08, Zachary Leaf wrote:
With the addition of compat64 syscall support to eBPF, attr is now a void* either to a union bpf_attr or compat_bpf_attr. This breaks the Linux Security Module (LSM) hook security_bpf that takes a bpf_attr param.
In order to keep the LSM hook and not break existing security modules,
Realistically I think we have to do this, the approach in patch 8 is really unappealing.
move the hooks inside each option where compat_bpf_attr has been converted into a bpf_attr.
Seeing this patch makes me slightly amend my recommendation in patch 5. Because security_bpf() needs to get the native layout regardless of the subcommand, the least invasive approach is to separate out the layout conversion from the calling of each subcommand handler. In other words, the big switch in __sys_bpf() along with the calling of security_bpf() could move to a new function, say bpf_dispatch(), and then we would get the following paths: - Native: SYSCALL_DEFINE3(bpf) -> __sys_bpf() -> bpf_dispatch() -> bpf_prog_load() - Compat: COMPAT_SYSCALL_DEFINE3(bpf) -> __sys_compat_bpf() -> bpf_dispatch() -> bpf_prog_load()
__sys_bpf() would do the same thing as before this series in terms of loading the union. __sys_compat_bpf() would load the compat union, have a big switch to convert the struct layout for each subcommand, then call the common dispatcher.
AFAICT this approach is both the most minimal (diff-wise) and the most readable, but maybe I'm missing something!
Kevin
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/security.h | 4 ++-- kernel/bpf/syscall.c | 12 ++++++------ security/security.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h index f0ad6571d067..25b3ef71f495 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1965,7 +1965,7 @@ struct bpf_map; struct bpf_prog; struct bpf_prog_aux; #ifdef CONFIG_SECURITY -extern int security_bpf(int cmd, void *attr, unsigned int size); +extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size); extern int security_bpf_map(struct bpf_map *map, fmode_t fmode); extern int security_bpf_prog(struct bpf_prog *prog); extern int security_bpf_map_alloc(struct bpf_map *map); @@ -1973,7 +1973,7 @@ extern void security_bpf_map_free(struct bpf_map *map); extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); extern void security_bpf_prog_free(struct bpf_prog_aux *aux); #else -static inline int security_bpf(int cmd, void *attr, +static inline int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 910544bcae6e..aa4ebdb92a18 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2252,7 +2252,7 @@ static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size -static int bpf_prog_load(void *vattr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr, size_t size) { enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; @@ -2277,6 +2277,10 @@ static int bpf_prog_load(void *vattr, bpfptr_t uattr) #endif type = attr->prog_type;
- err = security_bpf(BPF_PROG_LOAD, attr, size);
- if (err < 0)
return err;
- if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ |
@@ -4703,10 +4707,6 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT;
- err = security_bpf(cmd, vattr, size);
- if (err < 0)
return err;
- switch (cmd) { case BPF_MAP_CREATE: err = map_create(&attr);
@@ -4727,7 +4727,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD:
err = bpf_prog_load(vattr, uattr);
break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr);err = bpf_prog_load(vattr, uattr, size);
diff --git a/security/security.c b/security/security.c index 2c5e8cf47248..b7cf5cbfdc67 100644 --- a/security/security.c +++ b/security/security.c @@ -2587,7 +2587,7 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule) #endif /* CONFIG_AUDIT */ #ifdef CONFIG_BPF_SYSCALL -int security_bpf(int cmd, void *attr, unsigned int size) +int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return call_int_hook(bpf, 0, cmd, attr, size); }
On 23/11/2022 14:03, Kevin Brodsky wrote:
On 15/11/2022 10:08, Zachary Leaf wrote:
With the addition of compat64 syscall support to eBPF, attr is now a void* either to a union bpf_attr or compat_bpf_attr. This breaks the Linux Security Module (LSM) hook security_bpf that takes a bpf_attr param.
In order to keep the LSM hook and not break existing security modules,
Realistically I think we have to do this, the approach in patch 8 is really unappealing.>
move the hooks inside each option where compat_bpf_attr has been converted into a bpf_attr.
Seeing this patch makes me slightly amend my recommendation in patch 5. Because security_bpf() needs to get the native layout regardless of the subcommand, the least invasive approach is to separate out the layout conversion from the calling of each subcommand handler. In other words, the big switch in __sys_bpf() along with the calling of security_bpf() could move to a new function, say bpf_dispatch(), and then we would get the following paths:
- Native: SYSCALL_DEFINE3(bpf) -> __sys_bpf() -> bpf_dispatch() ->
bpf_prog_load()
- Compat: COMPAT_SYSCALL_DEFINE3(bpf) -> __sys_compat_bpf() ->
bpf_dispatch() -> bpf_prog_load()
__sys_bpf() would do the same thing as before this series in terms of loading the union. __sys_compat_bpf() would load the compat union, have a big switch to convert the struct layout for each subcommand, then call the common dispatcher.
AFAICT this approach is both the most minimal (diff-wise) and the most readable, but maybe I'm missing something!
Not missing anything I think. The original idea here was to avoid a giant conversion of the entire union due to the amount of subcommands and different structs inside it. So break up the conversion and keep it near to where it's used. In hindsight a big con of this approach is it requires changes inside each subcommand and ends up being more invasive, especially with the bpf_security problem here. Then you end up with a bunch of in_compat_syscalls() inside every subcommand.
So I agree this approach above is nice, we front-load all the compat handling, all the compat handling is abstracted/hidden away pretty cleanly and resolved early. From there everything else pretty much should remain untouched. __sys_compat_bpf() is going to be large and unpretty, but as discussed offline the conversions should be relatively straightforward; there shouldn't be much subcommand specific "knowledge" that would benefit from it being closer to where it's used.
I think this should be better overall. I'll give this a go in a v2 with the other comments.
Thanks for the review.
Zach
Kevin
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/security.h | 4 ++-- kernel/bpf/syscall.c | 12 ++++++------ security/security.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/security.h b/include/linux/security.h index f0ad6571d067..25b3ef71f495 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1965,7 +1965,7 @@ struct bpf_map; struct bpf_prog; struct bpf_prog_aux; #ifdef CONFIG_SECURITY -extern int security_bpf(int cmd, void *attr, unsigned int size); +extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size); extern int security_bpf_map(struct bpf_map *map, fmode_t fmode); extern int security_bpf_prog(struct bpf_prog *prog); extern int security_bpf_map_alloc(struct bpf_map *map); @@ -1973,7 +1973,7 @@ extern void security_bpf_map_free(struct bpf_map *map); extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); extern void security_bpf_prog_free(struct bpf_prog_aux *aux); #else -static inline int security_bpf(int cmd, void *attr, +static inline int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 910544bcae6e..aa4ebdb92a18 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2252,7 +2252,7 @@ static void convert_compat_bpf_prog_load(const void *vattr, union bpf_attr *dest /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size -static int bpf_prog_load(void *vattr, bpfptr_t uattr) +static int bpf_prog_load(void *vattr, bpfptr_t uattr, size_t size) { enum bpf_prog_type type; struct bpf_prog *prog, *dst_prog = NULL; @@ -2277,6 +2277,10 @@ static int bpf_prog_load(void *vattr, bpfptr_t uattr) #endif type = attr->prog_type;
- err = security_bpf(BPF_PROG_LOAD, attr, size);
- if (err < 0)
return err;
- if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ |
@@ -4703,10 +4707,6 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(vattr, uattr, size) != 0) return -EFAULT;
- err = security_bpf(cmd, vattr, size);
- if (err < 0)
return err;
- switch (cmd) { case BPF_MAP_CREATE: err = map_create(&attr);
@@ -4727,7 +4727,7 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) err = map_freeze(&attr); break; case BPF_PROG_LOAD:
err = bpf_prog_load(vattr, uattr);
break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr);err = bpf_prog_load(vattr, uattr, size);
diff --git a/security/security.c b/security/security.c index 2c5e8cf47248..b7cf5cbfdc67 100644 --- a/security/security.c +++ b/security/security.c @@ -2587,7 +2587,7 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule) #endif /* CONFIG_AUDIT */ #ifdef CONFIG_BPF_SYSCALL -int security_bpf(int cmd, void *attr, unsigned int size) +int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) { return call_int_hook(bpf, 0, cmd, attr, size); }
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
Hi,
On 15/11/2022 09:08, Zachary Leaf wrote:
Hi,
This RFC proposes changes to the bpf syscall to support propagating user pointers as capabilities in the pure-capability kernel-user ABI (PCuABI). It also includes an approach to supporting the existing aarch64 ABI as a COMPAT layer (compat64).
Since the bpf syscall is multiplexed this RFC changes only the BPF_PROG_LOAD option. The idea is to agree on the general approach and get some feedback here before the remaining options are changed to fully support the syscall.
Since the changes are incomplete, functions in this series suffixed with _fixed will eventually replace the original function it is updating. The originals remain in place for now.
Patches 8 and 9 provide two potential solutions to one problem. I don't know enough about LSMs to decide there so any input there appreciated.
Reading both of them the second solution requires a bit more work upfront from us but I feel that it makes the most sense : no need to update the LSMs and the compat/not compat split is handled right where it happens. I haven't looked into it but maybe other parts of the kernel need something similar already ?
A basic LTP test that only does a bare minimal bpf(BPF_PROG_LOAD...) syscall has been created for this RFC to verify these changes. This also tests the new CHECK_ATTR macro implementation.
Kernel branch available at: https://git.morello-project.org/zdleaf/linux/-/commits/rfc/bpf
LTP test at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...
Thanks,
Zach
Overall nothing jumped out to me, it looks pretty good and makes sense to me ! It did help to have read Tudor's patches previously as some things are quite similar between the two. Cheers, Téo
Zachary Leaf (9): arm64: morello: enable eBPF and tracing bpf/net: copy ptrs from user with bpf/sockptr_t bpf: extend bpfptr_t to use pointers bpf: use user pointer types for bpf_attr bpf: add compat64 handling of bpf syscall bpf: make CHECK_ATTR support compat temp for RFC: add padding to BPF_PROG_LOAD bpf/security: update bpf lsm API bpf/security: move lsm hook point
.../morello_transitional_pcuabi_defconfig | 15 ++- include/linux/bpfptr.h | 19 ++++ include/linux/sockptr.h | 9 ++ include/uapi/linux/bpf.h | 55 +++++++++-- kernel/bpf/syscall.c | 95 ++++++++++++++++--- kernel/bpf/verifier.c | 10 +- 6 files changed, 176 insertions(+), 27 deletions(-)
-- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 15/11/2022 13:53, Teo Couprie Diaz wrote:
Hi,
On 15/11/2022 09:08, Zachary Leaf wrote:
Hi,
This RFC proposes changes to the bpf syscall to support propagating user pointers as capabilities in the pure-capability kernel-user ABI (PCuABI). It also includes an approach to supporting the existing aarch64 ABI as a COMPAT layer (compat64).
Since the bpf syscall is multiplexed this RFC changes only the BPF_PROG_LOAD option. The idea is to agree on the general approach and get some feedback here before the remaining options are changed to fully support the syscall.
Since the changes are incomplete, functions in this series suffixed with _fixed will eventually replace the original function it is updating. The originals remain in place for now.
Patches 8 and 9 provide two potential solutions to one problem. I don't know enough about LSMs to decide there so any input there appreciated.
Reading both of them the second solution requires a bit more work upfront from us but I feel that it makes the most sense : no need to update the LSMs and the compat/not compat split is handled right where it happens. I haven't looked into it but maybe other parts of the kernel need something similar already ?
That's my thoughts as well and a good summary of how the situation appears to me. Where it's been moved to we're not adding much inbetween expect a switch statement and the conversion from compat_bpf_attr to bpf_attr. That doesn't seem like an issue to me, but does increase diff size a bit since the hook now appears in every option instead of only inside __sys_bpf.
A basic LTP test that only does a bare minimal bpf(BPF_PROG_LOAD...) syscall has been created for this RFC to verify these changes. This also tests the new CHECK_ATTR macro implementation.
Kernel branch available at: https://git.morello-project.org/zdleaf/linux/-/commits/rfc/bpf
LTP test at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...
Thanks,
Zach
Overall nothing jumped out to me, it looks pretty good and makes sense to me ! It did help to have read Tudor's patches previously as some things are quite similar between the two.
Thanks for taking a look. Yes it's pretty similar to the io_uring and epoll changes. epoll is worth a look too, as well as the changes to io_submit/aio if you didn't catch those.
Thanks, Zach
Cheers, Téo
Zachary Leaf (9): arm64: morello: enable eBPF and tracing bpf/net: copy ptrs from user with bpf/sockptr_t bpf: extend bpfptr_t to use pointers bpf: use user pointer types for bpf_attr bpf: add compat64 handling of bpf syscall bpf: make CHECK_ATTR support compat temp for RFC: add padding to BPF_PROG_LOAD bpf/security: update bpf lsm API bpf/security: move lsm hook point
.../morello_transitional_pcuabi_defconfig | 15 ++- include/linux/bpfptr.h | 19 ++++ include/linux/sockptr.h | 9 ++ include/uapi/linux/bpf.h | 55 +++++++++-- kernel/bpf/syscall.c | 95 ++++++++++++++++--- kernel/bpf/verifier.c | 10 +- 6 files changed, 176 insertions(+), 27 deletions(-)
-- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 15-11-2022 09:08, Zachary Leaf wrote:
Hi,
This RFC proposes changes to the bpf syscall to support propagating user pointers as capabilities in the pure-capability kernel-user ABI (PCuABI). It also includes an approach to supporting the existing aarch64 ABI as a COMPAT layer (compat64).
Since the bpf syscall is multiplexed this RFC changes only the BPF_PROG_LOAD option. The idea is to agree on the general approach and get some feedback here before the remaining options are changed to fully support the syscall.
Since the changes are incomplete, functions in this series suffixed with _fixed will eventually replace the original function it is updating. The originals remain in place for now.
Patches 8 and 9 provide two potential solutions to one problem. I don't know enough about LSMs to decide there so any input there appreciated.
A basic LTP test that only does a bare minimal bpf(BPF_PROG_LOAD...) syscall has been created for this RFC to verify these changes. This also tests the new CHECK_ATTR macro implementation.
Kernel branch available at: https://git.morello-project.org/zdleaf/linux/-/commits/rfc/bpf
LTP test at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...
It's a great series, well done! I had only a few comments/questions. For the patches that I haven't commented on, they look good, I didn't have comments. To me, Patch 9 looks better than Patch 8 especially because we keep the bpf changes inside the module.
Thanks, Tudor
Thanks,
Zach
Zachary Leaf (9): arm64: morello: enable eBPF and tracing bpf/net: copy ptrs from user with bpf/sockptr_t bpf: extend bpfptr_t to use pointers bpf: use user pointer types for bpf_attr bpf: add compat64 handling of bpf syscall bpf: make CHECK_ATTR support compat temp for RFC: add padding to BPF_PROG_LOAD bpf/security: update bpf lsm API bpf/security: move lsm hook point
.../morello_transitional_pcuabi_defconfig | 15 ++- include/linux/bpfptr.h | 19 ++++ include/linux/sockptr.h | 9 ++ include/uapi/linux/bpf.h | 55 +++++++++-- kernel/bpf/syscall.c | 95 ++++++++++++++++--- kernel/bpf/verifier.c | 10 +- 6 files changed, 176 insertions(+), 27 deletions(-)
-- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
linux-morello@op-lists.linaro.org