Hi,
After getting side tracked by eBPF libraries/tools (libbpf/bpftool) and kselftest cross-compilation, here's the core kernel changes following on from the RFC[1] posted late last year.
The bpf syscall is updated to propagate user pointers as capabilities in the pure-capability kernel-user ABI (PCuABI). It also includes an approach to support the existing aarch64 ABI as a compatibility layer (compat64).
One complication here is from the fact this syscall supports many multiplexed sub-commands, some of which are themselves multiplexed with a number of nested multiplexed options.
A further complication is that the existing syscall uses a trick of storing user pointers as u64 to avoid needing a compat handler for 32-bit systems (see patch 3). To retain compatibility with the aarch64 ABI and add Morello support, a compat layer is added here only for the compat64 case, guarded by #ifdef CONFIG_COMPAT64. Normal compat32 operation is therefore unchanged.
Compared to the original RFC, inbound (userspace->kernel) conversion between compat64/native struct layouts is now handled upfront. This minimises changes to subcommand handlers. Some subcommands require conversion back out to userspace and that is by necessity handled where it occurs.
Patch 1 is not essential to this series but it's a nice debug feature to have and works[2]. It enables BPF_PROG_TYPE_TRACEPOINT which many eBPF kselftests use.
Patch 2 is required setup for the rest of the patches.
Patches 3-8 implement the core compat64 handling. Each commit compiles cleanly but relevant parts will be broken inbetween. They're split mainly to make review here easier.
Patch 9 fixes a check to also check configs passed in via compat64.
Patch 10 finally enables capabilities in the kernel.
Testing wise, see associated LTP changes below which will be posted to linux-morello-ltp mailing list. The eBPF LTP tests are fairly minimal and test only a small part of the changes here. There's a new test to test patch 9.
The kernel kselftests contain much more extensive eBPF tests. The kselftests have been used to test many parts of the compat64 handling but overall more work needs to be done here: a) enable cross-compilation for purecap as well as x86->aarch64 b) replace ptr_to_u64() with casts to uintptr_t in tests b) general libbpf/bpftool enablement and fixes since many tests rely on this c) CONFIG_DEBUG_INFO_BTF required for many tests but this requires the build system to have a recent version of pahole tool
Next steps once we have the core kernel support is porting libbpf and bpftool for purecap plus work on enabling kselftests as above.
Kernel branch available at: https://git.morello-project.org/zdleaf/linux/-/tree/morello/bpf
Associated LTP test/changes at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/tree/mor...
Thanks,
Zach
[1] [RFC PATCH 0/9] update bpf syscall for PCuABI/compat64 https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [2] [PATCH v3 0/5] Restore syscall tracing on Morello https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
Zachary Leaf (10): arm64: morello: enable syscall tracing bpf/net: copy ptrs from user with bpf/sockptr_t bpf: compat64: add handler and convert bpf_attr in bpf: compat64: bpf_attr convert out bpf: compat64: handle bpf_btf_info bpf: compat64: handle bpf_prog_info bpf: compat64: handle bpf_map_info bpf: compat64: handle bpf_link_info bpf: compat64: support CHECK_ATTR macro bpf: use user pointer types in uAPI structs
.../morello_transitional_pcuabi_defconfig | 2 +- arch/arm64/kernel/sys_compat64.c | 4 + drivers/media/rc/bpf-lirc.c | 7 +- include/linux/bpf_compat.h | 413 ++++++ include/linux/bpfptr.h | 18 +- include/linux/sockptr.h | 9 + include/uapi/linux/bpf.h | 94 +- kernel/bpf/bpf_iter.c | 2 +- kernel/bpf/btf.c | 97 +- kernel/bpf/cgroup.c | 10 +- kernel/bpf/hashtab.c | 13 +- kernel/bpf/net_namespace.c | 7 +- kernel/bpf/offload.c | 2 +- kernel/bpf/syscall.c | 1136 +++++++++++++---- kernel/bpf/verifier.c | 2 +- kernel/trace/bpf_trace.c | 6 +- net/bpf/bpf_dummy_struct_ops.c | 3 +- net/bpf/test_run.c | 32 +- net/core/sock_map.c | 7 +- 19 files changed, 1534 insertions(+), 330 deletions(-) create mode 100644 include/linux/bpf_compat.h
-- 2.34.1
Enable syscall tracing via ftrace. This also enables eBPF programs of type BPF_PROG_TYPE_TRACEPOINT to hook syscall traces and execute eBPF code.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- arch/arm64/configs/morello_transitional_pcuabi_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 40b74494980c..46bb365df97f 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -162,6 +162,6 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_CORESIGHT=y CONFIG_MEMTEST=y +CONFIG_FTRACE_SYSCALLS=y
On 08/06/2023 09:42, Zachary Leaf wrote:
Enable syscall tracing via ftrace. This also enables eBPF programs of type BPF_PROG_TYPE_TRACEPOINT to hook syscall traces and execute eBPF code.
As a note on this it might be something we don't want to enable by default, but something to enable only on the CI build if/when the bpf kselftests get added there.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
arch/arm64/configs/morello_transitional_pcuabi_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 40b74494980c..46bb365df97f 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -162,6 +162,6 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set # CONFIG_DEBUG_PREEMPT is not set -# CONFIG_FTRACE is not set CONFIG_CORESIGHT=y CONFIG_MEMTEST=y +CONFIG_FTRACE_SYSCALLS=y
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpfptr.h | 12 ++++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 21 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 79b2f78eec1a..407e25d608eb 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -54,11 +54,23 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_kernel_nofault(dst, src.kernel + offset, size); }
+static inline int copy_from_bpfptr_offset_with_ptr(void *dst, bpfptr_t src, + size_t offset, size_t size) +{ + return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset, + size); +} + static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); }
+static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{ + return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size); +} + static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index bae5e2369b4f..08b17706b21a 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; }
+static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src, + size_t offset, size_t size) +{ + if (!sockptr_is_kernel(src)) + return copy_from_user_with_ptr(dst, src.user + offset, size); + memcpy(dst, src.kernel + offset, size); + return 0; +} + static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
On 08/06/2023 10:42, Zachary Leaf wrote:
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 12 ++++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 21 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 79b2f78eec1a..407e25d608eb 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -54,11 +54,23 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_kernel_nofault(dst, src.kernel + offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst, bpfptr_t src,
size_t offset, size_t size)
+{
- return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset,
size);
I think you might have missed [1] when rebasing. copy_from_sockptr_offset() is no longer used by copy_from_bpfptr_offset(). I guess that means that we don't need copy_from_sockptr_offset_with_ptr() any more either?
Kevin
[1] https://lore.kernel.org/bpf/20220727132905.45166-1-jinghao@linux.ibm.com/
+}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); } +static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{
- return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
+}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index bae5e2369b4f..08b17706b21a 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; } +static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src,
size_t offset, size_t size)
+{
- if (!sockptr_is_kernel(src))
return copy_from_user_with_ptr(dst, src.user + offset, size);
- memcpy(dst, src.kernel + offset, size);
- return 0;
+}
static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
On 12/07/2023 12:18, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
In PCuABI, when copying a block of memory from userspace containing capabilities/pointers, the copy_from_user_with_ptr variant needs to be used to ensure pointers are preserved in full.
Introduce copy_from_x_with_ptr methods for bpfptr_t and sockptr_t to support this.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpfptr.h | 12 ++++++++++++ include/linux/sockptr.h | 9 +++++++++ 2 files changed, 21 insertions(+)
diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 79b2f78eec1a..407e25d608eb 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -54,11 +54,23 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_kernel_nofault(dst, src.kernel + offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst, bpfptr_t src,
size_t offset, size_t size)
+{
- return copy_from_sockptr_offset_with_ptr(dst, (sockptr_t) src, offset,
size);
I think you might have missed [1] when rebasing. copy_from_sockptr_offset() is no longer used by copy_from_bpfptr_offset(). I guess that means that we don't need copy_from_sockptr_offset_with_ptr() any more either?
Ah yep. Missed that rebasing from 5.18->6.1. Fixed.
Thanks, Zach
Kevin
[1] https://lore.kernel.org/bpf/20220727132905.45166-1-jinghao@linux.ibm.com/
+}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); } +static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{
- return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
+}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) { diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index bae5e2369b4f..08b17706b21a 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -50,6 +50,15 @@ static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, return 0; } +static inline int copy_from_sockptr_offset_with_ptr(void *dst, sockptr_t src,
size_t offset, size_t size)
+{
- if (!sockptr_is_kernel(src))
return copy_from_user_with_ptr(dst, src.user + offset, size);
- memcpy(dst, src.kernel + offset, size);
- return 0;
+}
static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size);
The bpf syscall does not require a compat handler for 32-bit compat. This is achieved by using u64 instead of pointer types in the bpf_attr union used to pass arguments to the syscall. This means that in a system where pointers are 32-bit, the struct/union layouts and offsets are the same as in a 64-bit arch, since the u64 field is split into two u32 fields/registers.
This greatly simplifies 32-bit compat at the small cost of requiring casting pointers passed in through the uAPI to u64 (generally via ptr_to_u64() helper functions).
This poses a problem in architectures where user pointers are longer than 64b such as Morello/PCuABI where pointers are represented as 129b capabilities. In order to extend the bpf syscall interface to accept capabilities and still retain compatibility with the existing 64/32b ABI, a 64-bit compat layer and appropriate conversions must be added to handle the different union/struct sizes caused by this pointer size mis-match.
Before extending the number of bits in union bpf_attr to accept capabilitities, set the groundwork with a compat64 handler and conversion function to take a compat64 sized bpf_attr and convert it to what will be the new native offsets.
Inbound conversion is handled upfront to minimise impact on existing code and reduce overall diff size. After dispatch_bpf the majority of code can remain unchanged. The cases where conversion back out to userspace is required are handled in subsequent commits.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- arch/arm64/kernel/sys_compat64.c | 4 + include/linux/bpf_compat.h | 274 +++++++++++++++++++ kernel/bpf/syscall.c | 438 ++++++++++++++++++++++++++----- 3 files changed, 652 insertions(+), 64 deletions(-) create mode 100644 include/linux/bpf_compat.h
diff --git a/arch/arm64/kernel/sys_compat64.c b/arch/arm64/kernel/sys_compat64.c index 1442581ec292..0687f88baa16 100644 --- a/arch/arm64/kernel/sys_compat64.c +++ b/arch/arm64/kernel/sys_compat64.c @@ -13,6 +13,10 @@
#include <asm/syscall.h>
+#ifdef CONFIG_COMPAT64 +#define __arm64_compatentry_sys_bpf __arm64_compatentry_compat_sys_bpf +#endif + #define __arm64_compatentry_sys_personality __arm64_compatentry_sys_arm64_personality
/* diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h new file mode 100644 index 000000000000..cc12f2e3b204 --- /dev/null +++ b/include/linux/bpf_compat.h @@ -0,0 +1,274 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2023 Arm Ltd */ + +#ifdef CONFIG_COMPAT64 + +union compat_bpf_attr { + struct { /* anonymous struct used by BPF_MAP_CREATE command */ + __u32 map_type; /* one of enum bpf_map_type */ + __u32 key_size; /* size of key in bytes */ + __u32 value_size; /* size of value in bytes */ + __u32 max_entries; /* max number of entries in a map */ + __u32 map_flags; /* BPF_MAP_CREATE related + * flags defined above. + */ + __u32 inner_map_fd; /* fd pointing to the inner map */ + __u32 numa_node; /* numa node (effective only if + * BPF_F_NUMA_NODE is set). + */ + char map_name[BPF_OBJ_NAME_LEN]; + __u32 map_ifindex; /* ifindex of netdev to create on */ + __u32 btf_fd; /* fd pointing to a BTF type data */ + __u32 btf_key_type_id; /* BTF type_id of the key */ + __u32 btf_value_type_id; /* BTF type_id of the value */ + __u32 btf_vmlinux_value_type_id;/* BTF type_id of a kernel- + * struct stored as the + * map value + */ + /* Any per-map-type extra fields + * + * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the + * number of hash functions (if 0, the bloom filter will default + * to using 5 hash functions). + */ + __u64 map_extra; + }; + + struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ + __u32 map_fd; + __aligned_u64 key; + union { + __aligned_u64 value; + __aligned_u64 next_key; + }; + __u64 flags; + }; + + struct { /* struct used by BPF_MAP_*_BATCH commands */ + __aligned_u64 in_batch; /* start batch, + * NULL to start from beginning + */ + __aligned_u64 out_batch; /* output: next start batch */ + __aligned_u64 keys; + __aligned_u64 values; + __u32 count; /* input/output: + * input: # of key/value + * elements + * output: # of filled elements + */ + __u32 map_fd; + __u64 elem_flags; + __u64 flags; + } batch; + + struct { /* anonymous struct used by BPF_PROG_LOAD command */ + __u32 prog_type; /* one of enum bpf_prog_type */ + __u32 insn_cnt; + __aligned_u64 insns; + __aligned_u64 license; + __u32 log_level; /* verbosity level of verifier */ + __u32 log_size; /* size of user buffer */ + __aligned_u64 log_buf; /* user supplied buffer */ + __u32 kern_version; /* not used */ + __u32 prog_flags; + char prog_name[BPF_OBJ_NAME_LEN]; + __u32 prog_ifindex; /* ifindex of netdev to prep for */ + /* For some prog types expected attach type must be known at + * load time to verify attach type specific parts of prog + * (context accesses, allowed helpers, etc). + */ + __u32 expected_attach_type; + __u32 prog_btf_fd; /* fd pointing to BTF type data */ + __u32 func_info_rec_size; /* userspace bpf_func_info size */ + __aligned_u64 func_info; /* func info */ + __u32 func_info_cnt; /* number of bpf_func_info records */ + __u32 line_info_rec_size; /* userspace bpf_line_info size */ + __aligned_u64 line_info; /* line info */ + __u32 line_info_cnt; /* number of bpf_line_info records */ + __u32 attach_btf_id; /* in-kernel BTF type id to attach to */ + union { + /* valid prog_fd to attach to bpf prog */ + __u32 attach_prog_fd; + /* or valid module BTF object fd or 0 to attach to vmlinux */ + __u32 attach_btf_obj_fd; + }; + __u32 core_relo_cnt; /* number of bpf_core_relo */ + __aligned_u64 fd_array; /* array of FDs */ + __aligned_u64 core_relos; + __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ + }; + + struct { /* anonymous struct used by BPF_OBJ_* commands */ + __aligned_u64 pathname; + __u32 bpf_fd; + __u32 file_flags; + }; + + struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ + __u32 target_fd; /* container object to attach to */ + __u32 attach_bpf_fd; /* eBPF program to attach */ + __u32 attach_type; + __u32 attach_flags; + __u32 replace_bpf_fd; /* previously attached eBPF + * program to replace if + * BPF_F_REPLACE is used + */ + }; + + struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ + __u32 prog_fd; + __u32 retval; + __u32 data_size_in; /* input: len of data_in */ + __u32 data_size_out; /* input/output: len of data_out + * returns ENOSPC if data_out + * is too small. + */ + __aligned_u64 data_in; + __aligned_u64 data_out; + __u32 repeat; + __u32 duration; + __u32 ctx_size_in; /* input: len of ctx_in */ + __u32 ctx_size_out; /* input/output: len of ctx_out + * returns ENOSPC if ctx_out + * is too small. + */ + __aligned_u64 ctx_in; + __aligned_u64 ctx_out; + __u32 flags; + __u32 cpu; + __u32 batch_size; + } test; + + struct { /* anonymous struct used by BPF_*_GET_*_ID */ + union { + __u32 start_id; + __u32 prog_id; + __u32 map_id; + __u32 btf_id; + __u32 link_id; + }; + __u32 next_id; + __u32 open_flags; + }; + + struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ + __u32 bpf_fd; + __u32 info_len; + __aligned_u64 info; + } info; + + struct { /* anonymous struct used by BPF_PROG_QUERY command */ + __u32 target_fd; /* container object to query */ + __u32 attach_type; + __u32 query_flags; + __u32 attach_flags; + __aligned_u64 prog_ids; + __u32 prog_cnt; + /* output: per-program attach_flags. + * not allowed to be set during effective query. + */ + __aligned_u64 prog_attach_flags; + } query; + + struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ + __u64 name; + __u32 prog_fd; + } raw_tracepoint; + + struct { /* anonymous struct for BPF_BTF_LOAD */ + __aligned_u64 btf; + __aligned_u64 btf_log_buf; + __u32 btf_size; + __u32 btf_log_size; + __u32 btf_log_level; + }; + + struct { + __u32 pid; /* input: pid */ + __u32 fd; /* input: fd */ + __u32 flags; /* input: flags */ + __u32 buf_len; /* input/output: buf len */ + __aligned_u64 buf; /* input/output: + * tp_name for tracepoint + * symbol for kprobe + * filename for uprobe + */ + __u32 prog_id; /* output: prod_id */ + __u32 fd_type; /* output: BPF_FD_TYPE_* */ + __u64 probe_offset; /* output: probe_offset */ + __u64 probe_addr; /* output: probe_addr */ + } task_fd_query; + + struct { /* struct used by BPF_LINK_CREATE command */ + __u32 prog_fd; /* eBPF program to attach */ + union { + __u32 target_fd; /* object to attach to */ + __u32 target_ifindex; /* target ifindex */ + }; + __u32 attach_type; /* attach type */ + __u32 flags; /* extra flags */ + union { + __u32 target_btf_id; /* btf_id of target to attach to */ + struct { + __aligned_u64 iter_info; /* extra bpf_iter_link_info */ + __u32 iter_info_len; /* iter_info length */ + }; + struct { + /* black box user-provided value passed through + * to BPF program at the execution time and + * accessible through bpf_get_attach_cookie() BPF helper + */ + __u64 bpf_cookie; + } perf_event; + struct { + __u32 flags; + __u32 cnt; + __aligned_u64 syms; + __aligned_u64 addrs; + __aligned_u64 cookies; + } kprobe_multi; + struct { + /* this is overlaid with the target_btf_id above. */ + __u32 target_btf_id; + /* black box user-provided value passed through + * to BPF program at the execution time and + * accessible through bpf_get_attach_cookie() BPF helper + */ + __u64 cookie; + } tracing; + }; + } link_create; + + struct { /* struct used by BPF_LINK_UPDATE command */ + __u32 link_fd; /* link fd */ + /* new program fd to update link with */ + __u32 new_prog_fd; + __u32 flags; /* extra flags */ + /* expected link's program fd; is specified only if + * BPF_F_REPLACE flag is set in flags */ + __u32 old_prog_fd; + } link_update; + + struct { + __u32 link_fd; + } link_detach; + + struct { /* struct used by BPF_ENABLE_STATS command */ + __u32 type; + } enable_stats; + + struct { /* struct used by BPF_ITER_CREATE command */ + __u32 link_fd; + __u32 flags; + } iter_create; + + struct { /* struct used by BPF_PROG_BIND_MAP command */ + __u32 prog_fd; + __u32 map_fd; + __u32 flags; /* extra flags */ + } prog_bind_map; + +} __attribute__((aligned(8))); + +#endif /* CONFIG_COMPAT64 */ + diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7b373a5e861f..818ca8b63295 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3,6 +3,7 @@ */ #include <linux/bpf.h> #include <linux/bpf-cgroup.h> +#include <linux/bpf_compat.h> #include <linux/bpf_trace.h> #include <linux/bpf_lirc.h> #include <linux/bpf_verifier.h> @@ -4908,153 +4909,127 @@ static int bpf_prog_bind_map(union bpf_attr *attr) return ret; }
-static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +static int dispatch_bpf(int cmd, union bpf_attr *attr, bpfptr_t uattr, int size) { - union bpf_attr attr; - bool capable; int err;
- capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled; - - /* Intent here is for unprivileged_bpf_disabled to block key object - * creation commands for unprivileged users; other actions depend - * of fd availability and access to bpffs, so are dependent on - * object creation success. Capabilities are later verified for - * operations such as load and map create, so even with unprivileged - * BPF disabled, capability checks are still carried out for these - * and other operations. - */ - if (!capable && - (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD)) - return -EPERM; - - err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); - if (err) - return err; - size = min_t(u32, size, sizeof(attr)); - - /* copy attributes from user space, may be less than sizeof(bpf_attr) */ - memset(&attr, 0, sizeof(attr)); - if (copy_from_bpfptr(&attr, uattr, size) != 0) - return -EFAULT; - - err = security_bpf(cmd, &attr, size); + err = security_bpf(cmd, attr, size); if (err < 0) return err;
switch (cmd) { case BPF_MAP_CREATE: - err = map_create(&attr); + err = map_create(attr); break; case BPF_MAP_LOOKUP_ELEM: - err = map_lookup_elem(&attr); + err = map_lookup_elem(attr); break; case BPF_MAP_UPDATE_ELEM: - err = map_update_elem(&attr, uattr); + err = map_update_elem(attr, uattr); break; case BPF_MAP_DELETE_ELEM: - err = map_delete_elem(&attr, uattr); + err = map_delete_elem(attr, uattr); break; case BPF_MAP_GET_NEXT_KEY: - err = map_get_next_key(&attr); + err = map_get_next_key(attr); break; case BPF_MAP_FREEZE: - err = map_freeze(&attr); + err = map_freeze(attr); break; case BPF_PROG_LOAD: - err = bpf_prog_load(&attr, uattr); + err = bpf_prog_load(attr, uattr); break; case BPF_OBJ_PIN: - err = bpf_obj_pin(&attr); + err = bpf_obj_pin(attr); break; case BPF_OBJ_GET: - err = bpf_obj_get(&attr); + err = bpf_obj_get(attr); break; case BPF_PROG_ATTACH: - err = bpf_prog_attach(&attr); + err = bpf_prog_attach(attr); break; case BPF_PROG_DETACH: - err = bpf_prog_detach(&attr); + err = bpf_prog_detach(attr); break; case BPF_PROG_QUERY: - err = bpf_prog_query(&attr, uattr.user); + err = bpf_prog_query(attr, uattr.user); break; case BPF_PROG_TEST_RUN: - err = bpf_prog_test_run(&attr, uattr.user); + err = bpf_prog_test_run(attr, uattr.user); break; case BPF_PROG_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &prog_idr, &prog_idr_lock); break; case BPF_MAP_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &map_idr, &map_idr_lock); break; case BPF_BTF_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &btf_idr, &btf_idr_lock); break; case BPF_PROG_GET_FD_BY_ID: - err = bpf_prog_get_fd_by_id(&attr); + err = bpf_prog_get_fd_by_id(attr); break; case BPF_MAP_GET_FD_BY_ID: - err = bpf_map_get_fd_by_id(&attr); + err = bpf_map_get_fd_by_id(attr); break; case BPF_OBJ_GET_INFO_BY_FD: - err = bpf_obj_get_info_by_fd(&attr, uattr.user); + err = bpf_obj_get_info_by_fd(attr, uattr.user); break; case BPF_RAW_TRACEPOINT_OPEN: - err = bpf_raw_tracepoint_open(&attr); + err = bpf_raw_tracepoint_open(attr); break; case BPF_BTF_LOAD: - err = bpf_btf_load(&attr, uattr); + err = bpf_btf_load(attr, uattr); break; case BPF_BTF_GET_FD_BY_ID: - err = bpf_btf_get_fd_by_id(&attr); + err = bpf_btf_get_fd_by_id(attr); break; case BPF_TASK_FD_QUERY: - err = bpf_task_fd_query(&attr, uattr.user); + err = bpf_task_fd_query(attr, uattr.user); break; case BPF_MAP_LOOKUP_AND_DELETE_ELEM: - err = map_lookup_and_delete_elem(&attr); + err = map_lookup_and_delete_elem(attr); break; case BPF_MAP_LOOKUP_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_LOOKUP_BATCH); + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_LOOKUP_BATCH); break; case BPF_MAP_LOOKUP_AND_DELETE_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_LOOKUP_AND_DELETE_BATCH); break; case BPF_MAP_UPDATE_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_UPDATE_BATCH); + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_UPDATE_BATCH); break; case BPF_MAP_DELETE_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_DELETE_BATCH); + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_DELETE_BATCH); break; case BPF_LINK_CREATE: - err = link_create(&attr, uattr); + err = link_create(attr, uattr); break; case BPF_LINK_UPDATE: - err = link_update(&attr); + err = link_update(attr); break; case BPF_LINK_GET_FD_BY_ID: - err = bpf_link_get_fd_by_id(&attr); + err = bpf_link_get_fd_by_id(attr); break; case BPF_LINK_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &link_idr, &link_idr_lock); break; case BPF_ENABLE_STATS: - err = bpf_enable_stats(&attr); + err = bpf_enable_stats(attr); break; case BPF_ITER_CREATE: - err = bpf_iter_create(&attr); + err = bpf_iter_create(attr); break; case BPF_LINK_DETACH: - err = link_detach(&attr); + err = link_detach(attr); break; case BPF_PROG_BIND_MAP: - err = bpf_prog_bind_map(&attr); + err = bpf_prog_bind_map(attr); break; default: err = -EINVAL; @@ -5064,11 +5039,346 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) return err; }
+#ifdef CONFIG_COMPAT64 +static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf_attr *cattr, int cmd) +{ + struct bpf_prog *prog; + + switch (cmd) { + case BPF_MAP_CREATE: + dest->map_type = cattr->map_type; + dest->key_size = cattr->key_size; + dest->value_size = cattr->value_size; + dest->max_entries = cattr->max_entries; + dest->map_flags = cattr->map_flags; + dest->inner_map_fd = cattr->inner_map_fd; + dest->numa_node = cattr->numa_node; + strncpy(dest->map_name, cattr->map_name, BPF_OBJ_NAME_LEN); + dest->map_ifindex = cattr->map_ifindex; + dest->btf_fd = cattr->btf_fd; + dest->btf_key_type_id = cattr->btf_key_type_id; + dest->btf_value_type_id = cattr->btf_value_type_id; + dest->btf_vmlinux_value_type_id = cattr->btf_vmlinux_value_type_id; + dest->map_extra = cattr->map_extra; + break; + case BPF_MAP_LOOKUP_ELEM: + case BPF_MAP_UPDATE_ELEM: + case BPF_MAP_DELETE_ELEM: + case BPF_MAP_LOOKUP_AND_DELETE_ELEM: + dest->map_fd = cattr->map_fd; + dest->key = cattr->key; + dest->value = cattr->value; + /* u64 next_key is in a union with u64 value */ + dest->flags = cattr->flags; + break; + case BPF_MAP_LOOKUP_BATCH: + case BPF_MAP_LOOKUP_AND_DELETE_BATCH: + case BPF_MAP_UPDATE_BATCH: + case BPF_MAP_DELETE_BATCH: + dest->batch.in_batch = cattr->batch.in_batch; + dest->batch.out_batch = cattr->batch.out_batch; + dest->batch.keys = cattr->batch.keys; + dest->batch.values = cattr->batch.values; + dest->batch.count = cattr->batch.count; + dest->batch.map_fd = cattr->batch.map_fd; + dest->batch.elem_flags = cattr->batch.elem_flags; + dest->batch.flags = cattr->batch.flags; + break; + case BPF_PROG_LOAD: + dest->prog_type = cattr->prog_type; + dest->insn_cnt = cattr->insn_cnt; + dest->insns = cattr->insns; + dest->license = cattr->license; + dest->log_level = cattr->log_level; + dest->log_size = cattr->log_size; + dest->log_buf = cattr->log_buf; + dest->kern_version = cattr->kern_version; + dest->prog_flags = cattr->prog_flags; + strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN); + dest->prog_ifindex = cattr->prog_ifindex; + dest->expected_attach_type = cattr->expected_attach_type; + dest->prog_btf_fd = cattr->prog_btf_fd; + dest->func_info_rec_size = cattr->func_info_rec_size; + dest->func_info = cattr->func_info; + dest->func_info_cnt = cattr->func_info_cnt; + dest->line_info_rec_size = cattr->line_info_rec_size; + dest->line_info = cattr->line_info; + dest->line_info_cnt = cattr->line_info_cnt; + dest->attach_btf_id = cattr->attach_btf_id; + dest->attach_prog_fd = cattr->attach_prog_fd; + /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */ + dest->core_relo_cnt = cattr->core_relo_cnt; + dest->fd_array = cattr->fd_array; + dest->core_relos = cattr->core_relos; + dest->core_relo_rec_size = cattr->core_relo_rec_size; + break; + case BPF_OBJ_PIN: + case BPF_OBJ_GET: + dest->pathname = cattr->pathname; + dest->bpf_fd = cattr->bpf_fd; + dest->file_flags = cattr->file_flags; + break; + case BPF_PROG_ATTACH: + case BPF_PROG_DETACH: + dest->target_fd = cattr->target_fd; + dest->attach_bpf_fd = cattr->attach_bpf_fd; + dest->attach_type = cattr->attach_type; + dest->attach_flags = cattr->attach_flags; + dest->replace_bpf_fd = cattr->replace_bpf_fd; + break; + case BPF_PROG_RUN: /* same as BPF_PROG_TEST_RUN */ + dest->test.prog_fd = cattr->test.prog_fd; + dest->test.retval = cattr->test.retval; + dest->test.data_size_in = cattr->test.data_size_in; + dest->test.data_size_out = cattr->test.data_size_out; + dest->test.data_in = cattr->test.data_in; + dest->test.data_out = cattr->test.data_out; + dest->test.repeat = cattr->test.repeat; + dest->test.duration = cattr->test.duration; + dest->test.ctx_size_in = cattr->test.ctx_size_in; + dest->test.ctx_size_out = cattr->test.ctx_size_out; + dest->test.ctx_in = cattr->test.ctx_in; + dest->test.ctx_out = cattr->test.ctx_out; + dest->test.flags = cattr->test.flags; + dest->test.cpu = cattr->test.cpu; + dest->test.batch_size = cattr->test.batch_size; + break; + case BPF_PROG_GET_NEXT_ID: + case BPF_MAP_GET_NEXT_ID: + case BPF_PROG_GET_FD_BY_ID: + case BPF_MAP_GET_FD_BY_ID: + case BPF_BTF_GET_FD_BY_ID: + case BPF_BTF_GET_NEXT_ID: + case BPF_LINK_GET_FD_BY_ID: + case BPF_LINK_GET_NEXT_ID: + /* u32 prog_id, map_id, btf_id + link_id are in a union with + * u32 start_id */ + dest->start_id = cattr->start_id; + dest->next_id = cattr->next_id; + dest->open_flags = cattr->open_flags; + break; + case BPF_OBJ_GET_INFO_BY_FD: + dest->info.bpf_fd = cattr->info.bpf_fd; + dest->info.info_len = cattr->info.info_len; + dest->info.info = cattr->info.info; + break; + case BPF_PROG_QUERY: + dest->query.target_fd = cattr->query.target_fd; + dest->query.attach_type = cattr->query.attach_type; + dest->query.query_flags = cattr->query.query_flags; + dest->query.attach_flags = cattr->query.attach_flags; + dest->query.prog_ids = cattr->query.prog_ids; + dest->query.prog_cnt = cattr->query.prog_cnt; + dest->query.prog_attach_flags = cattr->query.prog_attach_flags; + break; + case BPF_RAW_TRACEPOINT_OPEN: + dest->raw_tracepoint.name = cattr->raw_tracepoint.name; + dest->raw_tracepoint.prog_fd = cattr->raw_tracepoint.prog_fd; + break; + case BPF_BTF_LOAD: + dest->btf = cattr->btf; + dest->btf_log_buf = cattr->btf_log_buf; + dest->btf_size = cattr->btf_size; + dest->btf_log_size = cattr->btf_log_size; + dest->btf_log_level = cattr->btf_log_level; + break; + case BPF_TASK_FD_QUERY: + dest->task_fd_query.pid = cattr->task_fd_query.pid; + dest->task_fd_query.fd = cattr->task_fd_query.fd; + dest->task_fd_query.flags = cattr->task_fd_query.flags; + dest->task_fd_query.buf_len = cattr->task_fd_query.buf_len; + dest->task_fd_query.buf = cattr->task_fd_query.buf; + dest->task_fd_query.prog_id = cattr->task_fd_query.prog_id; + dest->task_fd_query.fd_type = cattr->task_fd_query.fd_type; + dest->task_fd_query.probe_offset = cattr->task_fd_query.probe_offset; + dest->task_fd_query.probe_addr = cattr->task_fd_query.probe_addr; + break; + case BPF_LINK_CREATE: + dest->link_create.prog_fd = cattr->link_create.prog_fd; + dest->link_create.target_fd = cattr->link_create.target_fd; + /* u32 target_ifindex is in a union with u32 target_fd */ + dest->link_create.attach_type = cattr->link_create.attach_type; + dest->link_create.flags = cattr->link_create.flags; + + prog = bpf_prog_get(cattr->link_create.prog_fd); + + if (prog->type == BPF_PROG_TYPE_CGROUP_SKB || + prog->type == BPF_PROG_TYPE_CGROUP_SOCK || + prog->type == BPF_PROG_TYPE_CGROUP_SOCK_ADDR || + prog->type == BPF_PROG_TYPE_SOCK_OPS || + prog->type == BPF_PROG_TYPE_CGROUP_DEVICE || + prog->type == BPF_PROG_TYPE_CGROUP_SYSCTL || + prog->type == BPF_PROG_TYPE_CGROUP_SOCKOPT) + break; + + if (prog->type == BPF_PROG_TYPE_EXT) { + dest->link_create.tracing.target_btf_id = + cattr->link_create.tracing.target_btf_id; + dest->link_create.tracing.cookie = + cattr->link_create.tracing.cookie; + break; + } + + if (prog->type == BPF_PROG_TYPE_LSM || + prog->type == BPF_PROG_TYPE_TRACING) { + if (prog->expected_attach_type == BPF_TRACE_ITER) { + /* iter_info is a user pointer to union + * bpf_iter_link_info however since this union + * contains no pointers, the size/offsets are + * the same for compat64/purecap; hence no + * conversion needed */ + dest->link_create.iter_info = + cattr->link_create.iter_info; + dest->link_create.iter_info_len = + cattr->link_create.iter_info_len; + break; + } else if (prog->expected_attach_type == BPF_TRACE_RAW_TP + || prog->expected_attach_type == BPF_LSM_CGROUP) { + /* only uses common fields above */ + break; + } else { + dest->link_create.target_btf_id = + cattr->link_create.target_btf_id; + dest->link_create.tracing.cookie = + cattr->link_create.tracing.cookie; + break; + } + } + + if (prog->type == BPF_PROG_TYPE_FLOW_DISSECTOR || + prog->type == BPF_PROG_TYPE_SK_LOOKUP || + prog->type == BPF_PROG_TYPE_XDP) + break; + + /* bpf_cookie is used in bpf_perf_link_attach() */ + if (prog->type == BPF_PROG_TYPE_PERF_EVENT || + prog->type == BPF_PROG_TYPE_TRACEPOINT || + (prog->type == BPF_PROG_TYPE_KPROBE && + cattr->link_create.attach_type == BPF_PERF_EVENT)) { + dest->link_create.perf_event.bpf_cookie = + cattr->link_create.perf_event.bpf_cookie; + break; + } + + /* kprobe_multi is used in bpf_kprobe_multi_link_attach() */ + if (prog->type == BPF_PROG_TYPE_KPROBE && + cattr->link_create.attach_type != BPF_PERF_EVENT) { + dest->link_create.kprobe_multi.flags = + cattr->link_create.kprobe_multi.flags; + dest->link_create.kprobe_multi.cnt = + cattr->link_create.kprobe_multi.cnt; + dest->link_create.kprobe_multi.syms = + cattr->link_create.kprobe_multi.syms; + dest->link_create.kprobe_multi.addrs = + cattr->link_create.kprobe_multi.addrs; + dest->link_create.kprobe_multi.cookies = + cattr->link_create.kprobe_multi.cookies; + break; + } + break; + case BPF_LINK_UPDATE: + dest->link_update.link_fd = cattr->link_update.link_fd; + dest->link_update.new_prog_fd = cattr->link_update.new_prog_fd; + dest->link_update.flags = cattr->link_update.flags; + dest->link_update.old_prog_fd = cattr->link_update.old_prog_fd; + break; + case BPF_LINK_DETACH: + dest->link_detach.link_fd = cattr->link_detach.link_fd; + break; + case BPF_ENABLE_STATS: + dest->enable_stats.type = cattr->enable_stats.type; + break; + case BPF_ITER_CREATE: + dest->iter_create.link_fd = cattr->iter_create.link_fd; + dest->iter_create.flags = cattr->iter_create.flags; + break; + case BPF_PROG_BIND_MAP: + dest->prog_bind_map.prog_fd = cattr->prog_bind_map.prog_fd; + dest->prog_bind_map.map_fd = cattr->prog_bind_map.map_fd; + dest->prog_bind_map.flags = cattr->prog_bind_map.flags; + break; + }; +} +#endif /* CONFIG_COMPAT64 */ + +static int bpf_check_perms(int cmd) +{ + bool capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled; + + /* Intent here is for unprivileged_bpf_disabled to block key object + * creation commands for unprivileged users; other actions depend + * of fd availability and access to bpffs, so are dependent on + * object creation success. Capabilities are later verified for + * operations such as load and map create, so even with unprivileged + * BPF disabled, capability checks are still carried out for these + * and other operations. + */ + if (!capable && + (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD)) + return -EPERM; + + return 0; +} + +static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{ + union bpf_attr attr; + int err; + + err = bpf_check_perms(cmd); + if (err) + return err; + + err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); + if (err) + return err; + size = min_t(u32, size, sizeof(attr)); + + /* copy attributes from user space, may be less than sizeof(bpf_attr) */ + memset(&attr, 0, sizeof(attr)); + if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0) + return -EFAULT; + + return dispatch_bpf(cmd, &attr, uattr, size); +} + SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { return __sys_bpf(cmd, USER_BPFPTR(uattr), size); }
+#ifdef CONFIG_COMPAT64 +static int __sys_compat_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{ + union bpf_attr attr; + union compat_bpf_attr cattr; + int err; + + err = bpf_check_perms(cmd); + if (err) + return err; + + err = bpf_check_uarg_tail_zero(uattr, sizeof(cattr), size); + if (err) + return err; + size = min_t(u32, size, sizeof(cattr)); + + /* copy attributes from user space, may be less than sizeof(bpf_attr) */ + memset(&cattr, 0, sizeof(cattr)); + if (copy_from_bpfptr_with_ptr(&cattr, uattr, size) != 0) + return -EFAULT; + + convert_compat_bpf_attr(&attr, &cattr, cmd); + + return dispatch_bpf(cmd, &attr, uattr, sizeof(attr)); +} + +COMPAT_SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) +{ + return __sys_compat_bpf(cmd, USER_BPFPTR(uattr), size); +} +#endif /* CONFIG_COMPAT64 */ + static bool syscall_prog_is_valid_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog,
On 08/06/2023 10:42, Zachary Leaf wrote:
The bpf syscall does not require a compat handler for 32-bit compat. This is achieved by using u64 instead of pointer types in the bpf_attr union used to pass arguments to the syscall. This means that in a system where pointers are 32-bit, the struct/union layouts and offsets are the same as in a 64-bit arch, since the u64 field is split into two u32 fields/registers.
This greatly simplifies 32-bit compat at the small cost of requiring casting pointers passed in through the uAPI to u64 (generally via ptr_to_u64() helper functions).
This poses a problem in architectures where user pointers are longer than 64b such as Morello/PCuABI where pointers are represented as 129b capabilities. In order to extend the bpf syscall interface to accept capabilities and still retain compatibility with the existing 64/32b ABI, a 64-bit compat layer and appropriate conversions must be added to handle the different union/struct sizes caused by this pointer size mis-match.
Before extending the number of bits in union bpf_attr to accept capabilitities, set the groundwork with a compat64 handler and
"capabilitities" sounds rather wrong, better with one less "ti" :D
conversion function to take a compat64 sized bpf_attr and convert it to what will be the new native offsets.
Inbound conversion is handled upfront to minimise impact on existing code and reduce overall diff size. After dispatch_bpf the majority of code can remain unchanged. The cases where conversion back out to userspace is required are handled in subsequent commits.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
[...] +#ifdef CONFIG_COMPAT64 +static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf_attr *cattr, int cmd) +{
- struct bpf_prog *prog;
- switch (cmd) {
- case BPF_MAP_CREATE:
dest->map_type = cattr->map_type;
dest->key_size = cattr->key_size;
dest->value_size = cattr->value_size;
dest->max_entries = cattr->max_entries;
dest->map_flags = cattr->map_flags;
dest->inner_map_fd = cattr->inner_map_fd;
dest->numa_node = cattr->numa_node;
strncpy(dest->map_name, cattr->map_name, BPF_OBJ_NAME_LEN);
dest->map_ifindex = cattr->map_ifindex;
dest->btf_fd = cattr->btf_fd;
dest->btf_key_type_id = cattr->btf_key_type_id;
dest->btf_value_type_id = cattr->btf_value_type_id;
dest->btf_vmlinux_value_type_id = cattr->btf_vmlinux_value_type_id;
dest->map_extra = cattr->map_extra;
break;
- case BPF_MAP_LOOKUP_ELEM:
- case BPF_MAP_UPDATE_ELEM:
- case BPF_MAP_DELETE_ELEM:
- case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
dest->map_fd = cattr->map_fd;
dest->key = cattr->key;
dest->value = cattr->value;
/* u64 next_key is in a union with u64 value */
dest->flags = cattr->flags;
break;
- case BPF_MAP_LOOKUP_BATCH:
- case BPF_MAP_LOOKUP_AND_DELETE_BATCH:
- case BPF_MAP_UPDATE_BATCH:
- case BPF_MAP_DELETE_BATCH:
dest->batch.in_batch = cattr->batch.in_batch;
dest->batch.out_batch = cattr->batch.out_batch;
dest->batch.keys = cattr->batch.keys;
dest->batch.values = cattr->batch.values;
dest->batch.count = cattr->batch.count;
dest->batch.map_fd = cattr->batch.map_fd;
dest->batch.elem_flags = cattr->batch.elem_flags;
dest->batch.flags = cattr->batch.flags;
break;
- case BPF_PROG_LOAD:
dest->prog_type = cattr->prog_type;
dest->insn_cnt = cattr->insn_cnt;
dest->insns = cattr->insns;
dest->license = cattr->license;
dest->log_level = cattr->log_level;
dest->log_size = cattr->log_size;
dest->log_buf = cattr->log_buf;
dest->kern_version = cattr->kern_version;
dest->prog_flags = cattr->prog_flags;
strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN);
dest->prog_ifindex = cattr->prog_ifindex;
dest->expected_attach_type = cattr->expected_attach_type;
dest->prog_btf_fd = cattr->prog_btf_fd;
dest->func_info_rec_size = cattr->func_info_rec_size;
dest->func_info = cattr->func_info;
dest->func_info_cnt = cattr->func_info_cnt;
dest->line_info_rec_size = cattr->line_info_rec_size;
dest->line_info = cattr->line_info;
dest->line_info_cnt = cattr->line_info_cnt;
dest->attach_btf_id = cattr->attach_btf_id;
dest->attach_prog_fd = cattr->attach_prog_fd;
/* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */
dest->core_relo_cnt = cattr->core_relo_cnt;
dest->fd_array = cattr->fd_array;
dest->core_relos = cattr->core_relos;
dest->core_relo_rec_size = cattr->core_relo_rec_size;
break;
- case BPF_OBJ_PIN:
- case BPF_OBJ_GET:
dest->pathname = cattr->pathname;
dest->bpf_fd = cattr->bpf_fd;
dest->file_flags = cattr->file_flags;
break;
- case BPF_PROG_ATTACH:
- case BPF_PROG_DETACH:
dest->target_fd = cattr->target_fd;
dest->attach_bpf_fd = cattr->attach_bpf_fd;
dest->attach_type = cattr->attach_type;
dest->attach_flags = cattr->attach_flags;
dest->replace_bpf_fd = cattr->replace_bpf_fd;
break;
- case BPF_PROG_RUN: /* same as BPF_PROG_TEST_RUN */
dest->test.prog_fd = cattr->test.prog_fd;
dest->test.retval = cattr->test.retval;
dest->test.data_size_in = cattr->test.data_size_in;
dest->test.data_size_out = cattr->test.data_size_out;
dest->test.data_in = cattr->test.data_in;
dest->test.data_out = cattr->test.data_out;
dest->test.repeat = cattr->test.repeat;
dest->test.duration = cattr->test.duration;
dest->test.ctx_size_in = cattr->test.ctx_size_in;
dest->test.ctx_size_out = cattr->test.ctx_size_out;
dest->test.ctx_in = cattr->test.ctx_in;
dest->test.ctx_out = cattr->test.ctx_out;
dest->test.flags = cattr->test.flags;
dest->test.cpu = cattr->test.cpu;
dest->test.batch_size = cattr->test.batch_size;
break;
- case BPF_PROG_GET_NEXT_ID:
- case BPF_MAP_GET_NEXT_ID:
- case BPF_PROG_GET_FD_BY_ID:
- case BPF_MAP_GET_FD_BY_ID:
- case BPF_BTF_GET_FD_BY_ID:
- case BPF_BTF_GET_NEXT_ID:
- case BPF_LINK_GET_FD_BY_ID:
- case BPF_LINK_GET_NEXT_ID:
/* u32 prog_id, map_id, btf_id + link_id are in a union with
* u32 start_id */
dest->start_id = cattr->start_id;
dest->next_id = cattr->next_id;
dest->open_flags = cattr->open_flags;
break;
- case BPF_OBJ_GET_INFO_BY_FD:
dest->info.bpf_fd = cattr->info.bpf_fd;
dest->info.info_len = cattr->info.info_len;
dest->info.info = cattr->info.info;
break;
- case BPF_PROG_QUERY:
dest->query.target_fd = cattr->query.target_fd;
dest->query.attach_type = cattr->query.attach_type;
dest->query.query_flags = cattr->query.query_flags;
dest->query.attach_flags = cattr->query.attach_flags;
dest->query.prog_ids = cattr->query.prog_ids;
dest->query.prog_cnt = cattr->query.prog_cnt;
dest->query.prog_attach_flags = cattr->query.prog_attach_flags;
break;
- case BPF_RAW_TRACEPOINT_OPEN:
dest->raw_tracepoint.name = cattr->raw_tracepoint.name;
dest->raw_tracepoint.prog_fd = cattr->raw_tracepoint.prog_fd;
break;
- case BPF_BTF_LOAD:
dest->btf = cattr->btf;
dest->btf_log_buf = cattr->btf_log_buf;
dest->btf_size = cattr->btf_size;
dest->btf_log_size = cattr->btf_log_size;
dest->btf_log_level = cattr->btf_log_level;
break;
- case BPF_TASK_FD_QUERY:
dest->task_fd_query.pid = cattr->task_fd_query.pid;
dest->task_fd_query.fd = cattr->task_fd_query.fd;
dest->task_fd_query.flags = cattr->task_fd_query.flags;
dest->task_fd_query.buf_len = cattr->task_fd_query.buf_len;
dest->task_fd_query.buf = cattr->task_fd_query.buf;
dest->task_fd_query.prog_id = cattr->task_fd_query.prog_id;
dest->task_fd_query.fd_type = cattr->task_fd_query.fd_type;
dest->task_fd_query.probe_offset = cattr->task_fd_query.probe_offset;
dest->task_fd_query.probe_addr = cattr->task_fd_query.probe_addr;
break;
- case BPF_LINK_CREATE:
dest->link_create.prog_fd = cattr->link_create.prog_fd;
dest->link_create.target_fd = cattr->link_create.target_fd;
/* u32 target_ifindex is in a union with u32 target_fd */
dest->link_create.attach_type = cattr->link_create.attach_type;
dest->link_create.flags = cattr->link_create.flags;
prog = bpf_prog_get(cattr->link_create.prog_fd);
if (prog->type == BPF_PROG_TYPE_CGROUP_SKB ||
prog->type == BPF_PROG_TYPE_CGROUP_SOCK ||
prog->type == BPF_PROG_TYPE_CGROUP_SOCK_ADDR ||
prog->type == BPF_PROG_TYPE_SOCK_OPS ||
prog->type == BPF_PROG_TYPE_CGROUP_DEVICE ||
prog->type == BPF_PROG_TYPE_CGROUP_SYSCTL ||
prog->type == BPF_PROG_TYPE_CGROUP_SOCKOPT)
break;
if (prog->type == BPF_PROG_TYPE_EXT) {
dest->link_create.tracing.target_btf_id =
cattr->link_create.tracing.target_btf_id;
dest->link_create.tracing.cookie =
cattr->link_create.tracing.cookie;
break;
}
if (prog->type == BPF_PROG_TYPE_LSM ||
prog->type == BPF_PROG_TYPE_TRACING) {
if (prog->expected_attach_type == BPF_TRACE_ITER) {
/* iter_info is a user pointer to union
* bpf_iter_link_info however since this union
* contains no pointers, the size/offsets are
* the same for compat64/purecap; hence no
* conversion needed */
dest->link_create.iter_info =
cattr->link_create.iter_info;
dest->link_create.iter_info_len =
cattr->link_create.iter_info_len;
break;
} else if (prog->expected_attach_type == BPF_TRACE_RAW_TP
|| prog->expected_attach_type == BPF_LSM_CGROUP) {
/* only uses common fields above */
break;
} else {
dest->link_create.target_btf_id =
cattr->link_create.target_btf_id;
dest->link_create.tracing.cookie =
cattr->link_create.tracing.cookie;
break;
}
}
if (prog->type == BPF_PROG_TYPE_FLOW_DISSECTOR ||
prog->type == BPF_PROG_TYPE_SK_LOOKUP ||
prog->type == BPF_PROG_TYPE_XDP)
break;
/* bpf_cookie is used in bpf_perf_link_attach() */
if (prog->type == BPF_PROG_TYPE_PERF_EVENT ||
prog->type == BPF_PROG_TYPE_TRACEPOINT ||
(prog->type == BPF_PROG_TYPE_KPROBE &&
cattr->link_create.attach_type == BPF_PERF_EVENT)) {
dest->link_create.perf_event.bpf_cookie =
cattr->link_create.perf_event.bpf_cookie;
break;
}
/* kprobe_multi is used in bpf_kprobe_multi_link_attach() */
if (prog->type == BPF_PROG_TYPE_KPROBE &&
cattr->link_create.attach_type != BPF_PERF_EVENT) {
dest->link_create.kprobe_multi.flags =
cattr->link_create.kprobe_multi.flags;
dest->link_create.kprobe_multi.cnt =
cattr->link_create.kprobe_multi.cnt;
dest->link_create.kprobe_multi.syms =
cattr->link_create.kprobe_multi.syms;
dest->link_create.kprobe_multi.addrs =
cattr->link_create.kprobe_multi.addrs;
dest->link_create.kprobe_multi.cookies =
cattr->link_create.kprobe_multi.cookies;
break;
}
break;
- case BPF_LINK_UPDATE:
dest->link_update.link_fd = cattr->link_update.link_fd;
dest->link_update.new_prog_fd = cattr->link_update.new_prog_fd;
dest->link_update.flags = cattr->link_update.flags;
dest->link_update.old_prog_fd = cattr->link_update.old_prog_fd;
break;
- case BPF_LINK_DETACH:
dest->link_detach.link_fd = cattr->link_detach.link_fd;
break;
- case BPF_ENABLE_STATS:
dest->enable_stats.type = cattr->enable_stats.type;
break;
- case BPF_ITER_CREATE:
dest->iter_create.link_fd = cattr->iter_create.link_fd;
dest->iter_create.flags = cattr->iter_create.flags;
break;
- case BPF_PROG_BIND_MAP:
dest->prog_bind_map.prog_fd = cattr->prog_bind_map.prog_fd;
dest->prog_bind_map.map_fd = cattr->prog_bind_map.map_fd;
dest->prog_bind_map.flags = cattr->prog_bind_map.flags;
break;
- };
Considering the really large number of assignments, I think it would be worth introducing a macro that takes dest, cattr and the name of the member to copy. That would shorten things a bit for nested structs, but most importantly it would prevent typos slipping in. A similar macro for pointer conversion could be introduced in the last patch (which would also hide the very noisy cast to __kernel_aligned_uintptr_t).
+} +#endif /* CONFIG_COMPAT64 */
+static int bpf_check_perms(int cmd) +{
- bool capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled;
- /* Intent here is for unprivileged_bpf_disabled to block key object
* creation commands for unprivileged users; other actions depend
* of fd availability and access to bpffs, so are dependent on
* object creation success. Capabilities are later verified for
* operations such as load and map create, so even with unprivileged
* BPF disabled, capability checks are still carried out for these
* and other operations.
*/
- if (!capable &&
(cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
return -EPERM;
- return 0;
+}
+static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{
- union bpf_attr attr;
- int err;
- err = bpf_check_perms(cmd);
- if (err)
return err;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
- if (err)
return err;
- size = min_t(u32, size, sizeof(attr));
- /* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&attr, 0, sizeof(attr));
- if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0)
I would prefer if the switch to the *_with_ptr() variants happened in the final patch, like in the io_uring series. That's because it is more logical this way: at this stage there is no actual pointer in bpf_attr, the need for *_with_ptr() arises when changing the types in the final patch. This also applies to patch 5-8.
return -EFAULT;
- return dispatch_bpf(cmd, &attr, uattr, size);
+}
SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { return __sys_bpf(cmd, USER_BPFPTR(uattr), size); } +#ifdef CONFIG_COMPAT64 +static int __sys_compat_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{
- union bpf_attr attr;
- union compat_bpf_attr cattr;
- int err;
- err = bpf_check_perms(cmd);
- if (err)
return err;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(cattr), size);
- if (err)
return err;
- size = min_t(u32, size, sizeof(cattr));
- /* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&cattr, 0, sizeof(cattr));
- if (copy_from_bpfptr_with_ptr(&cattr, uattr, size) != 0)
compat structs/unions do not hold actual pointers so we shouldn't use *_with_ptr() in that case.
return -EFAULT;
- convert_compat_bpf_attr(&attr, &cattr, cmd);
- return dispatch_bpf(cmd, &attr, uattr, sizeof(attr));
+}
+COMPAT_SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) +{
- return __sys_compat_bpf(cmd, USER_BPFPTR(uattr), size);
+}
Adding a compat handler is the "normal" way to handle this situation, so the approach in this patch is sound. However, it seems preferable to keep a native handler only and use in_compat_syscall() in __sys_bpf() for a couple of reasons: - AFAICT the diff in this file would be a lot smaller - just a few lines changed in __sys_bpf(), and of course the new conversion function. - It would avoid overriding the handler in sys_compat64.c, which is not ideal as it makes it hard to figure out what handler is actually used. The situation would be different if we were adding a "generic" compat handler, as in that case we could just change include/uapi/asm-generic/unistd.h (the normal way to do this).
With that approach, a further small improvement can be made. On the same model as Tudor's "io_uring: Implement compat versions of uAPI structs and handle them", __sys_bpf() could call a generic function, say copy_bpf_attr_from_user(). This function would then call the conversion function in compat64. Besides a better encapsulation, the main advantage is that the union compat_bpf_attr variable is allocated in the conversion function, and it is therefore live only for the duration of the conversion. Conversely, allocating it in __sys_compat_bpf() means it lives for the whole duration of the syscall handling. In other words, it slightly improves the stack usage. Not essential but nice to have. I think that approach could also be used in patch 5-8.
Finally to reduce the amount of #ifdef, we could have a macro like io_in_compat64() that only returns true in compat64. This allows keeping most of the code out of any #ifdef, assuming struct definitions and such are not #ifdef'd either (which is not particularly useful anyway).
Kevin
On 12/07/2023 12:19, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
The bpf syscall does not require a compat handler for 32-bit compat. This is achieved by using u64 instead of pointer types in the bpf_attr union used to pass arguments to the syscall. This means that in a system where pointers are 32-bit, the struct/union layouts and offsets are the same as in a 64-bit arch, since the u64 field is split into two u32 fields/registers.
This greatly simplifies 32-bit compat at the small cost of requiring casting pointers passed in through the uAPI to u64 (generally via ptr_to_u64() helper functions).
This poses a problem in architectures where user pointers are longer than 64b such as Morello/PCuABI where pointers are represented as 129b capabilities. In order to extend the bpf syscall interface to accept capabilities and still retain compatibility with the existing 64/32b ABI, a 64-bit compat layer and appropriate conversions must be added to handle the different union/struct sizes caused by this pointer size mis-match.
Before extending the number of bits in union bpf_attr to accept capabilitities, set the groundwork with a compat64 handler and
"capabilitities" sounds rather wrong, better with one less "ti" :D
conversion function to take a compat64 sized bpf_attr and convert it to what will be the new native offsets.
Inbound conversion is handled upfront to minimise impact on existing code and reduce overall diff size. After dispatch_bpf the majority of code can remain unchanged. The cases where conversion back out to userspace is required are handled in subsequent commits.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
[...] +#ifdef CONFIG_COMPAT64 +static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf_attr *cattr, int cmd) +{
- struct bpf_prog *prog;
- switch (cmd) {
[snip]
- case BPF_PROG_BIND_MAP:
dest->prog_bind_map.prog_fd = cattr->prog_bind_map.prog_fd;
dest->prog_bind_map.map_fd = cattr->prog_bind_map.map_fd;
dest->prog_bind_map.flags = cattr->prog_bind_map.flags;
break;
- };
Considering the really large number of assignments, I think it would be worth introducing a macro that takes dest, cattr and the name of the member to copy. That would shorten things a bit for nested structs, but most importantly it would prevent typos slipping in.
That's a good idea.
I've added a generic copy_field() macro in include/linux/stddef.h:
#define copy_field(DEST, SRC, FIELD) \ ((DEST)->FIELD = (SRC)->FIELD)
A similar macro for pointer conversion could be introduced in the last patch (which would also hide the very noisy cast to __kernel_aligned_uintptr_t).
This is a bit more tricky to name and figure out where it should live. So far I have compat_uptr_to_kern() in arch/arm64/include/asm/compat.h next to compat_ptr() define just for lack of a better place. It's kind of generic (not bpf related) but also pretty specific at the same time, hence hard to name without the name being really long. Maybe we just make it bpf specific and limit it to bpf/syscall.c + bpf/btf.c where it's used.
#define compat_uptr_to_kern(DEST, SRC, FIELD) \ ((DEST)->FIELD = (__kernel_aligned_uintptr_t)compat_ptr((SRC)->FIELD))
+} +#endif /* CONFIG_COMPAT64 */
+static int bpf_check_perms(int cmd) +{
- bool capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled;
- /* Intent here is for unprivileged_bpf_disabled to block key object
* creation commands for unprivileged users; other actions depend
* of fd availability and access to bpffs, so are dependent on
* object creation success. Capabilities are later verified for
* operations such as load and map create, so even with unprivileged
* BPF disabled, capability checks are still carried out for these
* and other operations.
*/
- if (!capable &&
(cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
return -EPERM;
- return 0;
+}
+static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{
- union bpf_attr attr;
- int err;
- err = bpf_check_perms(cmd);
- if (err)
return err;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
- if (err)
return err;
- size = min_t(u32, size, sizeof(attr));
- /* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&attr, 0, sizeof(attr));
- if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0)
I would prefer if the switch to the *_with_ptr() variants happened in the final patch, like in the io_uring series. That's because it is more logical this way: at this stage there is no actual pointer in bpf_attr, the need for *_with_ptr() arises when changing the types in the final patch. This also applies to patch 5-8.
Ack
return -EFAULT;
- return dispatch_bpf(cmd, &attr, uattr, size);
+}
SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { return __sys_bpf(cmd, USER_BPFPTR(uattr), size); } +#ifdef CONFIG_COMPAT64 +static int __sys_compat_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{
- union bpf_attr attr;
- union compat_bpf_attr cattr;
- int err;
- err = bpf_check_perms(cmd);
- if (err)
return err;
- err = bpf_check_uarg_tail_zero(uattr, sizeof(cattr), size);
- if (err)
return err;
- size = min_t(u32, size, sizeof(cattr));
- /* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(&cattr, 0, sizeof(cattr));
- if (copy_from_bpfptr_with_ptr(&cattr, uattr, size) != 0)
compat structs/unions do not hold actual pointers so we shouldn't use *_with_ptr() in that case.
Ack
return -EFAULT;
- convert_compat_bpf_attr(&attr, &cattr, cmd);
- return dispatch_bpf(cmd, &attr, uattr, sizeof(attr));
+}
+COMPAT_SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) +{
- return __sys_compat_bpf(cmd, USER_BPFPTR(uattr), size);
+}
Adding a compat handler is the "normal" way to handle this situation, so the approach in this patch is sound. However, it seems preferable to keep a native handler only and use in_compat_syscall() in __sys_bpf() for a couple of reasons:
- AFAICT the diff in this file would be a lot smaller - just a few lines
changed in __sys_bpf(), and of course the new conversion function.
It's a bit more compact - but there's some extra handling to make sure we're selecting the right size and union for everything:
inline bool in_compat64_syscall() { // generic - can go in arch/arm64/include/asm/compat.h return IS_ENABLED(CONFIG_COMPAT64) && in_compat_syscall(); }
#define select_attr (in_compat64_syscall() ? &cattr : attr) #define attr_size (in_compat64_syscall() ? sizeof(union compat_bpf_attr) : \ sizeof(union bpf_attr)) #define bpfptr_copy (in_compat64_syscall() ? copy_from_bpfptr : \ copy_from_bpfptr_with_ptr) static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, bpfptr_t uattr, unsigned int *size) { union compat_bpf_attr cattr; int err;
err = bpf_check_uarg_tail_zero(uattr, attr_size, *size); if (err) return err; *size = min_t(u32, *size, attr_size);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */ memset(select_attr, 0, attr_size); if (bpfptr_copy(select_attr, uattr, *size) != 0) return -EFAULT;
err = check_attr(cmd, select_attr); if (err) return -EINVAL;
if(in_compat64_syscall()) convert_compat_bpf_attr(attr, &cattr, cmd);
return 0; } #undef select_attr #undef attr_size #undef bpfptr_copy
static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; int err;
err = bpf_check_perms(cmd); if (err) return err;
err = copy_bpf_attr_from_user(&attr, cmd, uattr, &size); if (err) return err;
return dispatch_bpf(cmd, &attr, uattr, size); }
- It would avoid overriding the handler in sys_compat64.c, which is not
ideal as it makes it hard to figure out what handler is actually used. The situation would be different if we were adding a "generic" compat handler, as in that case we could just change include/uapi/asm-generic/unistd.h (the normal way to do this).
I don't follow that - it adds a handler in sys_compat64.c, and the handler in bpf/syscall.c is clearly guarded by #ifdef CONFIG_COMPAT64. We can even add a comment stating the compat handler is for compat64 only.
With that approach, a further small improvement can be made. On the same model as Tudor's "io_uring: Implement compat versions of uAPI structs and handle them", __sys_bpf() could call a generic function, say copy_bpf_attr_from_user(). This function would then call the conversion function in compat64. Besides a better encapsulation, the main advantage is that the union compat_bpf_attr variable is allocated in the conversion function, and it is therefore live only for the duration of the conversion. Conversely, allocating it in __sys_compat_bpf() means it lives for the whole duration of the syscall handling. In other words, it slightly improves the stack usage. Not essential but nice to have. I think that approach could also be used in patch 5-8.
Makes sense. That should be possible with the __sys_compat_bpf() handler too, just call out into another small helper function.
Finally to reduce the amount of #ifdef, we could have a macro like io_in_compat64() that only returns true in compat64. This allows keeping most of the code out of any #ifdef, assuming struct definitions and such are not #ifdef'd either (which is not particularly useful anyway).
I had preferred the other option just because everything compat64 is behind an #ifdef; so in the CONFIG_CHERI_PURECAP_UABI=n case it's as if compat64 doesn't even exist. We don't allocate, define or otherwise do anything compat64 related when CONFIG_COMPAT64 is disabled. So what's the problem with #ifdef? Maybe that is not that useful since I doubt anyone is using this kernel without PCuABI. Just seems a bit "cleaner" and straight forward to me, at the cost of some minor duplication.
Either way should work really so I don't know if you have strong feelings one way or the other after seeing the above. I'm leaning towards __sys_compat_bpf() as before but I might have mis-understood some stuff or not implemented it correctly.
Thanks,
Zach
Kevin
On 18/08/2023 17:15, Zachary Leaf wrote:
- case BPF_PROG_BIND_MAP:
dest->prog_bind_map.prog_fd = cattr->prog_bind_map.prog_fd;
dest->prog_bind_map.map_fd = cattr->prog_bind_map.map_fd;
dest->prog_bind_map.flags = cattr->prog_bind_map.flags;
break;
- };
Considering the really large number of assignments, I think it would be worth introducing a macro that takes dest, cattr and the name of the member to copy. That would shorten things a bit for nested structs, but most importantly it would prevent typos slipping in.
That's a good idea.
I've added a generic copy_field() macro in include/linux/stddef.h:
#define copy_field(DEST, SRC, FIELD) \ ((DEST)->FIELD = (SRC)->FIELD)
I wasn't thinking about adding it to a generic header but considering that it is indeed a completely generic macro, I don't see why not! stddef.h looks like a reasonable place to add it. Usual comment: macro arguments should not be capitalised.
A similar macro for pointer conversion could be introduced in the last patch (which would also hide the very noisy cast to __kernel_aligned_uintptr_t).
This is a bit more tricky to name and figure out where it should live. So far I have compat_uptr_to_kern() in arch/arm64/include/asm/compat.h next to compat_ptr() define just for lack of a better place. It's kind of generic (not bpf related) but also pretty specific at the same time, hence hard to name without the name being really long. Maybe we just make it bpf specific and limit it to bpf/syscall.c + bpf/btf.c where it's used.
I think we should indeed do that, it would have its place in bpf_compat.h. You are right that it is somewhat generic as we need to do this everywhere we introduced compat64 handling (e.g. io_uring), but the number of conversions is sufficiently low there that the most readable is probably to leave them as-is.
One comment on the naming, "to_kern" doesn't make much sense - it suggests converting to a kernel pointer, which is not what happens. __kernel_aligned_uintptr_t is thus named because it is a uapi type, like __kernel_long_t for instance - that is "kernel" means a kernel type here, nothing else. Maybe bpf_uattr_compat_ptr(), combining compat_ptr() and bpf_put_uattr()?
#define compat_uptr_to_kern(DEST, SRC, FIELD) \ ((DEST)->FIELD = (__kernel_aligned_uintptr_t)compat_ptr((SRC)->FIELD))
[...]
+COMPAT_SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) +{
- return __sys_compat_bpf(cmd, USER_BPFPTR(uattr), size);
+}
Adding a compat handler is the "normal" way to handle this situation, so the approach in this patch is sound. However, it seems preferable to keep a native handler only and use in_compat_syscall() in __sys_bpf() for a couple of reasons:
- AFAICT the diff in this file would be a lot smaller - just a few lines
changed in __sys_bpf(), and of course the new conversion function.
It's a bit more compact - but there's some extra handling to make sure we're selecting the right size and union for everything:
The principle looks good to me, I think the diff should be quite clear too. Some comments inline.
inline bool in_compat64_syscall() { // generic - can go in arch/arm64/include/asm/compat.h
Why not, this could be used in a few other places too. That said if we do this we should make it consistent with [1], in fact Kristina mentioned the idea of adding 64-bit helpers in the commit message.
[1] https://git.morello-project.org/morello/kernel/linux/-/commit/fd6902f3c5266
return IS_ENABLED(CONFIG_COMPAT64) && in_compat_syscall(); }
#define select_attr (in_compat64_syscall() ? &cattr : attr) #define attr_size (in_compat64_syscall() ? sizeof(union compat_bpf_attr) : \ sizeof(union bpf_attr))
I don't think these macros are necessary (also note that macros that make function calls should themselves be function macros to make it clear they are not constants). We could have a local variable for each instead.
#define bpfptr_copy (in_compat64_syscall() ? copy_from_bpfptr : \ copy_from_bpfptr_with_ptr) static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, bpfptr_t uattr, unsigned int *size) { union compat_bpf_attr cattr; int err;
err = bpf_check_uarg_tail_zero(uattr, attr_size, *size); if (err) return err; *size = min_t(u32, *size, attr_size);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */ memset(select_attr, 0, attr_size); if (bpfptr_copy(select_attr, uattr, *size) != 0) return -EFAULT;
err = check_attr(cmd, select_attr); if (err) return -EINVAL;
if(in_compat64_syscall()) convert_compat_bpf_attr(attr, &cattr, cmd);
return 0; } #undef select_attr #undef attr_size #undef bpfptr_copy
static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; int err;
err = bpf_check_perms(cmd); if (err) return err;
err = copy_bpf_attr_from_user(&attr, cmd, uattr, &size); if (err) return err;
return dispatch_bpf(cmd, &attr, uattr, size); }
- It would avoid overriding the handler in sys_compat64.c, which is not
ideal as it makes it hard to figure out what handler is actually used. The situation would be different if we were adding a "generic" compat handler, as in that case we could just change include/uapi/asm-generic/unistd.h (the normal way to do this).
I don't follow that - it adds a handler in sys_compat64.c, and the handler in bpf/syscall.c is clearly guarded by #ifdef CONFIG_COMPAT64. We can even add a comment stating the compat handler is for compat64 only.
Sure, we could do that. My point is that ideally we shouldn't use this mechanism as it's a hack - it makes it hard to tell how a syscall is handled in compat64, unless you already know where to look (unlike the regular mechanism, where include/uapi/asm-generic/unistd.h makes it clear).
With that approach, a further small improvement can be made. On the same model as Tudor's "io_uring: Implement compat versions of uAPI structs and handle them", __sys_bpf() could call a generic function, say copy_bpf_attr_from_user(). This function would then call the conversion function in compat64. Besides a better encapsulation, the main advantage is that the union compat_bpf_attr variable is allocated in the conversion function, and it is therefore live only for the duration of the conversion. Conversely, allocating it in __sys_compat_bpf() means it lives for the whole duration of the syscall handling. In other words, it slightly improves the stack usage. Not essential but nice to have. I think that approach could also be used in patch 5-8.
Makes sense. That should be possible with the __sys_compat_bpf() handler too, just call out into another small helper function.
Finally to reduce the amount of #ifdef, we could have a macro like io_in_compat64() that only returns true in compat64. This allows keeping most of the code out of any #ifdef, assuming struct definitions and such are not #ifdef'd either (which is not particularly useful anyway).
I had preferred the other option just because everything compat64 is behind an #ifdef; so in the CONFIG_CHERI_PURECAP_UABI=n case it's as if compat64 doesn't even exist. We don't allocate, define or otherwise do anything compat64 related when CONFIG_COMPAT64 is disabled. So what's the problem with #ifdef? Maybe that is not that useful since I doubt anyone is using this kernel without PCuABI. Just seems a bit "cleaner" and straight forward to me, at the cost of some minor duplication.
There's no real point in #ifdef'ing declarations, as they have no impact on codegen. However #ifdef's have a negative impact on readability, and they should especially be avoided in the middle of a big function. Since we can use some macro magic to replace the explicit #ifdef's with no difference in codegen, we could make patch 5-8 more readable without much downside.
Either way should work really so I don't know if you have strong feelings one way or the other after seeing the above. I'm leaning towards __sys_compat_bpf() as before but I might have mis-understood some stuff or not implemented it correctly.
Hopefully I've clarified a few aspects! No strong preference either but it does feel like just one handler is more straightforward.
Kevin
On 23/08/2023 15:02, Kevin Brodsky wrote:
On 18/08/2023 17:15, Zachary Leaf wrote:
- case BPF_PROG_BIND_MAP:
dest->prog_bind_map.prog_fd = cattr->prog_bind_map.prog_fd;
dest->prog_bind_map.map_fd = cattr->prog_bind_map.map_fd;
dest->prog_bind_map.flags = cattr->prog_bind_map.flags;
break;
- };
Considering the really large number of assignments, I think it would be worth introducing a macro that takes dest, cattr and the name of the member to copy. That would shorten things a bit for nested structs, but most importantly it would prevent typos slipping in.
That's a good idea.
I've added a generic copy_field() macro in include/linux/stddef.h:
#define copy_field(DEST, SRC, FIELD) \ ((DEST)->FIELD = (SRC)->FIELD)
I wasn't thinking about adding it to a generic header but considering that it is indeed a completely generic macro, I don't see why not! stddef.h looks like a reasonable place to add it. Usual comment: macro arguments should not be capitalised.
All macro arguments in include/linux/stddef.h are capitalised so I've made it consistent with that.
A similar macro for pointer conversion could be introduced in the last patch (which would also hide the very noisy cast to __kernel_aligned_uintptr_t).
This is a bit more tricky to name and figure out where it should live. So far I have compat_uptr_to_kern() in arch/arm64/include/asm/compat.h next to compat_ptr() define just for lack of a better place. It's kind of generic (not bpf related) but also pretty specific at the same time, hence hard to name without the name being really long. Maybe we just make it bpf specific and limit it to bpf/syscall.c + bpf/btf.c where it's used.
I think we should indeed do that, it would have its place in bpf_compat.h. You are right that it is somewhat generic as we need to do this everywhere we introduced compat64 handling (e.g. io_uring), but the number of conversions is sufficiently low there that the most readable is probably to leave them as-is.
One comment on the naming, "to_kern" doesn't make much sense - it suggests converting to a kernel pointer, which is not what happens. __kernel_aligned_uintptr_t is thus named because it is a uapi type, like __kernel_long_t for instance - that is "kernel" means a kernel type here, nothing else. Maybe bpf_uattr_compat_ptr(), combining compat_ptr() and bpf_put_uattr()?
Thanks. bpf_uattr_compat_ptr() now in bpf_compat.h.
#define compat_uptr_to_kern(DEST, SRC, FIELD) \ ((DEST)->FIELD = (__kernel_aligned_uintptr_t)compat_ptr((SRC)->FIELD))
[...]
+COMPAT_SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) +{
- return __sys_compat_bpf(cmd, USER_BPFPTR(uattr), size);
+}
Adding a compat handler is the "normal" way to handle this situation, so the approach in this patch is sound. However, it seems preferable to keep a native handler only and use in_compat_syscall() in __sys_bpf() for a couple of reasons:
- AFAICT the diff in this file would be a lot smaller - just a few lines
changed in __sys_bpf(), and of course the new conversion function.
It's a bit more compact - but there's some extra handling to make sure we're selecting the right size and union for everything:
The principle looks good to me, I think the diff should be quite clear too. Some comments inline.
inline bool in_compat64_syscall() { // generic - can go in arch/arm64/include/asm/compat.h
Why not, this could be used in a few other places too. That said if we do this we should make it consistent with [1], in fact Kristina mentioned the idea of adding 64-bit helpers in the commit message.
[1] https://git.morello-project.org/morello/kernel/linux/-/commit/fd6902f3c5266
New patches in v2 to introduce the compat64 helpers consistent with above. Also renamed in_32bit_compat_syscall() to in_compat32_syscall() to shorten things up a bit.
return IS_ENABLED(CONFIG_COMPAT64) && in_compat_syscall(); }
#define select_attr (in_compat64_syscall() ? &cattr : attr) #define attr_size (in_compat64_syscall() ? sizeof(union compat_bpf_attr) : \ sizeof(union bpf_attr))
I don't think these macros are necessary (also note that macros that make function calls should themselves be function macros to make it clear they are not constants). We could have a local variable for each instead.
Ack - looks better.
#define bpfptr_copy (in_compat64_syscall() ? copy_from_bpfptr : \ copy_from_bpfptr_with_ptr) static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, bpfptr_t uattr, unsigned int *size) { union compat_bpf_attr cattr; int err;
err = bpf_check_uarg_tail_zero(uattr, attr_size, *size); if (err) return err; *size = min_t(u32, *size, attr_size);
/* copy attributes from user space, may be less than sizeof(bpf_attr) */ memset(select_attr, 0, attr_size); if (bpfptr_copy(select_attr, uattr, *size) != 0) return -EFAULT;
err = check_attr(cmd, select_attr); if (err) return -EINVAL;
if(in_compat64_syscall()) convert_compat_bpf_attr(attr, &cattr, cmd);
return 0; } #undef select_attr #undef attr_size #undef bpfptr_copy
static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; int err;
err = bpf_check_perms(cmd); if (err) return err;
err = copy_bpf_attr_from_user(&attr, cmd, uattr, &size); if (err) return err;
return dispatch_bpf(cmd, &attr, uattr, size); }
- It would avoid overriding the handler in sys_compat64.c, which is not
ideal as it makes it hard to figure out what handler is actually used. The situation would be different if we were adding a "generic" compat handler, as in that case we could just change include/uapi/asm-generic/unistd.h (the normal way to do this).
I don't follow that - it adds a handler in sys_compat64.c, and the handler in bpf/syscall.c is clearly guarded by #ifdef CONFIG_COMPAT64. We can even add a comment stating the compat handler is for compat64 only.
Sure, we could do that. My point is that ideally we shouldn't use this mechanism as it's a hack - it makes it hard to tell how a syscall is handled in compat64, unless you already know where to look (unlike the regular mechanism, where include/uapi/asm-generic/unistd.h makes it clear).
With that approach, a further small improvement can be made. On the same model as Tudor's "io_uring: Implement compat versions of uAPI structs and handle them", __sys_bpf() could call a generic function, say copy_bpf_attr_from_user(). This function would then call the conversion function in compat64. Besides a better encapsulation, the main advantage is that the union compat_bpf_attr variable is allocated in the conversion function, and it is therefore live only for the duration of the conversion. Conversely, allocating it in __sys_compat_bpf() means it lives for the whole duration of the syscall handling. In other words, it slightly improves the stack usage. Not essential but nice to have. I think that approach could also be used in patch 5-8.
Implemented this for bpf_attr and in bpf_xyz_info as well.
Makes sense. That should be possible with the __sys_compat_bpf() handler too, just call out into another small helper function.
Finally to reduce the amount of #ifdef, we could have a macro like io_in_compat64() that only returns true in compat64. This allows keeping most of the code out of any #ifdef, assuming struct definitions and such are not #ifdef'd either (which is not particularly useful anyway).
I had preferred the other option just because everything compat64 is behind an #ifdef; so in the CONFIG_CHERI_PURECAP_UABI=n case it's as if compat64 doesn't even exist. We don't allocate, define or otherwise do anything compat64 related when CONFIG_COMPAT64 is disabled. So what's the problem with #ifdef? Maybe that is not that useful since I doubt anyone is using this kernel without PCuABI. Just seems a bit "cleaner" and straight forward to me, at the cost of some minor duplication.
There's no real point in #ifdef'ing declarations, as they have no impact on codegen. However #ifdef's have a negative impact on readability, and they should especially be avoided in the middle of a big function. Since we can use some macro magic to replace the explicit #ifdef's with no difference in codegen, we could make patch 5-8 more readable without much downside.
The point I originally missed is that in the case of !IS_ENABLED(CONFIG_COMPAT64) where we have something like:
if (in_compat64_syscall()) f(arg1, arg2);
then the compiler is smart enough to compile this whole statement + the entirety of f() out. No need to explicitly exclude it with the #ifdefs. This is assuming that f() is marked as static and only called in the context of the above if statement.
v2 is a complete flip in that regard, gone from manually/explicitly excluding everything with #ifdef to using in_compat64_syscall() and relying on compiler to remove things for us.
I know that's how it was done in io_uring but it makes a lot more sense now.
Either way should work really so I don't know if you have strong feelings one way or the other after seeing the above. I'm leaning towards __sys_compat_bpf() as before but I might have mis-understood some stuff or not implemented it correctly.
Hopefully I've clarified a few aspects! No strong preference either but it does feel like just one handler is more straightforward.
For sure, thanks - it's definitely an improvement. Changes to existing functions are minimised + removes a lot of duplication.
Thanks, Zach
Kevin _______________________________________________ linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
On 12/09/2023 11:22, Zachary Leaf wrote:
- case BPF_PROG_BIND_MAP:
dest->prog_bind_map.prog_fd = cattr->prog_bind_map.prog_fd;
dest->prog_bind_map.map_fd = cattr->prog_bind_map.map_fd;
dest->prog_bind_map.flags = cattr->prog_bind_map.flags;
break;
- };
Considering the really large number of assignments, I think it would be worth introducing a macro that takes dest, cattr and the name of the member to copy. That would shorten things a bit for nested structs, but most importantly it would prevent typos slipping in.
That's a good idea.
I've added a generic copy_field() macro in include/linux/stddef.h:
#define copy_field(DEST, SRC, FIELD) \ ((DEST)->FIELD = (SRC)->FIELD)
I wasn't thinking about adding it to a generic header but considering that it is indeed a completely generic macro, I don't see why not! stddef.h looks like a reasonable place to add it. Usual comment: macro arguments should not be capitalised.
All macro arguments in include/linux/stddef.h are capitalised so I've made it consistent with that.
Ah yes that makes sense. Looks like offsetof() has had capitalised arguments since <forever>, and as a result everything else added to that header followed the same convention. I doubt there is much logic to it, but local consistency always wins!
Kevin
There are two ways that the kernel writes back to __user *bpf_attr unions and each requires different compat64 handling: 1) Where the union member to be written to is a __user *ptr e.g. a pointer to some memory buffer 2) Writing to a normal non-ptr field e.g. __u32 prog_cnt
In the first instance, no additional conversion out or handling is required. When converting inbound each __user *ptr is simply copied from union compat_bpf_attr into a new union bpf_attr and this can be written to directly with copy_to_user().
For the second case, the kernel writes back out directly to user memory for a specific field. This is problematic for compat64 where the field offsets are different between native/compat64. That means if compat64 is enabled and we're in a compat syscall, we need to cast uattr to correctly select the correct member field/offset to write to.
The "technically correct" approach is probably to replace all functions with a union bpf_attr __user * parameter with void __user * (at least where we pass uattr.user in dispatch_bpf() and propagate this down). This forces us to check the type and cast before we use uattr, since the type is technically unknown. This avoids any potential mis-use by not checking for compat64 before trying to access a ptr with native union offsets. The downside of this approach is we would end up touching many function signatures as uattr propagates down the call stack (currently 73).
The minimal diff approach used here is to handle only specifically where we write directly to uattr with a fairly bad macro (it's not ideal but I've seen worse). This greatly reduces the diff size at the cost of having compat_bpf_attr unions passed around as union bpf_attr and needing to remember to cast back where required.
The instances where we write out to uattr were mostly found by grepping 'copy_to_user(&uattr' + 'put_user(.*uattr' so the list may or may not be complete. Luckily this doesn't appear to happen that often and usage seems to be consistent.
We can also replace all instances of copy_to_user() here with put_user() which somewhat simplifies things.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- drivers/media/rc/bpf-lirc.c | 5 +++-- include/linux/bpf_compat.h | 17 +++++++++++++++++ kernel/bpf/cgroup.c | 5 +++-- kernel/bpf/hashtab.c | 5 +++-- kernel/bpf/net_namespace.c | 5 +++-- kernel/bpf/syscall.c | 22 +++++++++++----------- net/bpf/bpf_dummy_struct_ops.c | 3 ++- net/bpf/test_run.c | 16 ++++++++-------- net/core/sock_map.c | 5 +++-- 9 files changed, 53 insertions(+), 30 deletions(-)
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index fe17c7f98e81..9988232aeab5 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -4,6 +4,7 @@ // Copyright (C) 2018 Sean Young sean@mess.org
#include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/filter.h> #include <linux/bpf_lirc.h> #include "rc-core-priv.h" @@ -319,12 +320,12 @@ int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) progs = lirc_rcu_dereference(rcdev->raw->progs); cnt = progs ? bpf_prog_array_length(progs) : 0;
- if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) { + if (PUT_USER_UATTR(cnt, query.prog_cnt)) { ret = -EFAULT; goto unlock; }
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) { + if (PUT_USER_UATTR(flags, query.attach_flags)) { ret = -EFAULT; goto unlock; } diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index cc12f2e3b204..3d5f6bd2cd46 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -1,6 +1,23 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2023 Arm Ltd */
+/* + * This isn't great but the cast is necessary to select the correct member + * offset when writing out to userspace for compat64 + */ +#define __PUT_USER_UATTR(TYPE, X, TO_FIELD) \ + (put_user(X, &(((TYPE)uattr)->TO_FIELD))) + +#ifdef CONFIG_COMPAT64 +#define PUT_USER_UATTR(X, TO_FIELD) \ + (in_compat_syscall() ? \ + __PUT_USER_UATTR(union compat_bpf_attr __user *, X, TO_FIELD) : \ + __PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD)) +#else +#define PUT_USER_UATTR(X, TO_FIELD) \ + __PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD) +#endif + #ifdef CONFIG_COMPAT64
union compat_bpf_attr { diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index bf2fdb33fb31..5f9d13848986 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -14,6 +14,7 @@ #include <linux/string.h> #include <linux/bpf.h> #include <linux/bpf-cgroup.h> +#include <linux/bpf_compat.h> #include <linux/bpf_lsm.h> #include <linux/bpf_verifier.h> #include <net/sock.h> @@ -1061,9 +1062,9 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
/* always output uattr->query.attach_flags as 0 during effective query */ flags = effective_query ? 0 : flags; - if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) + if (PUT_USER_UATTR(flags, query.attach_flags)) return -EFAULT; - if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt))) + if (PUT_USER_UATTR(total_cnt, query.prog_cnt)) return -EFAULT; if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt) /* return early if user requested only program count + flags */ diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index f39ee3e05589..5cf4ddc2a433 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -3,6 +3,7 @@ * Copyright (c) 2016 Facebook */ #include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf.h> #include <linux/jhash.h> #include <linux/filter.h> @@ -1686,7 +1687,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, if (!max_count) return 0;
- if (put_user(0, &uattr->batch.count)) + if (PUT_USER_UATTR(0, batch.count)) return -EFAULT;
batch = 0; @@ -1867,7 +1868,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* copy # of entries and next batch */ ubatch = u64_to_user_ptr(attr->batch.out_batch); if (copy_to_user(ubatch, &batch, sizeof(batch)) || - put_user(total, &uattr->batch.count)) + PUT_USER_UATTR(total, batch.count)) ret = -EFAULT;
out: diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c index 868cc2c43899..3bfc97b37774 100644 --- a/kernel/bpf/net_namespace.c +++ b/kernel/bpf/net_namespace.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/bpf-netns.h> #include <linux/filter.h> #include <net/net_namespace.h> @@ -257,9 +258,9 @@ static int __netns_bpf_prog_query(const union bpf_attr *attr, if (run_array) prog_cnt = bpf_prog_array_length(run_array);
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) + if (PUT_USER_UATTR(flags, query.attach_flags)) return -EFAULT; - if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt))) + if (PUT_USER_UATTR(prog_cnt, query.prog_cnt)) return -EFAULT; if (!attr->query.prog_cnt || !prog_ids || !prog_cnt) return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 818ca8b63295..4c86b1b23f36 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1601,7 +1601,7 @@ int generic_map_delete_batch(struct bpf_map *map, break; cond_resched(); } - if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp))) + if (PUT_USER_UATTR(cp, batch.count)) err = -EFAULT;
kvfree(key); @@ -1662,7 +1662,7 @@ int generic_map_update_batch(struct bpf_map *map, cond_resched(); }
- if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp))) + if (PUT_USER_UATTR(cp, batch.count)) err = -EFAULT;
kvfree(value); @@ -1698,7 +1698,7 @@ int generic_map_lookup_batch(struct bpf_map *map, if (!max_count) return 0;
- if (put_user(0, &uattr->batch.count)) + if (PUT_USER_UATTR(0, batch.count)) return -EFAULT;
buf_prevkey = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN); @@ -1763,8 +1763,8 @@ int generic_map_lookup_batch(struct bpf_map *map, if (err == -EFAULT) goto free_buf;
- if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) || - (cp && copy_to_user(uobatch, prev_key, map->key_size)))) + if (PUT_USER_UATTR(cp, batch.count) || + (cp && copy_to_user(uobatch, prev_key, map->key_size))) err = -EFAULT;
free_buf: @@ -3657,7 +3657,7 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr, spin_unlock_bh(lock);
if (!err) - err = put_user(next_id, &uattr->next_id); + err = PUT_USER_UATTR(next_id, next_id);
return err; } @@ -4348,7 +4348,7 @@ static int bpf_task_fd_query_copy(const union bpf_attr *attr, u32 len = buf ? strlen(buf) : 0, input_len; int err = 0;
- if (put_user(len, &uattr->task_fd_query.buf_len)) + if (PUT_USER_UATTR(len, task_fd_query.buf_len)) return -EFAULT; input_len = attr->task_fd_query.buf_len; if (input_len && ubuf) { @@ -4376,10 +4376,10 @@ static int bpf_task_fd_query_copy(const union bpf_attr *attr, } }
- if (put_user(prog_id, &uattr->task_fd_query.prog_id) || - put_user(fd_type, &uattr->task_fd_query.fd_type) || - put_user(probe_offset, &uattr->task_fd_query.probe_offset) || - put_user(probe_addr, &uattr->task_fd_query.probe_addr)) + if (PUT_USER_UATTR(prog_id, task_fd_query.prog_id) || + PUT_USER_UATTR(fd_type, task_fd_query.fd_type) || + PUT_USER_UATTR(probe_offset, task_fd_query.probe_offset) || + PUT_USER_UATTR(probe_addr, task_fd_query.probe_addr)) return -EFAULT;
return err; diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index e78dadfc5829..20cecfda5188 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -5,6 +5,7 @@ #include <linux/kernel.h> #include <linux/bpf_verifier.h> #include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf.h>
extern struct bpf_struct_ops bpf_bpf_dummy_ops; @@ -131,7 +132,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, err = dummy_ops_copy_args(args); if (err) goto out; - if (put_user(prog_ret, &uattr->test.retval)) + if (PUT_USER_UATTR(prog_ret, test.retval)) err = -EFAULT; out: kfree(args); diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index fcb3e6c5e03c..58a5d02c5a8e 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -2,6 +2,7 @@ /* Copyright (c) 2017 Facebook */ #include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf.h> #include <linux/btf_ids.h> #include <linux/slab.h> @@ -464,11 +465,11 @@ static int bpf_test_finish(const union bpf_attr *kattr, } }
- if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size))) + if (PUT_USER_UATTR(size, test.data_size_out)) goto out; - if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval))) + if (PUT_USER_UATTR(retval, test.retval)) goto out; - if (copy_to_user(&uattr->test.duration, &duration, sizeof(duration))) + if (PUT_USER_UATTR(duration, test.duration)) goto out; if (err != -ENOSPC) err = 0; @@ -822,7 +823,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog, }
retval = ((u32)side_effect << 16) | ret; - if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval))) + if (PUT_USER_UATTR(retval, test.retval)) goto out;
err = 0; @@ -897,8 +898,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, } put_cpu();
- if (!err && - copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32))) + if (!err && PUT_USER_UATTR(info.retval, test.retval)) err = -EFAULT;
kfree(info.ctx); @@ -954,7 +954,7 @@ static int bpf_ctx_finish(const union bpf_attr *kattr,
if (copy_to_user(data_out, data, copy_size)) goto out; - if (copy_to_user(&uattr->test.ctx_size_out, &size, sizeof(size))) + if (PUT_USER_UATTR(size, test.ctx_size_out)) goto out; if (err != -ENOSPC) err = 0; @@ -1632,7 +1632,7 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, retval = bpf_prog_run_pin_on_cpu(prog, ctx); rcu_read_unlock_trace();
- if (copy_to_user(&uattr->test.retval, &retval, sizeof(u32))) { + if (PUT_USER_UATTR(retval, test.retval)) { err = -EFAULT; goto out; } diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 81beb16ab1eb..e476159a53c7 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -2,6 +2,7 @@ /* Copyright (c) 2017 - 2018 Covalent IO, Inc. http://covalent.io */
#include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf_ids.h> #include <linux/filter.h> #include <linux/errno.h> @@ -1525,9 +1526,9 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr, end: rcu_read_unlock();
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)) || + if (PUT_USER_UATTR(flags, query.attach_flags) || (id != 0 && copy_to_user(prog_ids, &id, sizeof(u32))) || - copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt))) + PUT_USER_UATTR(prog_cnt, query.prog_cnt)) ret = -EFAULT;
fdput(f);
On 08/06/2023 10:42, Zachary Leaf wrote:
There are two ways that the kernel writes back to __user *bpf_attr unions and each requires different compat64 handling:
- Where the union member to be written to is a __user *ptr e.g. a pointer to some memory buffer
- Writing to a normal non-ptr field e.g. __u32 prog_cnt
In the first instance, no additional conversion out or handling is required. When converting inbound each __user *ptr is simply copied from union compat_bpf_attr into a new union bpf_attr and this can be written to directly with copy_to_user().
It took me a while but I think eventually I managed to get what you're trying to say: in that case, because there is a pointer in the struct, what we're getting from userspace is a compat layout, which we have to convert into a native struct and then we operate on that.
Assuming I got this right, this prompts another question: in the second case, how is the layout different at all between compat and native? I can't really grasp these two situations as described. My initial expectation would have been more or less the opposite: if you have pointers, then the layout is different, and when writing out you will have to do something: either write to a (temporary) native struct and then convert before writing to userspace, or write directly at the right offset with the sort of macro magic you've got in this patch. I don't immediately see how the field you're writing to makes a difference, rather the fields *before* do (as they change the offset). I'm probably missing at least a couple of things though!
For the second case, the kernel writes back out directly to user memory for a specific field. This is problematic for compat64 where the field offsets are different between native/compat64. That means if compat64 is enabled and we're in a compat syscall, we need to cast uattr to correctly select the correct member field/offset to write to.
The "technically correct" approach is probably to replace all functions with a union bpf_attr __user * parameter with void __user * (at leasta where we pass uattr.user in dispatch_bpf() and propagate this down). This forces us to check the type and cast before we use uattr, since the type is technically unknown. This avoids any potential mis-use by not checking for compat64 before trying to access a ptr with native union offsets. The downside of this approach is we would end up touching many function signatures as uattr propagates down the call stack (currently 73).
The minimal diff approach used here is to handle only specifically where we write directly to uattr with a fairly bad macro (it's not ideal but I've seen worse). This greatly reduces the diff size at the cost of having compat_bpf_attr unions passed around as union bpf_attr and needing to remember to cast back where required.
The instances where we write out to uattr were mostly found by grepping 'copy_to_user(&uattr' + 'put_user(.*uattr' so the list may or may not be complete. Luckily this doesn't appear to happen that often and usage seems to be consistent.
We can also replace all instances of copy_to_user() here with put_user() which somewhat simplifies things.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
[...]
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index cc12f2e3b204..3d5f6bd2cd46 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -1,6 +1,23 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2023 Arm Ltd */ +/*
- This isn't great but the cast is necessary to select the correct member
- offset when writing out to userspace for compat64
- */
+#define __PUT_USER_UATTR(TYPE, X, TO_FIELD) \
- (put_user(X, &(((TYPE)uattr)->TO_FIELD)))
A few things: - Public identifiers are typically loosely namespaced, this one is very generic. Adding "bpf" somewhere in the name would help. - Macro arguments are not normally capitalised in the kernel. In fact, for such a function-like macro, I wouldn't capitalise its name either. - Using implicit arguments like uattr here is highly discouraged for any macro that's part of a public API (i.e. not a macro that's used locally in a couple of functions). It saves some typing but hides interactions with the argument (it's quite common to look at how a variable is used by grepping it, and uattr wouldn't show up here). For that matter I find the existing CHECK_ATTR() pretty borderline, but at least it is local to syscall.c.
Also I think these macros could go directly in bpf.h - they are not specific to compat, they just have a different implementation in compat.
Kevin
+#ifdef CONFIG_COMPAT64 +#define PUT_USER_UATTR(X, TO_FIELD) \
- (in_compat_syscall() ? \
__PUT_USER_UATTR(union compat_bpf_attr __user *, X, TO_FIELD) : \
__PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD))
+#else +#define PUT_USER_UATTR(X, TO_FIELD) \
- __PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD)
+#endif
[...]
On 12/07/2023 12:19, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
There are two ways that the kernel writes back to __user *bpf_attr unions and each requires different compat64 handling:
- Where the union member to be written to is a __user *ptr e.g. a pointer to some memory buffer
- Writing to a normal non-ptr field e.g. __u32 prog_cnt
In the first instance, no additional conversion out or handling is required. When converting inbound each __user *ptr is simply copied from union compat_bpf_attr into a new union bpf_attr and this can be written to directly with copy_to_user().
It took me a while but I think eventually I managed to get what you're trying to say: in that case, because there is a pointer in the struct, what we're getting from userspace is a compat layout, which we have to convert into a native struct and then we operate on that.
What you say is correct, but it's not what I was getting at.
Assume you have converted some compat struct, containing both __user ptrs and normal members, into a struct with native offsets into the kernel. You then have two types of members in the native struct after that - __user pointers, for example, the location of some memory buffer in userspace and then normal non-ptr members.
When you write out to the __user pointer you don't need to interact with or worry about the struct offsets anymore as you have a direct pointer to some user memory somewhere. You just do copy_to_user() on that __user pointer and start writing directly to the memory buffer wherever it is in userspace memory. The compat and native __user pointer is the same - it's just an address and writing to that address doesn't matter what offsets we got that address from.
If you're trying to write back to some specific field in the struct e.g. uattr->some_int in the struct, you now need to interact with or ensure you're writing at the correct offsets and do all this casting magic.
Assuming I got this right, this prompts another question: in the second case, how is the layout different at all between compat and native? I can't really grasp these two situations as described. My initial expectation would have been more or less the opposite: if you have pointers, then the layout is different, and when writing out you will have to do something: either write to a (temporary) native struct and then convert before writing to userspace, or write directly at the right offset with the sort of macro magic you've got in this patch. I don't immediately see how the field you're writing to makes a difference, rather the fields *before* do (as they change the offset). I'm probably missing at least a couple of things though!
I think you're seeing the distinction here as structs with pointers and structs without pointers, but really it's about how you can can write back out to some memory and where it is.
With non-ptr members, the memory you're writing to is inside the struct, so you need to ensure offsets are correct to write to correct location.
With user ptr members, the memory you're writing to is outside the struct, so you can just write directly to it, you don't need to worry about struct offsets anymore.
Hence writing out to __user pointer members doesn't require any conversion out. For everything else you need to do this magic. I'll try and rework the commit message to hopefully clear this up.
For the second case, the kernel writes back out directly to user memory for a specific field. This is problematic for compat64 where the field offsets are different between native/compat64. That means if compat64 is enabled and we're in a compat syscall, we need to cast uattr to correctly select the correct member field/offset to write to.
The "technically correct" approach is probably to replace all functions with a union bpf_attr __user * parameter with void __user * (at leasta where we pass uattr.user in dispatch_bpf() and propagate this down). This forces us to check the type and cast before we use uattr, since the type is technically unknown. This avoids any potential mis-use by not checking for compat64 before trying to access a ptr with native union offsets. The downside of this approach is we would end up touching many function signatures as uattr propagates down the call stack (currently 73).
The minimal diff approach used here is to handle only specifically where we write directly to uattr with a fairly bad macro (it's not ideal but I've seen worse). This greatly reduces the diff size at the cost of having compat_bpf_attr unions passed around as union bpf_attr and needing to remember to cast back where required.
The instances where we write out to uattr were mostly found by grepping 'copy_to_user(&uattr' + 'put_user(.*uattr' so the list may or may not be complete. Luckily this doesn't appear to happen that often and usage seems to be consistent.
We can also replace all instances of copy_to_user() here with put_user() which somewhat simplifies things.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
[...]
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index cc12f2e3b204..3d5f6bd2cd46 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -1,6 +1,23 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2023 Arm Ltd */ +/*
- This isn't great but the cast is necessary to select the correct member
- offset when writing out to userspace for compat64
- */
+#define __PUT_USER_UATTR(TYPE, X, TO_FIELD) \
- (put_user(X, &(((TYPE)uattr)->TO_FIELD)))
A few things:
- Public identifiers are typically loosely namespaced, this one is very
generic. Adding "bpf" somewhere in the name would help.
- Macro arguments are not normally capitalised in the kernel. In fact,
for such a function-like macro, I wouldn't capitalise its name either.
- Using implicit arguments like uattr here is highly discouraged for any
macro that's part of a public API (i.e. not a macro that's used locally in a couple of functions). It saves some typing but hides interactions with the argument (it's quite common to look at how a variable is used by grepping it, and uattr wouldn't show up here). For that matter I find the existing CHECK_ATTR() pretty borderline, but at least it is local to syscall.c.
Also I think these macros could go directly in bpf.h - they are not specific to compat, they just have a different implementation in compat.
ACK on all this.
Thanks,
Zach
Kevin
+#ifdef CONFIG_COMPAT64 +#define PUT_USER_UATTR(X, TO_FIELD) \
- (in_compat_syscall() ? \
__PUT_USER_UATTR(union compat_bpf_attr __user *, X, TO_FIELD) : \
__PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD))
+#else +#define PUT_USER_UATTR(X, TO_FIELD) \
- __PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD)
+#endif
[...]
On 13/07/2023 09:17, Zachary Leaf wrote:
On 12/07/2023 12:19, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
There are two ways that the kernel writes back to __user *bpf_attr unions and each requires different compat64 handling:
- Where the union member to be written to is a __user *ptr e.g. a pointer to some memory buffer
- Writing to a normal non-ptr field e.g. __u32 prog_cnt
In the first instance, no additional conversion out or handling is required. When converting inbound each __user *ptr is simply copied from union compat_bpf_attr into a new union bpf_attr and this can be written to directly with copy_to_user().
It took me a while but I think eventually I managed to get what you're trying to say: in that case, because there is a pointer in the struct, what we're getting from userspace is a compat layout, which we have to convert into a native struct and then we operate on that.
What you say is correct, but it's not what I was getting at.
Assume you have converted some compat struct, containing both __user ptrs and normal members, into a struct with native offsets into the kernel. You then have two types of members in the native struct after that - __user pointers, for example, the location of some memory buffer in userspace and then normal non-ptr members.
When you write out to the __user pointer you don't need to interact with or worry about the struct offsets anymore as you have a direct pointer to some user memory somewhere. You just do copy_to_user() on that __user pointer and start writing directly to the memory buffer wherever it is in userspace memory. The compat and native __user pointer is the same - it's just an address and writing to that address doesn't matter what offsets we got that address from.
If you're trying to write back to some specific field in the struct e.g. uattr->some_int in the struct, you now need to interact with or ensure you're writing at the correct offsets and do all this casting magic.
Assuming I got this right, this prompts another question: in the second case, how is the layout different at all between compat and native? I can't really grasp these two situations as described. My initial expectation would have been more or less the opposite: if you have pointers, then the layout is different, and when writing out you will have to do something: either write to a (temporary) native struct and then convert before writing to userspace, or write directly at the right offset with the sort of macro magic you've got in this patch. I don't immediately see how the field you're writing to makes a difference, rather the fields *before* do (as they change the offset). I'm probably missing at least a couple of things though!
I think you're seeing the distinction here as structs with pointers and structs without pointers, but really it's about how you can can write back out to some memory and where it is.
With non-ptr members, the memory you're writing to is inside the struct, so you need to ensure offsets are correct to write to correct location.
Sure, I was just trying to make sense of my wrong interpretation so that got me on a tangent :) Clearly you do want to have this sort of macro to write to struct members regardless of where they are in the struct (i.e. even if they have the same offset in native and compat), to make things reasonably robust and consistent.
With user ptr members, the memory you're writing to is outside the struct, so you can just write directly to it, you don't need to worry about struct offsets anymore.
Hence writing out to __user pointer members doesn't require any conversion out. For everything else you need to do this magic. I'll try and rework the commit message to hopefully clear this up.
Right, that was the critical bit I was missing: for pointer members, it is always *the pointer target* that is written to. In that case clearly the struct layout is irrelevant, as you clarified above. In hindsight it is rather unlikely for this syscall that the *pointer itself* would be written to (where would that new user pointer come from?), but it is still worth making the language more precise in the commit message. "Writing to a pointer" can be interpreted both as writing to the pointer itself, or writing to its target.
Thinking further, there is a more fundamental reason why I found that wording confusing: if you are writing to the pointer target, you are not writing back to the user-provided union bpf_attr (uattr) at all. Rather you are extracting the user pointer from the copied-in (and potentially converted) union bpf_attr (attr), and then writing through that. That's why I didn't think about that situation, as it is fundamentally different. To make the commit message more intelligible, I would rather make a note about this aspect at the end of it, and start with what the patch is actually addressing.
Kevin
On 13/07/2023 09:14, Kevin Brodsky wrote:
On 13/07/2023 09:17, Zachary Leaf wrote:
On 12/07/2023 12:19, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
There are two ways that the kernel writes back to __user *bpf_attr unions and each requires different compat64 handling:
- Where the union member to be written to is a __user *ptr e.g. a pointer to some memory buffer
- Writing to a normal non-ptr field e.g. __u32 prog_cnt
In the first instance, no additional conversion out or handling is required. When converting inbound each __user *ptr is simply copied from union compat_bpf_attr into a new union bpf_attr and this can be written to directly with copy_to_user().
It took me a while but I think eventually I managed to get what you're trying to say: in that case, because there is a pointer in the struct, what we're getting from userspace is a compat layout, which we have to convert into a native struct and then we operate on that.
What you say is correct, but it's not what I was getting at.
Assume you have converted some compat struct, containing both __user ptrs and normal members, into a struct with native offsets into the kernel. You then have two types of members in the native struct after that - __user pointers, for example, the location of some memory buffer in userspace and then normal non-ptr members.
When you write out to the __user pointer you don't need to interact with or worry about the struct offsets anymore as you have a direct pointer to some user memory somewhere. You just do copy_to_user() on that __user pointer and start writing directly to the memory buffer wherever it is in userspace memory. The compat and native __user pointer is the same - it's just an address and writing to that address doesn't matter what offsets we got that address from.
If you're trying to write back to some specific field in the struct e.g. uattr->some_int in the struct, you now need to interact with or ensure you're writing at the correct offsets and do all this casting magic.
Assuming I got this right, this prompts another question: in the second case, how is the layout different at all between compat and native? I can't really grasp these two situations as described. My initial expectation would have been more or less the opposite: if you have pointers, then the layout is different, and when writing out you will have to do something: either write to a (temporary) native struct and then convert before writing to userspace, or write directly at the right offset with the sort of macro magic you've got in this patch. I don't immediately see how the field you're writing to makes a difference, rather the fields *before* do (as they change the offset). I'm probably missing at least a couple of things though!
I think you're seeing the distinction here as structs with pointers and structs without pointers, but really it's about how you can can write back out to some memory and where it is.
With non-ptr members, the memory you're writing to is inside the struct, so you need to ensure offsets are correct to write to correct location.
Sure, I was just trying to make sense of my wrong interpretation so that got me on a tangent :) Clearly you do want to have this sort of macro to write to struct members regardless of where they are in the struct (i.e. even if they have the same offset in native and compat), to make things reasonably robust and consistent.
With user ptr members, the memory you're writing to is outside the struct, so you can just write directly to it, you don't need to worry about struct offsets anymore.
Hence writing out to __user pointer members doesn't require any conversion out. For everything else you need to do this magic. I'll try and rework the commit message to hopefully clear this up.
Right, that was the critical bit I was missing: for pointer members, it is always *the pointer target* that is written to. In that case clearly the struct layout is irrelevant, as you clarified above. In hindsight it is rather unlikely for this syscall that the *pointer itself* would be written to (where would that new user pointer come from?), but it is still worth making the language more precise in the commit message. "Writing to a pointer" can be interpreted both as writing to the pointer itself, or writing to its target.
Thinking further, there is a more fundamental reason why I found that wording confusing: if you are writing to the pointer target, you are not writing back to the user-provided union bpf_attr (uattr) at all. Rather you are extracting the user pointer from the copied-in (and potentially converted) union bpf_attr (attr), and then writing through that. That's why I didn't think about that situation, as it is fundamentally different. To make the commit message more intelligible, I would rather make a note about this aspect at the end of it, and start with what the patch is actually addressing.
Right, there's a distinction there between pointer target and pointer itself. In hindsight it could all have been clearer. Thanks for the comments. Will have another go at the commit message in v2.
Thanks, Zach
Kevin
On 12/07/2023 12:19, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
There are two ways that the kernel writes back to __user *bpf_attr unions and each requires different compat64 handling:
- Where the union member to be written to is a __user *ptr e.g. a pointer to some memory buffer
- Writing to a normal non-ptr field e.g. __u32 prog_cnt
In the first instance, no additional conversion out or handling is required. When converting inbound each __user *ptr is simply copied from union compat_bpf_attr into a new union bpf_attr and this can be written to directly with copy_to_user().
It took me a while but I think eventually I managed to get what you're trying to say: in that case, because there is a pointer in the struct, what we're getting from userspace is a compat layout, which we have to convert into a native struct and then we operate on that.
Assuming I got this right, this prompts another question: in the second case, how is the layout different at all between compat and native? I can't really grasp these two situations as described. My initial expectation would have been more or less the opposite: if you have pointers, then the layout is different, and when writing out you will have to do something: either write to a (temporary) native struct and then convert before writing to userspace, or write directly at the right offset with the sort of macro magic you've got in this patch. I don't immediately see how the field you're writing to makes a difference, rather the fields *before* do (as they change the offset). I'm probably missing at least a couple of things though!
For the second case, the kernel writes back out directly to user memory for a specific field. This is problematic for compat64 where the field offsets are different between native/compat64. That means if compat64 is enabled and we're in a compat syscall, we need to cast uattr to correctly select the correct member field/offset to write to.
The "technically correct" approach is probably to replace all functions with a union bpf_attr __user * parameter with void __user * (at leasta where we pass uattr.user in dispatch_bpf() and propagate this down). This forces us to check the type and cast before we use uattr, since the type is technically unknown. This avoids any potential mis-use by not checking for compat64 before trying to access a ptr with native union offsets. The downside of this approach is we would end up touching many function signatures as uattr propagates down the call stack (currently 73).
The minimal diff approach used here is to handle only specifically where we write directly to uattr with a fairly bad macro (it's not ideal but I've seen worse). This greatly reduces the diff size at the cost of having compat_bpf_attr unions passed around as union bpf_attr and needing to remember to cast back where required.
The instances where we write out to uattr were mostly found by grepping 'copy_to_user(&uattr' + 'put_user(.*uattr' so the list may or may not be complete. Luckily this doesn't appear to happen that often and usage seems to be consistent.
We can also replace all instances of copy_to_user() here with put_user() which somewhat simplifies things.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
[...]
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index cc12f2e3b204..3d5f6bd2cd46 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -1,6 +1,23 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2023 Arm Ltd */ +/*
- This isn't great but the cast is necessary to select the correct member
- offset when writing out to userspace for compat64
- */
+#define __PUT_USER_UATTR(TYPE, X, TO_FIELD) \
- (put_user(X, &(((TYPE)uattr)->TO_FIELD)))
A few things:
- Public identifiers are typically loosely namespaced, this one is very
generic. Adding "bpf" somewhere in the name would help.
- Macro arguments are not normally capitalised in the kernel. In fact,
for such a function-like macro, I wouldn't capitalise its name either.
- Using implicit arguments like uattr here is highly discouraged for any
macro that's part of a public API (i.e. not a macro that's used locally in a couple of functions). It saves some typing but hides interactions with the argument (it's quite common to look at how a variable is used by grepping it, and uattr wouldn't show up here). For that matter I find the existing CHECK_ATTR() pretty borderline, but at least it is local to syscall.c.
Also I think these macros could go directly in bpf.h - they are not specific to compat, they just have a different implementation in compat.
Ack to all the above.
Turns out I also missed some places where we write to userspace using copy_to_bpfptr_offset() - in v2 there's another macro called bpfptr_put_uattr() to handle these cases.
Thanks,
Zach
Kevin
+#ifdef CONFIG_COMPAT64 +#define PUT_USER_UATTR(X, TO_FIELD) \
- (in_compat_syscall() ? \
__PUT_USER_UATTR(union compat_bpf_attr __user *, X, TO_FIELD) : \
__PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD))
+#else +#define PUT_USER_UATTR(X, TO_FIELD) \
- __PUT_USER_UATTR(union bpf_attr __user *, X, TO_FIELD)
+#endif
[...]
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
1. Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets.
2. Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpf_compat.h | 9 ++++ kernel/bpf/btf.c | 91 +++++++++++++++++++++++++++++++++++--- kernel/bpf/syscall.c | 8 ---- 3 files changed, 93 insertions(+), 15 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 3d5f6bd2cd46..6882322cefc2 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -287,5 +287,14 @@ union compat_bpf_attr {
} __attribute__((aligned(8)));
+struct compat_bpf_btf_info { + __aligned_u64 btf; + __u32 btf_size; + __u32 id; + __aligned_u64 name; + __u32 name_len; + __u32 kernel_btf; +} __attribute__((aligned(8))); + #endif /* CONFIG_COMPAT64 */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 35c07afac924..25cb5231f4df 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -16,6 +16,7 @@ #include <linux/kernel.h> #include <linux/idr.h> #include <linux/sort.h> +#include <linux/bpf_compat.h> #include <linux/bpf_verifier.h> #include <linux/btf.h> #include <linux/btf_ids.h> @@ -6942,6 +6943,30 @@ struct btf *btf_get_by_fd(int fd) return btf; }
+#ifdef CONFIG_COMPAT64 +void convert_compat_btf_info_in(struct bpf_btf_info *dest, + struct compat_bpf_btf_info *cinfo) +{ + dest->btf = cinfo->btf; + dest->btf_size = cinfo->btf_size; + dest->id = cinfo->id; + dest->name = cinfo->name; + dest->name_len = cinfo->name_len; + dest->kernel_btf = cinfo->kernel_btf; +} + +void convert_compat_btf_info_out(struct compat_bpf_btf_info *dest, + struct bpf_btf_info *info) +{ + dest->btf = info->btf; + dest->btf_size = info->btf_size; + dest->id = info->id; + dest->name = info->name; + dest->name_len = info->name_len; + dest->kernel_btf = info->kernel_btf; +} +#endif /* CONFIG_COMPAT64 */ + int btf_get_info_by_fd(const struct btf *btf, const union bpf_attr *attr, union bpf_attr __user *uattr) @@ -6953,15 +6978,56 @@ int btf_get_info_by_fd(const struct btf *btf, char __user *uname; u32 uinfo_len, uname_len, name_len; int ret = 0; +#ifdef CONFIG_COMPAT64 + struct compat_bpf_btf_info __user *cuinfo; + struct compat_bpf_btf_info cinfo; + union compat_bpf_attr __user *cuattr = + (union compat_bpf_attr __user *)uattr; + u32 cuinfo_len;
- uinfo = u64_to_user_ptr(attr->info.info); - uinfo_len = attr->info.info_len; + /* + * for compat64, uattr has already been converted and copied into + * attr - however attr->info.info is a ptr to uapi struct bpf_btf_info, + * which also needs converting + * + * convert bpf_btf_info coming in, and later reverse this conversion + * when writing back out to userspace + */ + if (in_compat_syscall()) { + cuinfo = (struct compat_bpf_btf_info __user *)attr->info.info; + cuinfo_len = attr->info.info_len;
- info_copy = min_t(u32, uinfo_len, sizeof(info)); - memset(&info, 0, sizeof(info)); - if (copy_from_user(&info, uinfo, info_copy)) - return -EFAULT; + ret = bpf_check_uarg_tail_zero(USER_BPFPTR(cuinfo), + sizeof(cinfo), cuinfo_len); + if (ret) + return ret; + + info_copy = min_t(u32, cuinfo_len, sizeof(cinfo)); + if (copy_from_user_with_ptr(&cinfo, cuinfo, info_copy)) + return -EFAULT; + memset(&info, 0, sizeof(info)); + convert_compat_btf_info_in(&info, &cinfo); + } else +#endif + { + uinfo = u64_to_user_ptr(attr->info.info); + uinfo_len = attr->info.info_len; + + ret = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), + sizeof(info), uinfo_len); + if (ret) + return ret;
+ info_copy = min_t(u32, uinfo_len, sizeof(info)); + memset(&info, 0, sizeof(info)); + if (copy_from_user_with_ptr(&info, uinfo, info_copy)) + return -EFAULT; + } + /* + * struct btf found in kernel/bpf/btf.c is kernel only + * copy relevant fields into bpf_btf_info, no need to convert + * anything in compat64 case + */ info.id = btf->id; ubtf = u64_to_user_ptr(info.btf); btf_copy = min_t(u32, btf->data_size, info.btf_size); @@ -6995,7 +7061,18 @@ int btf_get_info_by_fd(const struct btf *btf, } }
- if (copy_to_user(uinfo, &info, info_copy) || +#ifdef CONFIG_COMPAT64 + if (in_compat_syscall()) { + convert_compat_btf_info_out(&cinfo, &info); + if (copy_to_user_with_ptr(cuinfo, &cinfo, info_copy)) + return -EFAULT; + if (put_user(info_copy, &cuattr->info.info_len)) + return -EFAULT; + + return ret; + } +#endif + if (copy_to_user_with_ptr(uinfo, &info, info_copy) || put_user(info_copy, &uattr->info.info_len)) return -EFAULT;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4c86b1b23f36..d06700136e1f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4230,14 +4230,6 @@ static int bpf_btf_get_info_by_fd(struct file *file, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info); - u32 info_len = attr->info.info_len; - int err; - - err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(*uinfo), info_len); - if (err) - return err; - return btf_get_info_by_fd(btf, attr, uattr); }
On 08/06/2023 10:42, Zachary Leaf wrote:
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets.
Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
The commit message is helpful, thanks, but having exactly the same one in patch 5-8 is less so. It begs the question of whether it really makes sense for these patches to be separate. I realise that merging them would create a big diff, but considering that each patch does pretty much the same thing for each struct there is not much complexity involved, and having it all in one patch would be consistent.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpf_compat.h | 9 ++++ kernel/bpf/btf.c | 91 +++++++++++++++++++++++++++++++++++--- kernel/bpf/syscall.c | 8 ---- 3 files changed, 93 insertions(+), 15 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 3d5f6bd2cd46..6882322cefc2 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -287,5 +287,14 @@ union compat_bpf_attr { } __attribute__((aligned(8))); +struct compat_bpf_btf_info {
- __aligned_u64 btf;
- __u32 btf_size;
- __u32 id;
- __aligned_u64 name;
- __u32 name_len;
- __u32 kernel_btf;
+} __attribute__((aligned(8)));
#endif /* CONFIG_COMPAT64 */ diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 35c07afac924..25cb5231f4df 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -16,6 +16,7 @@ #include <linux/kernel.h> #include <linux/idr.h> #include <linux/sort.h> +#include <linux/bpf_compat.h> #include <linux/bpf_verifier.h> #include <linux/btf.h> #include <linux/btf_ids.h> @@ -6942,6 +6943,30 @@ struct btf *btf_get_by_fd(int fd) return btf; } +#ifdef CONFIG_COMPAT64 +void convert_compat_btf_info_in(struct bpf_btf_info *dest,
struct compat_bpf_btf_info *cinfo)
Nit: the input pointer could be const. Same in similar conversion functions.
+{
- dest->btf = cinfo->btf;
- dest->btf_size = cinfo->btf_size;
- dest->id = cinfo->id;
- dest->name = cinfo->name;
- dest->name_len = cinfo->name_len;
- dest->kernel_btf = cinfo->kernel_btf;
+}
+void convert_compat_btf_info_out(struct compat_bpf_btf_info *dest,
struct bpf_btf_info *info)
+{
- dest->btf = info->btf;
- dest->btf_size = info->btf_size;
- dest->id = info->id;
- dest->name = info->name;
- dest->name_len = info->name_len;
- dest->kernel_btf = info->kernel_btf;
+} +#endif /* CONFIG_COMPAT64 */
int btf_get_info_by_fd(const struct btf *btf, const union bpf_attr *attr, union bpf_attr __user *uattr) @@ -6953,15 +6978,56 @@ int btf_get_info_by_fd(const struct btf *btf, char __user *uname; u32 uinfo_len, uname_len, name_len; int ret = 0; +#ifdef CONFIG_COMPAT64
- struct compat_bpf_btf_info __user *cuinfo;
- struct compat_bpf_btf_info cinfo;
- union compat_bpf_attr __user *cuattr =
(union compat_bpf_attr __user *)uattr;
- u32 cuinfo_len;
- uinfo = u64_to_user_ptr(attr->info.info);
- uinfo_len = attr->info.info_len;
- /*
* for compat64, uattr has already been converted and copied into
* attr - however attr->info.info is a ptr to uapi struct bpf_btf_info,
* which also needs converting
*
* convert bpf_btf_info coming in, and later reverse this conversion
* when writing back out to userspace
*/
- if (in_compat_syscall()) {
cuinfo = (struct compat_bpf_btf_info __user *)attr->info.info;
This is not a valid cast at this stage, as info.info is still a 64-bit integer. I'd just use u64_to_user_ptr() and remove it in the final patch like the others.
cuinfo_len = attr->info.info_len;
- info_copy = min_t(u32, uinfo_len, sizeof(info));
- memset(&info, 0, sizeof(info));
- if (copy_from_user(&info, uinfo, info_copy))
return -EFAULT;
ret = bpf_check_uarg_tail_zero(USER_BPFPTR(cuinfo),
sizeof(cinfo), cuinfo_len);
if (ret)
return ret;
info_copy = min_t(u32, cuinfo_len, sizeof(cinfo));
if (copy_from_user_with_ptr(&cinfo, cuinfo, info_copy))
return -EFAULT;
memset(&info, 0, sizeof(info));
convert_compat_btf_info_in(&info, &cinfo);
- } else
+#endif
- {
uinfo = u64_to_user_ptr(attr->info.info);
uinfo_len = attr->info.info_len;
ret = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo),
sizeof(info), uinfo_len);
if (ret)
return ret;
info_copy = min_t(u32, uinfo_len, sizeof(info));
memset(&info, 0, sizeof(info));
if (copy_from_user_with_ptr(&info, uinfo, info_copy))
return -EFAULT;
- }
- /*
* struct btf found in kernel/bpf/btf.c is kernel only
* copy relevant fields into bpf_btf_info, no need to convert
* anything in compat64 case
*/
Not sure this comment is adding much, it doesn't make much difference that the data comes from struct btf or somewhere else.
Kevin
info.id = btf->id; ubtf = u64_to_user_ptr(info.btf); btf_copy = min_t(u32, btf->data_size, info.btf_size); @@ -6995,7 +7061,18 @@ int btf_get_info_by_fd(const struct btf *btf, } }
- if (copy_to_user(uinfo, &info, info_copy) ||
+#ifdef CONFIG_COMPAT64
- if (in_compat_syscall()) {
convert_compat_btf_info_out(&cinfo, &info);
if (copy_to_user_with_ptr(cuinfo, &cinfo, info_copy))
return -EFAULT;
if (put_user(info_copy, &cuattr->info.info_len))
return -EFAULT;
return ret;
- }
+#endif
- if (copy_to_user_with_ptr(uinfo, &info, info_copy) || put_user(info_copy, &uattr->info.info_len)) return -EFAULT;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4c86b1b23f36..d06700136e1f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4230,14 +4230,6 @@ static int bpf_btf_get_info_by_fd(struct file *file, const union bpf_attr *attr, union bpf_attr __user *uattr) {
- struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info);
- u32 info_len = attr->info.info_len;
- int err;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(*uinfo), info_len);
- if (err)
return err;
- return btf_get_info_by_fd(btf, attr, uattr);
}
On 12/07/2023 12:20, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets.
Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
The commit message is helpful, thanks, but having exactly the same one in patch 5-8 is less so. It begs the question of whether it really makes sense for these patches to be separate. I realise that merging them would create a big diff, but considering that each patch does pretty much the same thing for each struct there is not much complexity involved, and having it all in one patch would be consistent.
No problem. As per the cover letter, really it was just to make it easier to review. I agree these bpf_xyz_info can be merged, especially now that bpf_map_info patch has been removed.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpf_compat.h | 9 ++++ kernel/bpf/btf.c | 91 +++++++++++++++++++++++++++++++++++--- kernel/bpf/syscall.c | 8 ---- 3 files changed, 93 insertions(+), 15 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 3d5f6bd2cd46..6882322cefc2 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -287,5 +287,14 @@ union compat_bpf_attr { } __attribute__((aligned(8))); +struct compat_bpf_btf_info {
- __aligned_u64 btf;
- __u32 btf_size;
- __u32 id;
- __aligned_u64 name;
- __u32 name_len;
- __u32 kernel_btf;
+} __attribute__((aligned(8)));
#endif /* CONFIG_COMPAT64 */ diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 35c07afac924..25cb5231f4df 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -16,6 +16,7 @@ #include <linux/kernel.h> #include <linux/idr.h> #include <linux/sort.h> +#include <linux/bpf_compat.h> #include <linux/bpf_verifier.h> #include <linux/btf.h> #include <linux/btf_ids.h> @@ -6942,6 +6943,30 @@ struct btf *btf_get_by_fd(int fd) return btf; } +#ifdef CONFIG_COMPAT64 +void convert_compat_btf_info_in(struct bpf_btf_info *dest,
struct compat_bpf_btf_info *cinfo)
Nit: the input pointer could be const. Same in similar conversion functions.
Ack
+{
- dest->btf = cinfo->btf;
- dest->btf_size = cinfo->btf_size;
- dest->id = cinfo->id;
- dest->name = cinfo->name;
- dest->name_len = cinfo->name_len;
- dest->kernel_btf = cinfo->kernel_btf;
+}
+void convert_compat_btf_info_out(struct compat_bpf_btf_info *dest,
struct bpf_btf_info *info)
+{
- dest->btf = info->btf;
- dest->btf_size = info->btf_size;
- dest->id = info->id;
- dest->name = info->name;
- dest->name_len = info->name_len;
- dest->kernel_btf = info->kernel_btf;
+} +#endif /* CONFIG_COMPAT64 */
int btf_get_info_by_fd(const struct btf *btf, const union bpf_attr *attr, union bpf_attr __user *uattr) @@ -6953,15 +6978,56 @@ int btf_get_info_by_fd(const struct btf *btf, char __user *uname; u32 uinfo_len, uname_len, name_len; int ret = 0; +#ifdef CONFIG_COMPAT64
- struct compat_bpf_btf_info __user *cuinfo;
- struct compat_bpf_btf_info cinfo;
- union compat_bpf_attr __user *cuattr =
(union compat_bpf_attr __user *)uattr;
- u32 cuinfo_len;
- uinfo = u64_to_user_ptr(attr->info.info);
- uinfo_len = attr->info.info_len;
- /*
* for compat64, uattr has already been converted and copied into
* attr - however attr->info.info is a ptr to uapi struct bpf_btf_info,
* which also needs converting
*
* convert bpf_btf_info coming in, and later reverse this conversion
* when writing back out to userspace
*/
- if (in_compat_syscall()) {
cuinfo = (struct compat_bpf_btf_info __user *)attr->info.info;
This is not a valid cast at this stage, as info.info is still a 64-bit integer. I'd just use u64_to_user_ptr() and remove it in the final patch like the others.
Ack
cuinfo_len = attr->info.info_len;
- info_copy = min_t(u32, uinfo_len, sizeof(info));
- memset(&info, 0, sizeof(info));
- if (copy_from_user(&info, uinfo, info_copy))
return -EFAULT;
ret = bpf_check_uarg_tail_zero(USER_BPFPTR(cuinfo),
sizeof(cinfo), cuinfo_len);
if (ret)
return ret;
info_copy = min_t(u32, cuinfo_len, sizeof(cinfo));
if (copy_from_user_with_ptr(&cinfo, cuinfo, info_copy))
return -EFAULT;
memset(&info, 0, sizeof(info));
convert_compat_btf_info_in(&info, &cinfo);
- } else
+#endif
- {
uinfo = u64_to_user_ptr(attr->info.info);
uinfo_len = attr->info.info_len;
ret = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo),
sizeof(info), uinfo_len);
if (ret)
return ret;
info_copy = min_t(u32, uinfo_len, sizeof(info));
memset(&info, 0, sizeof(info));
if (copy_from_user_with_ptr(&info, uinfo, info_copy))
return -EFAULT;
- }
- /*
* struct btf found in kernel/bpf/btf.c is kernel only
* copy relevant fields into bpf_btf_info, no need to convert
* anything in compat64 case
*/
Not sure this comment is adding much, it doesn't make much difference that the data comes from struct btf or somewhere else.
Ack
Kevin
info.id = btf->id; ubtf = u64_to_user_ptr(info.btf); btf_copy = min_t(u32, btf->data_size, info.btf_size); @@ -6995,7 +7061,18 @@ int btf_get_info_by_fd(const struct btf *btf, } }
- if (copy_to_user(uinfo, &info, info_copy) ||
+#ifdef CONFIG_COMPAT64
- if (in_compat_syscall()) {
convert_compat_btf_info_out(&cinfo, &info);
if (copy_to_user_with_ptr(cuinfo, &cinfo, info_copy))
return -EFAULT;
if (put_user(info_copy, &cuattr->info.info_len))
return -EFAULT;
return ret;
- }
+#endif
- if (copy_to_user_with_ptr(uinfo, &info, info_copy) || put_user(info_copy, &uattr->info.info_len)) return -EFAULT;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4c86b1b23f36..d06700136e1f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4230,14 +4230,6 @@ static int bpf_btf_get_info_by_fd(struct file *file, const union bpf_attr *attr, union bpf_attr __user *uattr) {
- struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info);
- u32 info_len = attr->info.info_len;
- int err;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(*uinfo), info_len);
- if (err)
return err;
- return btf_get_info_by_fd(btf, attr, uattr);
}
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
1. Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets.
2. Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpf_compat.h | 42 ++++++++++ kernel/bpf/syscall.c | 154 ++++++++++++++++++++++++++++++++++--- 2 files changed, 186 insertions(+), 10 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 6882322cefc2..34aa707b021e 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -287,6 +287,48 @@ union compat_bpf_attr {
} __attribute__((aligned(8)));
+struct compat_bpf_prog_info { + __u32 type; + __u32 id; + __u8 tag[BPF_TAG_SIZE]; + __u32 jited_prog_len; + __u32 xlated_prog_len; + __aligned_u64 jited_prog_insns; + __aligned_u64 xlated_prog_insns; + __u64 load_time; /* ns since boottime */ + __u32 created_by_uid; + __u32 nr_map_ids; + __aligned_u64 map_ids; + char name[BPF_OBJ_NAME_LEN]; + __u32 ifindex; + __u32 gpl_compatible:1; + __u32:31; /* alignment pad */ + __u64 netns_dev; + __u64 netns_ino; + __u32 nr_jited_ksyms; + __u32 nr_jited_func_lens; + __aligned_u64 jited_ksyms; + __aligned_u64 jited_func_lens; + __u32 btf_id; + __u32 func_info_rec_size; + __aligned_u64 func_info; + __u32 nr_func_info; + __u32 nr_line_info; + __aligned_u64 line_info; + __aligned_u64 jited_line_info; + __u32 nr_jited_line_info; + __u32 line_info_rec_size; + __u32 jited_line_info_rec_size; + __u32 nr_prog_tags; + __aligned_u64 prog_tags; + __u64 run_time_ns; + __u64 run_cnt; + __u64 recursion_misses; + __u32 verified_insns; + __u32 attach_btf_obj_id; + __u32 attach_btf_id; +} __attribute__((aligned(8))); + struct compat_bpf_btf_info { __aligned_u64 btf; __u32 btf_size; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index d06700136e1f..929b515c56ae 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3891,28 +3891,152 @@ static int set_info_rec_size(struct bpf_prog_info *info) return 0; }
+#ifdef CONFIG_COMPAT64 +void convert_compat_prog_info_in(struct bpf_prog_info *dest, + struct compat_bpf_prog_info *cinfo) +{ + dest->type = cinfo->type; + dest->id = cinfo->id; + strncpy((char *)dest->tag, (char *)cinfo->tag, BPF_TAG_SIZE); + dest->jited_prog_len = cinfo->jited_prog_len; + dest->xlated_prog_len = cinfo->xlated_prog_len; + dest->jited_prog_insns = cinfo->jited_prog_insns; + dest->xlated_prog_insns = cinfo->xlated_prog_insns; + dest->load_time = cinfo->load_time; + dest->created_by_uid = cinfo->created_by_uid; + dest->nr_map_ids = cinfo->nr_map_ids; + dest->map_ids = cinfo->map_ids; + strncpy((char *)dest->name, (char *)cinfo->name, BPF_OBJ_NAME_LEN); + dest->ifindex = cinfo->ifindex; + dest->gpl_compatible = cinfo->gpl_compatible; + dest->netns_dev = cinfo->netns_dev; + dest->netns_ino = cinfo->netns_ino; + dest->nr_jited_ksyms = cinfo->nr_jited_ksyms; + dest->nr_jited_func_lens = cinfo->nr_jited_func_lens; + dest->jited_ksyms = cinfo->jited_ksyms; + dest->jited_func_lens = cinfo->jited_func_lens; + dest->btf_id = cinfo->btf_id; + dest->func_info_rec_size = cinfo->func_info_rec_size; + dest->func_info = cinfo->func_info; + dest->nr_func_info = cinfo->nr_func_info; + dest->nr_line_info = cinfo->nr_line_info; + dest->line_info = cinfo->line_info; + dest->jited_line_info = cinfo->jited_line_info; + dest->nr_jited_line_info = cinfo->nr_jited_line_info; + dest->line_info_rec_size = cinfo->line_info_rec_size; + dest->jited_line_info_rec_size = cinfo->jited_line_info_rec_size; + dest->nr_prog_tags = cinfo->nr_prog_tags; + dest->prog_tags = cinfo->prog_tags; + dest->run_time_ns = cinfo->run_time_ns; + dest->run_cnt = cinfo->run_cnt; + dest->recursion_misses = cinfo->recursion_misses; + dest->verified_insns = cinfo->verified_insns; + dest->attach_btf_obj_id = cinfo->attach_btf_obj_id; + dest->attach_btf_id = cinfo->attach_btf_id; +} + +void convert_compat_prog_info_out(struct compat_bpf_prog_info *dest, + struct bpf_prog_info *info) +{ + dest->type = info->type; + dest->id = info->id; + strncpy((char *)dest->tag, (char *)info->tag, BPF_TAG_SIZE); + dest->jited_prog_len = info->jited_prog_len; + dest->xlated_prog_len = info->xlated_prog_len; + dest->jited_prog_insns = info->jited_prog_insns; + dest->xlated_prog_insns = info->xlated_prog_insns; + dest->load_time = info->load_time; + dest->created_by_uid = info->created_by_uid; + dest->nr_map_ids = info->nr_map_ids; + dest->map_ids = info->map_ids; + strncpy((char *)dest->name, (char *)info->name, BPF_OBJ_NAME_LEN); + dest->ifindex = info->ifindex; + dest->gpl_compatible = info->gpl_compatible; + dest->netns_dev = info->netns_dev; + dest->netns_ino = info->netns_ino; + dest->nr_jited_ksyms = info->nr_jited_ksyms; + dest->nr_jited_func_lens = info->nr_jited_func_lens; + dest->jited_ksyms = info->jited_ksyms; + dest->jited_func_lens = info->jited_func_lens; + dest->btf_id = info->btf_id; + dest->func_info_rec_size = info->func_info_rec_size; + dest->func_info = info->func_info; + dest->nr_func_info = info->nr_func_info; + dest->nr_line_info = info->nr_line_info; + dest->line_info = info->line_info; + dest->jited_line_info = info->jited_line_info; + dest->nr_jited_line_info = info->nr_jited_line_info; + dest->line_info_rec_size = info->line_info_rec_size; + dest->jited_line_info_rec_size = info->jited_line_info_rec_size; + dest->nr_prog_tags = info->nr_prog_tags; + dest->prog_tags = info->prog_tags; + dest->run_time_ns = info->run_time_ns; + dest->run_cnt = info->run_cnt; + dest->recursion_misses = info->recursion_misses; + dest->verified_insns = info->verified_insns; + dest->attach_btf_obj_id = info->attach_btf_obj_id; + dest->attach_btf_id = info->attach_btf_id; +} + +#endif /* CONFIG_COMPAT64 */ + static int bpf_prog_get_info_by_fd(struct file *file, struct bpf_prog *prog, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_prog_info __user *uinfo = u64_to_user_ptr(attr->info.info); + struct bpf_prog_info __user *uinfo; struct btf *attach_btf = bpf_prog_get_target_btf(prog); struct bpf_prog_info info; - u32 info_len = attr->info.info_len; + u32 info_len; struct bpf_prog_kstats stats; char __user *uinsns; u32 ulen; int err; +#ifdef CONFIG_COMPAT64 + struct compat_bpf_prog_info __user *cuinfo; + struct compat_bpf_prog_info cinfo; + union compat_bpf_attr __user *cuattr = + (union compat_bpf_attr __user *)uattr; + u32 cuinfo_len; + /* + * for compat64, uattr has already been converted and copied into + * attr - however attr->info.info is a ptr to uapi struct bpf_prog_info, + * which also needs converting + * + * convert bpf_prog_info coming in, and later reverse this conversion + * when writing back out to userspace + */ + if (in_compat_syscall()) { + cuinfo = (struct compat_bpf_prog_info __user *)attr->info.info; + cuinfo_len = attr->info.info_len;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); - if (err) - return err; - info_len = min_t(u32, sizeof(info), info_len); + err = bpf_check_uarg_tail_zero(USER_BPFPTR(cuinfo), + sizeof(cinfo), cuinfo_len); + if (err) + return err; + cuinfo_len = min_t(u32, sizeof(cinfo), cuinfo_len);
- memset(&info, 0, sizeof(info)); - if (copy_from_user(&info, uinfo, info_len)) - return -EFAULT; + if (copy_from_user_with_ptr(&cinfo, cuinfo, cuinfo_len)) + return -EFAULT; + memset(&info, 0, sizeof(info)); + convert_compat_prog_info_in(&info, &cinfo); + } else +#endif + { + uinfo = u64_to_user_ptr(attr->info.info); + info_len = attr->info.info_len; + + err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), + sizeof(info), info_len); + if (err) + return err; + info_len = min_t(u32, sizeof(info), info_len); + + memset(&info, 0, sizeof(info)); + if (copy_from_user_with_ptr(&info, uinfo, info_len)) + return -EFAULT; + }
info.type = prog->type; info.id = prog->aux->id; @@ -4173,7 +4297,17 @@ static int bpf_prog_get_info_by_fd(struct file *file, }
done: - if (copy_to_user(uinfo, &info, info_len) || +#ifdef CONFIG_COMPAT64 + if (in_compat_syscall()) { + convert_compat_prog_info_out(&cinfo, &info); + if (copy_to_user_with_ptr(cuinfo, &cinfo, cuinfo_len) || + put_user(cuinfo_len, &cuattr->info.info_len)) + return -EFAULT; + + return 0; + } +#endif + if (copy_to_user_with_ptr(uinfo, &info, info_len) || put_user(info_len, &uattr->info.info_len)) return -EFAULT;
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
1. Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets.
2. Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpf_compat.h | 19 ++++++++++ kernel/bpf/syscall.c | 75 +++++++++++++++++++++++++++++++++++--- 2 files changed, 88 insertions(+), 6 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 34aa707b021e..93a2fcab5746 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -329,6 +329,25 @@ struct compat_bpf_prog_info { __u32 attach_btf_id; } __attribute__((aligned(8)));
+struct compat_bpf_map_info { + __u32 type; + __u32 id; + __u32 key_size; + __u32 value_size; + __u32 max_entries; + __u32 map_flags; + char name[BPF_OBJ_NAME_LEN]; + __u32 ifindex; + __u32 btf_vmlinux_value_type_id; + __u64 netns_dev; + __u64 netns_ino; + __u32 btf_id; + __u32 btf_key_type_id; + __u32 btf_value_type_id; + __u32:32; /* alignment pad */ + __u64 map_extra; +} __attribute__((aligned(8))); + struct compat_bpf_btf_info { __aligned_u64 btf; __u32 btf_size; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 929b515c56ae..588e4d5d4077 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4314,20 +4314,74 @@ static int bpf_prog_get_info_by_fd(struct file *file, return 0; }
+#ifdef CONFIG_COMPAT64 +void convert_compat_map_info_out(struct compat_bpf_map_info *dest, + struct bpf_map_info *info) +{ + dest->type = info->type; + dest->id = info->id; + dest->key_size = info->key_size; + dest->value_size = info->value_size; + dest->max_entries = info->max_entries; + dest->map_flags = info->map_flags; + strncpy((char *)dest->name, (char *)info->name, BPF_OBJ_NAME_LEN); + dest->ifindex = info->ifindex; + dest->btf_vmlinux_value_type_id = info->btf_vmlinux_value_type_id; + dest->netns_dev = info->netns_dev; + dest->netns_ino = info->netns_ino; + dest->btf_id = info->btf_id; + dest->btf_key_type_id = info->btf_key_type_id; + dest->btf_value_type_id = info->btf_value_type_id; + dest->map_extra = info->map_extra; +} +#endif /* CONFIG_COMPAT64 */ + static int bpf_map_get_info_by_fd(struct file *file, struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_map_info __user *uinfo = u64_to_user_ptr(attr->info.info); + struct bpf_map_info __user *uinfo; struct bpf_map_info info; - u32 info_len = attr->info.info_len; + u32 info_len; int err; +#ifdef CONFIG_COMPAT64 + struct compat_bpf_map_info __user *cuinfo; + struct compat_bpf_map_info cinfo; + union compat_bpf_attr __user *cuattr = + (union compat_bpf_attr __user *)uattr; + u32 cuinfo_len; + /* + * for compat64, uattr has already been converted and copied into + * attr - however attr->info.info is a ptr to uapi struct bpf_map_info + * + * bpf_map_get_info_by_fd() does not use any info from the + * bpf_map_info struct passed in from userspace, so only need to + * convert to compat64 struct layout when writing back out to userspace + */ + if (in_compat_syscall()) { + cuinfo = (struct compat_bpf_map_info __user *)attr->info.info; + cuinfo_len = attr->info.info_len;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); - if (err) - return err; - info_len = min_t(u32, sizeof(info), info_len); + err = bpf_check_uarg_tail_zero(USER_BPFPTR(cuinfo), + sizeof(cinfo), cuinfo_len); + if (err) + return err; + + cuinfo_len = min_t(u32, sizeof(cinfo), cuinfo_len); + } else +#endif + { + uinfo = u64_to_user_ptr(attr->info.info); + info_len = attr->info.info_len; + + err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), + sizeof(info), info_len); + if (err) + return err; + + info_len = min_t(u32, sizeof(info), info_len); + }
memset(&info, 0, sizeof(info)); info.type = map->map_type; @@ -4352,6 +4406,15 @@ static int bpf_map_get_info_by_fd(struct file *file, return err; }
+#ifdef CONFIG_COMPAT64 + if (in_compat_syscall()) { + convert_compat_map_info_out(&cinfo, &info); + if (copy_to_user(cuinfo, &cinfo, cuinfo_len) || + put_user(cuinfo_len, &cuattr->info.info_len)) + return -EFAULT; + return 0; + } +#endif if (copy_to_user(uinfo, &info, info_len) || put_user(info_len, &uattr->info.info_len)) return -EFAULT;
On 08/06/2023 09:42, Zachary Leaf wrote:
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets.
Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpf_compat.h | 19 ++++++++++ kernel/bpf/syscall.c | 75 +++++++++++++++++++++++++++++++++++--- 2 files changed, 88 insertions(+), 6 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 34aa707b021e..93a2fcab5746 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -329,6 +329,25 @@ struct compat_bpf_prog_info { __u32 attach_btf_id; } __attribute__((aligned(8))); +struct compat_bpf_map_info {
- __u32 type;
- __u32 id;
- __u32 key_size;
- __u32 value_size;
- __u32 max_entries;
- __u32 map_flags;
- char name[BPF_OBJ_NAME_LEN];
- __u32 ifindex;
- __u32 btf_vmlinux_value_type_id;
- __u64 netns_dev;
- __u64 netns_ino;
- __u32 btf_id;
- __u32 btf_key_type_id;
- __u32 btf_value_type_id;
- __u32:32; /* alignment pad */
- __u64 map_extra;
+} __attribute__((aligned(8)));
I've realised this entire patch should be able to be skipped. Since the struct doesn't contain any pointers, the size should be the same in native PCuABI/compat64 and no conversion is required.
struct compat_bpf_btf_info { __aligned_u64 btf; __u32 btf_size; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 929b515c56ae..588e4d5d4077 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4314,20 +4314,74 @@ static int bpf_prog_get_info_by_fd(struct file *file, return 0; } +#ifdef CONFIG_COMPAT64 +void convert_compat_map_info_out(struct compat_bpf_map_info *dest,
struct bpf_map_info *info)
+{
- dest->type = info->type;
- dest->id = info->id;
- dest->key_size = info->key_size;
- dest->value_size = info->value_size;
- dest->max_entries = info->max_entries;
- dest->map_flags = info->map_flags;
- strncpy((char *)dest->name, (char *)info->name, BPF_OBJ_NAME_LEN);
- dest->ifindex = info->ifindex;
- dest->btf_vmlinux_value_type_id = info->btf_vmlinux_value_type_id;
- dest->netns_dev = info->netns_dev;
- dest->netns_ino = info->netns_ino;
- dest->btf_id = info->btf_id;
- dest->btf_key_type_id = info->btf_key_type_id;
- dest->btf_value_type_id = info->btf_value_type_id;
- dest->map_extra = info->map_extra;
+} +#endif /* CONFIG_COMPAT64 */
static int bpf_map_get_info_by_fd(struct file *file, struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) {
- struct bpf_map_info __user *uinfo = u64_to_user_ptr(attr->info.info);
- struct bpf_map_info __user *uinfo; struct bpf_map_info info;
- u32 info_len = attr->info.info_len;
- u32 info_len; int err;
+#ifdef CONFIG_COMPAT64
- struct compat_bpf_map_info __user *cuinfo;
- struct compat_bpf_map_info cinfo;
- union compat_bpf_attr __user *cuattr =
(union compat_bpf_attr __user *)uattr;
- u32 cuinfo_len;
- /*
* for compat64, uattr has already been converted and copied into
* attr - however attr->info.info is a ptr to uapi struct bpf_map_info
*
* bpf_map_get_info_by_fd() does not use any info from the
* bpf_map_info struct passed in from userspace, so only need to
* convert to compat64 struct layout when writing back out to userspace
*/
- if (in_compat_syscall()) {
cuinfo = (struct compat_bpf_map_info __user *)attr->info.info;
cuinfo_len = attr->info.info_len;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len);
- if (err)
return err;
- info_len = min_t(u32, sizeof(info), info_len);
err = bpf_check_uarg_tail_zero(USER_BPFPTR(cuinfo),
sizeof(cinfo), cuinfo_len);
if (err)
return err;
cuinfo_len = min_t(u32, sizeof(cinfo), cuinfo_len);
- } else
+#endif
- {
uinfo = u64_to_user_ptr(attr->info.info);
info_len = attr->info.info_len;
err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo),
sizeof(info), info_len);
if (err)
return err;
info_len = min_t(u32, sizeof(info), info_len);
- }
memset(&info, 0, sizeof(info)); info.type = map->map_type; @@ -4352,6 +4406,15 @@ static int bpf_map_get_info_by_fd(struct file *file, return err; } +#ifdef CONFIG_COMPAT64
- if (in_compat_syscall()) {
convert_compat_map_info_out(&cinfo, &info);
if (copy_to_user(cuinfo, &cinfo, cuinfo_len) ||
put_user(cuinfo_len, &cuattr->info.info_len))
return -EFAULT;
return 0;
- }
+#endif if (copy_to_user(uinfo, &info, info_len) || put_user(info_len, &uattr->info.info_len)) return -EFAULT;
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
1. Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets.
2. Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpf_compat.h | 52 ++++++++++++ kernel/bpf/syscall.c | 167 ++++++++++++++++++++++++++++++++++--- 2 files changed, 209 insertions(+), 10 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 93a2fcab5746..bdbc6a44f2bc 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -357,5 +357,57 @@ struct compat_bpf_btf_info { __u32 kernel_btf; } __attribute__((aligned(8)));
+struct compat_bpf_link_info { + __u32 type; + __u32 id; + __u32 prog_id; + union { + struct { + __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */ + __u32 tp_name_len; /* in/out: tp_name buffer len */ + } raw_tracepoint; + struct { + __u32 attach_type; + __u32 target_obj_id; /* prog_id for PROG_EXT, otherwise btf object id */ + __u32 target_btf_id; /* BTF type id inside the object */ + } tracing; + struct { + __u64 cgroup_id; + __u32 attach_type; + } cgroup; + struct { + __aligned_u64 target_name; /* in/out: target_name buffer ptr */ + __u32 target_name_len; /* in/out: target_name buffer len */ + + /* If the iter specific field is 32 bits, it can be put + * in the first or second union. Otherwise it should be + * put in the second union. + */ + union { + struct { + __u32 map_id; + } map; + }; + union { + struct { + __u64 cgroup_id; + __u32 order; + } cgroup; + struct { + __u32 tid; + __u32 pid; + } task; + }; + } iter; + struct { + __u32 netns_ino; + __u32 attach_type; + } netns; + struct { + __u32 ifindex; + } xdp; + }; +} __attribute__((aligned(8))); + #endif /* CONFIG_COMPAT64 */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 588e4d5d4077..cdba1e563688 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4430,24 +4430,161 @@ static int bpf_btf_get_info_by_fd(struct file *file, return btf_get_info_by_fd(btf, attr, uattr); }
+#ifdef CONFIG_COMPAT64 +void convert_compat_link_info_in(struct bpf_link_info *dest, + struct compat_bpf_link_info *cinfo, + enum bpf_link_type type) +{ + /* + * Only a few fields in bpf_link_info are used as input, the + * rest do not need conversion in + */ + if (type == BPF_LINK_TYPE_RAW_TRACEPOINT) { + dest->raw_tracepoint.tp_name = + cinfo->raw_tracepoint.tp_name; + dest->raw_tracepoint.tp_name_len = + cinfo->raw_tracepoint.tp_name_len; + return; + } + if (type == BPF_LINK_TYPE_ITER) { + dest->iter.target_name = cinfo->iter.target_name; + dest->iter.target_name_len = cinfo->iter.target_name_len; + return; + } +} + +/* is_iter_XYZ_target copied from tools/bpf/bpftool/link.c */ +static bool is_iter_map_target(const char *target_name) +{ + return strcmp(target_name, "bpf_map_elem") == 0 || + strcmp(target_name, "bpf_sk_storage_map") == 0; +} + +static bool is_iter_cgroup_target(const char *target_name) +{ + return strcmp(target_name, "cgroup") == 0; +} + +static bool is_iter_task_target(const char *target_name) +{ + return strcmp(target_name, "task") == 0 || + strcmp(target_name, "task_file") == 0 || + strcmp(target_name, "task_vma") == 0; +} + +void convert_compat_link_info_out(struct compat_bpf_link_info *dest, + struct bpf_link_info *info, + enum bpf_link_type type) +{ + const char *iter_target; + + dest->type = info->type; + dest->id = info->id; + dest->prog_id = info->prog_id; + + switch (type) { + case BPF_LINK_TYPE_RAW_TRACEPOINT: + dest->raw_tracepoint.tp_name = + info->raw_tracepoint.tp_name; + dest->raw_tracepoint.tp_name_len = + info->raw_tracepoint.tp_name_len; + return; + case BPF_LINK_TYPE_TRACING: + dest->tracing.attach_type = + info->tracing.attach_type; + dest->tracing.target_obj_id = + info->tracing.target_obj_id; + dest->tracing.target_btf_id = + info->tracing.target_btf_id; + return; + case BPF_LINK_TYPE_CGROUP: + dest->cgroup.cgroup_id = info->cgroup.cgroup_id; + dest->cgroup.attach_type = info->cgroup.attach_type; + return; + case BPF_LINK_TYPE_ITER: + dest->iter.target_name = info->iter.target_name; + dest->iter.target_name_len = info->iter.target_name_len; + + iter_target = + (const char *)(uintptr_t)info->iter.target_name; + if (is_iter_map_target(iter_target)) + dest->iter.map.map_id = info->iter.map.map_id; + else if (is_iter_cgroup_target(iter_target)) { + dest->iter.cgroup.cgroup_id = + info->iter.cgroup.cgroup_id; + dest->iter.cgroup.order = + info->iter.cgroup.order; + } else if (is_iter_task_target(iter_target)) { + dest->iter.task.tid = info->iter.task.tid; + dest->iter.task.pid = info->iter.task.pid; + } + return; + case BPF_LINK_TYPE_NETNS: + dest->netns.netns_ino = info->netns.netns_ino; + dest->netns.attach_type = info->netns.attach_type; + return; + case BPF_LINK_TYPE_XDP: + dest->xdp.ifindex = info->xdp.ifindex; + return; + default: + return; + } +} +#endif /* CONFIG_COMPAT64 */ + static int bpf_link_get_info_by_fd(struct file *file, struct bpf_link *link, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_link_info __user *uinfo = u64_to_user_ptr(attr->info.info); + struct bpf_link_info __user *uinfo; struct bpf_link_info info; - u32 info_len = attr->info.info_len; + u32 info_len; int err; +#ifdef CONFIG_COMPAT64 + struct compat_bpf_link_info __user *cuinfo; + struct compat_bpf_link_info cinfo; + union compat_bpf_attr __user *cuattr = + (union compat_bpf_attr __user *)uattr; + u32 cuinfo_len; + /* + * for compat64, uattr has already been converted and copied into + * attr - however attr->info.info is a ptr to uapi struct bpf_link_info, + * which also needs converting + * + * convert bpf_link_info coming in, and later reverse this conversion + * when writing back out to userspace + */ + if (in_compat_syscall()) { + cuinfo = (struct compat_bpf_link_info __user *)attr->info.info; + cuinfo_len = attr->info.info_len;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); - if (err) - return err; - info_len = min_t(u32, sizeof(info), info_len); + err = bpf_check_uarg_tail_zero(USER_BPFPTR(cuinfo), + sizeof(cinfo), cuinfo_len); + if (err) + return err; + cuinfo_len = min_t(u32, sizeof(cinfo), cuinfo_len);
- memset(&info, 0, sizeof(info)); - if (copy_from_user(&info, uinfo, info_len)) - return -EFAULT; + if (copy_from_user_with_ptr(&cinfo, cuinfo, cuinfo_len)) + return -EFAULT; + memset(&info, 0, sizeof(info)); + convert_compat_link_info_in(&info, &cinfo, link->type); + } else +#endif + { + uinfo = u64_to_user_ptr(attr->info.info); + info_len = attr->info.info_len; + + err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), + sizeof(info), info_len); + if (err) + return err; + info_len = min_t(u32, sizeof(info), info_len); + + memset(&info, 0, sizeof(info)); + if (copy_from_user_with_ptr(&info, uinfo, info_len)) + return -EFAULT; + }
info.type = link->type; info.id = link->id; @@ -4459,7 +4596,17 @@ static int bpf_link_get_info_by_fd(struct file *file, return err; }
- if (copy_to_user(uinfo, &info, info_len) || +#ifdef CONFIG_COMPAT64 + if (in_compat_syscall()) { + convert_compat_link_info_out(&cinfo, &info, link->type); + if (copy_to_user_with_ptr(cuinfo, &cinfo, cuinfo_len) || + put_user(cuinfo_len, &cuattr->info.info_len)) + return -EFAULT; + + return 0; + } +#endif + if (copy_to_user_with_ptr(uinfo, &info, info_len) || put_user(info_len, &uattr->info.info_len)) return -EFAULT;
On 08/06/2023 10:42, Zachary Leaf wrote:
[...]
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 588e4d5d4077..cdba1e563688 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4430,24 +4430,161 @@ static int bpf_btf_get_info_by_fd(struct file *file, return btf_get_info_by_fd(btf, attr, uattr); } +#ifdef CONFIG_COMPAT64 +void convert_compat_link_info_in(struct bpf_link_info *dest,
struct compat_bpf_link_info *cinfo,
enum bpf_link_type type)
+{
- /*
* Only a few fields in bpf_link_info are used as input, the
* rest do not need conversion in
*/
- if (type == BPF_LINK_TYPE_RAW_TRACEPOINT) {
dest->raw_tracepoint.tp_name =
cinfo->raw_tracepoint.tp_name;
dest->raw_tracepoint.tp_name_len =
cinfo->raw_tracepoint.tp_name_len;
return;
- }
- if (type == BPF_LINK_TYPE_ITER) {
dest->iter.target_name = cinfo->iter.target_name;
dest->iter.target_name_len = cinfo->iter.target_name_len;
return;
- }
Nit: could make it a switch.
+}
+/* is_iter_XYZ_target copied from tools/bpf/bpftool/link.c */ +static bool is_iter_map_target(const char *target_name) +{
- return strcmp(target_name, "bpf_map_elem") == 0 ||
strcmp(target_name, "bpf_sk_storage_map") == 0;
+}
+static bool is_iter_cgroup_target(const char *target_name) +{
- return strcmp(target_name, "cgroup") == 0;
+}
+static bool is_iter_task_target(const char *target_name) +{
- return strcmp(target_name, "task") == 0 ||
strcmp(target_name, "task_file") == 0 ||
strcmp(target_name, "task_vma") == 0;
+}
+void convert_compat_link_info_out(struct compat_bpf_link_info *dest,
struct bpf_link_info *info,
enum bpf_link_type type)
+{
- const char *iter_target;
- dest->type = info->type;
- dest->id = info->id;
- dest->prog_id = info->prog_id;
- switch (type) {
- case BPF_LINK_TYPE_RAW_TRACEPOINT:
dest->raw_tracepoint.tp_name =
info->raw_tracepoint.tp_name;
dest->raw_tracepoint.tp_name_len =
info->raw_tracepoint.tp_name_len;
return;
- case BPF_LINK_TYPE_TRACING:
dest->tracing.attach_type =
info->tracing.attach_type;
dest->tracing.target_obj_id =
info->tracing.target_obj_id;
dest->tracing.target_btf_id =
info->tracing.target_btf_id;
return;
- case BPF_LINK_TYPE_CGROUP:
dest->cgroup.cgroup_id = info->cgroup.cgroup_id;
dest->cgroup.attach_type = info->cgroup.attach_type;
return;
- case BPF_LINK_TYPE_ITER:
dest->iter.target_name = info->iter.target_name;
dest->iter.target_name_len = info->iter.target_name_len;
iter_target =
(const char *)(uintptr_t)info->iter.target_name;
if (is_iter_map_target(iter_target))
dest->iter.map.map_id = info->iter.map.map_id;
else if (is_iter_cgroup_target(iter_target)) {
dest->iter.cgroup.cgroup_id =
info->iter.cgroup.cgroup_id;
dest->iter.cgroup.order =
info->iter.cgroup.order;
} else if (is_iter_task_target(iter_target)) {
dest->iter.task.tid = info->iter.task.tid;
dest->iter.task.pid = info->iter.task.pid;
}
This looks like the semantically correct way to do the conversion, but the complexity of matching the string doesn't feel worth it, considering that nothing here is a pointer and therefore we don't strictly need to know which union member is in use. Everything is a fixed-size integer, so I think it is safe to assume that the layout is the same in native and compat64. This means we could directly memcpy() the remaining members of the iter struct, from offsetof(map) to the end of the struct.
Kevin
On 12/07/2023 12:20, Kevin Brodsky wrote:
On 08/06/2023 10:42, Zachary Leaf wrote:
[...]
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 588e4d5d4077..cdba1e563688 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4430,24 +4430,161 @@ static int bpf_btf_get_info_by_fd(struct file *file, return btf_get_info_by_fd(btf, attr, uattr); } +#ifdef CONFIG_COMPAT64 +void convert_compat_link_info_in(struct bpf_link_info *dest,
struct compat_bpf_link_info *cinfo,
enum bpf_link_type type)
+{
- /*
* Only a few fields in bpf_link_info are used as input, the
* rest do not need conversion in
*/
- if (type == BPF_LINK_TYPE_RAW_TRACEPOINT) {
dest->raw_tracepoint.tp_name =
cinfo->raw_tracepoint.tp_name;
dest->raw_tracepoint.tp_name_len =
cinfo->raw_tracepoint.tp_name_len;
return;
- }
- if (type == BPF_LINK_TYPE_ITER) {
dest->iter.target_name = cinfo->iter.target_name;
dest->iter.target_name_len = cinfo->iter.target_name_len;
return;
- }
Nit: could make it a switch.
Done
+}
+/* is_iter_XYZ_target copied from tools/bpf/bpftool/link.c */ +static bool is_iter_map_target(const char *target_name) +{
- return strcmp(target_name, "bpf_map_elem") == 0 ||
strcmp(target_name, "bpf_sk_storage_map") == 0;
+}
+static bool is_iter_cgroup_target(const char *target_name) +{
- return strcmp(target_name, "cgroup") == 0;
+}
+static bool is_iter_task_target(const char *target_name) +{
- return strcmp(target_name, "task") == 0 ||
strcmp(target_name, "task_file") == 0 ||
strcmp(target_name, "task_vma") == 0;
+}
+void convert_compat_link_info_out(struct compat_bpf_link_info *dest,
struct bpf_link_info *info,
enum bpf_link_type type)
+{
- const char *iter_target;
- dest->type = info->type;
- dest->id = info->id;
- dest->prog_id = info->prog_id;
- switch (type) {
- case BPF_LINK_TYPE_RAW_TRACEPOINT:
dest->raw_tracepoint.tp_name =
info->raw_tracepoint.tp_name;
dest->raw_tracepoint.tp_name_len =
info->raw_tracepoint.tp_name_len;
return;
- case BPF_LINK_TYPE_TRACING:
dest->tracing.attach_type =
info->tracing.attach_type;
dest->tracing.target_obj_id =
info->tracing.target_obj_id;
dest->tracing.target_btf_id =
info->tracing.target_btf_id;
return;
- case BPF_LINK_TYPE_CGROUP:
dest->cgroup.cgroup_id = info->cgroup.cgroup_id;
dest->cgroup.attach_type = info->cgroup.attach_type;
return;
- case BPF_LINK_TYPE_ITER:
dest->iter.target_name = info->iter.target_name;
dest->iter.target_name_len = info->iter.target_name_len;
iter_target =
(const char *)(uintptr_t)info->iter.target_name;
if (is_iter_map_target(iter_target))
dest->iter.map.map_id = info->iter.map.map_id;
else if (is_iter_cgroup_target(iter_target)) {
dest->iter.cgroup.cgroup_id =
info->iter.cgroup.cgroup_id;
dest->iter.cgroup.order =
info->iter.cgroup.order;
} else if (is_iter_task_target(iter_target)) {
dest->iter.task.tid = info->iter.task.tid;
dest->iter.task.pid = info->iter.task.pid;
}
This looks like the semantically correct way to do the conversion, but the complexity of matching the string doesn't feel worth it, considering that nothing here is a pointer and therefore we don't strictly need to know which union member is in use. Everything is a fixed-size integer, so I think it is safe to assume that the layout is the same in native and compat64. This means we could directly memcpy() the remaining members of the iter struct, from offsetof(map) to the end of the struct.
This should do the trick:
memcpy((u8 *)dest+offsetof(struct compat_bpf_link_info, iter.map), (u8 *)info+offsetof(struct bpf_link_info, iter.map), offsetofend(struct bpf_link_info, iter.cgroup.order) - offsetof(struct bpf_link_info, iter.map.map_id));
4 lines instead of ~30 is definitely an improvement there.
Thanks, Zach
Kevin
The CHECK_ATTR macro needs adapting to support union compat_bpf_attr as well as union bpf_attr.
The union bpf_attr passed in from userspace contains many structs of various sizes for each of the bpf syscall's multiplexed options. Since most of the structs do not take up the full memory footprint of the union, we should not have any data past the last struct member for any given option.
The CHECK_ATTR macro checks for non-null data in the gap between the end of the last member and the end of the union. Any non-zero data signals a malformed request from userspace so should be rejected.
Adapt the macro to be able to handle the compat64 offsets based on in_compat_syscall().
In order to check the compat64 union passed in from userspace, we now need to check before the conversion occurs i.e. in __compat_sys_bpf or we will not be able to catch invalid data from userspace here.
This presents a problem since the existing MACRO relies on the string XYZ_LAST_FIELD called inside each sub-command handler. However at the __sys_bpf and __compat_sys_bpf stage we have only int cmd to distinguish subcommands as per enum bpf_cmd, and: 1. there is no way to go cleanly from int to enum string/name 2. enum bpf_cmd does not cleanly match 1:1 in all cases for XYZ_LAST_FIELD
We can either: a) rework/refactor XYZ_LAST_FIELD system to encode by int cmd b) add a check_attr function with case statement to call the macro with the correct XYZ_LAST_FIELD argument
The latter is preferred here for being the slightly easier and cleaner approach. Rather than refactor the entire system, abstract the details away in a check_attr() function called from __sys_bpf + __sys_compat_bpf.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- kernel/bpf/syscall.c | 185 ++++++++++++++++++++++--------------------- 1 file changed, 95 insertions(+), 90 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index cdba1e563688..6fc61b71d1ce 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -867,13 +867,21 @@ int bpf_get_file_flag(int flags) return O_RDWR; }
-/* helper macro to check that unused fields 'union bpf_attr' are zero */ +#define __CHECK_ATTR(CMD, TYPE) \ + (memchr_inv((void *) &(((TYPE *)vattr)->CMD##_LAST_FIELD) + \ + sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD), 0, \ + sizeof(*(TYPE *)vattr) - \ + offsetof(TYPE, CMD##_LAST_FIELD) - \ + sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD)) != NULL) + +#ifdef CONFIG_COMPAT64 #define CHECK_ATTR(CMD) \ - memchr_inv((void *) &attr->CMD##_LAST_FIELD + \ - sizeof(attr->CMD##_LAST_FIELD), 0, \ - sizeof(*attr) - \ - offsetof(union bpf_attr, CMD##_LAST_FIELD) - \ - sizeof(attr->CMD##_LAST_FIELD)) != NULL + (in_compat_syscall() ? \ + __CHECK_ATTR(CMD, union compat_bpf_attr) : \ + __CHECK_ATTR(CMD, union bpf_attr)) +#else +#define CHECK_ATTR(CMD) __CHECK_ATTR(CMD, union bpf_attr) +#endif
/* dst and src must have at least "size" number of bytes. * Return strlen on success and < 0 on error. @@ -1078,10 +1086,6 @@ static int map_create(union bpf_attr *attr) int f_flags; int err;
- err = CHECK_ATTR(BPF_MAP_CREATE); - if (err) - return -EINVAL; - if (attr->btf_vmlinux_value_type_id) { if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS || attr->btf_key_type_id || attr->btf_value_type_id) @@ -1317,9 +1321,6 @@ static int map_lookup_elem(union bpf_attr *attr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM)) - return -EINVAL; - if (attr->flags & ~BPF_F_LOCK) return -EINVAL;
@@ -1392,9 +1393,6 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -1446,9 +1444,6 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr) void *key; int err;
- if (CHECK_ATTR(BPF_MAP_DELETE_ELEM)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -1502,9 +1497,6 @@ static int map_get_next_key(union bpf_attr *attr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_GET_NEXT_KEY)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -1786,9 +1778,6 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM)) - return -EINVAL; - if (attr->flags & ~BPF_F_LOCK) return -EINVAL;
@@ -1874,9 +1863,6 @@ static int map_freeze(const union bpf_attr *attr) struct bpf_map *map; struct fd f;
- if (CHECK_ATTR(BPF_MAP_FREEZE)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -2470,9 +2456,6 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) char license[128]; bool is_gpl;
- if (CHECK_ATTR(BPF_PROG_LOAD)) - return -EINVAL; - if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ | @@ -2659,7 +2642,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
static int bpf_obj_pin(const union bpf_attr *attr) { - if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0) + if (attr->file_flags != 0) return -EINVAL;
return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname)); @@ -2667,8 +2650,7 @@ static int bpf_obj_pin(const union bpf_attr *attr)
static int bpf_obj_get(const union bpf_attr *attr) { - if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 || - attr->file_flags & ~BPF_OBJ_FLAG_MASK) + if (attr->bpf_fd != 0 || attr->file_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL;
return bpf_obj_get_user(u64_to_user_ptr(attr->pathname), @@ -3355,9 +3337,6 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr) struct bpf_prog *prog; int fd;
- if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN)) - return -EINVAL; - prog = bpf_prog_get(attr->raw_tracepoint.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); @@ -3465,9 +3444,6 @@ static int bpf_prog_attach(const union bpf_attr *attr) struct bpf_prog *prog; int ret;
- if (CHECK_ATTR(BPF_PROG_ATTACH)) - return -EINVAL; - if (attr->attach_flags & ~BPF_F_ATTACH_MASK) return -EINVAL;
@@ -3524,9 +3500,6 @@ static int bpf_prog_detach(const union bpf_attr *attr) { enum bpf_prog_type ptype;
- if (CHECK_ATTR(BPF_PROG_DETACH)) - return -EINVAL; - ptype = attach_type_to_prog_type(attr->attach_type);
switch (ptype) { @@ -3558,8 +3531,6 @@ static int bpf_prog_query(const union bpf_attr *attr, { if (!capable(CAP_NET_ADMIN)) return -EPERM; - if (CHECK_ATTR(BPF_PROG_QUERY)) - return -EINVAL; if (attr->query.query_flags & ~BPF_F_QUERY_EFFECTIVE) return -EINVAL;
@@ -3612,9 +3583,6 @@ static int bpf_prog_test_run(const union bpf_attr *attr, struct bpf_prog *prog; int ret = -ENOTSUPP;
- if (CHECK_ATTR(BPF_PROG_TEST_RUN)) - return -EINVAL; - if ((attr->test.ctx_size_in && !attr->test.ctx_in) || (!attr->test.ctx_size_in && attr->test.ctx_in)) return -EINVAL; @@ -3644,7 +3612,7 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr, u32 next_id = attr->start_id; int err = 0;
- if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX) + if (next_id >= INT_MAX) return -EINVAL;
if (!capable(CAP_SYS_ADMIN)) @@ -3725,9 +3693,6 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr) u32 id = attr->prog_id; int fd;
- if (CHECK_ATTR(BPF_PROG_GET_FD_BY_ID)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -3751,8 +3716,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr) int f_flags; int fd;
- if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID) || - attr->open_flags & ~BPF_OBJ_FLAG_MASK) + if (attr->open_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL;
if (!capable(CAP_SYS_ADMIN)) @@ -4623,9 +4587,6 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr, struct fd f; int err;
- if (CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD)) - return -EINVAL; - f = fdget(ufd); if (!f.file) return -EBADFD; @@ -4652,9 +4613,6 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr) { - if (CHECK_ATTR(BPF_BTF_LOAD)) - return -EINVAL; - if (!bpf_capable()) return -EPERM;
@@ -4665,9 +4623,6 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr)
static int bpf_btf_get_fd_by_id(const union bpf_attr *attr) { - if (CHECK_ATTR(BPF_BTF_GET_FD_BY_ID)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -4733,9 +4688,6 @@ static int bpf_task_fd_query(const union bpf_attr *attr, struct file *file; int err;
- if (CHECK_ATTR(BPF_TASK_FD_QUERY)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -4817,9 +4769,6 @@ static int bpf_map_do_batch(const union bpf_attr *attr, int err, ufd; struct fd f;
- if (CHECK_ATTR(BPF_MAP_BATCH)) - return -EINVAL; - ufd = attr->batch.map_fd; f = fdget(ufd); map = __bpf_map_get(f); @@ -4858,9 +4807,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) struct bpf_prog *prog; int ret;
- if (CHECK_ATTR(BPF_LINK_CREATE)) - return -EINVAL; - prog = bpf_prog_get(attr->link_create.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); @@ -4968,9 +4914,6 @@ static int link_update(union bpf_attr *attr) u32 flags; int ret;
- if (CHECK_ATTR(BPF_LINK_UPDATE)) - return -EINVAL; - flags = attr->link_update.flags; if (flags & ~BPF_F_REPLACE) return -EINVAL; @@ -5019,9 +4962,6 @@ static int link_detach(union bpf_attr *attr) struct bpf_link *link; int ret;
- if (CHECK_ATTR(BPF_LINK_DETACH)) - return -EINVAL; - link = bpf_link_get_from_fd(attr->link_detach.link_fd); if (IS_ERR(link)) return PTR_ERR(link); @@ -5089,9 +5029,6 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr) u32 id = attr->link_id; int fd;
- if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -5145,9 +5082,6 @@ static int bpf_enable_runtime_stats(void) static int bpf_enable_stats(union bpf_attr *attr) {
- if (CHECK_ATTR(BPF_ENABLE_STATS)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -5167,9 +5101,6 @@ static int bpf_iter_create(union bpf_attr *attr) struct bpf_link *link; int err;
- if (CHECK_ATTR(BPF_ITER_CREATE)) - return -EINVAL; - if (attr->iter_create.flags) return -EINVAL;
@@ -5192,9 +5123,6 @@ static int bpf_prog_bind_map(union bpf_attr *attr) struct bpf_map **used_maps_old, **used_maps_new; int i, ret = 0;
- if (CHECK_ATTR(BPF_PROG_BIND_MAP)) - return -EINVAL; - if (attr->prog_bind_map.flags) return -EINVAL;
@@ -5375,6 +5303,75 @@ static int dispatch_bpf(int cmd, union bpf_attr *attr, bpfptr_t uattr, int size) return err; }
+static int check_attr(int cmd, void *vattr) +{ + switch (cmd) { + case BPF_MAP_CREATE: + return CHECK_ATTR(BPF_MAP_CREATE); + case BPF_MAP_LOOKUP_ELEM: + return CHECK_ATTR(BPF_MAP_LOOKUP_ELEM); + case BPF_MAP_GET_NEXT_KEY: + return CHECK_ATTR(BPF_MAP_GET_NEXT_KEY); + case BPF_MAP_FREEZE: + return CHECK_ATTR(BPF_MAP_FREEZE); + case BPF_PROG_LOAD: + return CHECK_ATTR(BPF_PROG_LOAD); + case BPF_OBJ_PIN: + return CHECK_ATTR(BPF_OBJ); + case BPF_OBJ_GET: + return CHECK_ATTR(BPF_OBJ); + case BPF_PROG_ATTACH: + return CHECK_ATTR(BPF_PROG_ATTACH); + case BPF_PROG_DETACH: + return CHECK_ATTR(BPF_PROG_DETACH); + case BPF_PROG_QUERY: + return CHECK_ATTR(BPF_PROG_QUERY); + case BPF_PROG_TEST_RUN: + return CHECK_ATTR(BPF_PROG_TEST_RUN); + case BPF_PROG_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_MAP_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_BTF_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_PROG_GET_FD_BY_ID: + return CHECK_ATTR(BPF_PROG_GET_FD_BY_ID); + case BPF_MAP_GET_FD_BY_ID: + return CHECK_ATTR(BPF_MAP_GET_FD_BY_ID); + case BPF_OBJ_GET_INFO_BY_FD: + return CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD); + case BPF_RAW_TRACEPOINT_OPEN: + return CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN); + case BPF_BTF_LOAD: + return CHECK_ATTR(BPF_BTF_LOAD); + case BPF_BTF_GET_FD_BY_ID: + return CHECK_ATTR(BPF_BTF_GET_FD_BY_ID); + case BPF_TASK_FD_QUERY: + return CHECK_ATTR(BPF_TASK_FD_QUERY); + case BPF_MAP_LOOKUP_AND_DELETE_ELEM: + return CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM); + case BPF_LINK_CREATE: + return CHECK_ATTR(BPF_LINK_CREATE); + case BPF_LINK_UPDATE: + return CHECK_ATTR(BPF_LINK_UPDATE); + case BPF_LINK_GET_FD_BY_ID: + return CHECK_ATTR(BPF_LINK_GET_FD_BY_ID); + case BPF_LINK_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_ENABLE_STATS: + return CHECK_ATTR(BPF_ENABLE_STATS); + case BPF_ITER_CREATE: + return CHECK_ATTR(BPF_ITER_CREATE); + case BPF_LINK_DETACH: + return CHECK_ATTR(BPF_LINK_DETACH); + case BPF_PROG_BIND_MAP: + return CHECK_ATTR(BPF_PROG_BIND_MAP); + default: + return 0; + } + +} + #ifdef CONFIG_COMPAT64 static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf_attr *cattr, int cmd) { @@ -5675,6 +5672,10 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0) return -EFAULT;
+ err = check_attr(cmd, (void *)&attr); + if (err) + return -EINVAL; + return dispatch_bpf(cmd, &attr, uattr, size); }
@@ -5704,6 +5705,10 @@ static int __sys_compat_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(&cattr, uattr, size) != 0) return -EFAULT;
+ err = check_attr(cmd, (void *)&cattr); + if (err) + return -EINVAL; + convert_compat_bpf_attr(&attr, &cattr, cmd);
return dispatch_bpf(cmd, &attr, uattr, sizeof(attr));
On 08/06/2023 10:42, Zachary Leaf wrote:
The CHECK_ATTR macro needs adapting to support union compat_bpf_attr as well as union bpf_attr.
The union bpf_attr passed in from userspace contains many structs of various sizes for each of the bpf syscall's multiplexed options. Since most of the structs do not take up the full memory footprint of the union, we should not have any data past the last struct member for any given option.
The CHECK_ATTR macro checks for non-null data in the gap between the end of the last member and the end of the union. Any non-zero data signals a malformed request from userspace so should be rejected.
Adapt the macro to be able to handle the compat64 offsets based on in_compat_syscall().
In order to check the compat64 union passed in from userspace, we now need to check before the conversion occurs i.e. in __compat_sys_bpf or we will not be able to catch invalid data from userspace here.
This presents a problem since the existing MACRO relies on the string XYZ_LAST_FIELD called inside each sub-command handler. However at the __sys_bpf and __compat_sys_bpf stage we have only int cmd to distinguish subcommands as per enum bpf_cmd, and: 1. there is no way to go cleanly from int to enum string/name 2. enum bpf_cmd does not cleanly match 1:1 in all cases for XYZ_LAST_FIELD
We can either: a) rework/refactor XYZ_LAST_FIELD system to encode by int cmd b) add a check_attr function with case statement to call the macro with the correct XYZ_LAST_FIELD argument
The latter is preferred here for being the slightly easier and cleaner approach. Rather than refactor the entire system, abstract the details away in a check_attr() function called from __sys_bpf + __sys_compat_bpf.
I've pondered on that issue for a while and I tend to agree with your approach. The move of CHECK_ATTR() from each handler to a big switch is a bit unfortunate but I don't have a better idea. The fact that we still use *_LAST_FIELD in compat64 is quite nice.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
[...]
#ifdef CONFIG_COMPAT64 static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf_attr *cattr, int cmd) { @@ -5675,6 +5672,10 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(&attr, uattr, size) != 0) return -EFAULT;
- err = check_attr(cmd, (void *)&attr);
No need for the cast, void * is implicitly convertible to and from any pointer type (in C). Same below.
Kevin
- if (err)
return -EINVAL;
- return dispatch_bpf(cmd, &attr, uattr, size);
} @@ -5704,6 +5705,10 @@ static int __sys_compat_bpf(int cmd, bpfptr_t uattr, unsigned int size) if (copy_from_bpfptr_with_ptr(&cattr, uattr, size) != 0) return -EFAULT;
- err = check_attr(cmd, (void *)&cattr);
- if (err)
return -EINVAL;
- convert_compat_bpf_attr(&attr, &cattr, cmd);
return dispatch_bpf(cmd, &attr, uattr, sizeof(attr));
union bpf_attr and structs bpf_xyz_info are used to pass config to the kernel or return info to userspace via the bpf syscall.
In order to avoid needing a separate 32-bit compat handler for the bpf syscall, __user pointers inside these union/structs are stored as __u64 or __aligned_u64.
In Morello PCuABI a user pointer is a 129-bit capability so __aligned_u64 type is not big enough to hold it. Use type __kernel_aligned_uintptr_t instead, which is big enough on the affected architectures while remaining __aligned_64 on others.
Use copy_from_user_with_ptr where blocks of memory containing pointers are being copied to preserve capabilities.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- drivers/media/rc/bpf-lirc.c | 2 +- include/linux/bpfptr.h | 6 +- include/uapi/linux/bpf.h | 94 +++++++++++---------- kernel/bpf/bpf_iter.c | 2 +- kernel/bpf/btf.c | 12 +-- kernel/bpf/cgroup.c | 5 +- kernel/bpf/hashtab.c | 8 +- kernel/bpf/net_namespace.c | 2 +- kernel/bpf/offload.c | 2 +- kernel/bpf/syscall.c | 163 +++++++++++++++++++++--------------- kernel/bpf/verifier.c | 2 +- kernel/trace/bpf_trace.c | 6 +- net/bpf/test_run.c | 16 ++-- net/core/sock_map.c | 2 +- 14 files changed, 177 insertions(+), 145 deletions(-)
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index 9988232aeab5..c49d80a6dbef 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -295,7 +295,7 @@ int lirc_prog_detach(const union bpf_attr *attr)
int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; struct bpf_prog_array *progs; struct rc_dev *rcdev; u32 cnt, flags = 0; diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 407e25d608eb..2aebf9b59035 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -23,12 +23,12 @@ static inline bpfptr_t USER_BPFPTR(void __user *p) return (bpfptr_t) { .user = p }; }
-static inline bpfptr_t make_bpfptr(u64 addr, bool is_kernel) +static inline bpfptr_t make_bpfptr(__kernel_uintptr_t ptr, bool is_kernel) { if (is_kernel) - return KERNEL_BPFPTR((void*) (uintptr_t) addr); + return KERNEL_BPFPTR((void *)(uintptr_t)ptr); else - return USER_BPFPTR(u64_to_user_ptr(addr)); + return USER_BPFPTR((void __user *)ptr); }
static inline bool bpfptr_is_null(bpfptr_t bpfptr) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 51b9aa640ad2..e95709cf4147 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1330,21 +1330,21 @@ union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ __u32 map_fd; - __aligned_u64 key; + __kernel_aligned_uintptr_t key; union { - __aligned_u64 value; - __aligned_u64 next_key; + __kernel_aligned_uintptr_t value; + __kernel_aligned_uintptr_t next_key; }; __u64 flags; };
struct { /* struct used by BPF_MAP_*_BATCH commands */ - __aligned_u64 in_batch; /* start batch, - * NULL to start from beginning - */ - __aligned_u64 out_batch; /* output: next start batch */ - __aligned_u64 keys; - __aligned_u64 values; + /* start batch, NULL to start from beginning */ + __kernel_aligned_uintptr_t in_batch; + /* output: next start batch */ + __kernel_aligned_uintptr_t out_batch; + __kernel_aligned_uintptr_t keys; + __kernel_aligned_uintptr_t values; __u32 count; /* input/output: * input: # of key/value * elements @@ -1358,11 +1358,11 @@ union bpf_attr { struct { /* anonymous struct used by BPF_PROG_LOAD command */ __u32 prog_type; /* one of enum bpf_prog_type */ __u32 insn_cnt; - __aligned_u64 insns; - __aligned_u64 license; + __kernel_aligned_uintptr_t insns; + __kernel_aligned_uintptr_t license; __u32 log_level; /* verbosity level of verifier */ __u32 log_size; /* size of user buffer */ - __aligned_u64 log_buf; /* user supplied buffer */ + __kernel_aligned_uintptr_t log_buf; /* user supplied buffer */ __u32 kern_version; /* not used */ __u32 prog_flags; char prog_name[BPF_OBJ_NAME_LEN]; @@ -1374,10 +1374,10 @@ union bpf_attr { __u32 expected_attach_type; __u32 prog_btf_fd; /* fd pointing to BTF type data */ __u32 func_info_rec_size; /* userspace bpf_func_info size */ - __aligned_u64 func_info; /* func info */ + __kernel_aligned_uintptr_t func_info; /* func info */ __u32 func_info_cnt; /* number of bpf_func_info records */ __u32 line_info_rec_size; /* userspace bpf_line_info size */ - __aligned_u64 line_info; /* line info */ + __kernel_aligned_uintptr_t line_info; /* line info */ __u32 line_info_cnt; /* number of bpf_line_info records */ __u32 attach_btf_id; /* in-kernel BTF type id to attach to */ union { @@ -1387,13 +1387,13 @@ union bpf_attr { __u32 attach_btf_obj_fd; }; __u32 core_relo_cnt; /* number of bpf_core_relo */ - __aligned_u64 fd_array; /* array of FDs */ - __aligned_u64 core_relos; + __kernel_aligned_uintptr_t fd_array; /* array of FDs */ + __kernel_aligned_uintptr_t core_relos; __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ };
struct { /* anonymous struct used by BPF_OBJ_* commands */ - __aligned_u64 pathname; + __kernel_aligned_uintptr_t pathname; __u32 bpf_fd; __u32 file_flags; }; @@ -1417,8 +1417,8 @@ union bpf_attr { * returns ENOSPC if data_out * is too small. */ - __aligned_u64 data_in; - __aligned_u64 data_out; + __kernel_aligned_uintptr_t data_in; + __kernel_aligned_uintptr_t data_out; __u32 repeat; __u32 duration; __u32 ctx_size_in; /* input: len of ctx_in */ @@ -1426,8 +1426,8 @@ union bpf_attr { * returns ENOSPC if ctx_out * is too small. */ - __aligned_u64 ctx_in; - __aligned_u64 ctx_out; + __kernel_aligned_uintptr_t ctx_in; + __kernel_aligned_uintptr_t ctx_out; __u32 flags; __u32 cpu; __u32 batch_size; @@ -1448,7 +1448,7 @@ union bpf_attr { struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ __u32 bpf_fd; __u32 info_len; - __aligned_u64 info; + __kernel_aligned_uintptr_t info; } info;
struct { /* anonymous struct used by BPF_PROG_QUERY command */ @@ -1456,22 +1456,22 @@ union bpf_attr { __u32 attach_type; __u32 query_flags; __u32 attach_flags; - __aligned_u64 prog_ids; + __kernel_aligned_uintptr_t prog_ids; __u32 prog_cnt; /* output: per-program attach_flags. * not allowed to be set during effective query. */ - __aligned_u64 prog_attach_flags; + __kernel_aligned_uintptr_t prog_attach_flags; } query;
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ - __u64 name; + __kernel_aligned_uintptr_t name; __u32 prog_fd; } raw_tracepoint;
struct { /* anonymous struct for BPF_BTF_LOAD */ - __aligned_u64 btf; - __aligned_u64 btf_log_buf; + __kernel_aligned_uintptr_t btf; + __kernel_aligned_uintptr_t btf_log_buf; __u32 btf_size; __u32 btf_log_size; __u32 btf_log_level; @@ -1482,7 +1482,7 @@ union bpf_attr { __u32 fd; /* input: fd */ __u32 flags; /* input: flags */ __u32 buf_len; /* input/output: buf len */ - __aligned_u64 buf; /* input/output: + __kernel_aligned_uintptr_t buf; /* input/output: * tp_name for tracepoint * symbol for kprobe * filename for uprobe @@ -1504,8 +1504,10 @@ union bpf_attr { union { __u32 target_btf_id; /* btf_id of target to attach to */ struct { - __aligned_u64 iter_info; /* extra bpf_iter_link_info */ - __u32 iter_info_len; /* iter_info length */ + /* extra bpf_iter_link_info */ + __kernel_aligned_uintptr_t iter_info; + /* iter_info length */ + __u32 iter_info_len; }; struct { /* black box user-provided value passed through @@ -1517,9 +1519,9 @@ union bpf_attr { struct { __u32 flags; __u32 cnt; - __aligned_u64 syms; - __aligned_u64 addrs; - __aligned_u64 cookies; + __kernel_aligned_uintptr_t syms; + __kernel_aligned_uintptr_t addrs; + __kernel_aligned_uintptr_t cookies; } kprobe_multi; struct { /* this is overlaid with the target_btf_id above. */ @@ -6168,12 +6170,12 @@ struct bpf_prog_info { __u8 tag[BPF_TAG_SIZE]; __u32 jited_prog_len; __u32 xlated_prog_len; - __aligned_u64 jited_prog_insns; - __aligned_u64 xlated_prog_insns; + __kernel_aligned_uintptr_t jited_prog_insns; + __kernel_aligned_uintptr_t xlated_prog_insns; __u64 load_time; /* ns since boottime */ __u32 created_by_uid; __u32 nr_map_ids; - __aligned_u64 map_ids; + __kernel_aligned_uintptr_t map_ids; char name[BPF_OBJ_NAME_LEN]; __u32 ifindex; __u32 gpl_compatible:1; @@ -6182,20 +6184,20 @@ struct bpf_prog_info { __u64 netns_ino; __u32 nr_jited_ksyms; __u32 nr_jited_func_lens; - __aligned_u64 jited_ksyms; - __aligned_u64 jited_func_lens; + __kernel_aligned_uintptr_t jited_ksyms; + __kernel_aligned_uintptr_t jited_func_lens; __u32 btf_id; __u32 func_info_rec_size; - __aligned_u64 func_info; + __kernel_aligned_uintptr_t func_info; __u32 nr_func_info; __u32 nr_line_info; - __aligned_u64 line_info; - __aligned_u64 jited_line_info; + __kernel_aligned_uintptr_t line_info; + __kernel_aligned_uintptr_t jited_line_info; __u32 nr_jited_line_info; __u32 line_info_rec_size; __u32 jited_line_info_rec_size; __u32 nr_prog_tags; - __aligned_u64 prog_tags; + __kernel_aligned_uintptr_t prog_tags; __u64 run_time_ns; __u64 run_cnt; __u64 recursion_misses; @@ -6224,10 +6226,10 @@ struct bpf_map_info { } __attribute__((aligned(8)));
struct bpf_btf_info { - __aligned_u64 btf; + __kernel_aligned_uintptr_t btf; __u32 btf_size; __u32 id; - __aligned_u64 name; + __kernel_aligned_uintptr_t name; __u32 name_len; __u32 kernel_btf; } __attribute__((aligned(8))); @@ -6238,7 +6240,7 @@ struct bpf_link_info { __u32 prog_id; union { struct { - __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */ + __kernel_aligned_uintptr_t tp_name; /* in/out: tp_name buffer ptr */ __u32 tp_name_len; /* in/out: tp_name buffer len */ } raw_tracepoint; struct { @@ -6251,7 +6253,7 @@ struct bpf_link_info { __u32 attach_type; } cgroup; struct { - __aligned_u64 target_name; /* in/out: target_name buffer ptr */ + __kernel_aligned_uintptr_t target_name; /* in/out: target_name buffer ptr */ __u32 target_name_len; /* in/out: target_name buffer len */
/* If the iter specific field is 32 bits, it can be put diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 5dc307bdeaeb..fc3d2b440a16 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -456,7 +456,7 @@ static int bpf_iter_link_fill_link_info(const struct bpf_link *link, { struct bpf_iter_link *iter_link = container_of(link, struct bpf_iter_link, link); - char __user *ubuf = u64_to_user_ptr(info->iter.target_name); + char __user *ubuf = (char __user *)info->iter.target_name; bpf_iter_fill_link_info_t fill_link_info; u32 ulen = info->iter.target_name_len; const char *target_name; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 25cb5231f4df..f7da1077b488 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6897,7 +6897,7 @@ int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr)
btf = btf_parse(make_bpfptr(attr->btf, uattr.is_kernel), attr->btf_size, attr->btf_log_level, - u64_to_user_ptr(attr->btf_log_buf), + (char __user *)attr->btf_log_buf, attr->btf_log_size); if (IS_ERR(btf)) return PTR_ERR(btf); @@ -6947,10 +6947,10 @@ struct btf *btf_get_by_fd(int fd) void convert_compat_btf_info_in(struct bpf_btf_info *dest, struct compat_bpf_btf_info *cinfo) { - dest->btf = cinfo->btf; + dest->btf = (__kernel_aligned_uintptr_t)compat_ptr(cinfo->btf); dest->btf_size = cinfo->btf_size; dest->id = cinfo->id; - dest->name = cinfo->name; + dest->name = (__kernel_aligned_uintptr_t)compat_ptr(cinfo->name); dest->name_len = cinfo->name_len; dest->kernel_btf = cinfo->kernel_btf; } @@ -7010,7 +7010,7 @@ int btf_get_info_by_fd(const struct btf *btf, } else #endif { - uinfo = u64_to_user_ptr(attr->info.info); + uinfo = (struct bpf_btf_info __user *)attr->info.info; uinfo_len = attr->info.info_len;
ret = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), @@ -7029,7 +7029,7 @@ int btf_get_info_by_fd(const struct btf *btf, * anything in compat64 case */ info.id = btf->id; - ubtf = u64_to_user_ptr(info.btf); + ubtf = (void __user *)info.btf; btf_copy = min_t(u32, btf->data_size, info.btf_size); if (copy_to_user(ubtf, btf->data, btf_copy)) return -EFAULT; @@ -7037,7 +7037,7 @@ int btf_get_info_by_fd(const struct btf *btf,
info.kernel_btf = btf->kernel_btf;
- uname = u64_to_user_ptr(info.name); + uname = (char __user *)info.name; uname_len = info.name_len; if (!uname ^ !uname_len) return -EINVAL; diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 5f9d13848986..5345898af631 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1020,9 +1020,10 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr) { - __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags); + __u32 __user *prog_attach_flags = + (__u32 __user *)attr->query.prog_attach_flags; bool effective_query = attr->query.query_flags & BPF_F_QUERY_EFFECTIVE; - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; enum bpf_attach_type type = attr->query.attach_type; enum cgroup_bpf_attach_type from_atype, to_atype; enum cgroup_bpf_attach_type atype; diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 5cf4ddc2a433..0b187a352483 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1660,9 +1660,9 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, struct bpf_htab *htab = container_of(map, struct bpf_htab, map); u32 bucket_cnt, total, key_size, value_size, roundup_key_size; void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val; - void __user *uvalues = u64_to_user_ptr(attr->batch.values); - void __user *ukeys = u64_to_user_ptr(attr->batch.keys); - void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); + void __user *uvalues = (void __user *)attr->batch.values; + void __user *ukeys = (void __user *)attr->batch.keys; + void __user *ubatch = (void __user *)attr->batch.in_batch; u32 batch, max_count, size, bucket_size, map_id; struct htab_elem *node_to_free = NULL; u64 elem_map_flags, map_flags; @@ -1866,7 +1866,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, goto out;
/* copy # of entries and next batch */ - ubatch = u64_to_user_ptr(attr->batch.out_batch); + ubatch = (void __user *)attr->batch.out_batch; if (copy_to_user(ubatch, &batch, sizeof(batch)) || PUT_USER_UATTR(total, batch.count)) ret = -EFAULT; diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c index 3bfc97b37774..4a527ae97c2d 100644 --- a/kernel/bpf/net_namespace.c +++ b/kernel/bpf/net_namespace.c @@ -249,7 +249,7 @@ static int __netns_bpf_prog_query(const union bpf_attr *attr, struct net *net, enum netns_bpf_attach_type type) { - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; struct bpf_prog_array *run_array; u32 prog_cnt = 0, flags = 0;
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 13e4efc971e6..cdca154ffea7 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -322,7 +322,7 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info, ulen = info->jited_prog_len; info->jited_prog_len = aux->offload->jited_len; if (info->jited_prog_len && ulen) { - uinsns = u64_to_user_ptr(info->jited_prog_insns); + uinsns = (char __user *)info->jited_prog_insns; ulen = min_t(u32, info->jited_prog_len, ulen); if (copy_to_user(uinsns, aux->offload->jited_image, ulen)) { up_read(&bpf_devs_lock); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 6fc61b71d1ce..b45855ac7c50 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1312,8 +1312,8 @@ static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size)
static int map_lookup_elem(union bpf_attr *attr) { - void __user *ukey = u64_to_user_ptr(attr->key); - void __user *uvalue = u64_to_user_ptr(attr->value); + void __user *ukey = (void __user *)attr->key; + void __user *uvalue = (void __user *)attr->value; int ufd = attr->map_fd; struct bpf_map *map; void *key, *value; @@ -1489,8 +1489,8 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
static int map_get_next_key(union bpf_attr *attr) { - void __user *ukey = u64_to_user_ptr(attr->key); - void __user *unext_key = u64_to_user_ptr(attr->next_key); + void __user *ukey = (void __user *)attr->key; + void __user *unext_key = (void __user *)attr->next_key; int ufd = attr->map_fd; struct bpf_map *map; void *key, *next_key; @@ -1552,7 +1552,7 @@ int generic_map_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { - void __user *keys = u64_to_user_ptr(attr->batch.keys); + void __user *keys = (void __user *)attr->batch.keys; u32 cp, max_count; int err = 0; void *key; @@ -1606,8 +1606,8 @@ int generic_map_update_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { - void __user *values = u64_to_user_ptr(attr->batch.values); - void __user *keys = u64_to_user_ptr(attr->batch.keys); + void __user *values = (void __user *)attr->batch.values; + void __user *keys = (void __user *)attr->batch.keys; u32 value_size, cp, max_count; int ufd = attr->batch.map_fd; void *key, *value; @@ -1669,10 +1669,10 @@ int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { - void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch); - void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); - void __user *values = u64_to_user_ptr(attr->batch.values); - void __user *keys = u64_to_user_ptr(attr->batch.keys); + void __user *uobatch = (void __user *)attr->batch.out_batch; + void __user *ubatch = (void __user *)attr->batch.in_batch; + void __user *values = (void __user *)attr->batch.values; + void __user *keys = (void __user *)attr->batch.keys; void *buf, *buf_prevkey, *prev_key, *key, *value; int err, retry = MAP_LOOKUP_RETRIES; u32 value_size, cp, max_count; @@ -1769,8 +1769,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
static int map_lookup_and_delete_elem(union bpf_attr *attr) { - void __user *ukey = u64_to_user_ptr(attr->key); - void __user *uvalue = u64_to_user_ptr(attr->value); + void __user *ukey = (void __user *)attr->key; + void __user *uvalue = (void __user *)attr->value; int ufd = attr->map_fd; struct bpf_map *map; void *key, *value; @@ -2645,7 +2645,7 @@ static int bpf_obj_pin(const union bpf_attr *attr) if (attr->file_flags != 0) return -EINVAL;
- return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname)); + return bpf_obj_pin_user(attr->bpf_fd, (void __user *)attr->pathname); }
static int bpf_obj_get(const union bpf_attr *attr) @@ -2653,7 +2653,7 @@ static int bpf_obj_get(const union bpf_attr *attr) if (attr->bpf_fd != 0 || attr->file_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL;
- return bpf_obj_get_user(u64_to_user_ptr(attr->pathname), + return bpf_obj_get_user((void __user *)attr->pathname, attr->file_flags); }
@@ -3148,7 +3148,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link, { struct bpf_raw_tp_link *raw_tp_link = container_of(link, struct bpf_raw_tp_link, link); - char __user *ubuf = u64_to_user_ptr(info->raw_tracepoint.tp_name); + char __user *ubuf = (char __user *)info->raw_tracepoint.tp_name; const char *tp_name = raw_tp_link->btp->tp->name; u32 ulen = info->raw_tracepoint.tp_name_len; size_t tp_len = strlen(tp_name); @@ -3341,7 +3341,8 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr) if (IS_ERR(prog)) return PTR_ERR(prog);
- fd = bpf_raw_tp_link_attach(prog, u64_to_user_ptr(attr->raw_tracepoint.name)); + fd = bpf_raw_tp_link_attach(prog, + (const char __user *)attr->raw_tracepoint.name); if (fd < 0) bpf_prog_put(prog); return fd; @@ -3864,12 +3865,15 @@ void convert_compat_prog_info_in(struct bpf_prog_info *dest, strncpy((char *)dest->tag, (char *)cinfo->tag, BPF_TAG_SIZE); dest->jited_prog_len = cinfo->jited_prog_len; dest->xlated_prog_len = cinfo->xlated_prog_len; - dest->jited_prog_insns = cinfo->jited_prog_insns; - dest->xlated_prog_insns = cinfo->xlated_prog_insns; + dest->jited_prog_insns = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->jited_prog_insns); + dest->xlated_prog_insns = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->xlated_prog_insns); dest->load_time = cinfo->load_time; dest->created_by_uid = cinfo->created_by_uid; dest->nr_map_ids = cinfo->nr_map_ids; - dest->map_ids = cinfo->map_ids; + dest->map_ids = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->map_ids); strncpy((char *)dest->name, (char *)cinfo->name, BPF_OBJ_NAME_LEN); dest->ifindex = cinfo->ifindex; dest->gpl_compatible = cinfo->gpl_compatible; @@ -3877,20 +3881,26 @@ void convert_compat_prog_info_in(struct bpf_prog_info *dest, dest->netns_ino = cinfo->netns_ino; dest->nr_jited_ksyms = cinfo->nr_jited_ksyms; dest->nr_jited_func_lens = cinfo->nr_jited_func_lens; - dest->jited_ksyms = cinfo->jited_ksyms; - dest->jited_func_lens = cinfo->jited_func_lens; + dest->jited_ksyms = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->jited_ksyms); + dest->jited_func_lens = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->jited_func_lens); dest->btf_id = cinfo->btf_id; dest->func_info_rec_size = cinfo->func_info_rec_size; - dest->func_info = cinfo->func_info; + dest->func_info = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->func_info); dest->nr_func_info = cinfo->nr_func_info; dest->nr_line_info = cinfo->nr_line_info; - dest->line_info = cinfo->line_info; - dest->jited_line_info = cinfo->jited_line_info; + dest->line_info = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->line_info); + dest->jited_line_info = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->jited_line_info); dest->nr_jited_line_info = cinfo->nr_jited_line_info; dest->line_info_rec_size = cinfo->line_info_rec_size; dest->jited_line_info_rec_size = cinfo->jited_line_info_rec_size; dest->nr_prog_tags = cinfo->nr_prog_tags; - dest->prog_tags = cinfo->prog_tags; + dest->prog_tags = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->prog_tags); dest->run_time_ns = cinfo->run_time_ns; dest->run_cnt = cinfo->run_cnt; dest->recursion_misses = cinfo->recursion_misses; @@ -3988,7 +3998,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, } else #endif { - uinfo = u64_to_user_ptr(attr->info.info); + uinfo = (struct bpf_prog_info __user *)attr->info.info; info_len = attr->info.info_len;
err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), @@ -4017,7 +4027,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, info.nr_map_ids = prog->aux->used_map_cnt; ulen = min_t(u32, info.nr_map_ids, ulen); if (ulen) { - u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids); + u32 __user *user_map_ids = (u32 __user *)info.map_ids; u32 i;
for (i = 0; i < ulen; i++) @@ -4064,7 +4074,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, insns_sanitized = bpf_insn_prepare_dump(prog, file->f_cred); if (!insns_sanitized) return -ENOMEM; - uinsns = u64_to_user_ptr(info.xlated_prog_insns); + uinsns = (char __user *)info.xlated_prog_insns; ulen = min_t(u32, info.xlated_prog_len, ulen); fault = copy_to_user(uinsns, insns_sanitized, ulen); kfree(insns_sanitized); @@ -4096,7 +4106,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
if (info.jited_prog_len && ulen) { if (bpf_dump_raw_ok(file->f_cred)) { - uinsns = u64_to_user_ptr(info.jited_prog_insns); + uinsns = (char __user *)info.jited_prog_insns; ulen = min_t(u32, info.jited_prog_len, ulen);
/* for multi-function programs, copy the JITed @@ -4139,7 +4149,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, * corresponding to each function */ ulen = min_t(u32, info.nr_jited_ksyms, ulen); - user_ksyms = u64_to_user_ptr(info.jited_ksyms); + user_ksyms = (u64 __user *)info.jited_ksyms; if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { ksym_addr = (unsigned long) @@ -4167,7 +4177,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
/* copy the JITed image lengths for each function */ ulen = min_t(u32, info.nr_jited_func_lens, ulen); - user_lens = u64_to_user_ptr(info.jited_func_lens); + user_lens = (u32 __user *)info.jited_func_lens; if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { func_len = @@ -4196,7 +4206,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, if (info.nr_func_info && ulen) { char __user *user_finfo;
- user_finfo = u64_to_user_ptr(info.func_info); + user_finfo = (char __user *)info.func_info; ulen = min_t(u32, info.nr_func_info, ulen); if (copy_to_user(user_finfo, prog->aux->func_info, info.func_info_rec_size * ulen)) @@ -4208,7 +4218,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, if (info.nr_line_info && ulen) { __u8 __user *user_linfo;
- user_linfo = u64_to_user_ptr(info.line_info); + user_linfo = (__u8 __user *)info.line_info; ulen = min_t(u32, info.nr_line_info, ulen); if (copy_to_user(user_linfo, prog->aux->linfo, info.line_info_rec_size * ulen)) @@ -4226,7 +4236,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, __u64 __user *user_linfo; u32 i;
- user_linfo = u64_to_user_ptr(info.jited_line_info); + user_linfo = (__u64 __user *)info.jited_line_info; ulen = min_t(u32, info.nr_jited_line_info, ulen); for (i = 0; i < ulen; i++) { line_addr = (unsigned long)prog->aux->jited_linfo[i]; @@ -4241,20 +4251,20 @@ static int bpf_prog_get_info_by_fd(struct file *file, ulen = info.nr_prog_tags; info.nr_prog_tags = prog->aux->func_cnt ? : 1; if (ulen) { - __u8 __user (*user_prog_tags)[BPF_TAG_SIZE]; + __u8 __user *user_prog_tags; u32 i;
- user_prog_tags = u64_to_user_ptr(info.prog_tags); + user_prog_tags = (__u8 __user *)info.prog_tags; ulen = min_t(u32, info.nr_prog_tags, ulen); if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { - if (copy_to_user(user_prog_tags[i], + if (copy_to_user(user_prog_tags+i, prog->aux->func[i]->tag, BPF_TAG_SIZE)) return -EFAULT; } } else { - if (copy_to_user(user_prog_tags[0], + if (copy_to_user(user_prog_tags, prog->tag, BPF_TAG_SIZE)) return -EFAULT; } @@ -4336,7 +4346,7 @@ static int bpf_map_get_info_by_fd(struct file *file, } else #endif { - uinfo = u64_to_user_ptr(attr->info.info); + uinfo = (struct bpf_map_info __user *)attr->info.info; info_len = attr->info.info_len;
err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), @@ -4404,14 +4414,14 @@ void convert_compat_link_info_in(struct bpf_link_info *dest, * rest do not need conversion in */ if (type == BPF_LINK_TYPE_RAW_TRACEPOINT) { - dest->raw_tracepoint.tp_name = - cinfo->raw_tracepoint.tp_name; - dest->raw_tracepoint.tp_name_len = - cinfo->raw_tracepoint.tp_name_len; + dest->raw_tracepoint.tp_name = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->raw_tracepoint.tp_name); + dest->raw_tracepoint.tp_name_len = cinfo->raw_tracepoint.tp_name_len; return; } if (type == BPF_LINK_TYPE_ITER) { - dest->iter.target_name = cinfo->iter.target_name; + dest->iter.target_name = (__kernel_aligned_uintptr_t) + compat_ptr(cinfo->iter.target_name); dest->iter.target_name_len = cinfo->iter.target_name_len; return; } @@ -4536,7 +4546,7 @@ static int bpf_link_get_info_by_fd(struct file *file, } else #endif { - uinfo = u64_to_user_ptr(attr->info.info); + uinfo = (struct bpf_link_info __user *)attr->info.info; info_len = attr->info.info_len;
err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), @@ -4635,7 +4645,7 @@ static int bpf_task_fd_query_copy(const union bpf_attr *attr, const char *buf, u64 probe_offset, u64 probe_addr) { - char __user *ubuf = u64_to_user_ptr(attr->task_fd_query.buf); + char __user *ubuf = (char __user *)attr->task_fd_query.buf; u32 len = buf ? strlen(buf) : 0, input_len; int err = 0;
@@ -5399,19 +5409,25 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf case BPF_MAP_DELETE_ELEM: case BPF_MAP_LOOKUP_AND_DELETE_ELEM: dest->map_fd = cattr->map_fd; - dest->key = cattr->key; - dest->value = cattr->value; - /* u64 next_key is in a union with u64 value */ + dest->key = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->key); + dest->value = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->value); + /* next_key is in a union with value */ dest->flags = cattr->flags; break; case BPF_MAP_LOOKUP_BATCH: case BPF_MAP_LOOKUP_AND_DELETE_BATCH: case BPF_MAP_UPDATE_BATCH: case BPF_MAP_DELETE_BATCH: - dest->batch.in_batch = cattr->batch.in_batch; - dest->batch.out_batch = cattr->batch.out_batch; - dest->batch.keys = cattr->batch.keys; - dest->batch.values = cattr->batch.values; + dest->batch.in_batch = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->batch.in_batch); + dest->batch.out_batch = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->batch.out_batch); + dest->batch.keys = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->batch.keys); + dest->batch.values = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->batch.values); dest->batch.count = cattr->batch.count; dest->batch.map_fd = cattr->batch.map_fd; dest->batch.elem_flags = cattr->batch.elem_flags; @@ -5420,11 +5436,14 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf case BPF_PROG_LOAD: dest->prog_type = cattr->prog_type; dest->insn_cnt = cattr->insn_cnt; - dest->insns = cattr->insns; - dest->license = cattr->license; + dest->insns = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->insns); + dest->license = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->license); dest->log_level = cattr->log_level; dest->log_size = cattr->log_size; - dest->log_buf = cattr->log_buf; + dest->log_buf = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->log_buf); dest->kern_version = cattr->kern_version; dest->prog_flags = cattr->prog_flags; strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN); @@ -5432,22 +5451,27 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf dest->expected_attach_type = cattr->expected_attach_type; dest->prog_btf_fd = cattr->prog_btf_fd; dest->func_info_rec_size = cattr->func_info_rec_size; - dest->func_info = cattr->func_info; + dest->func_info = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->func_info); dest->func_info_cnt = cattr->func_info_cnt; dest->line_info_rec_size = cattr->line_info_rec_size; - dest->line_info = cattr->line_info; + dest->line_info = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->line_info); dest->line_info_cnt = cattr->line_info_cnt; dest->attach_btf_id = cattr->attach_btf_id; dest->attach_prog_fd = cattr->attach_prog_fd; /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */ dest->core_relo_cnt = cattr->core_relo_cnt; - dest->fd_array = cattr->fd_array; - dest->core_relos = cattr->core_relos; + dest->fd_array = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->fd_array); + dest->core_relos = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->core_relos); dest->core_relo_rec_size = cattr->core_relo_rec_size; break; case BPF_OBJ_PIN: case BPF_OBJ_GET: - dest->pathname = cattr->pathname; + dest->pathname = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->pathname); dest->bpf_fd = cattr->bpf_fd; dest->file_flags = cattr->file_flags; break; @@ -5464,14 +5488,18 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf dest->test.retval = cattr->test.retval; dest->test.data_size_in = cattr->test.data_size_in; dest->test.data_size_out = cattr->test.data_size_out; - dest->test.data_in = cattr->test.data_in; - dest->test.data_out = cattr->test.data_out; + dest->test.data_in = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->test.data_in); + dest->test.data_out = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->test.data_out); dest->test.repeat = cattr->test.repeat; dest->test.duration = cattr->test.duration; dest->test.ctx_size_in = cattr->test.ctx_size_in; dest->test.ctx_size_out = cattr->test.ctx_size_out; - dest->test.ctx_in = cattr->test.ctx_in; - dest->test.ctx_out = cattr->test.ctx_out; + dest->test.ctx_in = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->test.ctx_in); + dest->test.ctx_out = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->test.ctx_out); dest->test.flags = cattr->test.flags; dest->test.cpu = cattr->test.cpu; dest->test.batch_size = cattr->test.batch_size; @@ -5493,7 +5521,8 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf case BPF_OBJ_GET_INFO_BY_FD: dest->info.bpf_fd = cattr->info.bpf_fd; dest->info.info_len = cattr->info.info_len; - dest->info.info = cattr->info.info; + dest->info.info = + (__kernel_aligned_uintptr_t)compat_ptr(cattr->info.info); break; case BPF_PROG_QUERY: dest->query.target_fd = cattr->query.target_fd; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 264b3dc714cc..d513cd5f6cb8 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -15213,7 +15213,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) * and supplied buffer to store the verification trace */ log->level = attr->log_level; - log->ubuf = (char __user *) (unsigned long) attr->log_buf; + log->ubuf = (char __user *)attr->log_buf; log->len_total = attr->log_size;
/* log attributes have to be sane */ diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 1ed08967fb97..526c8ae45f5d 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2686,8 +2686,8 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr if (flags & ~BPF_F_KPROBE_MULTI_RETURN) return -EINVAL;
- uaddrs = u64_to_user_ptr(attr->link_create.kprobe_multi.addrs); - usyms = u64_to_user_ptr(attr->link_create.kprobe_multi.syms); + uaddrs = (void __user *)attr->link_create.kprobe_multi.addrs; + usyms = (void __user *)attr->link_create.kprobe_multi.syms; if (!!uaddrs == !!usyms) return -EINVAL;
@@ -2700,7 +2700,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr if (!addrs) return -ENOMEM;
- ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies); + ucookies = (void __user *)attr->link_create.kprobe_multi.cookies; if (ucookies) { cookies = kvmalloc_array(cnt, sizeof(*addrs), GFP_KERNEL); if (!cookies) { diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 58a5d02c5a8e..92ded1c16d93 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -416,7 +416,7 @@ static int bpf_test_finish(const union bpf_attr *kattr, struct skb_shared_info *sinfo, u32 size, u32 retval, u32 duration) { - void __user *data_out = u64_to_user_ptr(kattr->test.data_out); + void __user *data_out = (void __user *)kattr->test.data_out; int err = -EFAULT; u32 copy_size = size;
@@ -766,7 +766,7 @@ BTF_SET8_END(test_sk_check_kfunc_ids) static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, u32 size, u32 headroom, u32 tailroom) { - void __user *data_in = u64_to_user_ptr(kattr->test.data_in); + void __user *data_in = (void __user *)kattr->test.data_in; void *data;
if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom) @@ -852,7 +852,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { - void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in); + void __user *ctx_in = (void __user *)kattr->test.ctx_in; __u32 ctx_size_in = kattr->test.ctx_size_in; struct bpf_raw_tp_test_run_info info; int cpu = kattr->test.cpu, err = 0; @@ -907,8 +907,8 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
static void *bpf_ctx_init(const union bpf_attr *kattr, u32 max_size) { - void __user *data_in = u64_to_user_ptr(kattr->test.ctx_in); - void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out); + void __user *data_in = (void __user *)kattr->test.ctx_in; + void __user *data_out = (void __user *)kattr->test.ctx_out; u32 size = kattr->test.ctx_size_in; void *data; int err; @@ -940,7 +940,7 @@ static int bpf_ctx_finish(const union bpf_attr *kattr, union bpf_attr __user *uattr, const void *data, u32 size) { - void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out); + void __user *data_out = (void __user *)kattr->test.ctx_out; int err = -EFAULT; u32 copy_size = size;
@@ -1347,7 +1347,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, goto free_data;
if (unlikely(kattr->test.data_size_in > size)) { - void __user *data_in = u64_to_user_ptr(kattr->test.data_in); + void __user *data_in = (void __user *)kattr->test.data_in;
while (size < kattr->test.data_size_in) { struct page *page; @@ -1605,7 +1605,7 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { - void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in); + void __user *ctx_in = (void __user *)kattr->test.ctx_in; __u32 ctx_size_in = kattr->test.ctx_size_in; void *ctx = NULL; u32 retval; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index e476159a53c7..718f3621ad9f 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1487,7 +1487,7 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, int sock_map_bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; u32 prog_cnt = 0, flags = 0, ufd = attr->target_fd; struct bpf_prog **pprog; struct bpf_prog *prog;
On 08/06/2023 10:42, Zachary Leaf wrote:
[...]
@@ -4241,20 +4251,20 @@ static int bpf_prog_get_info_by_fd(struct file *file, ulen = info.nr_prog_tags; info.nr_prog_tags = prog->aux->func_cnt ? : 1; if (ulen) {
__u8 __user (*user_prog_tags)[BPF_TAG_SIZE];
u32 i;__u8 __user *user_prog_tags;
user_prog_tags = u64_to_user_ptr(info.prog_tags);
ulen = min_t(u32, info.nr_prog_tags, ulen); if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) {user_prog_tags = (__u8 __user *)info.prog_tags;
if (copy_to_user(user_prog_tags[i],
if (copy_to_user(user_prog_tags+i,
Nit: &user_prog_tags[i] would be slightly more idiomatic (and closer to the original code). Similarly &user_prog_tags[0] could be used below.
Kevin
prog->aux->func[i]->tag, BPF_TAG_SIZE)) return -EFAULT; } } else {
if (copy_to_user(user_prog_tags[0],
}if (copy_to_user(user_prog_tags, prog->tag, BPF_TAG_SIZE)) return -EFAULT;
On 08/06/2023 10:42, Zachary Leaf wrote:
Hi,
After getting side tracked by eBPF libraries/tools (libbpf/bpftool) and kselftest cross-compilation, here's the core kernel changes following on from the RFC[1] posted late last year.
The bpf syscall is updated to propagate user pointers as capabilities in the pure-capability kernel-user ABI (PCuABI). It also includes an approach to support the existing aarch64 ABI as a compatibility layer (compat64).
One complication here is from the fact this syscall supports many multiplexed sub-commands, some of which are themselves multiplexed with a number of nested multiplexed options.
A further complication is that the existing syscall uses a trick of storing user pointers as u64 to avoid needing a compat handler for 32-bit systems (see patch 3). To retain compatibility with the aarch64 ABI and add Morello support, a compat layer is added here only for the compat64 case, guarded by #ifdef CONFIG_COMPAT64. Normal compat32 operation is therefore unchanged.
Compared to the original RFC, inbound (userspace->kernel) conversion between compat64/native struct layouts is now handled upfront. This minimises changes to subcommand handlers. Some subcommands require conversion back out to userspace and that is by necessity handled where it occurs.
Patch 1 is not essential to this series but it's a nice debug feature to have and works[2]. It enables BPF_PROG_TYPE_TRACEPOINT which many eBPF kselftests use.
Patch 2 is required setup for the rest of the patches.
Patches 3-8 implement the core compat64 handling. Each commit compiles cleanly but relevant parts will be broken inbetween. They're split mainly to make review here easier.
Patch 9 fixes a check to also check configs passed in via compat64.
Patch 10 finally enables capabilities in the kernel.
Nice work! Kudos for the commit messages (and that cover letter), I really appreciate the effort put into writing those.
I have mostly looked at the patches from a functional perspective, without going into the details of each subcommand / field at this stage (I don't expect major issues there anyway). Overall the approach looks good, I think the chosen trade-offs make sense. I've got a number of specific comments/questions as usual, inline in each patch.
Cheers, Kevin
Testing wise, see associated LTP changes below which will be posted to linux-morello-ltp mailing list. The eBPF LTP tests are fairly minimal and test only a small part of the changes here. There's a new test to test patch 9.
The kernel kselftests contain much more extensive eBPF tests. The kselftests have been used to test many parts of the compat64 handling but overall more work needs to be done here: a) enable cross-compilation for purecap as well as x86->aarch64 b) replace ptr_to_u64() with casts to uintptr_t in tests b) general libbpf/bpftool enablement and fixes since many tests rely on this c) CONFIG_DEBUG_INFO_BTF required for many tests but this requires the build system to have a recent version of pahole tool
Next steps once we have the core kernel support is porting libbpf and bpftool for purecap plus work on enabling kselftests as above.
Kernel branch available at: https://git.morello-project.org/zdleaf/linux/-/tree/morello/bpf
Associated LTP test/changes at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/tree/mor...
Thanks,
Zach
[1] [RFC PATCH 0/9] update bpf syscall for PCuABI/compat64 https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [2] [PATCH v3 0/5] Restore syscall tracing on Morello https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/...
Zachary Leaf (10): arm64: morello: enable syscall tracing bpf/net: copy ptrs from user with bpf/sockptr_t bpf: compat64: add handler and convert bpf_attr in bpf: compat64: bpf_attr convert out bpf: compat64: handle bpf_btf_info bpf: compat64: handle bpf_prog_info bpf: compat64: handle bpf_map_info bpf: compat64: handle bpf_link_info bpf: compat64: support CHECK_ATTR macro bpf: use user pointer types in uAPI structs
.../morello_transitional_pcuabi_defconfig | 2 +- arch/arm64/kernel/sys_compat64.c | 4 + drivers/media/rc/bpf-lirc.c | 7 +- include/linux/bpf_compat.h | 413 ++++++ include/linux/bpfptr.h | 18 +- include/linux/sockptr.h | 9 + include/uapi/linux/bpf.h | 94 +- kernel/bpf/bpf_iter.c | 2 +- kernel/bpf/btf.c | 97 +- kernel/bpf/cgroup.c | 10 +- kernel/bpf/hashtab.c | 13 +- kernel/bpf/net_namespace.c | 7 +- kernel/bpf/offload.c | 2 +- kernel/bpf/syscall.c | 1136 +++++++++++++---- kernel/bpf/verifier.c | 2 +- kernel/trace/bpf_trace.c | 6 +- net/bpf/bpf_dummy_struct_ops.c | 3 +- net/bpf/test_run.c | 32 +- net/core/sock_map.c | 7 +- 19 files changed, 1534 insertions(+), 330 deletions(-) create mode 100644 include/linux/bpf_compat.h
-- 2.34.1
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
linux-morello@op-lists.linaro.org