Hi,
Here's v2 of the bpf syscall patches, following on from the RFC[1] and v1[2].
The bpf syscall is updated to propagate user pointers as capabilities in the pure-capability kernel-user ABI (PCuABI). It also includes an approach to support the existing aarch64 ABI (compat64).
One complication here is from the fact this syscall supports many multiplexed sub-commands, some of which are themselves multiplexed with a number of nested multiplexed options.
A further complication is that the existing syscall uses a trick of storing user pointers as u64 to avoid needing a compat handler for 32-bit systems. To retain compatibility with the aarch64 ABI and add Morello support, special compat64 conversion and handling is implemented.
Inbound (userspace->kernel) conversion between compat64/native struct layouts is handled upfront on entry to the syscall (with the exception of bpf_xyz_info structs). This minimises changes to subcommand handlers. Some subcommands require conversion back out to userspace and that is by necessity handled where it occurs.
Patch 1 is not essential to this series but it's a nice debug feature to have and works[3]. It enables BPF_PROG_TYPE_TRACEPOINT which many eBPF kselftests use.
Patches 2-4, 9 are setup and helper functions.
Patches 5-7 implement the core compat64 handling. Each commit compiles cleanly but relevant parts will be broken inbetween.
Patch 8 fixes a check to also check configs passed in via compat64.
Patch 10 finally enables capabilities in the kernel.
Patches 11,12 handles uaccess that occurs in two eBPF helper functions.
Testing wise, see associated LTP changes below as posted to LTP mailing list[4]. The eBPF LTP tests are fairly minimal and test only a small part of the changes here. There's a new test to test CHECK_ATTR from patch 8.
The kernel kselftests contain much more extensive eBPF tests. They can be built fairly easily natively on aarch64 which is useful for testing compat64. More work needs to be done here though to: a) enable out-of-tree cross-compilation for purecap as well as x86->aarch64 b) replace ptr_to_u64() with casts to uintptr_t in tests c) general libbpf/bpftool enablement and fixes since many tests rely on this d) CONFIG_DEBUG_INFO_BTF required for many tests but this requires the build system to have a recent version of pahole tool
Next steps once we have the core kernel support would be porting libbpf and bpftool for purecap plus work on enabling kselftests as above.
Kernel branch available at: https://git.morello-project.org/zdleaf/linux/-/tree/morello/bpf_v2
Associated LTP test/changes at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/tree/mor...
Thanks,
Zach
[1] [RFC PATCH 0/9] update bpf syscall for PCuABI/compat64 https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [2] [PATCH 00/10] update bpf syscall for PCuABI/compat64 https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [3] [PATCH v3 0/5] Restore syscall tracing on Morello https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org/... [4] [PATCH 0/3] add eBPF support https://op-lists.linaro.org/archives/list/linux-morello-ltp@op-lists.linaro.... ----------------------------------------------------------------------- v2: - Fixed copy_from_bpfptr_offset_with_ptr - this no longer uses sockptr as of 6.1 (see patch 9) - Rebase on 6.4 - new struct members need conversion/handling: - New BPF_PROG_TYPE_NETFILTER + associated bpf_attr BPF_LINK_CREATE members - New bpf_link_info types BPF_LINK_TYPE_NETFILTER + BPF_LINK_TYPE_STRUCT_OPS - Renamed in_32bit_compat_syscall() to in_compat32_syscall() and added in_compat64_syscall() - Handled uaccess from bpf helper programs bpf/helpers.c:bpf_copy_from_user + bpf_copy_from_user_task - Added new stddef.h macro copy_field() to simplify conversion assignments
- bpf: compat64: add handler and convert bpf_attr in - Replaced #ifdef CONFIG_COMPAT64 with in_compat64_syscall() + use copy_bpf_attr_from_user() helper function for inbound compat conversion - This removes the __sys_compat_bpf compat64 handler, now using the existing __sys_bpf + in_compat64_syscall() for reduced diff size, less duplication and clearer code - Conversion now happens in copy_bpf_attr_from_user() for better encapsulation + stack usage
- bpf: compat64: bpf_attr convert out - Renamed PUT_USER_ATTR() to bpf_put_uattr() + moved to bpf.h - Introduced bpfptr_put_uattr() in bpfptr.h - Originally missed handling cases writing out to userspace with copy_to_bpfptr_offset() - now replaced with new bpfptr_put_uattr() macro - 6.4 now has a new field showing what log size will be needed - see 47a71c1f9af0 ("bpf: Add log_true_size output field to return necessary log buffer size") - This also requires new macro bpf_field_exists() to handle compat64 when checking that userspace supports this new field
- bpf: compat64: handle bpf_xyz_info - Simplified bpf_link_info conversion handling using memcpy - Removed bpf_map_info patch since struct is the same in compat64/native (no ptrs) - Replaced #ifdef CONFIG_COMPAT64 with in_compat64_syscall() + use copy_bpf_xyz_info_{from,to}_user() helper functions for compat conversions - Merged bpf_{btf,prog,link}_info into a single patch
- bpf: use user pointer types in uAPI structs - Added new compat_uptr_to_kern() macro to simplify casting/converting in compat ptrs - Usage of copy_{to,from}_user_with_ptr variants now correctly applied with new helpers -----------------------------------------------------------------------
Zachary Leaf (12): arm64: morello: enable syscall tracing arch: rename to in_compat32_syscall arch: add compat helpers specific to 64-bit stddef: introduce copy_field helper bpf: compat64: add handler and convert bpf_attr in bpf: compat64: bpf_attr convert out bpf: compat64: handle bpf_xyz_info bpf: compat64: support CHECK_ATTR macro bpf: copy_{to,from}_user_with_ptr helpers bpf: use user pointer types in uAPI structs bpf: use addr for bpf_copy_from_user_with_task bpf: use addr for bpf_copy_from_user
.../morello_transitional_pcuabi_defconfig | 3 +- arch/arm64/include/asm/compat.h | 5 + arch/sparc/include/asm/compat.h | 2 +- arch/x86/include/asm/compat.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 +- drivers/input/input.c | 2 +- drivers/media/rc/bpf-lirc.c | 7 +- fs/ext4/dir.c | 2 +- fs/nfs/dir.c | 2 +- include/linux/bpf.h | 13 + include/linux/bpf_compat.h | 423 ++++++++ include/linux/bpfptr.h | 27 +- include/linux/compat.h | 14 +- include/linux/stddef.h | 3 + include/uapi/linux/bpf.h | 94 +- kernel/bpf/bpf_iter.c | 2 +- kernel/bpf/btf.c | 100 +- kernel/bpf/cgroup.c | 10 +- kernel/bpf/hashtab.c | 13 +- kernel/bpf/helpers.c | 9 +- kernel/bpf/net_namespace.c | 7 +- kernel/bpf/offload.c | 2 +- kernel/bpf/syscall.c | 991 ++++++++++++++---- kernel/bpf/verifier.c | 21 +- kernel/time/time.c | 2 +- kernel/trace/bpf_trace.c | 6 +- net/bpf/bpf_dummy_struct_ops.c | 9 +- net/bpf/test_run.c | 32 +- net/core/sock_map.c | 7 +- 30 files changed, 1449 insertions(+), 365 deletions(-) create mode 100644 include/linux/bpf_compat.h
-- 2.34.1
Enable syscall tracing via ftrace. This also enables eBPF programs of type BPF_PROG_TYPE_TRACEPOINT to hook syscall traces and execute eBPF code.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- arch/arm64/configs/morello_transitional_pcuabi_defconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 0f692600a181..738e7c930d41 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -161,6 +161,7 @@ CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y # CONFIG_SCHED_DEBUG is not set -# CONFIG_FTRACE is not set +# CONFIG_DEBUG_PREEMPT is not set CONFIG_CORESIGHT=y CONFIG_MEMTEST=y +CONFIG_FTRACE_SYSCALLS=y
in_32bit_compat_syscall() + is_32bit_compat_task() were originally named to be consistent with is_32bit_compat_thread() from asm/compat.h in the commit ("arch: add compat helpers specific to 32-bit").
Defined in linux/compat.h, in_32bit_compat_syscall() doesn't necessarily have to follow the same pattern. in_compat32_syscall() is shorter, matches the config option CONFIG_COMPAT32, and is generally more consistent with the common name/way 32-bit compat is referred to.
Rename in_32bit_compat_syscall() to in_compat32_syscall().
Signed-off-by: Zachary Leaf zachary.leaf@arm.com Cc: Kristina Martsenko kristina.martsenko@arm.com --- arch/sparc/include/asm/compat.h | 2 +- arch/x86/include/asm/compat.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 +- drivers/input/input.c | 2 +- fs/ext4/dir.c | 2 +- fs/nfs/dir.c | 2 +- include/linux/compat.h | 8 ++++---- kernel/time/time.c | 2 +- 9 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h index 1d2e5ad1c894..3816422c817a 100644 --- a/arch/sparc/include/asm/compat.h +++ b/arch/sparc/include/asm/compat.h @@ -156,7 +156,7 @@ static inline bool in_compat_syscall(void) return pt_regs_trap_type(current_pt_regs()) == 0x110; } #define in_compat_syscall in_compat_syscall -#define in_32bit_compat_syscall in_compat_syscall +#define in_compat32_syscall in_compat_syscall #endif
#endif /* _ASM_SPARC64_COMPAT_H */ diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h index 2f7facbb592c..5dcb9afbcb9c 100644 --- a/arch/x86/include/asm/compat.h +++ b/arch/x86/include/asm/compat.h @@ -100,7 +100,7 @@ static inline bool in_compat_syscall(void) return in_32bit_syscall(); } #define in_compat_syscall in_compat_syscall /* override the generic impl */ -#define in_32bit_compat_syscall in_compat_syscall +#define in_compat32_syscall in_compat_syscall #define compat_need_64bit_alignment_fixup in_ia32_syscall #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 57bbf4a4ee2d..d1fd4eec97f7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -132,7 +132,7 @@ static int kfd_open(struct inode *inode, struct file *filep) if (iminor(inode) != 0) return -ENODEV;
- is_32bit_user_mode = in_32bit_compat_syscall(); + is_32bit_user_mode = in_compat32_syscall();
if (is_32bit_user_mode) { dev_warn(kfd_device, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 5a037594450d..210bb5c9e876 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1427,7 +1427,7 @@ static struct kfd_process *create_process(const struct task_struct *thread) err = kfd_event_init_process(process); if (err) goto err_event_init; - process->is_32bit_user_mode = in_32bit_compat_syscall(); + process->is_32bit_user_mode = in_compat32_syscall();
process->pasid = kfd_pasid_alloc(); if (process->pasid == 0) { diff --git a/drivers/input/input.c b/drivers/input/input.c index b74f7b536ad6..13985b40772a 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -1074,7 +1074,7 @@ static int input_bits_to_string(char *buf, int buf_size, { int len = 0;
- if (in_32bit_compat_syscall()) { + if (in_compat32_syscall()) { u32 dword = bits >> 32; if (dword || !skip_empty) len += snprintf(buf, buf_size, "%x ", dword); diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 87b105a00cd0..cfa1b4e54209 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -315,7 +315,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx) static inline int is_32bit_api(void) { #ifdef CONFIG_COMPAT - return in_32bit_compat_syscall(); + return in_compat32_syscall(); #else return (BITS_PER_LONG == 32); #endif diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 8689d5a06078..3649258ff9ab 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -460,7 +460,7 @@ static inline int is_32bit_api(void) { #ifdef CONFIG_COMPAT - return in_32bit_compat_syscall(); + return in_compat32_syscall(); #else return (BITS_PER_LONG == 32); #endif diff --git a/include/linux/compat.h b/include/linux/compat.h index d72881d525ac..f6979417c673 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -1020,8 +1020,8 @@ int kcompat_sys_fstatfs64(unsigned int fd, compat_size_t sz, #ifndef in_compat_syscall static inline bool in_compat_syscall(void) { return is_compat_task(); } #endif -#ifndef in_32bit_compat_syscall -static inline bool in_32bit_compat_syscall(void) { return is_32bit_compat_task(); } +#ifndef in_compat32_syscall +static inline bool in_compat32_syscall(void) { return is_32bit_compat_task(); } #endif
#else /* !CONFIG_COMPAT */ @@ -1031,8 +1031,8 @@ static inline bool in_32bit_compat_syscall(void) { return is_32bit_compat_task() /* Ensure no one redefines in_compat_syscall() under !CONFIG_COMPAT */ #define in_compat_syscall in_compat_syscall static inline bool in_compat_syscall(void) { return false; } -#define in_32bit_compat_syscall in_32bit_compat_syscall -static inline bool in_32bit_compat_syscall(void) { return false; } +#define in_compat32_syscall in_compat32_syscall +static inline bool in_compat32_syscall(void) { return false; }
#endif /* CONFIG_COMPAT */
diff --git a/kernel/time/time.c b/kernel/time/time.c index 3d9be77e0425..1e2133b201cc 100644 --- a/kernel/time/time.c +++ b/kernel/time/time.c @@ -791,7 +791,7 @@ int get_timespec64(struct timespec64 *ts, ts->tv_sec = kts.tv_sec;
/* Zero out the padding in compat mode */ - if (in_32bit_compat_syscall()) + if (in_compat32_syscall()) kts.tv_nsec &= 0xFFFFFFFFUL;
/* In 32-bit mode, this drops the padding */
On 12/09/2023 11:51, Zachary Leaf wrote:
in_32bit_compat_syscall() + is_32bit_compat_task() were originally named to be consistent with is_32bit_compat_thread() from asm/compat.h in the commit ("arch: add compat helpers specific to 32-bit").
AFAICT it is the other way round: first the generic helpers were introduced, then the arm64-specific is_compat_thread() was renamed [1]. As long as Kristina doesn't see an issue with the renaming, neither do I, but I would prefer the renaming to be complete (all compat32 helpers including the arm64 ones).
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/commit/a53b230b237d3c...
On 15/09/2023 15:44, Kevin Brodsky wrote:
On 12/09/2023 11:51, Zachary Leaf wrote:
in_32bit_compat_syscall() + is_32bit_compat_task() were originally named to be consistent with is_32bit_compat_thread() from asm/compat.h in the commit ("arch: add compat helpers specific to 32-bit").
AFAICT it is the other way round: first the generic helpers were introduced, then the arm64-specific is_compat_thread() was renamed [1]. As long as Kristina doesn't see an issue with the renaming, neither do I, but I would prefer the renaming to be complete (all compat32 helpers including the arm64 ones).
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/commit/a53b230b237d3c...
Indeed, I think they were named to be consistent with is_32bit_task(), not is_32bit_compat_thread().
I don't see any issue with renaming them, looks good to me. (And I would also have a slight preference for renaming all of them. is_32bit_task() is a bit different anyway (it's not compat).)
Thanks, Kristina
On 15/09/2023 16:02, Kristina Martsenko wrote:
On 15/09/2023 15:44, Kevin Brodsky wrote:
On 12/09/2023 11:51, Zachary Leaf wrote:
in_32bit_compat_syscall() + is_32bit_compat_task() were originally named to be consistent with is_32bit_compat_thread() from asm/compat.h in the commit ("arch: add compat helpers specific to 32-bit").
AFAICT it is the other way round: first the generic helpers were introduced, then the arm64-specific is_compat_thread() was renamed [1]. As long as Kristina doesn't see an issue with the renaming, neither do I, but I would prefer the renaming to be complete (all compat32 helpers including the arm64 ones).
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/commit/a53b230b237d3c...
Indeed, I think they were named to be consistent with is_32bit_task(), not is_32bit_compat_thread().
I don't see any issue with renaming them, looks good to me. (And I would also have a slight preference for renaming all of them. is_32bit_task() is a bit different anyway (it's not compat).)
Oops yep - got the history a bit wrong.
Sounds good though. Thanks both.
Zach
Thanks, Kristina
Similar to ("arch: add compat helpers specific to 32-bit"), add 64 bit variants to determine if we're in a COMPAT64 syscall.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- arch/arm64/include/asm/compat.h | 5 +++++ include/linux/compat.h | 6 ++++++ 2 files changed, 11 insertions(+)
diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h index 40cd9f48eb53..16a0a3ff78c9 100644 --- a/arch/arm64/include/asm/compat.h +++ b/arch/arm64/include/asm/compat.h @@ -112,6 +112,11 @@ static inline int is_32bit_compat_task(void) return IS_ENABLED(CONFIG_COMPAT32) && test_thread_flag(TIF_32BIT); }
+static inline bool is_64bit_compat_task(void) +{ + return IS_ENABLED(CONFIG_COMPAT64) && test_thread_flag(TIF_64BIT_COMPAT); +} + static inline int is_compat_task(void) { return (IS_ENABLED(CONFIG_COMPAT32) && test_thread_flag(TIF_32BIT)) || diff --git a/include/linux/compat.h b/include/linux/compat.h index f6979417c673..54dd8a8a16bc 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -1023,16 +1023,22 @@ static inline bool in_compat_syscall(void) { return is_compat_task(); } #ifndef in_compat32_syscall static inline bool in_compat32_syscall(void) { return is_32bit_compat_task(); } #endif +#ifndef in_compat64_syscall +static inline bool in_compat64_syscall(void) { return is_64bit_compat_task(); } +#endif
#else /* !CONFIG_COMPAT */
#define is_compat_task() (0) #define is_32bit_compat_task() (0) +#define is_64bit_compat_task() (0) /* Ensure no one redefines in_compat_syscall() under !CONFIG_COMPAT */ #define in_compat_syscall in_compat_syscall static inline bool in_compat_syscall(void) { return false; } #define in_compat32_syscall in_compat32_syscall static inline bool in_compat32_syscall(void) { return false; } +#define in_compat64_syscall in_compat64_syscall +static inline bool in_compat64_syscall(void) { return false; }
#endif /* CONFIG_COMPAT */
Introduce a helper to shorten lines when copying fields from one struct to another, where both have identical fields.
This is useful in compat conversion functions, where structs have identical fields, but different offsets.
e.g. dest->task_fd_query.probe_offset = src->task_fd_query.probe_offset
is replaced by: copy_field(dest, src, task_fd_query.probe_offset)
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/stddef.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/linux/stddef.h b/include/linux/stddef.h index 929d67710cc5..d81504dbc632 100644 --- a/include/linux/stddef.h +++ b/include/linux/stddef.h @@ -12,6 +12,9 @@ enum { true = 1 };
+#define copy_field(DEST, SRC, FIELD) \ + ((DEST)->FIELD = (SRC)->FIELD) + #undef offsetof #define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
The bpf syscall does not require a compat handler for 32-bit compat. This is achieved by using u64 instead of pointer types in the bpf_attr union used to pass arguments to the syscall. This means that in a system where pointers are 32-bit, the struct/union layouts and offsets are the same as in a 64-bit arch, since the u64 field is split into two u32 fields/registers.
This greatly simplifies 32-bit compat at the small cost of requiring casting pointers passed in through the uAPI to u64 (generally via ptr_to_u64() helper functions).
This poses a problem in architectures where user pointers are longer than 64b such as Morello/PCuABI where pointers are represented as 129b capabilities. In order to extend the bpf syscall interface to accept capabilities and still retain compatibility with the existing 64/32b ABI, 64-bit compat handling and appropriate conversions must be added to handle the different union/struct sizes caused by this pointer size mis-match.
Before extending the number of bits in union bpf_attr to accept capabilities, set the groundwork for handling compat64. When in_compat64_syscall(), take the compat64 sized bpf_attr and convert it to what will be the new native offsets.
Inbound conversion is handled upfront to minimise impact on existing code and reduce overall diff size. After dispatch_bpf the majority of code can remain unchanged. The cases where conversion back out to userspace is required are handled in subsequent commits.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpf_compat.h | 300 ++++++++++++++++++++++++++ kernel/bpf/syscall.c | 427 +++++++++++++++++++++++++++++++------ 2 files changed, 663 insertions(+), 64 deletions(-) create mode 100644 include/linux/bpf_compat.h
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h new file mode 100644 index 000000000000..85e0198bede7 --- /dev/null +++ b/include/linux/bpf_compat.h @@ -0,0 +1,300 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2023 Arm Ltd */ + +union compat_bpf_attr { + struct { /* anonymous struct used by BPF_MAP_CREATE command */ + __u32 map_type; /* one of enum bpf_map_type */ + __u32 key_size; /* size of key in bytes */ + __u32 value_size; /* size of value in bytes */ + __u32 max_entries; /* max number of entries in a map */ + __u32 map_flags; /* BPF_MAP_CREATE related + * flags defined above. + */ + __u32 inner_map_fd; /* fd pointing to the inner map */ + __u32 numa_node; /* numa node (effective only if + * BPF_F_NUMA_NODE is set). + */ + char map_name[BPF_OBJ_NAME_LEN]; + __u32 map_ifindex; /* ifindex of netdev to create on */ + __u32 btf_fd; /* fd pointing to a BTF type data */ + __u32 btf_key_type_id; /* BTF type_id of the key */ + __u32 btf_value_type_id; /* BTF type_id of the value */ + __u32 btf_vmlinux_value_type_id;/* BTF type_id of a kernel- + * struct stored as the + * map value + */ + /* Any per-map-type extra fields + * + * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the + * number of hash functions (if 0, the bloom filter will default + * to using 5 hash functions). + */ + __u64 map_extra; + }; + + struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ + __u32 map_fd; + __aligned_u64 key; + union { + __aligned_u64 value; + __aligned_u64 next_key; + }; + __u64 flags; + }; + + struct { /* struct used by BPF_MAP_*_BATCH commands */ + __aligned_u64 in_batch; /* start batch, + * NULL to start from beginning + */ + __aligned_u64 out_batch; /* output: next start batch */ + __aligned_u64 keys; + __aligned_u64 values; + __u32 count; /* input/output: + * input: # of key/value + * elements + * output: # of filled elements + */ + __u32 map_fd; + __u64 elem_flags; + __u64 flags; + } batch; + + struct { /* anonymous struct used by BPF_PROG_LOAD command */ + __u32 prog_type; /* one of enum bpf_prog_type */ + __u32 insn_cnt; + __aligned_u64 insns; + __aligned_u64 license; + __u32 log_level; /* verbosity level of verifier */ + __u32 log_size; /* size of user buffer */ + __aligned_u64 log_buf; /* user supplied buffer */ + __u32 kern_version; /* not used */ + __u32 prog_flags; + char prog_name[BPF_OBJ_NAME_LEN]; + __u32 prog_ifindex; /* ifindex of netdev to prep for */ + /* For some prog types expected attach type must be known at + * load time to verify attach type specific parts of prog + * (context accesses, allowed helpers, etc). + */ + __u32 expected_attach_type; + __u32 prog_btf_fd; /* fd pointing to BTF type data */ + __u32 func_info_rec_size; /* userspace bpf_func_info size */ + __aligned_u64 func_info; /* func info */ + __u32 func_info_cnt; /* number of bpf_func_info records */ + __u32 line_info_rec_size; /* userspace bpf_line_info size */ + __aligned_u64 line_info; /* line info */ + __u32 line_info_cnt; /* number of bpf_line_info records */ + __u32 attach_btf_id; /* in-kernel BTF type id to attach to */ + union { + /* valid prog_fd to attach to bpf prog */ + __u32 attach_prog_fd; + /* or valid module BTF object fd or 0 to attach to vmlinux */ + __u32 attach_btf_obj_fd; + }; + __u32 core_relo_cnt; /* number of bpf_core_relo */ + __aligned_u64 fd_array; /* array of FDs */ + __aligned_u64 core_relos; + __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ + /* output: actual total log contents size (including termintaing zero). + * It could be both larger than original log_size (if log was + * truncated), or smaller (if log buffer wasn't filled completely). + */ + __u32 log_true_size; + }; + + struct { /* anonymous struct used by BPF_OBJ_* commands */ + __aligned_u64 pathname; + __u32 bpf_fd; + __u32 file_flags; + }; + + struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ + __u32 target_fd; /* container object to attach to */ + __u32 attach_bpf_fd; /* eBPF program to attach */ + __u32 attach_type; + __u32 attach_flags; + __u32 replace_bpf_fd; /* previously attached eBPF + * program to replace if + * BPF_F_REPLACE is used + */ + }; + + struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ + __u32 prog_fd; + __u32 retval; + __u32 data_size_in; /* input: len of data_in */ + __u32 data_size_out; /* input/output: len of data_out + * returns ENOSPC if data_out + * is too small. + */ + __aligned_u64 data_in; + __aligned_u64 data_out; + __u32 repeat; + __u32 duration; + __u32 ctx_size_in; /* input: len of ctx_in */ + __u32 ctx_size_out; /* input/output: len of ctx_out + * returns ENOSPC if ctx_out + * is too small. + */ + __aligned_u64 ctx_in; + __aligned_u64 ctx_out; + __u32 flags; + __u32 cpu; + __u32 batch_size; + } test; + + struct { /* anonymous struct used by BPF_*_GET_*_ID */ + union { + __u32 start_id; + __u32 prog_id; + __u32 map_id; + __u32 btf_id; + __u32 link_id; + }; + __u32 next_id; + __u32 open_flags; + }; + + struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ + __u32 bpf_fd; + __u32 info_len; + __aligned_u64 info; + } info; + + struct { /* anonymous struct used by BPF_PROG_QUERY command */ + __u32 target_fd; /* container object to query */ + __u32 attach_type; + __u32 query_flags; + __u32 attach_flags; + __aligned_u64 prog_ids; + __u32 prog_cnt; + /* output: per-program attach_flags. + * not allowed to be set during effective query. + */ + __aligned_u64 prog_attach_flags; + } query; + + struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ + __u64 name; + __u32 prog_fd; + } raw_tracepoint; + + struct { /* anonymous struct for BPF_BTF_LOAD */ + __aligned_u64 btf; + __aligned_u64 btf_log_buf; + __u32 btf_size; + __u32 btf_log_size; + __u32 btf_log_level; + /* output: actual total log contents size (including termintaing zero). + * It could be both larger than original log_size (if log was + * truncated), or smaller (if log buffer wasn't filled completely). + */ + __u32 btf_log_true_size; + }; + + struct { + __u32 pid; /* input: pid */ + __u32 fd; /* input: fd */ + __u32 flags; /* input: flags */ + __u32 buf_len; /* input/output: buf len */ + __aligned_u64 buf; /* input/output: + * tp_name for tracepoint + * symbol for kprobe + * filename for uprobe + */ + __u32 prog_id; /* output: prod_id */ + __u32 fd_type; /* output: BPF_FD_TYPE_* */ + __u64 probe_offset; /* output: probe_offset */ + __u64 probe_addr; /* output: probe_addr */ + } task_fd_query; + + struct { /* struct used by BPF_LINK_CREATE command */ + union { + __u32 prog_fd; /* eBPF program to attach */ + __u32 map_fd; /* struct_ops to attach */ + }; + union { + __u32 target_fd; /* object to attach to */ + __u32 target_ifindex; /* target ifindex */ + }; + __u32 attach_type; /* attach type */ + __u32 flags; /* extra flags */ + union { + __u32 target_btf_id; /* btf_id of target to attach to */ + struct { + __aligned_u64 iter_info; /* extra bpf_iter_link_info */ + __u32 iter_info_len; /* iter_info length */ + }; + struct { + /* black box user-provided value passed through + * to BPF program at the execution time and + * accessible through bpf_get_attach_cookie() BPF helper + */ + __u64 bpf_cookie; + } perf_event; + struct { + __u32 flags; + __u32 cnt; + __aligned_u64 syms; + __aligned_u64 addrs; + __aligned_u64 cookies; + } kprobe_multi; + struct { + /* this is overlaid with the target_btf_id above. */ + __u32 target_btf_id; + /* black box user-provided value passed through + * to BPF program at the execution time and + * accessible through bpf_get_attach_cookie() BPF helper + */ + __u64 cookie; + } tracing; + struct { + __u32 pf; + __u32 hooknum; + __s32 priority; + __u32 flags; + } netfilter; + }; + } link_create; + + struct { /* struct used by BPF_LINK_UPDATE command */ + __u32 link_fd; /* link fd */ + union { + /* new program fd to update link with */ + __u32 new_prog_fd; + /* new struct_ops map fd to update link with */ + __u32 new_map_fd; + }; + __u32 flags; /* extra flags */ + union { + /* expected link's program fd; is specified only if + * BPF_F_REPLACE flag is set in flags. + */ + __u32 old_prog_fd; + /* expected link's map fd; is specified only + * if BPF_F_REPLACE flag is set. + */ + __u32 old_map_fd; + }; + } link_update; + + struct { + __u32 link_fd; + } link_detach; + + struct { /* struct used by BPF_ENABLE_STATS command */ + __u32 type; + } enable_stats; + + struct { /* struct used by BPF_ITER_CREATE command */ + __u32 link_fd; + __u32 flags; + } iter_create; + + struct { /* struct used by BPF_PROG_BIND_MAP command */ + __u32 prog_fd; + __u32 map_fd; + __u32 flags; /* extra flags */ + } prog_bind_map; + +} __attribute__((aligned(8))); + diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f1c8733f76b8..867c51d45a83 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3,6 +3,7 @@ */ #include <linux/bpf.h> #include <linux/bpf-cgroup.h> +#include <linux/bpf_compat.h> #include <linux/bpf_trace.h> #include <linux/bpf_lirc.h> #include <linux/bpf_verifier.h> @@ -11,6 +12,7 @@ #include <linux/syscalls.h> #include <linux/slab.h> #include <linux/sched/signal.h> +#include <linux/stddef.h> #include <linux/vmalloc.h> #include <linux/mmzone.h> #include <linux/anon_inodes.h> @@ -5015,153 +5017,128 @@ static int bpf_prog_bind_map(union bpf_attr *attr) return ret; }
-static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +static int dispatch_bpf(int cmd, union bpf_attr *attr, bpfptr_t uattr, + unsigned int size) { - union bpf_attr attr; - bool capable; int err;
- capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled; - - /* Intent here is for unprivileged_bpf_disabled to block key object - * creation commands for unprivileged users; other actions depend - * of fd availability and access to bpffs, so are dependent on - * object creation success. Capabilities are later verified for - * operations such as load and map create, so even with unprivileged - * BPF disabled, capability checks are still carried out for these - * and other operations. - */ - if (!capable && - (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD)) - return -EPERM; - - err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); - if (err) - return err; - size = min_t(u32, size, sizeof(attr)); - - /* copy attributes from user space, may be less than sizeof(bpf_attr) */ - memset(&attr, 0, sizeof(attr)); - if (copy_from_bpfptr(&attr, uattr, size) != 0) - return -EFAULT; - - err = security_bpf(cmd, &attr, size); + err = security_bpf(cmd, attr, size); if (err < 0) return err;
switch (cmd) { case BPF_MAP_CREATE: - err = map_create(&attr); + err = map_create(attr); break; case BPF_MAP_LOOKUP_ELEM: - err = map_lookup_elem(&attr); + err = map_lookup_elem(attr); break; case BPF_MAP_UPDATE_ELEM: - err = map_update_elem(&attr, uattr); + err = map_update_elem(attr, uattr); break; case BPF_MAP_DELETE_ELEM: - err = map_delete_elem(&attr, uattr); + err = map_delete_elem(attr, uattr); break; case BPF_MAP_GET_NEXT_KEY: - err = map_get_next_key(&attr); + err = map_get_next_key(attr); break; case BPF_MAP_FREEZE: - err = map_freeze(&attr); + err = map_freeze(attr); break; case BPF_PROG_LOAD: - err = bpf_prog_load(&attr, uattr, size); + err = bpf_prog_load(attr, uattr, size); break; case BPF_OBJ_PIN: - err = bpf_obj_pin(&attr); + err = bpf_obj_pin(attr); break; case BPF_OBJ_GET: - err = bpf_obj_get(&attr); + err = bpf_obj_get(attr); break; case BPF_PROG_ATTACH: - err = bpf_prog_attach(&attr); + err = bpf_prog_attach(attr); break; case BPF_PROG_DETACH: - err = bpf_prog_detach(&attr); + err = bpf_prog_detach(attr); break; case BPF_PROG_QUERY: - err = bpf_prog_query(&attr, uattr.user); + err = bpf_prog_query(attr, uattr.user); break; case BPF_PROG_TEST_RUN: - err = bpf_prog_test_run(&attr, uattr.user); + err = bpf_prog_test_run(attr, uattr.user); break; case BPF_PROG_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &prog_idr, &prog_idr_lock); break; case BPF_MAP_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &map_idr, &map_idr_lock); break; case BPF_BTF_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &btf_idr, &btf_idr_lock); break; case BPF_PROG_GET_FD_BY_ID: - err = bpf_prog_get_fd_by_id(&attr); + err = bpf_prog_get_fd_by_id(attr); break; case BPF_MAP_GET_FD_BY_ID: - err = bpf_map_get_fd_by_id(&attr); + err = bpf_map_get_fd_by_id(attr); break; case BPF_OBJ_GET_INFO_BY_FD: - err = bpf_obj_get_info_by_fd(&attr, uattr.user); + err = bpf_obj_get_info_by_fd(attr, uattr.user); break; case BPF_RAW_TRACEPOINT_OPEN: - err = bpf_raw_tracepoint_open(&attr); + err = bpf_raw_tracepoint_open(attr); break; case BPF_BTF_LOAD: - err = bpf_btf_load(&attr, uattr, size); + err = bpf_btf_load(attr, uattr, size); break; case BPF_BTF_GET_FD_BY_ID: - err = bpf_btf_get_fd_by_id(&attr); + err = bpf_btf_get_fd_by_id(attr); break; case BPF_TASK_FD_QUERY: - err = bpf_task_fd_query(&attr, uattr.user); + err = bpf_task_fd_query(attr, uattr.user); break; case BPF_MAP_LOOKUP_AND_DELETE_ELEM: - err = map_lookup_and_delete_elem(&attr); + err = map_lookup_and_delete_elem(attr); break; case BPF_MAP_LOOKUP_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_LOOKUP_BATCH); + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_LOOKUP_BATCH); break; case BPF_MAP_LOOKUP_AND_DELETE_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_LOOKUP_AND_DELETE_BATCH); break; case BPF_MAP_UPDATE_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_UPDATE_BATCH); + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_UPDATE_BATCH); break; case BPF_MAP_DELETE_BATCH: - err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_DELETE_BATCH); + err = bpf_map_do_batch(attr, uattr.user, BPF_MAP_DELETE_BATCH); break; case BPF_LINK_CREATE: - err = link_create(&attr, uattr); + err = link_create(attr, uattr); break; case BPF_LINK_UPDATE: - err = link_update(&attr); + err = link_update(attr); break; case BPF_LINK_GET_FD_BY_ID: - err = bpf_link_get_fd_by_id(&attr); + err = bpf_link_get_fd_by_id(attr); break; case BPF_LINK_GET_NEXT_ID: - err = bpf_obj_get_next_id(&attr, uattr.user, + err = bpf_obj_get_next_id(attr, uattr.user, &link_idr, &link_idr_lock); break; case BPF_ENABLE_STATS: - err = bpf_enable_stats(&attr); + err = bpf_enable_stats(attr); break; case BPF_ITER_CREATE: - err = bpf_iter_create(&attr); + err = bpf_iter_create(attr); break; case BPF_LINK_DETACH: - err = link_detach(&attr); + err = link_detach(attr); break; case BPF_PROG_BIND_MAP: - err = bpf_prog_bind_map(&attr); + err = bpf_prog_bind_map(attr); break; default: err = -EINVAL; @@ -5171,6 +5148,328 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) return err; }
+static void convert_compat_bpf_attr(union bpf_attr *dest, + const union compat_bpf_attr *cattr, int cmd) +{ + struct bpf_prog *prog; + + memset(dest, 0, sizeof(union bpf_attr)); + + switch (cmd) { + case BPF_MAP_CREATE: + copy_field(dest, cattr, map_type); + copy_field(dest, cattr, key_size); + copy_field(dest, cattr, value_size); + copy_field(dest, cattr, max_entries); + copy_field(dest, cattr, map_flags); + copy_field(dest, cattr, inner_map_fd); + copy_field(dest, cattr, numa_node); + strncpy(dest->map_name, cattr->map_name, BPF_OBJ_NAME_LEN); + copy_field(dest, cattr, map_ifindex); + copy_field(dest, cattr, btf_fd); + copy_field(dest, cattr, btf_key_type_id); + copy_field(dest, cattr, btf_value_type_id); + copy_field(dest, cattr, btf_vmlinux_value_type_id); + copy_field(dest, cattr, map_extra); + break; + case BPF_MAP_LOOKUP_ELEM: + case BPF_MAP_UPDATE_ELEM: + case BPF_MAP_DELETE_ELEM: + case BPF_MAP_LOOKUP_AND_DELETE_ELEM: + copy_field(dest, cattr, map_fd); + copy_field(dest, cattr, key); + copy_field(dest, cattr, value); + /* u64 next_key is in a union with u64 value */ + copy_field(dest, cattr, flags); + break; + case BPF_MAP_LOOKUP_BATCH: + case BPF_MAP_LOOKUP_AND_DELETE_BATCH: + case BPF_MAP_UPDATE_BATCH: + case BPF_MAP_DELETE_BATCH: + copy_field(dest, cattr, batch.in_batch); + copy_field(dest, cattr, batch.out_batch); + copy_field(dest, cattr, batch.keys); + copy_field(dest, cattr, batch.values); + copy_field(dest, cattr, batch.count); + copy_field(dest, cattr, batch.map_fd); + copy_field(dest, cattr, batch.elem_flags); + copy_field(dest, cattr, batch.flags); + break; + case BPF_PROG_LOAD: + copy_field(dest, cattr, prog_type); + copy_field(dest, cattr, insn_cnt); + copy_field(dest, cattr, insns); + copy_field(dest, cattr, license); + copy_field(dest, cattr, log_level); + copy_field(dest, cattr, log_size); + copy_field(dest, cattr, log_buf); + copy_field(dest, cattr, kern_version); + copy_field(dest, cattr, prog_flags); + strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN); + copy_field(dest, cattr, prog_ifindex); + copy_field(dest, cattr, expected_attach_type); + copy_field(dest, cattr, prog_btf_fd); + copy_field(dest, cattr, func_info_rec_size); + copy_field(dest, cattr, func_info); + copy_field(dest, cattr, func_info_cnt); + copy_field(dest, cattr, line_info_rec_size); + copy_field(dest, cattr, line_info); + copy_field(dest, cattr, line_info_cnt); + copy_field(dest, cattr, attach_btf_id); + copy_field(dest, cattr, attach_prog_fd); + /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */ + copy_field(dest, cattr, core_relo_cnt); + copy_field(dest, cattr, fd_array); + copy_field(dest, cattr, core_relos); + copy_field(dest, cattr, core_relo_rec_size); + copy_field(dest, cattr, log_true_size); + break; + case BPF_OBJ_PIN: + case BPF_OBJ_GET: + copy_field(dest, cattr, pathname); + copy_field(dest, cattr, bpf_fd); + copy_field(dest, cattr, file_flags); + break; + case BPF_PROG_ATTACH: + case BPF_PROG_DETACH: + copy_field(dest, cattr, target_fd); + copy_field(dest, cattr, attach_bpf_fd); + copy_field(dest, cattr, attach_type); + copy_field(dest, cattr, attach_flags); + copy_field(dest, cattr, replace_bpf_fd); + break; + case BPF_PROG_RUN: /* same as BPF_PROG_TEST_RUN */ + copy_field(dest, cattr, test.prog_fd); + copy_field(dest, cattr, test.retval); + copy_field(dest, cattr, test.data_size_in); + copy_field(dest, cattr, test.data_size_out); + copy_field(dest, cattr, test.data_in); + copy_field(dest, cattr, test.data_out); + copy_field(dest, cattr, test.repeat); + copy_field(dest, cattr, test.duration); + copy_field(dest, cattr, test.ctx_size_in); + copy_field(dest, cattr, test.ctx_size_out); + copy_field(dest, cattr, test.ctx_in); + copy_field(dest, cattr, test.ctx_out); + copy_field(dest, cattr, test.flags); + copy_field(dest, cattr, test.cpu); + copy_field(dest, cattr, test.batch_size); + break; + case BPF_PROG_GET_NEXT_ID: + case BPF_MAP_GET_NEXT_ID: + case BPF_PROG_GET_FD_BY_ID: + case BPF_MAP_GET_FD_BY_ID: + case BPF_BTF_GET_FD_BY_ID: + case BPF_BTF_GET_NEXT_ID: + case BPF_LINK_GET_FD_BY_ID: + case BPF_LINK_GET_NEXT_ID: + /* u32 prog_id, map_id, btf_id + link_id are in a union with + * u32 start_id */ + copy_field(dest, cattr, start_id); + copy_field(dest, cattr, next_id); + copy_field(dest, cattr, open_flags); + break; + case BPF_OBJ_GET_INFO_BY_FD: + copy_field(dest, cattr, info.bpf_fd); + copy_field(dest, cattr, info.info_len); + copy_field(dest, cattr, info.info); + break; + case BPF_PROG_QUERY: + copy_field(dest, cattr, query.target_fd); + copy_field(dest, cattr, query.attach_type); + copy_field(dest, cattr, query.query_flags); + copy_field(dest, cattr, query.attach_flags); + copy_field(dest, cattr, query.prog_ids); + copy_field(dest, cattr, query.prog_cnt); + copy_field(dest, cattr, query.prog_attach_flags); + break; + case BPF_RAW_TRACEPOINT_OPEN: + copy_field(dest, cattr, raw_tracepoint.name); + copy_field(dest, cattr, raw_tracepoint.prog_fd); + break; + case BPF_BTF_LOAD: + copy_field(dest, cattr, btf); + copy_field(dest, cattr, btf_log_buf); + copy_field(dest, cattr, btf_size); + copy_field(dest, cattr, btf_log_size); + copy_field(dest, cattr, btf_log_level); + copy_field(dest, cattr, btf_log_true_size); + break; + case BPF_TASK_FD_QUERY: + copy_field(dest, cattr, task_fd_query.pid); + copy_field(dest, cattr, task_fd_query.fd); + copy_field(dest, cattr, task_fd_query.flags); + copy_field(dest, cattr, task_fd_query.buf_len); + copy_field(dest, cattr, task_fd_query.buf); + copy_field(dest, cattr, task_fd_query.prog_id); + copy_field(dest, cattr, task_fd_query.fd_type); + copy_field(dest, cattr, task_fd_query.probe_offset); + copy_field(dest, cattr, task_fd_query.probe_addr); + break; + case BPF_LINK_CREATE: + copy_field(dest, cattr, link_create.prog_fd); + copy_field(dest, cattr, link_create.target_fd); + /* u32 target_ifindex is in a union with u32 target_fd */ + copy_field(dest, cattr, link_create.attach_type); + copy_field(dest, cattr, link_create.flags); + + prog = bpf_prog_get(cattr->link_create.prog_fd); + + if (prog->type == BPF_PROG_TYPE_CGROUP_SKB || + prog->type == BPF_PROG_TYPE_CGROUP_SOCK || + prog->type == BPF_PROG_TYPE_CGROUP_SOCK_ADDR || + prog->type == BPF_PROG_TYPE_SOCK_OPS || + prog->type == BPF_PROG_TYPE_CGROUP_DEVICE || + prog->type == BPF_PROG_TYPE_CGROUP_SYSCTL || + prog->type == BPF_PROG_TYPE_CGROUP_SOCKOPT) + break; + + if (prog->type == BPF_PROG_TYPE_EXT) { + copy_field(dest, cattr, link_create.tracing.target_btf_id); + copy_field(dest, cattr, link_create.tracing.cookie); + break; + } + + if (prog->type == BPF_PROG_TYPE_LSM || + prog->type == BPF_PROG_TYPE_TRACING) { + if (prog->expected_attach_type == BPF_TRACE_ITER) { + /* iter_info is a user pointer to union + * bpf_iter_link_info however since this union + * contains no pointers, the size/offsets are + * the same for compat64/purecap; hence no + * conversion needed + */ + copy_field(dest, cattr, link_create.iter_info); + copy_field(dest, cattr, link_create.iter_info_len); + break; + } else if (prog->expected_attach_type == BPF_TRACE_RAW_TP + || prog->expected_attach_type == BPF_LSM_CGROUP) { + /* only uses common fields above */ + break; + } else { + copy_field(dest, cattr, link_create.target_btf_id); + copy_field(dest, cattr, link_create.tracing.cookie); + break; + } + } + + if (prog->type == BPF_PROG_TYPE_FLOW_DISSECTOR || + prog->type == BPF_PROG_TYPE_SK_LOOKUP || + prog->type == BPF_PROG_TYPE_XDP) + break; + + /* bpf_cookie is used in bpf_perf_link_attach() */ + if (prog->type == BPF_PROG_TYPE_PERF_EVENT || + prog->type == BPF_PROG_TYPE_TRACEPOINT || + (prog->type == BPF_PROG_TYPE_KPROBE && + cattr->link_create.attach_type == BPF_PERF_EVENT)) { + copy_field(dest, cattr, link_create.perf_event.bpf_cookie); + break; + } + + /* kprobe_multi is used in bpf_kprobe_multi_link_attach() */ + if (prog->type == BPF_PROG_TYPE_KPROBE && + cattr->link_create.attach_type != BPF_PERF_EVENT) { + copy_field(dest, cattr, link_create.kprobe_multi.flags); + copy_field(dest, cattr, link_create.kprobe_multi.cnt); + copy_field(dest, cattr, link_create.kprobe_multi.syms); + copy_field(dest, cattr, link_create.kprobe_multi.addrs); + copy_field(dest, cattr, link_create.kprobe_multi.cookies); + break; + } + + if (prog->type == BPF_PROG_TYPE_NETFILTER) { + copy_field(dest, cattr, link_create.netfilter.pf); + copy_field(dest, cattr, link_create.netfilter.hooknum); + copy_field(dest, cattr, link_create.netfilter.priority); + copy_field(dest, cattr, link_create.netfilter.flags); + break; + } + break; + case BPF_LINK_UPDATE: + copy_field(dest, cattr, link_update.link_fd); + copy_field(dest, cattr, link_update.new_prog_fd); + copy_field(dest, cattr, link_update.flags); + copy_field(dest, cattr, link_update.old_prog_fd); + break; + case BPF_LINK_DETACH: + copy_field(dest, cattr, link_detach.link_fd); + break; + case BPF_ENABLE_STATS: + copy_field(dest, cattr, enable_stats.type); + break; + case BPF_ITER_CREATE: + copy_field(dest, cattr, iter_create.link_fd); + copy_field(dest, cattr, iter_create.flags); + break; + case BPF_PROG_BIND_MAP: + copy_field(dest, cattr, prog_bind_map.prog_fd); + copy_field(dest, cattr, prog_bind_map.map_fd); + copy_field(dest, cattr, prog_bind_map.flags); + break; + }; +} + +static int bpf_check_perms(int cmd) +{ + bool capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled; + + /* Intent here is for unprivileged_bpf_disabled to block key object + * creation commands for unprivileged users; other actions depend + * of fd availability and access to bpffs, so are dependent on + * object creation success. Capabilities are later verified for + * operations such as load and map create, so even with unprivileged + * BPF disabled, capability checks are still carried out for these + * and other operations. + */ + if (!capable && + (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD)) + return -EPERM; + + return 0; +} + +static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, + bpfptr_t uattr, unsigned int *size) +{ + union compat_bpf_attr cattr; + void *select_attr = in_compat64_syscall() ? &cattr : attr; + size_t attr_size = in_compat64_syscall() ? sizeof(union compat_bpf_attr) + : sizeof(union bpf_attr); + int err; + + err = bpf_check_uarg_tail_zero(uattr, attr_size, *size); + if (err) + return err; + *size = min_t(u32, *size, attr_size); + + /* copy attributes from user space, may be less than sizeof(bpf_attr) */ + memset(select_attr, 0, attr_size); + if (copy_from_bpfptr(select_attr, uattr, *size) != 0) + return -EFAULT; + + if (in_compat64_syscall()) + convert_compat_bpf_attr(attr, &cattr, cmd); + + return 0; +} + +static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{ + union bpf_attr attr; + int err; + + err = bpf_check_perms(cmd); + if (err) + return err; + + err = copy_bpf_attr_from_user(&attr, cmd, uattr, &size); + if (err) + return err; + + return dispatch_bpf(cmd, &attr, uattr, size); +} + SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { return __sys_bpf(cmd, USER_BPFPTR(uattr), size);
On 12/09/2023 11:51, Zachary Leaf wrote:
[...]
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f1c8733f76b8..867c51d45a83 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3,6 +3,7 @@ */ #include <linux/bpf.h> #include <linux/bpf-cgroup.h> +#include <linux/bpf_compat.h> #include <linux/bpf_trace.h> #include <linux/bpf_lirc.h> #include <linux/bpf_verifier.h> @@ -11,6 +12,7 @@ #include <linux/syscalls.h> #include <linux/slab.h> #include <linux/sched/signal.h> +#include <linux/stddef.h>
We can skip this. It's included from <linux/kernel.h> so we can consider it available by default.
[...]
+static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd,
bpfptr_t uattr, unsigned int *size)
+{
- union compat_bpf_attr cattr;
- void *select_attr = in_compat64_syscall() ? &cattr : attr;
Nit: select_attr sounds a bit strange, I guess as in "selected", but that doesn't describe what it is. Maybe target_attr? That would apply to the other conversion functions as well.
- size_t attr_size = in_compat64_syscall() ? sizeof(union compat_bpf_attr)
: sizeof(union bpf_attr);
- int err;
- err = bpf_check_uarg_tail_zero(uattr, attr_size, *size);
- if (err)
return err;
- *size = min_t(u32, *size, attr_size);
- /* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(select_attr, 0, attr_size);
- if (copy_from_bpfptr(select_attr, uattr, *size) != 0)
return -EFAULT;
- if (in_compat64_syscall())
convert_compat_bpf_attr(attr, &cattr, cmd);
- return 0;
+}
+static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{
- union bpf_attr attr;
- int err;
- err = bpf_check_perms(cmd);
- if (err)
return err;
- err = copy_bpf_attr_from_user(&attr, cmd, uattr, &size);
- if (err)
return err;
- return dispatch_bpf(cmd, &attr, uattr, size);
+}
Let's merge this and dispatch_bpf together. This way, we keep __sys_bpf mostly unchanged, removing the big diff in the switch (replacing &attr with attr). This is partly the reason why I suggested not having a separate compat handler: no need to create a common function, reducing the diff further. We probably don't need to create bpf_check_perms() either.
Kevin
SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { return __sys_bpf(cmd, USER_BPFPTR(uattr), size);
On 15/09/2023 15:45, Kevin Brodsky wrote:
On 12/09/2023 11:51, Zachary Leaf wrote:
[...]
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f1c8733f76b8..867c51d45a83 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3,6 +3,7 @@ */ #include <linux/bpf.h> #include <linux/bpf-cgroup.h> +#include <linux/bpf_compat.h> #include <linux/bpf_trace.h> #include <linux/bpf_lirc.h> #include <linux/bpf_verifier.h> @@ -11,6 +12,7 @@ #include <linux/syscalls.h> #include <linux/slab.h> #include <linux/sched/signal.h> +#include <linux/stddef.h>
We can skip this. It's included from <linux/kernel.h> so we can consider it available by default.
Ack
[...]
+static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd,
bpfptr_t uattr, unsigned int *size)
+{
- union compat_bpf_attr cattr;
- void *select_attr = in_compat64_syscall() ? &cattr : attr;
Nit: select_attr sounds a bit strange, I guess as in "selected", but that doesn't describe what it is. Maybe target_attr? That would apply to the other conversion functions as well.
That was how I was thinking about it - "selecting" the correct attr, but target is probably clearer.
The same for the bpf_{prog,btf,link}_info conversion functions are changed in v3: select_info -> target_info
- size_t attr_size = in_compat64_syscall() ? sizeof(union compat_bpf_attr)
: sizeof(union bpf_attr);
- int err;
- err = bpf_check_uarg_tail_zero(uattr, attr_size, *size);
- if (err)
return err;
- *size = min_t(u32, *size, attr_size);
- /* copy attributes from user space, may be less than sizeof(bpf_attr) */
- memset(select_attr, 0, attr_size);
- if (copy_from_bpfptr(select_attr, uattr, *size) != 0)
return -EFAULT;
- if (in_compat64_syscall())
convert_compat_bpf_attr(attr, &cattr, cmd);
- return 0;
+}
+static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) +{
- union bpf_attr attr;
- int err;
- err = bpf_check_perms(cmd);
- if (err)
return err;
- err = copy_bpf_attr_from_user(&attr, cmd, uattr, &size);
- if (err)
return err;
- return dispatch_bpf(cmd, &attr, uattr, size);
+}
Let's merge this and dispatch_bpf together. This way, we keep __sys_bpf mostly unchanged, removing the big diff in the switch (replacing &attr with attr). This is partly the reason why I suggested not having a separate compat handler: no need to create a common function, reducing the diff further. We probably don't need to create bpf_check_perms() either.
Right - of course. That simplifies the diff a lot. I probably prefer it split out instead of one giant function but it's not worth the maintenance burden on that.
Thanks, Zach
Kevin
SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { return __sys_bpf(cmd, USER_BPFPTR(uattr), size);
The patch "bpf: compat64: add handler and convert bpf_attr in" handled the case where a compat64 version of union bpf_attr is passed into the kernel via the bpf syscall. It resolved the differences in size and member offsets by copying the compat version into a native version of the struct to be used in the kernel.
When writing back out to the original __user *bpf_attr union, we must reverse the conversion to correctly write to the appropriate offsets.
The "technically correct" approach would probably be to replace all functions that have a union bpf_attr __user* parameter with void __user* (at least where we pass uattr.user in dispatch_bpf() and propagate this down). This forces us to check the type and cast before we use uattr, since the type is technically unknown. This avoids any potential mis-use by not checking for compat64 before trying to access a compat_bpf_attr ptr with native union offsets. The downside of this approach is we would end up touching many function signatures as uattr propagates down the call stack (currently 73).
The minimal diff approach used here is to handle only specifically where we write directly to uattr with macros to select the correct struct type (compat or native). This greatly reduces the diff size at the cost of having compat_bpf_attr unions passed around as union bpf_attr and needing to remember to cast back where required.
The instances where we write out to uattr were found by grepping 'copy_to_user(&uattr', 'put_user(.*uattr' and 'copy_to_bpfptr' so the list may or may not be complete. Luckily this doesn't appear to happen that often and usage seems to be consistent.
We can also replace all instances of copy_to_user() here with put_user() which somewhat simplifies things. Similarly since we're only copying individual members with copy_to_bpfptr_offset(), bpfptr_put_uattr() usage is simplified by not having to specify the size.
Note: conversion out is only required where we're writing out to the original userspace union bpf_attr i.e. uattr. There are many instances where we receive a __user ptr from userspace via bpf_attr, e.g. a ptr to some userspace buffer. This pointer is copied into a native struct (it may be converted from a compat layout), and we can then *write directly to pointer target* without requiring any additional conversion out. copy_to_user() or put_user() calls where the target is a __user ptr (not a ptr to the __user union bpf_attr) therefore do not require any additional handling, since the offsets and struct layouts are irrelevant.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- drivers/media/rc/bpf-lirc.c | 5 +++-- include/linux/bpf.h | 13 +++++++++++++ include/linux/bpfptr.h | 8 ++++++++ kernel/bpf/btf.c | 6 +++--- kernel/bpf/cgroup.c | 5 +++-- kernel/bpf/hashtab.c | 5 +++-- kernel/bpf/net_namespace.c | 5 +++-- kernel/bpf/syscall.c | 22 +++++++++++----------- kernel/bpf/verifier.c | 18 ++++++------------ net/bpf/bpf_dummy_struct_ops.c | 3 ++- net/bpf/test_run.c | 16 ++++++++-------- net/core/sock_map.c | 5 +++-- 12 files changed, 66 insertions(+), 45 deletions(-)
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index fe17c7f98e81..f419d7452295 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -4,6 +4,7 @@ // Copyright (C) 2018 Sean Young sean@mess.org
#include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/filter.h> #include <linux/bpf_lirc.h> #include "rc-core-priv.h" @@ -319,12 +320,12 @@ int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) progs = lirc_rcu_dereference(rcdev->raw->progs); cnt = progs ? bpf_prog_array_length(progs) : 0;
- if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) { + if (bpf_put_uattr(uattr, cnt, query.prog_cnt)) { ret = -EFAULT; goto unlock; }
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) { + if (bpf_put_uattr(uattr, flags, query.attach_flags)) { ret = -EFAULT; goto unlock; } diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e53ceee1df37..d73442571290 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -58,6 +58,19 @@ extern struct kobject *btf_kobj; extern struct bpf_mem_alloc bpf_global_ma; extern bool bpf_global_ma_set;
+#define __bpf_put_uattr(uattr, x, to_field) \ + (put_user(x, &((uattr)->to_field))) + +#define bpf_put_uattr(uattr, x, to_field) \ + (in_compat_syscall() ? \ + __bpf_put_uattr((union compat_bpf_attr __user *)uattr, x, to_field) : \ + __bpf_put_uattr((union bpf_attr __user *)uattr, x, to_field)) + +#define bpf_field_exists(uattr_size, field) \ + (in_compat_syscall() ? \ + (uattr_size >= offsetofend(union compat_bpf_attr, field)) : \ + (uattr_size >= offsetofend(union bpf_attr, field))) + typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64); typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, struct bpf_iter_aux_info *aux); diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 79b2f78eec1a..7fdf9692d76e 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -8,6 +8,14 @@
typedef sockptr_t bpfptr_t;
+#define __bpfptr_put_uattr(type, x, uattr, to_field) \ + (copy_to_bpfptr_offset(uattr, offsetof(type, to_field), &x, sizeof(x))) + +#define bpfptr_put_uattr(x, uattr, to_field) \ + (in_compat_syscall() ? \ + __bpfptr_put_uattr(union compat_bpf_attr, x, uattr, to_field) : \ + __bpfptr_put_uattr(union bpf_attr, x, uattr, to_field)) + static inline bool bpfptr_is_kernel(bpfptr_t bpfptr) { return bpfptr.is_kernel; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 72b32b7cd9cd..8c4a90172eb9 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -16,6 +16,7 @@ #include <linux/kernel.h> #include <linux/idr.h> #include <linux/sort.h> +#include <linux/bpf_compat.h> #include <linux/bpf_verifier.h> #include <linux/btf.h> #include <linux/btf_ids.h> @@ -5454,9 +5455,8 @@ static int finalize_log(struct bpf_verifier_log *log, bpfptr_t uattr, u32 uattr_
err = bpf_vlog_finalize(log, &log_true_size);
- if (uattr_size >= offsetofend(union bpf_attr, btf_log_true_size) && - copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, btf_log_true_size), - &log_true_size, sizeof(log_true_size))) + if (bpf_field_exists(uattr_size, btf_log_true_size) && + bpfptr_put_uattr(log_true_size, uattr, btf_log_true_size)) err = -EFAULT;
return err; diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 517b6a5928cc..fa5e67fbbbca 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -14,6 +14,7 @@ #include <linux/string.h> #include <linux/bpf.h> #include <linux/bpf-cgroup.h> +#include <linux/bpf_compat.h> #include <linux/bpf_lsm.h> #include <linux/bpf_verifier.h> #include <net/sock.h> @@ -1061,9 +1062,9 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
/* always output uattr->query.attach_flags as 0 during effective query */ flags = effective_query ? 0 : flags; - if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) + if (bpf_put_uattr(uattr, flags, query.attach_flags)) return -EFAULT; - if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt))) + if (bpf_put_uattr(uattr, total_cnt, query.prog_cnt)) return -EFAULT; if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt) /* return early if user requested only program count + flags */ diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 9901efee4339..acd3561a1254 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -3,6 +3,7 @@ * Copyright (c) 2016 Facebook */ #include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf.h> #include <linux/jhash.h> #include <linux/filter.h> @@ -1693,7 +1694,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, if (!max_count) return 0;
- if (put_user(0, &uattr->batch.count)) + if (bpf_put_uattr(uattr, 0, batch.count)) return -EFAULT;
batch = 0; @@ -1875,7 +1876,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* copy # of entries and next batch */ ubatch = u64_to_user_ptr(attr->batch.out_batch); if (copy_to_user(ubatch, &batch, sizeof(batch)) || - put_user(total, &uattr->batch.count)) + bpf_put_uattr(uattr, total, batch.count)) ret = -EFAULT;
out: diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c index 868cc2c43899..29ae0e3fe5bd 100644 --- a/kernel/bpf/net_namespace.c +++ b/kernel/bpf/net_namespace.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/bpf-netns.h> #include <linux/filter.h> #include <net/net_namespace.h> @@ -257,9 +258,9 @@ static int __netns_bpf_prog_query(const union bpf_attr *attr, if (run_array) prog_cnt = bpf_prog_array_length(run_array);
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) + if (bpf_put_uattr(uattr, flags, query.attach_flags)) return -EFAULT; - if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt))) + if (bpf_put_uattr(uattr, prog_cnt, query.prog_cnt)) return -EFAULT; if (!attr->query.prog_cnt || !prog_ids || !prog_cnt) return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 867c51d45a83..e3ac922bd62e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1651,7 +1651,7 @@ int generic_map_delete_batch(struct bpf_map *map, break; cond_resched(); } - if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp))) + if (bpf_put_uattr(uattr, cp, batch.count)) err = -EFAULT;
kvfree(key); @@ -1709,7 +1709,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, cond_resched(); }
- if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp))) + if (bpf_put_uattr(uattr, cp, batch.count)) err = -EFAULT;
kvfree(value); @@ -1744,7 +1744,7 @@ int generic_map_lookup_batch(struct bpf_map *map, if (!max_count) return 0;
- if (put_user(0, &uattr->batch.count)) + if (bpf_put_uattr(uattr, 0, batch.count)) return -EFAULT;
buf_prevkey = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN); @@ -1809,8 +1809,8 @@ int generic_map_lookup_batch(struct bpf_map *map, if (err == -EFAULT) goto free_buf;
- if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) || - (cp && copy_to_user(uobatch, prev_key, map->key_size)))) + if (bpf_put_uattr(uattr, cp, batch.count) || + (cp && copy_to_user(uobatch, prev_key, map->key_size))) err = -EFAULT;
free_buf: @@ -3718,7 +3718,7 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr, spin_unlock_bh(lock);
if (!err) - err = put_user(next_id, &uattr->next_id); + err = bpf_put_uattr(uattr, next_id, next_id);
return err; } @@ -4410,7 +4410,7 @@ static int bpf_task_fd_query_copy(const union bpf_attr *attr, u32 len = buf ? strlen(buf) : 0, input_len; int err = 0;
- if (put_user(len, &uattr->task_fd_query.buf_len)) + if (bpf_put_uattr(uattr, len, task_fd_query.buf_len)) return -EFAULT; input_len = attr->task_fd_query.buf_len; if (input_len && ubuf) { @@ -4438,10 +4438,10 @@ static int bpf_task_fd_query_copy(const union bpf_attr *attr, } }
- if (put_user(prog_id, &uattr->task_fd_query.prog_id) || - put_user(fd_type, &uattr->task_fd_query.fd_type) || - put_user(probe_offset, &uattr->task_fd_query.probe_offset) || - put_user(probe_addr, &uattr->task_fd_query.probe_addr)) + if (bpf_put_uattr(uattr, prog_id, task_fd_query.prog_id) || + bpf_put_uattr(uattr, fd_type, task_fd_query.fd_type) || + bpf_put_uattr(uattr, probe_offset, task_fd_query.probe_offset) || + bpf_put_uattr(uattr, probe_addr, task_fd_query.probe_addr)) return -EFAULT;
return err; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index cf5f230360f5..e524f2ef0e73 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9,6 +9,7 @@ #include <linux/types.h> #include <linux/slab.h> #include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf.h> #include <linux/bpf_verifier.h> #include <linux/filter.h> @@ -14253,9 +14254,7 @@ static int check_btf_func(struct bpf_verifier_env *env, /* set the size kernel expects so loader can zero * out the rest of the record. */ - if (copy_to_bpfptr_offset(uattr, - offsetof(union bpf_attr, func_info_rec_size), - &min_size, sizeof(min_size))) + if (bpfptr_put_uattr(min_size, uattr, func_info_rec_size)) ret = -EFAULT; } goto err_free; @@ -14387,9 +14386,7 @@ static int check_btf_line(struct bpf_verifier_env *env, if (err) { if (err == -E2BIG) { verbose(env, "nonzero tailing record in line_info"); - if (copy_to_bpfptr_offset(uattr, - offsetof(union bpf_attr, line_info_rec_size), - &expected_size, sizeof(expected_size))) + if (bpfptr_put_uattr(expected_size, uattr, line_info_rec_size)) err = -EFAULT; } goto err_free; @@ -14510,9 +14507,7 @@ static int check_core_relo(struct bpf_verifier_env *env, if (err) { if (err == -E2BIG) { verbose(env, "nonzero tailing record in core_relo"); - if (copy_to_bpfptr_offset(uattr, - offsetof(union bpf_attr, core_relo_rec_size), - &expected_size, sizeof(expected_size))) + if (bpfptr_put_uattr(expected_size, uattr, core_relo_rec_size)) err = -EFAULT; } break; @@ -18957,9 +18952,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3 if (err) ret = err;
- if (uattr_size >= offsetofend(union bpf_attr, log_true_size) && - copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, log_true_size), - &log_true_size, sizeof(log_true_size))) { + if (bpf_field_exists(uattr_size, log_true_size) && + bpfptr_put_uattr(log_true_size, uattr, log_true_size)) { ret = -EFAULT; goto err_release_maps; } diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 5918d1b32e19..50848fbeb26c 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -5,6 +5,7 @@ #include <linux/kernel.h> #include <linux/bpf_verifier.h> #include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf.h>
extern struct bpf_struct_ops bpf_bpf_dummy_ops; @@ -130,7 +131,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, err = dummy_ops_copy_args(args); if (err) goto out; - if (put_user(prog_ret, &uattr->test.retval)) + if (bpf_put_uattr(uattr, prog_ret, test.retval)) err = -EFAULT; out: kfree(args); diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index e79e3a415ca9..899f6d2d3ff4 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -2,6 +2,7 @@ /* Copyright (c) 2017 Facebook */ #include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf.h> #include <linux/btf_ids.h> #include <linux/slab.h> @@ -485,11 +486,11 @@ static int bpf_test_finish(const union bpf_attr *kattr, } }
- if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size))) + if (bpf_put_uattr(uattr, size, test.data_size_out)) goto out; - if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval))) + if (bpf_put_uattr(uattr, retval, test.retval)) goto out; - if (copy_to_user(&uattr->test.duration, &duration, sizeof(duration))) + if (bpf_put_uattr(uattr, duration, test.duration)) goto out; if (err != -ENOSPC) err = 0; @@ -871,7 +872,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog, }
retval = ((u32)side_effect << 16) | ret; - if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval))) + if (bpf_put_uattr(uattr, retval, test.retval)) goto out;
err = 0; @@ -946,8 +947,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, } put_cpu();
- if (!err && - copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32))) + if (!err && bpf_put_uattr(uattr, info.retval, test.retval)) err = -EFAULT;
kfree(info.ctx); @@ -1003,7 +1003,7 @@ static int bpf_ctx_finish(const union bpf_attr *kattr,
if (copy_to_user(data_out, data, copy_size)) goto out; - if (copy_to_user(&uattr->test.ctx_size_out, &size, sizeof(size))) + if (bpf_put_uattr(uattr, size, test.ctx_size_out)) goto out; if (err != -ENOSPC) err = 0; @@ -1681,7 +1681,7 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, retval = bpf_prog_run_pin_on_cpu(prog, ctx); rcu_read_unlock_trace();
- if (copy_to_user(&uattr->test.retval, &retval, sizeof(u32))) { + if (bpf_put_uattr(uattr, retval, test.retval)) { err = -EFAULT; goto out; } diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 00afb66cd095..aa7522a61e30 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -2,6 +2,7 @@ /* Copyright (c) 2017 - 2018 Covalent IO, Inc. http://covalent.io */
#include <linux/bpf.h> +#include <linux/bpf_compat.h> #include <linux/btf_ids.h> #include <linux/filter.h> #include <linux/errno.h> @@ -1547,9 +1548,9 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr, end: rcu_read_unlock();
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)) || + if (bpf_put_uattr(uattr, flags, query.attach_flags) || (id != 0 && copy_to_user(prog_ids, &id, sizeof(u32))) || - copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt))) + bpf_put_uattr(uattr, prog_cnt, query.prog_cnt)) ret = -EFAULT;
fdput(f);
On 12/09/2023 11:51, Zachary Leaf wrote:
The patch "bpf: compat64: add handler and convert bpf_attr in" handled the case where a compat64 version of union bpf_attr is passed into the kernel via the bpf syscall. It resolved the differences in size and member offsets by copying the compat version into a native version of the struct to be used in the kernel.
When writing back out to the original __user *bpf_attr union, we must reverse the conversion to correctly write to the appropriate offsets.
The "technically correct" approach would probably be to replace all functions that have a union bpf_attr __user* parameter with void __user* (at least where we pass uattr.user in dispatch_bpf() and propagate this down). This forces us to check the type and cast before we use uattr, since the type is technically unknown. This avoids any potential mis-use by not checking for compat64 before trying to access a compat_bpf_attr ptr with native union offsets. The downside of this approach is we would end up touching many function signatures as uattr propagates down the call stack (currently 73).
The minimal diff approach used here is to handle only specifically where we write directly to uattr with macros to select the correct struct type (compat or native). This greatly reduces the diff size at the cost of having compat_bpf_attr unions passed around as union bpf_attr and needing to remember to cast back where required.
The instances where we write out to uattr were found by grepping 'copy_to_user(&uattr', 'put_user(.*uattr' and 'copy_to_bpfptr' so the list may or may not be complete. Luckily this doesn't appear to happen that often and usage seems to be consistent.
We can also replace all instances of copy_to_user() here with put_user() which somewhat simplifies things. Similarly since we're only copying individual members with copy_to_bpfptr_offset(), bpfptr_put_uattr() usage is simplified by not having to specify the size.
Note: conversion out is only required where we're writing out to the original userspace union bpf_attr i.e. uattr. There are many instances where we receive a __user ptr from userspace via bpf_attr, e.g. a ptr to some userspace buffer. This pointer is copied into a native struct (it may be converted from a compat layout), and we can then *write directly to pointer target* without requiring any additional conversion out. copy_to_user() or put_user() calls where the target is a __user ptr (not a ptr to the __user union bpf_attr) therefore do not require any additional handling, since the offsets and struct layouts are irrelevant.
Much clearer now, thanks!
This patch actually simplifies the existing code quite a bit, regardless of compat. It would be interesting to try and upstream these helpers (without the compat handling naturally). If they were accepted, it would make our patches more robust when rebasing (besides the obvious reduction in diff).
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
drivers/media/rc/bpf-lirc.c | 5 +++-- include/linux/bpf.h | 13 +++++++++++++ include/linux/bpfptr.h | 8 ++++++++ kernel/bpf/btf.c | 6 +++--- kernel/bpf/cgroup.c | 5 +++-- kernel/bpf/hashtab.c | 5 +++-- kernel/bpf/net_namespace.c | 5 +++-- kernel/bpf/syscall.c | 22 +++++++++++----------- kernel/bpf/verifier.c | 18 ++++++------------ net/bpf/bpf_dummy_struct_ops.c | 3 ++- net/bpf/test_run.c | 16 ++++++++-------- net/core/sock_map.c | 5 +++-- 12 files changed, 66 insertions(+), 45 deletions(-)
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index fe17c7f98e81..f419d7452295 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -4,6 +4,7 @@ // Copyright (C) 2018 Sean Young sean@mess.org #include <linux/bpf.h> +#include <linux/bpf_compat.h>
I was going to say "no longer needed", but then realised that in fact these new macros all need bpf_compat.h for the definition of union compat_bpf_attr. Unless there's a particular issue with that, I think we should just have that #include in bpf.h, as it doesn't make much sense to have the macros there otherwise.
#include <linux/filter.h> #include <linux/bpf_lirc.h> #include "rc-core-priv.h" @@ -319,12 +320,12 @@ int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) progs = lirc_rcu_dereference(rcdev->raw->progs); cnt = progs ? bpf_prog_array_length(progs) : 0;
- if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) {
- if (bpf_put_uattr(uattr, cnt, query.prog_cnt)) { ret = -EFAULT; goto unlock; }
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) {
- if (bpf_put_uattr(uattr, flags, query.attach_flags)) { ret = -EFAULT; goto unlock; }
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e53ceee1df37..d73442571290 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -58,6 +58,19 @@ extern struct kobject *btf_kobj; extern struct bpf_mem_alloc bpf_global_ma; extern bool bpf_global_ma_set; +#define __bpf_put_uattr(uattr, x, to_field) \
- (put_user(x, &((uattr)->to_field)))
+#define bpf_put_uattr(uattr, x, to_field) \
- (in_compat_syscall() ? \
__bpf_put_uattr((union compat_bpf_attr __user *)uattr, x, to_field) : \
__bpf_put_uattr((union bpf_attr __user *)uattr, x, to_field))
+#define bpf_field_exists(uattr_size, field) \
The naming is a bit strange, though to be fair it's not easy to come up with something meaningful. Maybe bpf_has_field?
- (in_compat_syscall() ? \
(uattr_size >= offsetofend(union compat_bpf_attr, field)) : \
(uattr_size >= offsetofend(union bpf_attr, field)))
typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64); typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, struct bpf_iter_aux_info *aux); diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 79b2f78eec1a..7fdf9692d76e 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -8,6 +8,14 @@ typedef sockptr_t bpfptr_t; +#define __bpfptr_put_uattr(type, x, uattr, to_field) \
- (copy_to_bpfptr_offset(uattr, offsetof(type, to_field), &x, sizeof(x)))
+#define bpfptr_put_uattr(x, uattr, to_field) \
The order of arguments should probably be the same in both bpf_put_uattr() and bpfptr_put_uattr(). I'm not sure which makes more sense, both look plausible.
Kevin
On 15/09/2023 16:02, Kevin Brodsky wrote:
On 12/09/2023 11:51, Zachary Leaf wrote:
The patch "bpf: compat64: add handler and convert bpf_attr in" handled the case where a compat64 version of union bpf_attr is passed into the kernel via the bpf syscall. It resolved the differences in size and member offsets by copying the compat version into a native version of the struct to be used in the kernel.
When writing back out to the original __user *bpf_attr union, we must reverse the conversion to correctly write to the appropriate offsets.
The "technically correct" approach would probably be to replace all functions that have a union bpf_attr __user* parameter with void __user* (at least where we pass uattr.user in dispatch_bpf() and propagate this down). This forces us to check the type and cast before we use uattr, since the type is technically unknown. This avoids any potential mis-use by not checking for compat64 before trying to access a compat_bpf_attr ptr with native union offsets. The downside of this approach is we would end up touching many function signatures as uattr propagates down the call stack (currently 73).
The minimal diff approach used here is to handle only specifically where we write directly to uattr with macros to select the correct struct type (compat or native). This greatly reduces the diff size at the cost of having compat_bpf_attr unions passed around as union bpf_attr and needing to remember to cast back where required.
The instances where we write out to uattr were found by grepping 'copy_to_user(&uattr', 'put_user(.*uattr' and 'copy_to_bpfptr' so the list may or may not be complete. Luckily this doesn't appear to happen that often and usage seems to be consistent.
We can also replace all instances of copy_to_user() here with put_user() which somewhat simplifies things. Similarly since we're only copying individual members with copy_to_bpfptr_offset(), bpfptr_put_uattr() usage is simplified by not having to specify the size.
Note: conversion out is only required where we're writing out to the original userspace union bpf_attr i.e. uattr. There are many instances where we receive a __user ptr from userspace via bpf_attr, e.g. a ptr to some userspace buffer. This pointer is copied into a native struct (it may be converted from a compat layout), and we can then *write directly to pointer target* without requiring any additional conversion out. copy_to_user() or put_user() calls where the target is a __user ptr (not a ptr to the __user union bpf_attr) therefore do not require any additional handling, since the offsets and struct layouts are irrelevant.
Much clearer now, thanks!
This patch actually simplifies the existing code quite a bit, regardless of compat. It would be interesting to try and upstream these helpers (without the compat handling naturally). If they were accepted, it would make our patches more robust when rebasing (besides the obvious reduction in diff).
Possibly, at least for bpfptr_put_uattr() and bpf_field_exists().
Then the copy_to_user()'s can be replaced with put_user() where appropriate.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
drivers/media/rc/bpf-lirc.c | 5 +++-- include/linux/bpf.h | 13 +++++++++++++ include/linux/bpfptr.h | 8 ++++++++ kernel/bpf/btf.c | 6 +++--- kernel/bpf/cgroup.c | 5 +++-- kernel/bpf/hashtab.c | 5 +++-- kernel/bpf/net_namespace.c | 5 +++-- kernel/bpf/syscall.c | 22 +++++++++++----------- kernel/bpf/verifier.c | 18 ++++++------------ net/bpf/bpf_dummy_struct_ops.c | 3 ++- net/bpf/test_run.c | 16 ++++++++-------- net/core/sock_map.c | 5 +++-- 12 files changed, 66 insertions(+), 45 deletions(-)
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index fe17c7f98e81..f419d7452295 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -4,6 +4,7 @@ // Copyright (C) 2018 Sean Young sean@mess.org #include <linux/bpf.h> +#include <linux/bpf_compat.h>
I was going to say "no longer needed", but then realised that in fact these new macros all need bpf_compat.h for the definition of union compat_bpf_attr. Unless there's a particular issue with that, I think we should just have that #include in bpf.h, as it doesn't make much sense to have the macros there otherwise.
Sure, makes sense.
#include <linux/filter.h> #include <linux/bpf_lirc.h> #include "rc-core-priv.h" @@ -319,12 +320,12 @@ int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) progs = lirc_rcu_dereference(rcdev->raw->progs); cnt = progs ? bpf_prog_array_length(progs) : 0;
- if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) {
- if (bpf_put_uattr(uattr, cnt, query.prog_cnt)) { ret = -EFAULT; goto unlock; }
- if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) {
- if (bpf_put_uattr(uattr, flags, query.attach_flags)) { ret = -EFAULT; goto unlock; }
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e53ceee1df37..d73442571290 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -58,6 +58,19 @@ extern struct kobject *btf_kobj; extern struct bpf_mem_alloc bpf_global_ma; extern bool bpf_global_ma_set; +#define __bpf_put_uattr(uattr, x, to_field) \
- (put_user(x, &((uattr)->to_field)))
+#define bpf_put_uattr(uattr, x, to_field) \
- (in_compat_syscall() ? \
__bpf_put_uattr((union compat_bpf_attr __user *)uattr, x, to_field) : \
__bpf_put_uattr((union bpf_attr __user *)uattr, x, to_field))
+#define bpf_field_exists(uattr_size, field) \
The naming is a bit strange, though to be fair it's not easy to come up with something meaningful. Maybe bpf_has_field?
There is some precedent in libbpf/userspace, see tools/lib/bpf/bpf_core_read.h - there's bpf_core_field_exists() to check a similar thing but the other way around i.e. userspace checks if kernel has a field.
This is the exact opposite, kernel is checking if userspace has a field.
Also see: https://nakryiko.com/posts/bpf-core-reference-guide/#bpf-core-field-exists
- (in_compat_syscall() ? \
(uattr_size >= offsetofend(union compat_bpf_attr, field)) : \
(uattr_size >= offsetofend(union bpf_attr, field)))
typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64); typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, struct bpf_iter_aux_info *aux); diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 79b2f78eec1a..7fdf9692d76e 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -8,6 +8,14 @@ typedef sockptr_t bpfptr_t; +#define __bpfptr_put_uattr(type, x, uattr, to_field) \
- (copy_to_bpfptr_offset(uattr, offsetof(type, to_field), &x, sizeof(x)))
+#define bpfptr_put_uattr(x, uattr, to_field) \
The order of arguments should probably be the same in both bpf_put_uattr() and bpfptr_put_uattr(). I'm not sure which makes more sense, both look plausible.
Better to make it match closer with put_user() I think, since that is what it's trying to replicate
e.g. - if (put_user(0, &uattr->batch.count)) + if (bpf_put_uattr(0, uattr, batch.count))
Thanks, Zach
Kevin
On 26/09/2023 11:44, Zachary Leaf wrote:
+#define bpf_field_exists(uattr_size, field) \
The naming is a bit strange, though to be fair it's not easy to come up with something meaningful. Maybe bpf_has_field?
There is some precedent in libbpf/userspace, see tools/lib/bpf/bpf_core_read.h - there's bpf_core_field_exists() to check a similar thing but the other way around i.e. userspace checks if kernel has a field.
This is the exact opposite, kernel is checking if userspace has a field.
Also see: https://nakryiko.com/posts/bpf-core-reference-guide/#bpf-core-field-exists
Fair enough, makes sense to reuse that naming.
- (in_compat_syscall() ? \
(uattr_size >= offsetofend(union compat_bpf_attr, field)) : \
(uattr_size >= offsetofend(union bpf_attr, field)))
typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64); typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, struct bpf_iter_aux_info *aux); diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 79b2f78eec1a..7fdf9692d76e 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -8,6 +8,14 @@ typedef sockptr_t bpfptr_t; +#define __bpfptr_put_uattr(type, x, uattr, to_field) \
- (copy_to_bpfptr_offset(uattr, offsetof(type, to_field), &x, sizeof(x)))
+#define bpfptr_put_uattr(x, uattr, to_field) \
The order of arguments should probably be the same in both bpf_put_uattr() and bpfptr_put_uattr(). I'm not sure which makes more sense, both look plausible.
Better to make it match closer with put_user() I think, since that is what it's trying to replicate
e.g.
- if (put_user(0, &uattr->batch.count))
- if (bpf_put_uattr(0, uattr, batch.count))
I agree, that feels most natural.
Kevin
union bpf_attr member info.info is a __user *ptr to either: struct bpf_btf_info struct bpf_prog_info struct bpf_map_info struct bpf_link_info
These info structs are passed in from userspace and used to store information returned from calls to BPF_OBJ_GET_INFO_BY_FD.
While inbound conversion for all other members of union bpf_attr is handled upfront, this same approach is not possible here for two main reasons:
1. Due to the extra complexity of a further userspace struct that is one of 4 types. These types themselves can be a union/multiplex of even more additional options and types. It is not straight forward or possible in many cases to determine which elements of the union are active upfront, and each has different native/compat64 layouts/offsets (except bpf_map_info - see below).
2. Due to the nature of this subcommand, the main purpose is to copy information back out to userspace, there is more work to do to convert the struct back out to compat64 compatible offsets compared to other subcommands.
Instead of an upfront conversion, convert where each is used at the end site both in/out.
Note: since struct bpf_map_info contains no pointers, the layout is the same in native/compat64. No conversion or additional handling is therefore required.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpf_compat.h | 112 ++++++++++++++ kernel/bpf/btf.c | 88 +++++++++-- kernel/bpf/syscall.c | 297 +++++++++++++++++++++++++++++++++---- 3 files changed, 458 insertions(+), 39 deletions(-)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 85e0198bede7..710815417a27 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -298,3 +298,115 @@ union compat_bpf_attr {
} __attribute__((aligned(8)));
+struct compat_bpf_prog_info { + __u32 type; + __u32 id; + __u8 tag[BPF_TAG_SIZE]; + __u32 jited_prog_len; + __u32 xlated_prog_len; + __aligned_u64 jited_prog_insns; + __aligned_u64 xlated_prog_insns; + __u64 load_time; /* ns since boottime */ + __u32 created_by_uid; + __u32 nr_map_ids; + __aligned_u64 map_ids; + char name[BPF_OBJ_NAME_LEN]; + __u32 ifindex; + __u32 gpl_compatible:1; + __u32:31; /* alignment pad */ + __u64 netns_dev; + __u64 netns_ino; + __u32 nr_jited_ksyms; + __u32 nr_jited_func_lens; + __aligned_u64 jited_ksyms; + __aligned_u64 jited_func_lens; + __u32 btf_id; + __u32 func_info_rec_size; + __aligned_u64 func_info; + __u32 nr_func_info; + __u32 nr_line_info; + __aligned_u64 line_info; + __aligned_u64 jited_line_info; + __u32 nr_jited_line_info; + __u32 line_info_rec_size; + __u32 jited_line_info_rec_size; + __u32 nr_prog_tags; + __aligned_u64 prog_tags; + __u64 run_time_ns; + __u64 run_cnt; + __u64 recursion_misses; + __u32 verified_insns; + __u32 attach_btf_obj_id; + __u32 attach_btf_id; +} __attribute__((aligned(8))); + +struct compat_bpf_btf_info { + __aligned_u64 btf; + __u32 btf_size; + __u32 id; + __aligned_u64 name; + __u32 name_len; + __u32 kernel_btf; +} __attribute__((aligned(8))); + +struct compat_bpf_link_info { + __u32 type; + __u32 id; + __u32 prog_id; + union { + struct { + __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */ + __u32 tp_name_len; /* in/out: tp_name buffer len */ + } raw_tracepoint; + struct { + __u32 attach_type; + __u32 target_obj_id; /* prog_id for PROG_EXT, otherwise btf object id */ + __u32 target_btf_id; /* BTF type id inside the object */ + } tracing; + struct { + __u64 cgroup_id; + __u32 attach_type; + } cgroup; + struct { + __aligned_u64 target_name; /* in/out: target_name buffer ptr */ + __u32 target_name_len; /* in/out: target_name buffer len */ + + /* If the iter specific field is 32 bits, it can be put + * in the first or second union. Otherwise it should be + * put in the second union. + */ + union { + struct { + __u32 map_id; + } map; + }; + union { + struct { + __u64 cgroup_id; + __u32 order; + } cgroup; + struct { + __u32 tid; + __u32 pid; + } task; + }; + } iter; + struct { + __u32 netns_ino; + __u32 attach_type; + } netns; + struct { + __u32 ifindex; + } xdp; + struct { + __u32 map_id; + } struct_ops; + struct { + __u32 pf; + __u32 hooknum; + __s32 priority; + __u32 flags; + } netfilter; + }; +} __attribute__((aligned(8))); + diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 8c4a90172eb9..caeb98dcf3cf 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -7215,25 +7215,89 @@ struct btf *btf_get_by_fd(int fd) return btf; }
+static void convert_compat_btf_info_in(struct bpf_btf_info *dest, + const struct compat_bpf_btf_info *cinfo) +{ + copy_field(dest, cinfo, btf); + copy_field(dest, cinfo, btf_size); + copy_field(dest, cinfo, id); + copy_field(dest, cinfo, name); + copy_field(dest, cinfo, name_len); + copy_field(dest, cinfo, kernel_btf); +} + +static void convert_compat_btf_info_out(struct compat_bpf_btf_info *dest, + const struct bpf_btf_info *info) +{ + copy_field(dest, info, btf); + copy_field(dest, info, btf_size); + copy_field(dest, info, id); + copy_field(dest, info, name); + copy_field(dest, info, name_len); + copy_field(dest, info, kernel_btf); +} + +static int copy_bpf_btf_info_from_user(const union bpf_attr *attr, + struct bpf_btf_info *info, + u32 *info_len) +{ + struct compat_bpf_btf_info cinfo; + int err; + void *select_info = in_compat64_syscall() ? &cinfo : info; + size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_btf_info) + : sizeof(struct bpf_btf_info); + void __user *uinfo = u64_to_user_ptr(attr->info.info); + *info_len = attr->info.info_len; + + err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), + info_size, *info_len); + if (err) + return err; + *info_len = min_t(u32, info_size, *info_len); + + memset(info, 0, sizeof(struct bpf_btf_info)); + if (copy_from_user(select_info, uinfo, *info_len)) + return -EFAULT; + + if (in_compat64_syscall()) + convert_compat_btf_info_in(info, &cinfo); + + return 0; +} + +static int copy_bpf_btf_info_to_user(const union bpf_attr *attr, + union bpf_attr __user *uattr, + struct bpf_btf_info *info, + u32 *info_len) +{ + struct compat_bpf_btf_info cinfo; + void *select_info = in_compat64_syscall() ? &cinfo : info; + void __user *uinfo = (void __user *)attr->info.info; + + if (in_compat64_syscall()) + convert_compat_btf_info_out(&cinfo, info); + + if (copy_to_user(uinfo, select_info, *info_len) || + bpf_put_uattr(uattr, *info_len, info.info_len)) + return -EFAULT; + + return 0; +} + int btf_get_info_by_fd(const struct btf *btf, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_btf_info __user *uinfo; struct bpf_btf_info info; - u32 info_copy, btf_copy; + u32 btf_copy; void __user *ubtf; char __user *uname; u32 uinfo_len, uname_len, name_len; int ret = 0;
- uinfo = u64_to_user_ptr(attr->info.info); - uinfo_len = attr->info.info_len; - - info_copy = min_t(u32, uinfo_len, sizeof(info)); - memset(&info, 0, sizeof(info)); - if (copy_from_user(&info, uinfo, info_copy)) - return -EFAULT; + ret = copy_bpf_btf_info_from_user(attr, &info, &uinfo_len); + if (ret) + return ret;
info.id = btf->id; ubtf = u64_to_user_ptr(info.btf); @@ -7268,9 +7332,9 @@ int btf_get_info_by_fd(const struct btf *btf, } }
- if (copy_to_user(uinfo, &info, info_copy) || - put_user(info_copy, &uattr->info.info_len)) - return -EFAULT; + ret = copy_bpf_btf_info_to_user(attr, uattr, &info, &uinfo_len); + if (ret) + return ret;
return ret; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e3ac922bd62e..17f55deddf48 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3952,28 +3952,157 @@ static int set_info_rec_size(struct bpf_prog_info *info) return 0; }
+static void +convert_compat_prog_info_in(struct bpf_prog_info *dest, + const struct compat_bpf_prog_info *cinfo) +{ + copy_field(dest, cinfo, type); + copy_field(dest, cinfo, id); + strncpy((char *)dest->tag, (char *)cinfo->tag, BPF_TAG_SIZE); + copy_field(dest, cinfo, jited_prog_len); + copy_field(dest, cinfo, xlated_prog_len); + copy_field(dest, cinfo, jited_prog_insns); + copy_field(dest, cinfo, xlated_prog_insns); + copy_field(dest, cinfo, load_time); + copy_field(dest, cinfo, created_by_uid); + copy_field(dest, cinfo, nr_map_ids); + copy_field(dest, cinfo, map_ids); + strncpy((char *)dest->name, (char *)cinfo->name, BPF_OBJ_NAME_LEN); + copy_field(dest, cinfo, ifindex); + copy_field(dest, cinfo, gpl_compatible); + copy_field(dest, cinfo, netns_dev); + copy_field(dest, cinfo, netns_ino); + copy_field(dest, cinfo, nr_jited_ksyms); + copy_field(dest, cinfo, nr_jited_func_lens); + copy_field(dest, cinfo, jited_ksyms); + copy_field(dest, cinfo, jited_func_lens); + copy_field(dest, cinfo, btf_id); + copy_field(dest, cinfo, func_info_rec_size); + copy_field(dest, cinfo, func_info); + copy_field(dest, cinfo, nr_func_info); + copy_field(dest, cinfo, nr_line_info); + copy_field(dest, cinfo, line_info); + copy_field(dest, cinfo, jited_line_info); + copy_field(dest, cinfo, nr_jited_line_info); + copy_field(dest, cinfo, line_info_rec_size); + copy_field(dest, cinfo, jited_line_info_rec_size); + copy_field(dest, cinfo, nr_prog_tags); + copy_field(dest, cinfo, prog_tags); + copy_field(dest, cinfo, run_time_ns); + copy_field(dest, cinfo, run_cnt); + copy_field(dest, cinfo, recursion_misses); + copy_field(dest, cinfo, verified_insns); + copy_field(dest, cinfo, attach_btf_obj_id); + copy_field(dest, cinfo, attach_btf_id); +} + +static void +convert_compat_prog_info_out(struct compat_bpf_prog_info *dest, + const struct bpf_prog_info *info) +{ + copy_field(dest, info, type); + copy_field(dest, info, id); + strncpy((char *)dest->tag, (char *)info->tag, BPF_TAG_SIZE); + copy_field(dest, info, jited_prog_len); + copy_field(dest, info, xlated_prog_len); + copy_field(dest, info, jited_prog_insns); + copy_field(dest, info, xlated_prog_insns); + copy_field(dest, info, load_time); + copy_field(dest, info, created_by_uid); + copy_field(dest, info, nr_map_ids); + copy_field(dest, info, map_ids); + strncpy((char *)dest->name, (char *)info->name, BPF_OBJ_NAME_LEN); + copy_field(dest, info, ifindex); + copy_field(dest, info, gpl_compatible); + copy_field(dest, info, netns_dev); + copy_field(dest, info, netns_ino); + copy_field(dest, info, nr_jited_ksyms); + copy_field(dest, info, nr_jited_func_lens); + copy_field(dest, info, jited_ksyms); + copy_field(dest, info, jited_func_lens); + copy_field(dest, info, btf_id); + copy_field(dest, info, func_info_rec_size); + copy_field(dest, info, func_info); + copy_field(dest, info, nr_func_info); + copy_field(dest, info, nr_line_info); + copy_field(dest, info, line_info); + copy_field(dest, info, jited_line_info); + copy_field(dest, info, nr_jited_line_info); + copy_field(dest, info, line_info_rec_size); + copy_field(dest, info, jited_line_info_rec_size); + copy_field(dest, info, nr_prog_tags); + copy_field(dest, info, prog_tags); + copy_field(dest, info, run_time_ns); + copy_field(dest, info, run_cnt); + copy_field(dest, info, recursion_misses); + copy_field(dest, info, verified_insns); + copy_field(dest, info, attach_btf_obj_id); + copy_field(dest, info, attach_btf_id); +} + +static int copy_bpf_prog_info_from_user(const union bpf_attr *attr, + struct bpf_prog_info *info, + u32 *info_len) +{ + struct compat_bpf_prog_info cinfo; + int err; + void *select_info = in_compat64_syscall() ? &cinfo : info; + size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_prog_info) + : sizeof(struct bpf_prog_info); + void __user *uinfo = u64_to_user_ptr(attr->info.info); + *info_len = attr->info.info_len; + + err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), + info_size, *info_len); + if (err) + return err; + *info_len = min_t(u32, info_size, *info_len); + + memset(info, 0, sizeof(struct bpf_prog_info)); + if (copy_from_user(select_info, uinfo, *info_len)) + return -EFAULT; + + if (in_compat64_syscall()) + convert_compat_prog_info_in(info, &cinfo); + + return 0; +} + +static int copy_bpf_prog_info_to_user(const union bpf_attr *attr, + union bpf_attr __user *uattr, + struct bpf_prog_info *info, + u32 *info_len) +{ + struct compat_bpf_prog_info cinfo; + void *select_info = in_compat64_syscall() ? &cinfo : info; + void __user *uinfo = (void __user *)attr->info.info; + + if (in_compat64_syscall()) + convert_compat_prog_info_out(&cinfo, info); + + if (copy_to_user(uinfo, select_info, *info_len) || + bpf_put_uattr(uattr, *info_len, info.info_len)) + return -EFAULT; + + return 0; +} + static int bpf_prog_get_info_by_fd(struct file *file, struct bpf_prog *prog, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_prog_info __user *uinfo = u64_to_user_ptr(attr->info.info); struct btf *attach_btf = bpf_prog_get_target_btf(prog); struct bpf_prog_info info; - u32 info_len = attr->info.info_len; + u32 info_len; struct bpf_prog_kstats stats; char __user *uinsns; u32 ulen; int err;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); + err = copy_bpf_prog_info_from_user(attr, &info, &info_len); if (err) return err; - info_len = min_t(u32, sizeof(info), info_len); - - memset(&info, 0, sizeof(info)); - if (copy_from_user(&info, uinfo, info_len)) - return -EFAULT;
info.type = prog->type; info.id = prog->aux->id; @@ -4234,9 +4363,9 @@ static int bpf_prog_get_info_by_fd(struct file *file, }
done: - if (copy_to_user(uinfo, &info, info_len) || - put_user(info_len, &uattr->info.info_len)) - return -EFAULT; + err = copy_bpf_prog_info_to_user(attr, uattr, &info, &info_len); + if (err) + return err;
return 0; } @@ -4291,15 +4420,134 @@ static int bpf_btf_get_info_by_fd(struct file *file, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info); - u32 info_len = attr->info.info_len; - int err; + return btf_get_info_by_fd(btf, attr, uattr); +} + +static void +convert_compat_link_info_in(struct bpf_link_info *dest, + const struct compat_bpf_link_info *cinfo, + enum bpf_link_type type) +{ + /* + * Only a few fields in bpf_link_info are used as input, the + * rest do not need conversion in + */ + switch (type) { + case BPF_LINK_TYPE_RAW_TRACEPOINT: + copy_field(dest, cinfo, raw_tracepoint.tp_name); + copy_field(dest, cinfo, raw_tracepoint.tp_name_len); + return; + case BPF_LINK_TYPE_ITER: + copy_field(dest, cinfo, iter.target_name); + copy_field(dest, cinfo, iter.target_name_len); + return; + default: + return; + } +} + +static void +convert_compat_link_info_out(struct compat_bpf_link_info *dest, + const struct bpf_link_info *info, + enum bpf_link_type type) +{ + copy_field(dest, info, type); + copy_field(dest, info, id); + copy_field(dest, info, prog_id); + + switch (type) { + case BPF_LINK_TYPE_RAW_TRACEPOINT: + copy_field(dest, info, raw_tracepoint.tp_name); + copy_field(dest, info, raw_tracepoint.tp_name_len); + return; + case BPF_LINK_TYPE_TRACING: + copy_field(dest, info, tracing.attach_type); + copy_field(dest, info, tracing.target_obj_id); + copy_field(dest, info, tracing.target_btf_id); + return; + case BPF_LINK_TYPE_CGROUP: + copy_field(dest, info, cgroup.cgroup_id); + copy_field(dest, info, cgroup.attach_type); + return; + case BPF_LINK_TYPE_ITER: + copy_field(dest, info, iter.target_name); + copy_field(dest, info, iter.target_name_len); + /* remaining struct is fixed size integers, so identical in + * purecap/compat64 - since figuring out what members are active + * is non-trivial, memcpy to the end of the struct */ + memcpy((u8 *)dest+offsetof(struct compat_bpf_link_info, iter.map), + (u8 *)info+offsetof(struct bpf_link_info, iter.map), + offsetofend(struct bpf_link_info, iter.cgroup.order) - + offsetof(struct bpf_link_info, iter.map.map_id)); + return; + case BPF_LINK_TYPE_NETNS: + copy_field(dest, info, netns.netns_ino); + copy_field(dest, info, netns.attach_type); + return; + case BPF_LINK_TYPE_XDP: + copy_field(dest, info, xdp.ifindex); + return; + case BPF_LINK_TYPE_STRUCT_OPS: + copy_field(dest, info, struct_ops.map_id); + return; + case BPF_LINK_TYPE_NETFILTER: + copy_field(dest, info, netfilter.pf); + copy_field(dest, info, netfilter.hooknum); + copy_field(dest, info, netfilter.priority); + copy_field(dest, info, netfilter.flags); + return; + default: + return; + } +}
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(*uinfo), info_len); +static int copy_bpf_link_info_from_user(const union bpf_attr *attr, + struct bpf_link_info *info, + u32 *info_len, + enum bpf_link_type type) +{ + struct compat_bpf_link_info cinfo; + int err; + void *select_info = in_compat64_syscall() ? &cinfo : info; + size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_link_info) + : sizeof(struct bpf_link_info); + void __user *uinfo = u64_to_user_ptr(attr->info.info); + *info_len = attr->info.info_len; + + err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), + info_size, *info_len); if (err) return err; + *info_len = min_t(u32, info_size, *info_len);
- return btf_get_info_by_fd(btf, attr, uattr); + memset(info, 0, sizeof(struct bpf_link_info)); + if (copy_from_user(select_info, uinfo, *info_len)) + return -EFAULT; + + if (in_compat64_syscall()) + convert_compat_link_info_in(info, &cinfo, type); + + return 0; +} + +static int copy_bpf_link_info_to_user(const union bpf_attr *attr, + union bpf_attr __user *uattr, + struct bpf_link_info *info, + u32 *info_len, + enum bpf_link_type type) +{ + struct compat_bpf_link_info cinfo; + void *select_info = in_compat64_syscall() ? &cinfo : info; + void __user *uinfo = (void __user *)attr->info.info; + + if (in_compat64_syscall()) + convert_compat_link_info_out(&cinfo, info, type); + + if (copy_to_user(uinfo, select_info, *info_len) || + bpf_put_uattr(uattr, *info_len, info.info_len)) + return -EFAULT; + + return 0; }
static int bpf_link_get_info_by_fd(struct file *file, @@ -4307,19 +4555,13 @@ static int bpf_link_get_info_by_fd(struct file *file, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_link_info __user *uinfo = u64_to_user_ptr(attr->info.info); struct bpf_link_info info; - u32 info_len = attr->info.info_len; + u32 info_len; int err;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); + err = copy_bpf_link_info_from_user(attr, &info, &info_len, link->type); if (err) return err; - info_len = min_t(u32, sizeof(info), info_len); - - memset(&info, 0, sizeof(info)); - if (copy_from_user(&info, uinfo, info_len)) - return -EFAULT;
info.type = link->type; info.id = link->id; @@ -4332,9 +4574,10 @@ static int bpf_link_get_info_by_fd(struct file *file, return err; }
- if (copy_to_user(uinfo, &info, info_len) || - put_user(info_len, &uattr->info.info_len)) - return -EFAULT; + err = copy_bpf_link_info_to_user(attr, uattr, &info, &info_len, + link->type); + if (err) + return err;
return 0; }
Nit (in title): "xyz" might be misinterpreted :D bpf_*_info would be clearer.
On 12/09/2023 11:51, Zachary Leaf wrote:
[...]
+static int copy_bpf_btf_info_from_user(const union bpf_attr *attr,
struct bpf_btf_info *info,
u32 *info_len)
+{
- struct compat_bpf_btf_info cinfo;
- int err;
- void *select_info = in_compat64_syscall() ? &cinfo : info;
- size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_btf_info)
: sizeof(struct bpf_btf_info);
- void __user *uinfo = u64_to_user_ptr(attr->info.info);
Nit: there is normally always an empty line between declarations and the first non-declaration statement.
- *info_len = attr->info.info_len;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo),
info_size, *info_len);
It took me a long time to figure out why this suddenly appeared - eventually I realised that it got moved from bpf_btf_get_info_by_fd(). It's a small thing but could we still have it in a separate patch? That will also make the diff in syscall.c less noisy around that function.
- if (err)
return err;
- *info_len = min_t(u32, info_size, *info_len);
- memset(info, 0, sizeof(struct bpf_btf_info));
Can use sizeof(*info), similarly to the original code.
Thinking some more about the zeroing, having looked at some other conversion functions, I feel that we might be missing some. There are two situations where zeroing is crucial: 1. When copying in and the user-supplied struct is smaller than the kernel one. If we miss zeroing, then we're going to read garbage from the last struct members. 2. When copying out the entire struct. We should have set all the members at this point, but if we miss zeroing then any potential padding will be garbage. This is even worse than 1. as arbitrary kernel data should never be written to user memory.
2. was not an issue before because that padding was provided by userspace, when we copy_from_user(). Overall because we now have both a native and a compat path, it gets harder to track if the zeroing is right or not. In this case, I think we're missing 1. in the compat case, since we don't zero the compat struct and then read it to initialise members of the native struct. I sense that in general we should zero both the native and compat structs.
It's worth mentioning that because Morello Clang (possibly GCC too) supports it, we get CONFIG_INIT_STACK_ALL_ZERO=y by default (see security/Kconfig.hardening). This means that we won't actually see whether we get zeroing right or not, in fact the codegen should be unchanged. We could verify it using CONFIG_INIT_STACK_ALL_PATTERN=y, but I don't think it's worth the hassle. Let's just try our best to be consistent.
- if (copy_from_user(select_info, uinfo, *info_len))
return -EFAULT;
- if (in_compat64_syscall())
convert_compat_btf_info_in(info, &cinfo);
- return 0;
+}
+static int copy_bpf_btf_info_to_user(const union bpf_attr *attr,
union bpf_attr __user *uattr,
struct bpf_btf_info *info,
u32 *info_len)
+{
- struct compat_bpf_btf_info cinfo;
- void *select_info = in_compat64_syscall() ? &cinfo : info;
- void __user *uinfo = (void __user *)attr->info.info;
That won't build, you could stick to u64_to_user_ptr() as this is temporary anyway.
- if (in_compat64_syscall())
convert_compat_btf_info_out(&cinfo, info);
- if (copy_to_user(uinfo, select_info, *info_len) ||
bpf_put_uattr(uattr, *info_len, info.info_len))
return -EFAULT;
- return 0;
+}
int btf_get_info_by_fd(const struct btf *btf, const union bpf_attr *attr, union bpf_attr __user *uattr) {
- struct bpf_btf_info __user *uinfo; struct bpf_btf_info info;
- u32 info_copy, btf_copy;
- u32 btf_copy; void __user *ubtf; char __user *uname; u32 uinfo_len, uname_len, name_len; int ret = 0;
Nit: better remove the 0 initialiser since ret is now assigned just after.
- uinfo = u64_to_user_ptr(attr->info.info);
- uinfo_len = attr->info.info_len;
- info_copy = min_t(u32, uinfo_len, sizeof(info));
- memset(&info, 0, sizeof(info));
- if (copy_from_user(&info, uinfo, info_copy))
return -EFAULT;
- ret = copy_bpf_btf_info_from_user(attr, &info, &uinfo_len);
- if (ret)
return ret;
info.id = btf->id; ubtf = u64_to_user_ptr(info.btf); @@ -7268,9 +7332,9 @@ int btf_get_info_by_fd(const struct btf *btf, } }
- if (copy_to_user(uinfo, &info, info_copy) ||
put_user(info_copy, &uattr->info.info_len))
return -EFAULT;
- ret = copy_bpf_btf_info_to_user(attr, uattr, &info, &uinfo_len);
There is a semantic change here in case ret is set to something else than 0 (3 lines above). Given that these out functions only return -EFAULT as an error, we may be better off keeping the existing style in general, that is if (copy...) return -EFAULT.
[...]
+static void +convert_compat_link_info_out(struct compat_bpf_link_info *dest,
const struct bpf_link_info *info,
enum bpf_link_type type)
+{
- copy_field(dest, info, type);
- copy_field(dest, info, id);
- copy_field(dest, info, prog_id);
- switch (type) {
- case BPF_LINK_TYPE_RAW_TRACEPOINT:
copy_field(dest, info, raw_tracepoint.tp_name);
copy_field(dest, info, raw_tracepoint.tp_name_len);
return;
- case BPF_LINK_TYPE_TRACING:
copy_field(dest, info, tracing.attach_type);
copy_field(dest, info, tracing.target_obj_id);
copy_field(dest, info, tracing.target_btf_id);
return;
- case BPF_LINK_TYPE_CGROUP:
copy_field(dest, info, cgroup.cgroup_id);
copy_field(dest, info, cgroup.attach_type);
return;
- case BPF_LINK_TYPE_ITER:
copy_field(dest, info, iter.target_name);
copy_field(dest, info, iter.target_name_len);
/* remaining struct is fixed size integers, so identical in
* purecap/compat64 - since figuring out what members are active
Nit: to avoid being too specific in generic code, we could just say "any 64-bit ABI" (excluding 32-bit because alignof(__u64) is 4 on x86_32, unlike any other ABI). This is compat64-specific so 32-bit ABIs are irrelevant.
* is non-trivial, memcpy to the end of the struct */
memcpy((u8 *)dest+offsetof(struct compat_bpf_link_info, iter.map),
(u8 *)info+offsetof(struct bpf_link_info, iter.map),
offsetofend(struct bpf_link_info, iter.cgroup.order) -
offsetof(struct bpf_link_info, iter.map.map_id));
We could slightly simplify this by referring to the structs and not the innermost members, that is iter.map instead of iter.map_id and iter.cgroup instead of iter.cgroup.order. The latter may result in a slightly larger copy size due to padding, but that's not an issue because we (should) have zeroed that padding to start with (see comment above). In any case, this memcpy() relies on any padding being initialised.
Kevin
On 15/09/2023 15:47, Kevin Brodsky wrote:
Nit (in title): "xyz" might be misinterpreted :D bpf_*_info would be clearer.
I went with bpf_{btf,prog,link}_info to be even clearer (:
On 12/09/2023 11:51, Zachary Leaf wrote:
[...]
+static int copy_bpf_btf_info_from_user(const union bpf_attr *attr,
struct bpf_btf_info *info,
u32 *info_len)
+{
- struct compat_bpf_btf_info cinfo;
- int err;
- void *select_info = in_compat64_syscall() ? &cinfo : info;
- size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_btf_info)
: sizeof(struct bpf_btf_info);
- void __user *uinfo = u64_to_user_ptr(attr->info.info);
Nit: there is normally always an empty line between declarations and the first non-declaration statement.
- *info_len = attr->info.info_len;
- err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo),
info_size, *info_len);
It took me a long time to figure out why this suddenly appeared - eventually I realised that it got moved from bpf_btf_get_info_by_fd(). It's a small thing but could we still have it in a separate patch? That will also make the diff in syscall.c less noisy around that function.
Sure
- if (err)
return err;
- *info_len = min_t(u32, info_size, *info_len);
- memset(info, 0, sizeof(struct bpf_btf_info));
Can use sizeof(*info), similarly to the original code.
Done
Thinking some more about the zeroing, having looked at some other conversion functions, I feel that we might be missing some. There are two situations where zeroing is crucial:
- When copying in and the user-supplied struct is smaller than the
kernel one. If we miss zeroing, then we're going to read garbage from the last struct members.
I think you're right. Thinking out loud:
We're handling different userspace struct sizes by getting the size from userspace, and using that if it's smaller i.e. *info_len = min_t(u32, kernel_size, user_size);
So if the user supplied struct is smaller, then we'd only copy in user_size bytes. That part is fine.
The problem is now when we're doing the compat64 conversion, and copying fields from that compat struct into a native struct. For the last remaining members, if we never zero initialised with memset and never copied in those last fields because userspace had a smaller struct, we could copy in garbage there.
So agree we need to zero initialise there, because we don't want garbage in those fields to be used by the kernel.
I think those fields shouldn't ever get written out. When we do copy_to_user, we would again use the userspace provided size if it was smaller. Still - it's a problem because the garbage could be used elsewhere.
- When copying out the entire struct. We should have set all the
members at this point, but if we miss zeroing then any potential padding will be garbage. This is even worse than 1. as arbitrary kernel data should never be written to user memory.
Agreed, the padding could then contain garbage and get written out, so this is a problem. We need to zero initialise before writing out.
- was not an issue before because that padding was provided by
userspace, when we copy_from_user(). Overall because we now have both a native and a compat path, it gets harder to track if the zeroing is right or not. In this case, I think we're missing 1. in the compat case, since we don't zero the compat struct and then read it to initialise members of the native struct. I sense that in general we should zero both the native and compat structs.
In that case we need to explicitly memset() rather than initialise the struct with {0}, since the latter doesn't set the padding.
So we end up with if (in_compat64_syscall()) memset(cinfo, 0, sizeof(cinfo));
in all the copy to/from functions.
Similar thing for bpf_attr in relevant commit.
It's worth mentioning that because Morello Clang (possibly GCC too) supports it, we get CONFIG_INIT_STACK_ALL_ZERO=y by default (see security/Kconfig.hardening). This means that we won't actually see whether we get zeroing right or not, in fact the codegen should be unchanged. We could verify it using CONFIG_INIT_STACK_ALL_PATTERN=y, but I don't think it's worth the hassle. Let's just try our best to be consistent.
Okay thanks, good to know about security/Kconfig.hardening.
- if (copy_from_user(select_info, uinfo, *info_len))
return -EFAULT;
- if (in_compat64_syscall())
convert_compat_btf_info_in(info, &cinfo);
- return 0;
+}
+static int copy_bpf_btf_info_to_user(const union bpf_attr *attr,
union bpf_attr __user *uattr,
struct bpf_btf_info *info,
u32 *info_len)
+{
- struct compat_bpf_btf_info cinfo;
- void *select_info = in_compat64_syscall() ? &cinfo : info;
- void __user *uinfo = (void __user *)attr->info.info;
That won't build, you could stick to u64_to_user_ptr() as this is temporary anyway.
Ah yep. Missed this one or it somehow found it's way back in.
- if (in_compat64_syscall())
convert_compat_btf_info_out(&cinfo, info);
- if (copy_to_user(uinfo, select_info, *info_len) ||
bpf_put_uattr(uattr, *info_len, info.info_len))
return -EFAULT;
- return 0;
+}
int btf_get_info_by_fd(const struct btf *btf, const union bpf_attr *attr, union bpf_attr __user *uattr) {
- struct bpf_btf_info __user *uinfo; struct bpf_btf_info info;
- u32 info_copy, btf_copy;
- u32 btf_copy; void __user *ubtf; char __user *uname; u32 uinfo_len, uname_len, name_len; int ret = 0;
Nit: better remove the 0 initialiser since ret is now assigned just after.
Done.
- uinfo = u64_to_user_ptr(attr->info.info);
- uinfo_len = attr->info.info_len;
- info_copy = min_t(u32, uinfo_len, sizeof(info));
- memset(&info, 0, sizeof(info));
- if (copy_from_user(&info, uinfo, info_copy))
return -EFAULT;
- ret = copy_bpf_btf_info_from_user(attr, &info, &uinfo_len);
- if (ret)
return ret;
info.id = btf->id; ubtf = u64_to_user_ptr(info.btf); @@ -7268,9 +7332,9 @@ int btf_get_info_by_fd(const struct btf *btf, } }
- if (copy_to_user(uinfo, &info, info_copy) ||
put_user(info_copy, &uattr->info.info_len))
return -EFAULT;
- ret = copy_bpf_btf_info_to_user(attr, uattr, &info, &uinfo_len);
There is a semantic change here in case ret is set to something else than 0 (3 lines above). Given that these out functions only return -EFAULT as an error, we may be better off keeping the existing style in general, that is if (copy...) return -EFAULT.
Yeah I see. Agreed for copy..**to** here. For copy_**from**, then bpf_check_uarg_tail_zero() can return something other than -EFAULT so need to keep that as is.
[...]
+static void +convert_compat_link_info_out(struct compat_bpf_link_info *dest,
const struct bpf_link_info *info,
enum bpf_link_type type)
+{
- copy_field(dest, info, type);
- copy_field(dest, info, id);
- copy_field(dest, info, prog_id);
- switch (type) {
- case BPF_LINK_TYPE_RAW_TRACEPOINT:
copy_field(dest, info, raw_tracepoint.tp_name);
copy_field(dest, info, raw_tracepoint.tp_name_len);
return;
- case BPF_LINK_TYPE_TRACING:
copy_field(dest, info, tracing.attach_type);
copy_field(dest, info, tracing.target_obj_id);
copy_field(dest, info, tracing.target_btf_id);
return;
- case BPF_LINK_TYPE_CGROUP:
copy_field(dest, info, cgroup.cgroup_id);
copy_field(dest, info, cgroup.attach_type);
return;
- case BPF_LINK_TYPE_ITER:
copy_field(dest, info, iter.target_name);
copy_field(dest, info, iter.target_name_len);
/* remaining struct is fixed size integers, so identical in
* purecap/compat64 - since figuring out what members are active
Nit: to avoid being too specific in generic code, we could just say "any 64-bit ABI" (excluding 32-bit because alignof(__u64) is 4 on x86_32, unlike any other ABI). This is compat64-specific so 32-bit ABIs are irrelevant.
Done
* is non-trivial, memcpy to the end of the struct */
memcpy((u8 *)dest+offsetof(struct compat_bpf_link_info, iter.map),
(u8 *)info+offsetof(struct bpf_link_info, iter.map),
offsetofend(struct bpf_link_info, iter.cgroup.order) -
offsetof(struct bpf_link_info, iter.map.map_id));
We could slightly simplify this by referring to the structs and not the innermost members, that is iter.map instead of iter.map_id and iter.cgroup instead of iter.cgroup.order. The latter may result in a slightly larger copy size due to padding, but that's not an issue because we (should) have zeroed that padding to start with (see comment above). In any case, this memcpy() relies on any padding being initialised.
That works too.
Thanks,
Zach
Kevin
On 26/09/2023 11:44, Zachary Leaf wrote:
In that case we need to explicitly memset() rather than initialise the struct with {0}, since the latter doesn't set the padding.
Well that's fascinating, I hadn't realised that! Found a nice blog post [1] regarding that can of worms. TL;DR is that padding does get zero-init at -O2 with both Clang and GCC, but that is not required by the standard (at least not yet). So yes, an explicit memset() is a good idea.
Kevin
[1] https://interrupt.memfault.com/blog/c-struct-padding-initialization
The CHECK_ATTR macro needs adapting to support union compat_bpf_attr as well as union bpf_attr.
The union bpf_attr passed in from userspace contains many structs of various sizes for each of the bpf syscall's multiplexed options. Since most of the structs do not take up the full memory footprint of the union, we should not have any data past the last struct member for any given option.
The CHECK_ATTR macro checks for non-null data in the gap between the end of the last member and the end of the union. Any non-zero data signals a malformed request from userspace so should be rejected.
Adapt the macro to be able to handle the compat64 offsets based on in_compat64_syscall().
Since compat64 conversion involves copying out individual fields to a new native sized union, we need to check for invalid data before the conversion occurs i.e. in copy_bpf_attr_from_user.
This presents a problem since the existing macro relies on the string XYZ_LAST_FIELD called inside each sub-command handler. However at the __sys_bpf + copy_bpf_attr_from_user stage we have only int cmd to distinguish subcommands as per enum bpf_cmd, and: 1. there is no way to go cleanly from int to enum string/name 2. enum bpf_cmd does not cleanly match 1:1 in all cases for XYZ_LAST_FIELD
We can either: a) rework/refactor XYZ_LAST_FIELD system to encode by int cmd b) add a check_attr function with case statement to call the macro with the correct XYZ_LAST_FIELD argument
The latter is preferred here for being the slightly easier and cleaner approach. Rather than refactor the entire system, abstract the details away in a check_attr() function called from copy_bpf_attr_from_user().
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- kernel/bpf/syscall.c | 177 +++++++++++++++++++++---------------------- 1 file changed, 87 insertions(+), 90 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 17f55deddf48..89c202b69f6c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -974,13 +974,17 @@ int bpf_get_file_flag(int flags) return O_RDWR; }
-/* helper macro to check that unused fields 'union bpf_attr' are zero */ +#define __CHECK_ATTR(CMD, TYPE) \ + (memchr_inv((void *) &(((TYPE *)vattr)->CMD##_LAST_FIELD) + \ + sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD), 0, \ + sizeof(*(TYPE *)vattr) - \ + offsetof(TYPE, CMD##_LAST_FIELD) - \ + sizeof(((TYPE *)vattr)->CMD##_LAST_FIELD)) != NULL) + #define CHECK_ATTR(CMD) \ - memchr_inv((void *) &attr->CMD##_LAST_FIELD + \ - sizeof(attr->CMD##_LAST_FIELD), 0, \ - sizeof(*attr) - \ - offsetof(union bpf_attr, CMD##_LAST_FIELD) - \ - sizeof(attr->CMD##_LAST_FIELD)) != NULL + (in_compat64_syscall() ? \ + __CHECK_ATTR(CMD, union compat_bpf_attr) : \ + __CHECK_ATTR(CMD, union bpf_attr))
/* dst and src must have at least "size" number of bytes. * Return strlen on success and < 0 on error. @@ -1134,10 +1138,6 @@ static int map_create(union bpf_attr *attr) int f_flags; int err;
- err = CHECK_ATTR(BPF_MAP_CREATE); - if (err) - return -EINVAL; - if (attr->btf_vmlinux_value_type_id) { if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS || attr->btf_key_type_id || attr->btf_value_type_id) @@ -1367,9 +1367,6 @@ static int map_lookup_elem(union bpf_attr *attr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM)) - return -EINVAL; - if (attr->flags & ~BPF_F_LOCK) return -EINVAL;
@@ -1442,9 +1439,6 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -1496,9 +1490,6 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr) void *key; int err;
- if (CHECK_ATTR(BPF_MAP_DELETE_ELEM)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -1552,9 +1543,6 @@ static int map_get_next_key(union bpf_attr *attr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_GET_NEXT_KEY)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -1832,9 +1820,6 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) struct fd f; int err;
- if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM)) - return -EINVAL; - if (attr->flags & ~BPF_F_LOCK) return -EINVAL;
@@ -1920,9 +1905,6 @@ static int map_freeze(const union bpf_attr *attr) struct bpf_map *map; struct fd f;
- if (CHECK_ATTR(BPF_MAP_FREEZE)) - return -EINVAL; - f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) @@ -2510,9 +2492,6 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size) char license[128]; bool is_gpl;
- if (CHECK_ATTR(BPF_PROG_LOAD)) - return -EINVAL; - if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ | @@ -2707,7 +2686,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
static int bpf_obj_pin(const union bpf_attr *attr) { - if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0) + if (attr->file_flags != 0) return -EINVAL;
return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname)); @@ -2715,8 +2694,7 @@ static int bpf_obj_pin(const union bpf_attr *attr)
static int bpf_obj_get(const union bpf_attr *attr) { - if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 || - attr->file_flags & ~BPF_OBJ_FLAG_MASK) + if (attr->bpf_fd != 0 || attr->file_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL;
return bpf_obj_get_user(u64_to_user_ptr(attr->pathname), @@ -3411,9 +3389,6 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr) struct bpf_prog *prog; int fd;
- if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN)) - return -EINVAL; - prog = bpf_prog_get(attr->raw_tracepoint.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); @@ -3526,9 +3501,6 @@ static int bpf_prog_attach(const union bpf_attr *attr) struct bpf_prog *prog; int ret;
- if (CHECK_ATTR(BPF_PROG_ATTACH)) - return -EINVAL; - if (attr->attach_flags & ~BPF_F_ATTACH_MASK) return -EINVAL;
@@ -3585,9 +3557,6 @@ static int bpf_prog_detach(const union bpf_attr *attr) { enum bpf_prog_type ptype;
- if (CHECK_ATTR(BPF_PROG_DETACH)) - return -EINVAL; - ptype = attach_type_to_prog_type(attr->attach_type);
switch (ptype) { @@ -3619,8 +3588,6 @@ static int bpf_prog_query(const union bpf_attr *attr, { if (!capable(CAP_NET_ADMIN)) return -EPERM; - if (CHECK_ATTR(BPF_PROG_QUERY)) - return -EINVAL; if (attr->query.query_flags & ~BPF_F_QUERY_EFFECTIVE) return -EINVAL;
@@ -3673,9 +3640,6 @@ static int bpf_prog_test_run(const union bpf_attr *attr, struct bpf_prog *prog; int ret = -ENOTSUPP;
- if (CHECK_ATTR(BPF_PROG_TEST_RUN)) - return -EINVAL; - if ((attr->test.ctx_size_in && !attr->test.ctx_in) || (!attr->test.ctx_size_in && attr->test.ctx_in)) return -EINVAL; @@ -3705,7 +3669,7 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr, u32 next_id = attr->start_id; int err = 0;
- if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX) + if (next_id >= INT_MAX) return -EINVAL;
if (!capable(CAP_SYS_ADMIN)) @@ -3786,9 +3750,6 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr) u32 id = attr->prog_id; int fd;
- if (CHECK_ATTR(BPF_PROG_GET_FD_BY_ID)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -3812,8 +3773,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr) int f_flags; int fd;
- if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID) || - attr->open_flags & ~BPF_OBJ_FLAG_MASK) + if (attr->open_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL;
if (!capable(CAP_SYS_ADMIN)) @@ -4592,9 +4552,6 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr, struct fd f; int err;
- if (CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD)) - return -EINVAL; - f = fdget(ufd); if (!f.file) return -EBADFD; @@ -4621,9 +4578,6 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size) { - if (CHECK_ATTR(BPF_BTF_LOAD)) - return -EINVAL; - if (!bpf_capable()) return -EPERM;
@@ -4634,9 +4588,6 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_
static int bpf_btf_get_fd_by_id(const union bpf_attr *attr) { - if (CHECK_ATTR(BPF_BTF_GET_FD_BY_ID)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -4702,9 +4653,6 @@ static int bpf_task_fd_query(const union bpf_attr *attr, struct file *file; int err;
- if (CHECK_ATTR(BPF_TASK_FD_QUERY)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -4786,9 +4734,6 @@ static int bpf_map_do_batch(const union bpf_attr *attr, int err, ufd; struct fd f;
- if (CHECK_ATTR(BPF_MAP_BATCH)) - return -EINVAL; - ufd = attr->batch.map_fd; f = fdget(ufd); map = __bpf_map_get(f); @@ -4827,9 +4772,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) struct bpf_prog *prog; int ret;
- if (CHECK_ATTR(BPF_LINK_CREATE)) - return -EINVAL; - if (attr->link_create.attach_type == BPF_STRUCT_OPS) return bpf_struct_ops_link_create(attr);
@@ -4978,9 +4920,6 @@ static int link_update(union bpf_attr *attr) u32 flags; int ret;
- if (CHECK_ATTR(BPF_LINK_UPDATE)) - return -EINVAL; - flags = attr->link_update.flags; if (flags & ~BPF_F_REPLACE) return -EINVAL; @@ -5034,9 +4973,6 @@ static int link_detach(union bpf_attr *attr) struct bpf_link *link; int ret;
- if (CHECK_ATTR(BPF_LINK_DETACH)) - return -EINVAL; - link = bpf_link_get_from_fd(attr->link_detach.link_fd); if (IS_ERR(link)) return PTR_ERR(link); @@ -5104,9 +5040,6 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr) u32 id = attr->link_id; int fd;
- if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -5160,9 +5093,6 @@ static int bpf_enable_runtime_stats(void) static int bpf_enable_stats(union bpf_attr *attr) {
- if (CHECK_ATTR(BPF_ENABLE_STATS)) - return -EINVAL; - if (!capable(CAP_SYS_ADMIN)) return -EPERM;
@@ -5182,9 +5112,6 @@ static int bpf_iter_create(union bpf_attr *attr) struct bpf_link *link; int err;
- if (CHECK_ATTR(BPF_ITER_CREATE)) - return -EINVAL; - if (attr->iter_create.flags) return -EINVAL;
@@ -5207,9 +5134,6 @@ static int bpf_prog_bind_map(union bpf_attr *attr) struct bpf_map **used_maps_old, **used_maps_new; int i, ret = 0;
- if (CHECK_ATTR(BPF_PROG_BIND_MAP)) - return -EINVAL; - if (attr->prog_bind_map.flags) return -EINVAL;
@@ -5391,6 +5315,75 @@ static int dispatch_bpf(int cmd, union bpf_attr *attr, bpfptr_t uattr, return err; }
+static int check_attr(int cmd, void *vattr) +{ + switch (cmd) { + case BPF_MAP_CREATE: + return CHECK_ATTR(BPF_MAP_CREATE); + case BPF_MAP_LOOKUP_ELEM: + return CHECK_ATTR(BPF_MAP_LOOKUP_ELEM); + case BPF_MAP_GET_NEXT_KEY: + return CHECK_ATTR(BPF_MAP_GET_NEXT_KEY); + case BPF_MAP_FREEZE: + return CHECK_ATTR(BPF_MAP_FREEZE); + case BPF_PROG_LOAD: + return CHECK_ATTR(BPF_PROG_LOAD); + case BPF_OBJ_PIN: + return CHECK_ATTR(BPF_OBJ); + case BPF_OBJ_GET: + return CHECK_ATTR(BPF_OBJ); + case BPF_PROG_ATTACH: + return CHECK_ATTR(BPF_PROG_ATTACH); + case BPF_PROG_DETACH: + return CHECK_ATTR(BPF_PROG_DETACH); + case BPF_PROG_QUERY: + return CHECK_ATTR(BPF_PROG_QUERY); + case BPF_PROG_TEST_RUN: + return CHECK_ATTR(BPF_PROG_TEST_RUN); + case BPF_PROG_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_MAP_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_BTF_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_PROG_GET_FD_BY_ID: + return CHECK_ATTR(BPF_PROG_GET_FD_BY_ID); + case BPF_MAP_GET_FD_BY_ID: + return CHECK_ATTR(BPF_MAP_GET_FD_BY_ID); + case BPF_OBJ_GET_INFO_BY_FD: + return CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD); + case BPF_RAW_TRACEPOINT_OPEN: + return CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN); + case BPF_BTF_LOAD: + return CHECK_ATTR(BPF_BTF_LOAD); + case BPF_BTF_GET_FD_BY_ID: + return CHECK_ATTR(BPF_BTF_GET_FD_BY_ID); + case BPF_TASK_FD_QUERY: + return CHECK_ATTR(BPF_TASK_FD_QUERY); + case BPF_MAP_LOOKUP_AND_DELETE_ELEM: + return CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM); + case BPF_LINK_CREATE: + return CHECK_ATTR(BPF_LINK_CREATE); + case BPF_LINK_UPDATE: + return CHECK_ATTR(BPF_LINK_UPDATE); + case BPF_LINK_GET_FD_BY_ID: + return CHECK_ATTR(BPF_LINK_GET_FD_BY_ID); + case BPF_LINK_GET_NEXT_ID: + return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID); + case BPF_ENABLE_STATS: + return CHECK_ATTR(BPF_ENABLE_STATS); + case BPF_ITER_CREATE: + return CHECK_ATTR(BPF_ITER_CREATE); + case BPF_LINK_DETACH: + return CHECK_ATTR(BPF_LINK_DETACH); + case BPF_PROG_BIND_MAP: + return CHECK_ATTR(BPF_PROG_BIND_MAP); + default: + return 0; + } + +} + static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf_attr *cattr, int cmd) { @@ -5691,6 +5684,10 @@ static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, if (copy_from_bpfptr(select_attr, uattr, *size) != 0) return -EFAULT;
+ err = check_attr(cmd, select_attr); + if (err) + return -EINVAL; + if (in_compat64_syscall()) convert_compat_bpf_attr(attr, &cattr, cmd);
On 12/09/2023 11:51, Zachary Leaf wrote:
[...]
+static int check_attr(int cmd, void *vattr) +{
- switch (cmd) {
- case BPF_MAP_CREATE:
return CHECK_ATTR(BPF_MAP_CREATE);
- case BPF_MAP_LOOKUP_ELEM:
return CHECK_ATTR(BPF_MAP_LOOKUP_ELEM);
- case BPF_MAP_GET_NEXT_KEY:
return CHECK_ATTR(BPF_MAP_GET_NEXT_KEY);
- case BPF_MAP_FREEZE:
return CHECK_ATTR(BPF_MAP_FREEZE);
- case BPF_PROG_LOAD:
return CHECK_ATTR(BPF_PROG_LOAD);
- case BPF_OBJ_PIN:
return CHECK_ATTR(BPF_OBJ);
- case BPF_OBJ_GET:
return CHECK_ATTR(BPF_OBJ);
- case BPF_PROG_ATTACH:
return CHECK_ATTR(BPF_PROG_ATTACH);
- case BPF_PROG_DETACH:
return CHECK_ATTR(BPF_PROG_DETACH);
- case BPF_PROG_QUERY:
return CHECK_ATTR(BPF_PROG_QUERY);
- case BPF_PROG_TEST_RUN:
return CHECK_ATTR(BPF_PROG_TEST_RUN);
- case BPF_PROG_GET_NEXT_ID:
return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID);
- case BPF_MAP_GET_NEXT_ID:
return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID);
- case BPF_BTF_GET_NEXT_ID:
return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID);
- case BPF_PROG_GET_FD_BY_ID:
return CHECK_ATTR(BPF_PROG_GET_FD_BY_ID);
- case BPF_MAP_GET_FD_BY_ID:
return CHECK_ATTR(BPF_MAP_GET_FD_BY_ID);
- case BPF_OBJ_GET_INFO_BY_FD:
return CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD);
- case BPF_RAW_TRACEPOINT_OPEN:
return CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN);
- case BPF_BTF_LOAD:
return CHECK_ATTR(BPF_BTF_LOAD);
- case BPF_BTF_GET_FD_BY_ID:
return CHECK_ATTR(BPF_BTF_GET_FD_BY_ID);
- case BPF_TASK_FD_QUERY:
return CHECK_ATTR(BPF_TASK_FD_QUERY);
- case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
return CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM);
- case BPF_LINK_CREATE:
return CHECK_ATTR(BPF_LINK_CREATE);
- case BPF_LINK_UPDATE:
return CHECK_ATTR(BPF_LINK_UPDATE);
- case BPF_LINK_GET_FD_BY_ID:
return CHECK_ATTR(BPF_LINK_GET_FD_BY_ID);
- case BPF_LINK_GET_NEXT_ID:
return CHECK_ATTR(BPF_OBJ_GET_NEXT_ID);
- case BPF_ENABLE_STATS:
return CHECK_ATTR(BPF_ENABLE_STATS);
- case BPF_ITER_CREATE:
return CHECK_ATTR(BPF_ITER_CREATE);
- case BPF_LINK_DETACH:
return CHECK_ATTR(BPF_LINK_DETACH);
- case BPF_PROG_BIND_MAP:
return CHECK_ATTR(BPF_PROG_BIND_MAP);
- default:
return 0;
- }
Nit: no empty line between curly brackets.
Kevin
+}
static void convert_compat_bpf_attr(union bpf_attr *dest, const union compat_bpf_attr *cattr, int cmd) { @@ -5691,6 +5684,10 @@ static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, if (copy_from_bpfptr(select_attr, uattr, *size) != 0) return -EFAULT;
- err = check_attr(cmd, select_attr);
- if (err)
return -EINVAL;
- if (in_compat64_syscall()) convert_compat_bpf_attr(attr, &cattr, cmd);
In PCuABI, when copying a block of memory to/from userspace containing capabilities/pointers, the copy_{to,from}_user_with_ptr variants need to be used to ensure capabilities are preserved in full.
Introduce bpf specific helper methods/macros to support this.
Since compat64 syscalls do not contain capabilities, the _with_ptr variants are not required in the in_compat64_syscall() case.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- include/linux/bpf_compat.h | 8 ++++++++ include/linux/bpfptr.h | 13 +++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 710815417a27..9e16f7f5eda6 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -1,6 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2023 Arm Ltd */
+#define bpf_copy_from_user_with_ptr(dest, src, size) \ + (in_compat64_syscall() ? copy_from_user(dest, src, size) \ + : copy_from_user_with_ptr(dest, src, size)) + +#define bpf_copy_to_user_with_ptr(dest, src, size) \ + (in_compat64_syscall() ? copy_to_user(dest, src, size) \ + : copy_to_user_with_ptr(dest, src, size)) + union compat_bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 7fdf9692d76e..f6828c5f7858 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -62,11 +62,24 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_kernel_nofault(dst, src.kernel + offset, size); }
+static inline int copy_from_bpfptr_offset_with_ptr(void *dst, bpfptr_t src, + size_t offset, size_t size) +{ + if (!bpfptr_is_kernel(src)) + return copy_from_user_with_ptr(dst, src.user + offset, size); + return copy_from_kernel_nofault(dst, src.kernel + offset, size); +} + static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); }
+static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{ + return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size); +} + static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) {
On 12/09/2023 11:51, Zachary Leaf wrote:
In PCuABI, when copying a block of memory to/from userspace containing capabilities/pointers, the copy_{to,from}_user_with_ptr variants need to be used to ensure capabilities are preserved in full.
Introduce bpf specific helper methods/macros to support this.
Since compat64 syscalls do not contain capabilities, the _with_ptr variants are not required in the in_compat64_syscall() case.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
include/linux/bpf_compat.h | 8 ++++++++ include/linux/bpfptr.h | 13 +++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 710815417a27..9e16f7f5eda6 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -1,6 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2023 Arm Ltd */ +#define bpf_copy_from_user_with_ptr(dest, src, size) \
- (in_compat64_syscall() ? copy_from_user(dest, src, size) \
: copy_from_user_with_ptr(dest, src, size))
+#define bpf_copy_to_user_with_ptr(dest, src, size) \
- (in_compat64_syscall() ? copy_to_user(dest, src, size) \
: copy_to_user_with_ptr(dest, src, size))
I think these belong to bpf.h, like bpf_put_uattr and friends.
Kevin
union compat_bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 7fdf9692d76e..f6828c5f7858 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -62,11 +62,24 @@ static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, return copy_from_kernel_nofault(dst, src.kernel + offset, size); } +static inline int copy_from_bpfptr_offset_with_ptr(void *dst, bpfptr_t src,
size_t offset, size_t size)
+{
- if (!bpfptr_is_kernel(src))
return copy_from_user_with_ptr(dst, src.user + offset, size);
- return copy_from_kernel_nofault(dst, src.kernel + offset, size);
+}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) { return copy_from_bpfptr_offset(dst, src, 0, size); } +static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) +{
- return copy_from_bpfptr_offset_with_ptr(dst, src, 0, size);
+}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset, const void *src, size_t size) {
union bpf_attr and structs bpf_xyz_info are used to pass config to the kernel or return info to userspace via the bpf syscall.
In order to avoid needing a separate 32-bit compat handler for the bpf syscall, __user pointers inside these union/structs are stored as __u64 or __aligned_u64.
In Morello PCuABI a user pointer is a 129-bit capability so __aligned_u64 type is not big enough to hold it. Use type __kernel_aligned_uintptr_t instead, which is big enough on the affected architectures while remaining __aligned_64 on others.
Use copy_{to,from}_user_with_ptr variants where blocks of memory containing pointers are being copied to preserve capabilities.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- drivers/media/rc/bpf-lirc.c | 2 +- include/linux/bpf_compat.h | 3 + include/linux/bpfptr.h | 6 +- include/uapi/linux/bpf.h | 94 +++++++++++----------- kernel/bpf/bpf_iter.c | 2 +- kernel/bpf/btf.c | 16 ++-- kernel/bpf/cgroup.c | 5 +- kernel/bpf/hashtab.c | 8 +- kernel/bpf/net_namespace.c | 2 +- kernel/bpf/offload.c | 2 +- kernel/bpf/syscall.c | 138 +++++++++++++++++---------------- kernel/bpf/verifier.c | 3 +- kernel/trace/bpf_trace.c | 6 +- net/bpf/bpf_dummy_struct_ops.c | 6 +- net/bpf/test_run.c | 16 ++-- net/core/sock_map.c | 2 +- 16 files changed, 161 insertions(+), 150 deletions(-)
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index f419d7452295..093ec66dbedd 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -295,7 +295,7 @@ int lirc_prog_detach(const union bpf_attr *attr)
int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; struct bpf_prog_array *progs; struct rc_dev *rcdev; u32 cnt, flags = 0; diff --git a/include/linux/bpf_compat.h b/include/linux/bpf_compat.h index 9e16f7f5eda6..0c1c299deebf 100644 --- a/include/linux/bpf_compat.h +++ b/include/linux/bpf_compat.h @@ -1,6 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2023 Arm Ltd */
+#define bpf_uattr_compat_ptr(DEST, SRC, FIELD) \ + ((DEST)->FIELD = (__kernel_aligned_uintptr_t)compat_ptr((SRC)->FIELD)) + #define bpf_copy_from_user_with_ptr(dest, src, size) \ (in_compat64_syscall() ? copy_from_user(dest, src, size) \ : copy_from_user_with_ptr(dest, src, size)) diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index f6828c5f7858..485deaf96dcb 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -31,12 +31,12 @@ static inline bpfptr_t USER_BPFPTR(void __user *p) return (bpfptr_t) { .user = p }; }
-static inline bpfptr_t make_bpfptr(u64 addr, bool is_kernel) +static inline bpfptr_t make_bpfptr(__kernel_uintptr_t ptr, bool is_kernel) { if (is_kernel) - return KERNEL_BPFPTR((void*) (uintptr_t) addr); + return KERNEL_BPFPTR((void *)(uintptr_t)ptr); else - return USER_BPFPTR(u64_to_user_ptr(addr)); + return USER_BPFPTR((void __user *)ptr); }
static inline bool bpfptr_is_null(bpfptr_t bpfptr) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c994ff5b157c..a536d8f8805e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1350,21 +1350,21 @@ union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ __u32 map_fd; - __aligned_u64 key; + __kernel_aligned_uintptr_t key; union { - __aligned_u64 value; - __aligned_u64 next_key; + __kernel_aligned_uintptr_t value; + __kernel_aligned_uintptr_t next_key; }; __u64 flags; };
struct { /* struct used by BPF_MAP_*_BATCH commands */ - __aligned_u64 in_batch; /* start batch, - * NULL to start from beginning - */ - __aligned_u64 out_batch; /* output: next start batch */ - __aligned_u64 keys; - __aligned_u64 values; + /* start batch, NULL to start from beginning */ + __kernel_aligned_uintptr_t in_batch; + /* output: next start batch */ + __kernel_aligned_uintptr_t out_batch; + __kernel_aligned_uintptr_t keys; + __kernel_aligned_uintptr_t values; __u32 count; /* input/output: * input: # of key/value * elements @@ -1378,11 +1378,11 @@ union bpf_attr { struct { /* anonymous struct used by BPF_PROG_LOAD command */ __u32 prog_type; /* one of enum bpf_prog_type */ __u32 insn_cnt; - __aligned_u64 insns; - __aligned_u64 license; + __kernel_aligned_uintptr_t insns; + __kernel_aligned_uintptr_t license; __u32 log_level; /* verbosity level of verifier */ __u32 log_size; /* size of user buffer */ - __aligned_u64 log_buf; /* user supplied buffer */ + __kernel_aligned_uintptr_t log_buf; /* user supplied buffer */ __u32 kern_version; /* not used */ __u32 prog_flags; char prog_name[BPF_OBJ_NAME_LEN]; @@ -1394,10 +1394,10 @@ union bpf_attr { __u32 expected_attach_type; __u32 prog_btf_fd; /* fd pointing to BTF type data */ __u32 func_info_rec_size; /* userspace bpf_func_info size */ - __aligned_u64 func_info; /* func info */ + __kernel_aligned_uintptr_t func_info; /* func info */ __u32 func_info_cnt; /* number of bpf_func_info records */ __u32 line_info_rec_size; /* userspace bpf_line_info size */ - __aligned_u64 line_info; /* line info */ + __kernel_aligned_uintptr_t line_info; /* line info */ __u32 line_info_cnt; /* number of bpf_line_info records */ __u32 attach_btf_id; /* in-kernel BTF type id to attach to */ union { @@ -1407,8 +1407,8 @@ union bpf_attr { __u32 attach_btf_obj_fd; }; __u32 core_relo_cnt; /* number of bpf_core_relo */ - __aligned_u64 fd_array; /* array of FDs */ - __aligned_u64 core_relos; + __kernel_aligned_uintptr_t fd_array; /* array of FDs */ + __kernel_aligned_uintptr_t core_relos; __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ /* output: actual total log contents size (including termintaing zero). * It could be both larger than original log_size (if log was @@ -1418,7 +1418,7 @@ union bpf_attr { };
struct { /* anonymous struct used by BPF_OBJ_* commands */ - __aligned_u64 pathname; + __kernel_aligned_uintptr_t pathname; __u32 bpf_fd; __u32 file_flags; }; @@ -1442,8 +1442,8 @@ union bpf_attr { * returns ENOSPC if data_out * is too small. */ - __aligned_u64 data_in; - __aligned_u64 data_out; + __kernel_aligned_uintptr_t data_in; + __kernel_aligned_uintptr_t data_out; __u32 repeat; __u32 duration; __u32 ctx_size_in; /* input: len of ctx_in */ @@ -1451,8 +1451,8 @@ union bpf_attr { * returns ENOSPC if ctx_out * is too small. */ - __aligned_u64 ctx_in; - __aligned_u64 ctx_out; + __kernel_aligned_uintptr_t ctx_in; + __kernel_aligned_uintptr_t ctx_out; __u32 flags; __u32 cpu; __u32 batch_size; @@ -1473,7 +1473,7 @@ union bpf_attr { struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ __u32 bpf_fd; __u32 info_len; - __aligned_u64 info; + __kernel_aligned_uintptr_t info; } info;
struct { /* anonymous struct used by BPF_PROG_QUERY command */ @@ -1481,22 +1481,22 @@ union bpf_attr { __u32 attach_type; __u32 query_flags; __u32 attach_flags; - __aligned_u64 prog_ids; + __kernel_aligned_uintptr_t prog_ids; __u32 prog_cnt; /* output: per-program attach_flags. * not allowed to be set during effective query. */ - __aligned_u64 prog_attach_flags; + __kernel_aligned_uintptr_t prog_attach_flags; } query;
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ - __u64 name; + __kernel_aligned_uintptr_t name; __u32 prog_fd; } raw_tracepoint;
struct { /* anonymous struct for BPF_BTF_LOAD */ - __aligned_u64 btf; - __aligned_u64 btf_log_buf; + __kernel_aligned_uintptr_t btf; + __kernel_aligned_uintptr_t btf_log_buf; __u32 btf_size; __u32 btf_log_size; __u32 btf_log_level; @@ -1512,7 +1512,7 @@ union bpf_attr { __u32 fd; /* input: fd */ __u32 flags; /* input: flags */ __u32 buf_len; /* input/output: buf len */ - __aligned_u64 buf; /* input/output: + __kernel_aligned_uintptr_t buf; /* input/output: * tp_name for tracepoint * symbol for kprobe * filename for uprobe @@ -1537,8 +1537,10 @@ union bpf_attr { union { __u32 target_btf_id; /* btf_id of target to attach to */ struct { - __aligned_u64 iter_info; /* extra bpf_iter_link_info */ - __u32 iter_info_len; /* iter_info length */ + /* extra bpf_iter_link_info */ + __kernel_aligned_uintptr_t iter_info; + /* iter_info length */ + __u32 iter_info_len; }; struct { /* black box user-provided value passed through @@ -1550,9 +1552,9 @@ union bpf_attr { struct { __u32 flags; __u32 cnt; - __aligned_u64 syms; - __aligned_u64 addrs; - __aligned_u64 cookies; + __kernel_aligned_uintptr_t syms; + __kernel_aligned_uintptr_t addrs; + __kernel_aligned_uintptr_t cookies; } kprobe_multi; struct { /* this is overlaid with the target_btf_id above. */ @@ -6303,12 +6305,12 @@ struct bpf_prog_info { __u8 tag[BPF_TAG_SIZE]; __u32 jited_prog_len; __u32 xlated_prog_len; - __aligned_u64 jited_prog_insns; - __aligned_u64 xlated_prog_insns; + __kernel_aligned_uintptr_t jited_prog_insns; + __kernel_aligned_uintptr_t xlated_prog_insns; __u64 load_time; /* ns since boottime */ __u32 created_by_uid; __u32 nr_map_ids; - __aligned_u64 map_ids; + __kernel_aligned_uintptr_t map_ids; char name[BPF_OBJ_NAME_LEN]; __u32 ifindex; __u32 gpl_compatible:1; @@ -6317,20 +6319,20 @@ struct bpf_prog_info { __u64 netns_ino; __u32 nr_jited_ksyms; __u32 nr_jited_func_lens; - __aligned_u64 jited_ksyms; - __aligned_u64 jited_func_lens; + __kernel_aligned_uintptr_t jited_ksyms; + __kernel_aligned_uintptr_t jited_func_lens; __u32 btf_id; __u32 func_info_rec_size; - __aligned_u64 func_info; + __kernel_aligned_uintptr_t func_info; __u32 nr_func_info; __u32 nr_line_info; - __aligned_u64 line_info; - __aligned_u64 jited_line_info; + __kernel_aligned_uintptr_t line_info; + __kernel_aligned_uintptr_t jited_line_info; __u32 nr_jited_line_info; __u32 line_info_rec_size; __u32 jited_line_info_rec_size; __u32 nr_prog_tags; - __aligned_u64 prog_tags; + __kernel_aligned_uintptr_t prog_tags; __u64 run_time_ns; __u64 run_cnt; __u64 recursion_misses; @@ -6359,10 +6361,10 @@ struct bpf_map_info { } __attribute__((aligned(8)));
struct bpf_btf_info { - __aligned_u64 btf; + __kernel_aligned_uintptr_t btf; __u32 btf_size; __u32 id; - __aligned_u64 name; + __kernel_aligned_uintptr_t name; __u32 name_len; __u32 kernel_btf; } __attribute__((aligned(8))); @@ -6373,7 +6375,7 @@ struct bpf_link_info { __u32 prog_id; union { struct { - __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */ + __kernel_aligned_uintptr_t tp_name; /* in/out: tp_name buffer ptr */ __u32 tp_name_len; /* in/out: tp_name buffer len */ } raw_tracepoint; struct { @@ -6386,7 +6388,7 @@ struct bpf_link_info { __u32 attach_type; } cgroup; struct { - __aligned_u64 target_name; /* in/out: target_name buffer ptr */ + __kernel_aligned_uintptr_t target_name; /* in/out: target_name buffer ptr */ __u32 target_name_len; /* in/out: target_name buffer len */
/* If the iter specific field is 32 bits, it can be put diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 96856f130cbf..669efe46a5a6 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -456,7 +456,7 @@ static int bpf_iter_link_fill_link_info(const struct bpf_link *link, { struct bpf_iter_link *iter_link = container_of(link, struct bpf_iter_link, link); - char __user *ubuf = u64_to_user_ptr(info->iter.target_name); + char __user *ubuf = (char __user *)info->iter.target_name; bpf_iter_fill_link_info_t fill_link_info; u32 ulen = info->iter.target_name_len; const char *target_name; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index caeb98dcf3cf..d050fa1371b2 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -5465,7 +5465,7 @@ static int finalize_log(struct bpf_verifier_log *log, bpfptr_t uattr, u32 uattr_ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size) { bpfptr_t btf_data = make_bpfptr(attr->btf, uattr.is_kernel); - char __user *log_ubuf = u64_to_user_ptr(attr->btf_log_buf); + char __user *log_ubuf = (char __user *)attr->btf_log_buf; struct btf_struct_metas *struct_meta_tab; struct btf_verifier_env *env = NULL; struct btf *btf = NULL; @@ -7218,10 +7218,10 @@ struct btf *btf_get_by_fd(int fd) static void convert_compat_btf_info_in(struct bpf_btf_info *dest, const struct compat_bpf_btf_info *cinfo) { - copy_field(dest, cinfo, btf); + bpf_uattr_compat_ptr(dest, cinfo, btf); copy_field(dest, cinfo, btf_size); copy_field(dest, cinfo, id); - copy_field(dest, cinfo, name); + bpf_uattr_compat_ptr(dest, cinfo, name); copy_field(dest, cinfo, name_len); copy_field(dest, cinfo, kernel_btf); } @@ -7246,7 +7246,7 @@ static int copy_bpf_btf_info_from_user(const union bpf_attr *attr, void *select_info = in_compat64_syscall() ? &cinfo : info; size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_btf_info) : sizeof(struct bpf_btf_info); - void __user *uinfo = u64_to_user_ptr(attr->info.info); + void __user *uinfo = (void __user *)attr->info.info; *info_len = attr->info.info_len;
err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), @@ -7256,7 +7256,7 @@ static int copy_bpf_btf_info_from_user(const union bpf_attr *attr, *info_len = min_t(u32, info_size, *info_len);
memset(info, 0, sizeof(struct bpf_btf_info)); - if (copy_from_user(select_info, uinfo, *info_len)) + if (bpf_copy_from_user_with_ptr(select_info, uinfo, *info_len)) return -EFAULT;
if (in_compat64_syscall()) @@ -7277,7 +7277,7 @@ static int copy_bpf_btf_info_to_user(const union bpf_attr *attr, if (in_compat64_syscall()) convert_compat_btf_info_out(&cinfo, info);
- if (copy_to_user(uinfo, select_info, *info_len) || + if (bpf_copy_to_user_with_ptr(uinfo, select_info, *info_len) || bpf_put_uattr(uattr, *info_len, info.info_len)) return -EFAULT;
@@ -7300,7 +7300,7 @@ int btf_get_info_by_fd(const struct btf *btf, return ret;
info.id = btf->id; - ubtf = u64_to_user_ptr(info.btf); + ubtf = (void __user *)info.btf; btf_copy = min_t(u32, btf->data_size, info.btf_size); if (copy_to_user(ubtf, btf->data, btf_copy)) return -EFAULT; @@ -7308,7 +7308,7 @@ int btf_get_info_by_fd(const struct btf *btf,
info.kernel_btf = btf->kernel_btf;
- uname = u64_to_user_ptr(info.name); + uname = (char __user *)info.name; uname_len = info.name_len; if (!uname ^ !uname_len) return -EINVAL; diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index fa5e67fbbbca..1f00b75cf2f9 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1020,9 +1020,10 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr) { - __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags); + __u32 __user *prog_attach_flags = + (__u32 __user *)attr->query.prog_attach_flags; bool effective_query = attr->query.query_flags & BPF_F_QUERY_EFFECTIVE; - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; enum bpf_attach_type type = attr->query.attach_type; enum cgroup_bpf_attach_type from_atype, to_atype; enum cgroup_bpf_attach_type atype; diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index acd3561a1254..bab4b2cdd1e2 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1667,9 +1667,9 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, struct bpf_htab *htab = container_of(map, struct bpf_htab, map); u32 bucket_cnt, total, key_size, value_size, roundup_key_size; void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val; - void __user *uvalues = u64_to_user_ptr(attr->batch.values); - void __user *ukeys = u64_to_user_ptr(attr->batch.keys); - void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); + void __user *uvalues = (void __user *)attr->batch.values; + void __user *ukeys = (void __user *)attr->batch.keys; + void __user *ubatch = (void __user *)attr->batch.in_batch; u32 batch, max_count, size, bucket_size, map_id; struct htab_elem *node_to_free = NULL; u64 elem_map_flags, map_flags; @@ -1874,7 +1874,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, goto out;
/* copy # of entries and next batch */ - ubatch = u64_to_user_ptr(attr->batch.out_batch); + ubatch = (void __user *)attr->batch.out_batch; if (copy_to_user(ubatch, &batch, sizeof(batch)) || bpf_put_uattr(uattr, total, batch.count)) ret = -EFAULT; diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c index 29ae0e3fe5bd..7056cd18ee5f 100644 --- a/kernel/bpf/net_namespace.c +++ b/kernel/bpf/net_namespace.c @@ -249,7 +249,7 @@ static int __netns_bpf_prog_query(const union bpf_attr *attr, struct net *net, enum netns_bpf_attach_type type) { - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; struct bpf_prog_array *run_array; u32 prog_cnt = 0, flags = 0;
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 8a26cd8814c1..a6c0d674fdec 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -479,7 +479,7 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info, ulen = info->jited_prog_len; info->jited_prog_len = aux->offload->jited_len; if (info->jited_prog_len && ulen) { - uinsns = u64_to_user_ptr(info->jited_prog_insns); + uinsns = (char __user *)info->jited_prog_insns; ulen = min_t(u32, info->jited_prog_len, ulen); if (copy_to_user(uinsns, aux->offload->jited_image, ulen)) { up_read(&bpf_devs_lock); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 89c202b69f6c..dfcc9a9c8dff 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1358,8 +1358,8 @@ static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size)
static int map_lookup_elem(union bpf_attr *attr) { - void __user *ukey = u64_to_user_ptr(attr->key); - void __user *uvalue = u64_to_user_ptr(attr->value); + void __user *ukey = (void __user *)attr->key; + void __user *uvalue = (void __user *)attr->value; int ufd = attr->map_fd; struct bpf_map *map; void *key, *value; @@ -1535,8 +1535,8 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
static int map_get_next_key(union bpf_attr *attr) { - void __user *ukey = u64_to_user_ptr(attr->key); - void __user *unext_key = u64_to_user_ptr(attr->next_key); + void __user *ukey = (void __user *)attr->key; + void __user *unext_key = (void __user *)attr->next_key; int ufd = attr->map_fd; struct bpf_map *map; void *key, *next_key; @@ -1598,7 +1598,7 @@ int generic_map_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { - void __user *keys = u64_to_user_ptr(attr->batch.keys); + void __user *keys = (void __user *)attr->batch.keys; u32 cp, max_count; int err = 0; void *key; @@ -1652,8 +1652,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, const union bpf_attr *attr, union bpf_attr __user *uattr) { - void __user *values = u64_to_user_ptr(attr->batch.values); - void __user *keys = u64_to_user_ptr(attr->batch.keys); + void __user *values = (void __user *)attr->batch.values; + void __user *keys = (void __user *)attr->batch.keys; u32 value_size, cp, max_count; void *key, *value; int err = 0; @@ -1711,10 +1711,10 @@ int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { - void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch); - void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); - void __user *values = u64_to_user_ptr(attr->batch.values); - void __user *keys = u64_to_user_ptr(attr->batch.keys); + void __user *uobatch = (void __user *)attr->batch.out_batch; + void __user *ubatch = (void __user *)attr->batch.in_batch; + void __user *values = (void __user *)attr->batch.values; + void __user *keys = (void __user *)attr->batch.keys; void *buf, *buf_prevkey, *prev_key, *key, *value; int err, retry = MAP_LOOKUP_RETRIES; u32 value_size, cp, max_count; @@ -1811,8 +1811,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
static int map_lookup_and_delete_elem(union bpf_attr *attr) { - void __user *ukey = u64_to_user_ptr(attr->key); - void __user *uvalue = u64_to_user_ptr(attr->value); + void __user *ukey = (void __user *)attr->key; + void __user *uvalue = (void __user *)attr->value; int ufd = attr->map_fd; struct bpf_map *map; void *key, *value; @@ -2689,7 +2689,7 @@ static int bpf_obj_pin(const union bpf_attr *attr) if (attr->file_flags != 0) return -EINVAL;
- return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname)); + return bpf_obj_pin_user(attr->bpf_fd, (void __user *)attr->pathname); }
static int bpf_obj_get(const union bpf_attr *attr) @@ -2697,7 +2697,7 @@ static int bpf_obj_get(const union bpf_attr *attr) if (attr->bpf_fd != 0 || attr->file_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL;
- return bpf_obj_get_user(u64_to_user_ptr(attr->pathname), + return bpf_obj_get_user((void __user *)attr->pathname, attr->file_flags); }
@@ -3200,7 +3200,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link, { struct bpf_raw_tp_link *raw_tp_link = container_of(link, struct bpf_raw_tp_link, link); - char __user *ubuf = u64_to_user_ptr(info->raw_tracepoint.tp_name); + char __user *ubuf = (void __user *)info->raw_tracepoint.tp_name; const char *tp_name = raw_tp_link->btp->tp->name; u32 ulen = info->raw_tracepoint.tp_name_len; size_t tp_len = strlen(tp_name); @@ -3393,7 +3393,8 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr) if (IS_ERR(prog)) return PTR_ERR(prog);
- fd = bpf_raw_tp_link_attach(prog, u64_to_user_ptr(attr->raw_tracepoint.name)); + fd = bpf_raw_tp_link_attach(prog, + (const char __user *)attr->raw_tracepoint.name); if (fd < 0) bpf_prog_put(prog); return fd; @@ -3921,12 +3922,12 @@ convert_compat_prog_info_in(struct bpf_prog_info *dest, strncpy((char *)dest->tag, (char *)cinfo->tag, BPF_TAG_SIZE); copy_field(dest, cinfo, jited_prog_len); copy_field(dest, cinfo, xlated_prog_len); - copy_field(dest, cinfo, jited_prog_insns); - copy_field(dest, cinfo, xlated_prog_insns); + bpf_uattr_compat_ptr(dest, cinfo, jited_prog_insns); + bpf_uattr_compat_ptr(dest, cinfo, xlated_prog_insns); copy_field(dest, cinfo, load_time); copy_field(dest, cinfo, created_by_uid); copy_field(dest, cinfo, nr_map_ids); - copy_field(dest, cinfo, map_ids); + bpf_uattr_compat_ptr(dest, cinfo, map_ids); strncpy((char *)dest->name, (char *)cinfo->name, BPF_OBJ_NAME_LEN); copy_field(dest, cinfo, ifindex); copy_field(dest, cinfo, gpl_compatible); @@ -3934,20 +3935,20 @@ convert_compat_prog_info_in(struct bpf_prog_info *dest, copy_field(dest, cinfo, netns_ino); copy_field(dest, cinfo, nr_jited_ksyms); copy_field(dest, cinfo, nr_jited_func_lens); - copy_field(dest, cinfo, jited_ksyms); - copy_field(dest, cinfo, jited_func_lens); + bpf_uattr_compat_ptr(dest, cinfo, jited_ksyms); + bpf_uattr_compat_ptr(dest, cinfo, jited_func_lens); copy_field(dest, cinfo, btf_id); copy_field(dest, cinfo, func_info_rec_size); - copy_field(dest, cinfo, func_info); + bpf_uattr_compat_ptr(dest, cinfo, func_info); copy_field(dest, cinfo, nr_func_info); copy_field(dest, cinfo, nr_line_info); - copy_field(dest, cinfo, line_info); - copy_field(dest, cinfo, jited_line_info); + bpf_uattr_compat_ptr(dest, cinfo, line_info); + bpf_uattr_compat_ptr(dest, cinfo, jited_line_info); copy_field(dest, cinfo, nr_jited_line_info); copy_field(dest, cinfo, line_info_rec_size); copy_field(dest, cinfo, jited_line_info_rec_size); copy_field(dest, cinfo, nr_prog_tags); - copy_field(dest, cinfo, prog_tags); + bpf_uattr_compat_ptr(dest, cinfo, prog_tags); copy_field(dest, cinfo, run_time_ns); copy_field(dest, cinfo, run_cnt); copy_field(dest, cinfo, recursion_misses); @@ -4009,7 +4010,7 @@ static int copy_bpf_prog_info_from_user(const union bpf_attr *attr, void *select_info = in_compat64_syscall() ? &cinfo : info; size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_prog_info) : sizeof(struct bpf_prog_info); - void __user *uinfo = u64_to_user_ptr(attr->info.info); + void __user *uinfo = (void __user *)attr->info.info; *info_len = attr->info.info_len;
err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), @@ -4019,7 +4020,7 @@ static int copy_bpf_prog_info_from_user(const union bpf_attr *attr, *info_len = min_t(u32, info_size, *info_len);
memset(info, 0, sizeof(struct bpf_prog_info)); - if (copy_from_user(select_info, uinfo, *info_len)) + if (bpf_copy_from_user_with_ptr(select_info, uinfo, *info_len)) return -EFAULT;
if (in_compat64_syscall()) @@ -4040,7 +4041,7 @@ static int copy_bpf_prog_info_to_user(const union bpf_attr *attr, if (in_compat64_syscall()) convert_compat_prog_info_out(&cinfo, info);
- if (copy_to_user(uinfo, select_info, *info_len) || + if (bpf_copy_to_user_with_ptr(uinfo, select_info, *info_len) || bpf_put_uattr(uattr, *info_len, info.info_len)) return -EFAULT;
@@ -4079,7 +4080,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, info.nr_map_ids = prog->aux->used_map_cnt; ulen = min_t(u32, info.nr_map_ids, ulen); if (ulen) { - u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids); + u32 __user *user_map_ids = (u32 __user *)info.map_ids; u32 i;
for (i = 0; i < ulen; i++) @@ -4126,7 +4127,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, insns_sanitized = bpf_insn_prepare_dump(prog, file->f_cred); if (!insns_sanitized) return -ENOMEM; - uinsns = u64_to_user_ptr(info.xlated_prog_insns); + uinsns = (char __user *)info.xlated_prog_insns; ulen = min_t(u32, info.xlated_prog_len, ulen); fault = copy_to_user(uinsns, insns_sanitized, ulen); kfree(insns_sanitized); @@ -4158,7 +4159,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
if (info.jited_prog_len && ulen) { if (bpf_dump_raw_ok(file->f_cred)) { - uinsns = u64_to_user_ptr(info.jited_prog_insns); + uinsns = (char __user *)info.jited_prog_insns; ulen = min_t(u32, info.jited_prog_len, ulen);
/* for multi-function programs, copy the JITed @@ -4201,7 +4202,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, * corresponding to each function */ ulen = min_t(u32, info.nr_jited_ksyms, ulen); - user_ksyms = u64_to_user_ptr(info.jited_ksyms); + user_ksyms = (u64 __user *)info.jited_ksyms; if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { ksym_addr = (unsigned long) @@ -4229,7 +4230,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
/* copy the JITed image lengths for each function */ ulen = min_t(u32, info.nr_jited_func_lens, ulen); - user_lens = u64_to_user_ptr(info.jited_func_lens); + user_lens = (u32 __user *)info.jited_func_lens; if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { func_len = @@ -4258,7 +4259,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, if (info.nr_func_info && ulen) { char __user *user_finfo;
- user_finfo = u64_to_user_ptr(info.func_info); + user_finfo = (char __user *)info.func_info; ulen = min_t(u32, info.nr_func_info, ulen); if (copy_to_user(user_finfo, prog->aux->func_info, info.func_info_rec_size * ulen)) @@ -4270,7 +4271,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, if (info.nr_line_info && ulen) { __u8 __user *user_linfo;
- user_linfo = u64_to_user_ptr(info.line_info); + user_linfo = (__u8 __user *)info.line_info; ulen = min_t(u32, info.nr_line_info, ulen); if (copy_to_user(user_linfo, prog->aux->linfo, info.line_info_rec_size * ulen)) @@ -4288,7 +4289,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, __u64 __user *user_linfo; u32 i;
- user_linfo = u64_to_user_ptr(info.jited_line_info); + user_linfo = (__u64 __user *)info.jited_line_info; ulen = min_t(u32, info.nr_jited_line_info, ulen); for (i = 0; i < ulen; i++) { line_addr = (unsigned long)prog->aux->jited_linfo[i]; @@ -4303,20 +4304,20 @@ static int bpf_prog_get_info_by_fd(struct file *file, ulen = info.nr_prog_tags; info.nr_prog_tags = prog->aux->func_cnt ? : 1; if (ulen) { - __u8 __user (*user_prog_tags)[BPF_TAG_SIZE]; + __u8 __user *user_prog_tags; u32 i;
- user_prog_tags = u64_to_user_ptr(info.prog_tags); + user_prog_tags = (__u8 __user *)info.prog_tags; ulen = min_t(u32, info.nr_prog_tags, ulen); if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { - if (copy_to_user(user_prog_tags[i], + if (copy_to_user(&user_prog_tags[i], prog->aux->func[i]->tag, BPF_TAG_SIZE)) return -EFAULT; } } else { - if (copy_to_user(user_prog_tags[0], + if (copy_to_user(&user_prog_tags[0], prog->tag, BPF_TAG_SIZE)) return -EFAULT; } @@ -4335,7 +4336,8 @@ static int bpf_map_get_info_by_fd(struct file *file, const union bpf_attr *attr, union bpf_attr __user *uattr) { - struct bpf_map_info __user *uinfo = u64_to_user_ptr(attr->info.info); + struct bpf_map_info __user *uinfo = + (struct bpf_map_info __user *)attr->info.info; struct bpf_map_info info; u32 info_len = attr->info.info_len; int err; @@ -4394,11 +4396,11 @@ convert_compat_link_info_in(struct bpf_link_info *dest, */ switch (type) { case BPF_LINK_TYPE_RAW_TRACEPOINT: - copy_field(dest, cinfo, raw_tracepoint.tp_name); + bpf_uattr_compat_ptr(dest, cinfo, raw_tracepoint.tp_name); copy_field(dest, cinfo, raw_tracepoint.tp_name_len); return; case BPF_LINK_TYPE_ITER: - copy_field(dest, cinfo, iter.target_name); + bpf_uattr_compat_ptr(dest, cinfo, iter.target_name); copy_field(dest, cinfo, iter.target_name_len); return; default: @@ -4471,7 +4473,7 @@ static int copy_bpf_link_info_from_user(const union bpf_attr *attr, void *select_info = in_compat64_syscall() ? &cinfo : info; size_t info_size = in_compat64_syscall() ? sizeof(struct compat_bpf_link_info) : sizeof(struct bpf_link_info); - void __user *uinfo = u64_to_user_ptr(attr->info.info); + void __user *uinfo = (void __user *)attr->info.info; *info_len = attr->info.info_len;
err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), @@ -4481,7 +4483,7 @@ static int copy_bpf_link_info_from_user(const union bpf_attr *attr, *info_len = min_t(u32, info_size, *info_len);
memset(info, 0, sizeof(struct bpf_link_info)); - if (copy_from_user(select_info, uinfo, *info_len)) + if (bpf_copy_from_user_with_ptr(select_info, uinfo, *info_len)) return -EFAULT;
if (in_compat64_syscall()) @@ -4503,7 +4505,7 @@ static int copy_bpf_link_info_to_user(const union bpf_attr *attr, if (in_compat64_syscall()) convert_compat_link_info_out(&cinfo, info, type);
- if (copy_to_user(uinfo, select_info, *info_len) || + if (bpf_copy_to_user_with_ptr(uinfo, select_info, *info_len) || bpf_put_uattr(uattr, *info_len, info.info_len)) return -EFAULT;
@@ -4600,7 +4602,7 @@ static int bpf_task_fd_query_copy(const union bpf_attr *attr, const char *buf, u64 probe_offset, u64 probe_addr) { - char __user *ubuf = u64_to_user_ptr(attr->task_fd_query.buf); + char __user *ubuf = (char __user *)attr->task_fd_query.buf; u32 len = buf ? strlen(buf) : 0, input_len; int err = 0;
@@ -5422,10 +5424,10 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, case BPF_MAP_LOOKUP_AND_DELETE_BATCH: case BPF_MAP_UPDATE_BATCH: case BPF_MAP_DELETE_BATCH: - copy_field(dest, cattr, batch.in_batch); - copy_field(dest, cattr, batch.out_batch); - copy_field(dest, cattr, batch.keys); - copy_field(dest, cattr, batch.values); + bpf_uattr_compat_ptr(dest, cattr, batch.in_batch); + bpf_uattr_compat_ptr(dest, cattr, batch.out_batch); + bpf_uattr_compat_ptr(dest, cattr, batch.keys); + bpf_uattr_compat_ptr(dest, cattr, batch.values); copy_field(dest, cattr, batch.count); copy_field(dest, cattr, batch.map_fd); copy_field(dest, cattr, batch.elem_flags); @@ -5434,11 +5436,11 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, case BPF_PROG_LOAD: copy_field(dest, cattr, prog_type); copy_field(dest, cattr, insn_cnt); - copy_field(dest, cattr, insns); - copy_field(dest, cattr, license); + bpf_uattr_compat_ptr(dest, cattr, insns); + bpf_uattr_compat_ptr(dest, cattr, license); copy_field(dest, cattr, log_level); copy_field(dest, cattr, log_size); - copy_field(dest, cattr, log_buf); + bpf_uattr_compat_ptr(dest, cattr, log_buf); copy_field(dest, cattr, kern_version); copy_field(dest, cattr, prog_flags); strncpy(dest->prog_name, cattr->prog_name, BPF_OBJ_NAME_LEN); @@ -5446,23 +5448,23 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, copy_field(dest, cattr, expected_attach_type); copy_field(dest, cattr, prog_btf_fd); copy_field(dest, cattr, func_info_rec_size); - copy_field(dest, cattr, func_info); + bpf_uattr_compat_ptr(dest, cattr, func_info); copy_field(dest, cattr, func_info_cnt); copy_field(dest, cattr, line_info_rec_size); - copy_field(dest, cattr, line_info); + bpf_uattr_compat_ptr(dest, cattr, line_info); copy_field(dest, cattr, line_info_cnt); copy_field(dest, cattr, attach_btf_id); copy_field(dest, cattr, attach_prog_fd); /* u32 attach_btf_obj_fd is in a union with u32 attach_prog_fd */ copy_field(dest, cattr, core_relo_cnt); - copy_field(dest, cattr, fd_array); - copy_field(dest, cattr, core_relos); + bpf_uattr_compat_ptr(dest, cattr, fd_array); + bpf_uattr_compat_ptr(dest, cattr, core_relos); copy_field(dest, cattr, core_relo_rec_size); copy_field(dest, cattr, log_true_size); break; case BPF_OBJ_PIN: case BPF_OBJ_GET: - copy_field(dest, cattr, pathname); + bpf_uattr_compat_ptr(dest, cattr, pathname); copy_field(dest, cattr, bpf_fd); copy_field(dest, cattr, file_flags); break; @@ -5479,14 +5481,14 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, copy_field(dest, cattr, test.retval); copy_field(dest, cattr, test.data_size_in); copy_field(dest, cattr, test.data_size_out); - copy_field(dest, cattr, test.data_in); - copy_field(dest, cattr, test.data_out); + bpf_uattr_compat_ptr(dest, cattr, test.data_in); + bpf_uattr_compat_ptr(dest, cattr, test.data_out); copy_field(dest, cattr, test.repeat); copy_field(dest, cattr, test.duration); copy_field(dest, cattr, test.ctx_size_in); copy_field(dest, cattr, test.ctx_size_out); - copy_field(dest, cattr, test.ctx_in); - copy_field(dest, cattr, test.ctx_out); + bpf_uattr_compat_ptr(dest, cattr, test.ctx_in); + bpf_uattr_compat_ptr(dest, cattr, test.ctx_out); copy_field(dest, cattr, test.flags); copy_field(dest, cattr, test.cpu); copy_field(dest, cattr, test.batch_size); @@ -5508,7 +5510,7 @@ static void convert_compat_bpf_attr(union bpf_attr *dest, case BPF_OBJ_GET_INFO_BY_FD: copy_field(dest, cattr, info.bpf_fd); copy_field(dest, cattr, info.info_len); - copy_field(dest, cattr, info.info); + bpf_uattr_compat_ptr(dest, cattr, info.info); break; case BPF_PROG_QUERY: copy_field(dest, cattr, query.target_fd); @@ -5665,6 +5667,9 @@ static int bpf_check_perms(int cmd) return 0; }
+#define bpfptr_copy(dest, src, size) \ + (in_compat64_syscall() ? copy_from_bpfptr(dest, src, size) \ + : copy_from_bpfptr_with_ptr(dest, src, size)) static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, bpfptr_t uattr, unsigned int *size) { @@ -5681,7 +5686,7 @@ static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd,
/* copy attributes from user space, may be less than sizeof(bpf_attr) */ memset(select_attr, 0, attr_size); - if (copy_from_bpfptr(select_attr, uattr, *size) != 0) + if (bpfptr_copy(select_attr, uattr, *size) != 0) return -EFAULT;
err = check_attr(cmd, select_attr); @@ -5693,6 +5698,7 @@ static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd,
return 0; } +#undef bpfptr_copy
static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e524f2ef0e73..c2787fe4ec6f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -18830,8 +18830,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3 * and supplied buffer to store the verification trace */ ret = bpf_vlog_init(&env->log, attr->log_level, - (char __user *) (unsigned long) attr->log_buf, - attr->log_size); + (char __user *)attr->log_buf, attr->log_size); if (ret) goto err_unlock;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 1f4b07da327a..dcaf4c329ee6 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2797,8 +2797,8 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr if (flags & ~BPF_F_KPROBE_MULTI_RETURN) return -EINVAL;
- uaddrs = u64_to_user_ptr(attr->link_create.kprobe_multi.addrs); - usyms = u64_to_user_ptr(attr->link_create.kprobe_multi.syms); + uaddrs = (void __user *)attr->link_create.kprobe_multi.addrs; + usyms = (void __user *)attr->link_create.kprobe_multi.syms; if (!!uaddrs == !!usyms) return -EINVAL;
@@ -2811,7 +2811,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr if (!addrs) return -ENOMEM;
- ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies); + ucookies = (void __user *)attr->link_create.kprobe_multi.cookies; if (ucookies) { cookies = kvmalloc_array(cnt, sizeof(*addrs), GFP_KERNEL); if (!cookies) { diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 50848fbeb26c..b414762e9d32 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -34,12 +34,12 @@ dummy_ops_init_args(const union bpf_attr *kattr, unsigned int nr) if (!args) return ERR_PTR(-ENOMEM);
- ctx_in = u64_to_user_ptr(kattr->test.ctx_in); + ctx_in = (void __user *)kattr->test.ctx_in; if (copy_from_user(args->args, ctx_in, size_in)) goto out;
/* args[0] is 0 means state argument of test_N will be NULL */ - u_state = u64_to_user_ptr(args->args[0]); + u_state = (void __user *)args->args[0]; if (u_state && copy_from_user(&args->state, u_state, sizeof(args->state))) goto out; @@ -54,7 +54,7 @@ static int dummy_ops_copy_args(struct bpf_dummy_ops_test_args *args) { void __user *u_state;
- u_state = u64_to_user_ptr(args->args[0]); + u_state = (void __user *)args->args[0]; if (u_state && copy_to_user(u_state, &args->state, sizeof(args->state))) return -EFAULT;
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 899f6d2d3ff4..abcc0390e5d2 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -437,7 +437,7 @@ static int bpf_test_finish(const union bpf_attr *kattr, struct skb_shared_info *sinfo, u32 size, u32 retval, u32 duration) { - void __user *data_out = u64_to_user_ptr(kattr->test.data_out); + void __user *data_out = (void __user *)kattr->test.data_out; int err = -EFAULT; u32 copy_size = size;
@@ -814,7 +814,7 @@ BTF_SET8_END(test_sk_check_kfunc_ids) static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, u32 size, u32 headroom, u32 tailroom) { - void __user *data_in = u64_to_user_ptr(kattr->test.data_in); + void __user *data_in = (void __user *)kattr->test.data_in; void *data;
if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom) @@ -901,7 +901,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { - void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in); + void __user *ctx_in = (void __user *)kattr->test.ctx_in; __u32 ctx_size_in = kattr->test.ctx_size_in; struct bpf_raw_tp_test_run_info info; int cpu = kattr->test.cpu, err = 0; @@ -956,8 +956,8 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
static void *bpf_ctx_init(const union bpf_attr *kattr, u32 max_size) { - void __user *data_in = u64_to_user_ptr(kattr->test.ctx_in); - void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out); + void __user *data_in = (void __user *)kattr->test.ctx_in; + void __user *data_out = (void __user *)kattr->test.ctx_out; u32 size = kattr->test.ctx_size_in; void *data; int err; @@ -989,7 +989,7 @@ static int bpf_ctx_finish(const union bpf_attr *kattr, union bpf_attr __user *uattr, const void *data, u32 size) { - void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out); + void __user *data_out = (void __user *)kattr->test.ctx_out; int err = -EFAULT; u32 copy_size = size;
@@ -1396,7 +1396,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, goto free_data;
if (unlikely(kattr->test.data_size_in > size)) { - void __user *data_in = u64_to_user_ptr(kattr->test.data_in); + void __user *data_in = (void __user *)kattr->test.data_in;
while (size < kattr->test.data_size_in) { struct page *page; @@ -1654,7 +1654,7 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { - void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in); + void __user *ctx_in = (void __user *)kattr->test.ctx_in; __u32 ctx_size_in = kattr->test.ctx_size_in; void *ctx = NULL; u32 retval; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index aa7522a61e30..d38e20a3f049 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1509,7 +1509,7 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, int sock_map_bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { - __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + __u32 __user *prog_ids = (__u32 __user *)attr->query.prog_ids; u32 prog_cnt = 0, flags = 0, ufd = attr->target_fd; struct bpf_prog **pprog; struct bpf_prog *prog;
On 12/09/2023 11:51, Zachary Leaf wrote:
[...] +#define bpf_uattr_compat_ptr(DEST, SRC, FIELD) \
- ((DEST)->FIELD = (__kernel_aligned_uintptr_t)compat_ptr((SRC)->FIELD))
Nit: better use lowercase arguments here to be consistent with the surrounding macros.
Also since this isn't tied to struct bpf_attr in any way, it might be better to call it something like bpf_compat_ptr_feld(). It's not actually specific to BPF either but I'm not sure it makes sense to have such a uapi-specific helper in stddef.h, so I'd stick to the middle-ground approach.
[...] @@ -3200,7 +3200,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link, { struct bpf_raw_tp_link *raw_tp_link = container_of(link, struct bpf_raw_tp_link, link);
- char __user *ubuf = u64_to_user_ptr(info->raw_tracepoint.tp_name);
- char __user *ubuf = (void __user *)info->raw_tracepoint.tp_name;
Nit: s/void/char/
[...] +#define bpfptr_copy(dest, src, size) \
- (in_compat64_syscall() ? copy_from_bpfptr(dest, src, size) \
: copy_from_bpfptr_with_ptr(dest, src, size))
Nit: empty line before a function.
I'm not a big fan of this sort of local macro but I have to admit it makes the function more readable. It's not the only case where we want a conditional *_with_ptr, I'll think about adding a generic helper. Let's keep this one for now.
Kevin
static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, bpfptr_t uattr, unsigned int *size) {
[...]
On 15/09/2023 15:51, Kevin Brodsky wrote:
On 12/09/2023 11:51, Zachary Leaf wrote:
[...] +#define bpf_uattr_compat_ptr(DEST, SRC, FIELD) \
- ((DEST)->FIELD = (__kernel_aligned_uintptr_t)compat_ptr((SRC)->FIELD))
Nit: better use lowercase arguments here to be consistent with the surrounding macros.
Also since this isn't tied to struct bpf_attr in any way, it might be better to call it something like bpf_compat_ptr_feld(). It's not actually specific to BPF either but I'm not sure it makes sense to have such a uapi-specific helper in stddef.h, so I'd stick to the middle-ground approach.
[...] @@ -3200,7 +3200,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link, { struct bpf_raw_tp_link *raw_tp_link = container_of(link, struct bpf_raw_tp_link, link);
- char __user *ubuf = u64_to_user_ptr(info->raw_tracepoint.tp_name);
- char __user *ubuf = (void __user *)info->raw_tracepoint.tp_name;
Nit: s/void/char/
[...] +#define bpfptr_copy(dest, src, size) \
- (in_compat64_syscall() ? copy_from_bpfptr(dest, src, size) \
: copy_from_bpfptr_with_ptr(dest, src, size))
Nit: empty line before a function.
I'm not a big fan of this sort of local macro but I have to admit it makes the function more readable. It's not the only case where we want a conditional *_with_ptr, I'll think about adding a generic helper. Let's keep this one for now.
Yeah it's very similar to bpf_copy_{to,from}_user_with_ptr() helpers in bpf.h.
What I've ended up doing is putting the in_compat64_syscall() conditional into bpfptr.h:copy_from_bpfptr_with_ptr. That way selecting the correct variant is all abstracted away and we can avoid this temporary #define.
static inline int copy_from_bpfptr_with_ptr(void *dst, bpfptr_t src, size_t size) { return (in_compat64_syscall() ? copy_from_bpfptr_offset(dst, src, 0, size) : copy_from_bpfptr_offset_with_ptr(dst, src, 0, size)) }
Thanks, Zach
Kevin
static int copy_bpf_attr_from_user(union bpf_attr *attr, int cmd, bpfptr_t uattr, unsigned int *size) {
[...]
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
The purpose of the bpf_copy_from_user_with_task eBPF helper is to copy user data from the provided task. Since __user *ptrs are only valid for the current task/process, what is actually being passed in here is an address. access_process_vm is then used to access that address given the task specified by the task_struct.
Additionally, this helper is called from an eBPF program, which does not support capabilities. It is therefore unable to pass in a capability as a __user ptr in the first place.
The use case of this helper is to read arbitrary user memory, as used by security or monitoring eBPF programs. There is therefore no requirement to check user capabilities or uaccess here. Loading of eBPF applications using this helper type is already strictly limited to privileged/root users.
Change the inbound type from __user * to ptraddr_t and remove the TODO since no capability checks are required here.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- kernel/bpf/helpers.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1362d7736a93..b4f3c2079484 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -670,7 +670,7 @@ const struct bpf_func_proto bpf_copy_from_user_proto = { };
BPF_CALL_5(bpf_copy_from_user_task, void *, dst, u32, size, - const void __user *, user_ptr, struct task_struct *, tsk, u64, flags) + const ptraddr_t, addr, struct task_struct *, tsk, u64, flags) { int ret;
@@ -681,8 +681,7 @@ BPF_CALL_5(bpf_copy_from_user_task, void *, dst, u32, size, if (unlikely(!size)) return 0;
- /* TODO [PCuABI] - capability checks for uaccess */ - ret = access_process_vm(tsk, user_ptr_addr(user_ptr), dst, size, 0); + ret = access_process_vm(tsk, addr, dst, size, 0); if (ret == size) return 0;
On 12/09/2023 11:51, Zachary Leaf wrote:
The purpose of the bpf_copy_from_user_with_task eBPF helper is to copy user data from the provided task. Since __user *ptrs are only valid for the current task/process, what is actually being passed in here is an address. access_process_vm is then used to access that address given the task specified by the task_struct.
Additionally, this helper is called from an eBPF program, which does not support capabilities. It is therefore unable to pass in a capability as a __user ptr in the first place.
(in PCuABI)
The use case of this helper is to read arbitrary user memory, as used by security or monitoring eBPF programs. There is therefore no requirement to check user capabilities or uaccess here. Loading of eBPF applications
It might be worth clarifying that access_process_vm() is not really uaccess in the usual sense, as uaccess is about accessing current. We simply never check such accesses at the moment.
using this helper type is already strictly limited to privileged/root users.
Change the inbound type from __user * to ptraddr_t and remove the TODO since no capability checks are required here.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
kernel/bpf/helpers.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1362d7736a93..b4f3c2079484 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -670,7 +670,7 @@ const struct bpf_func_proto bpf_copy_from_user_proto = { }; BPF_CALL_5(bpf_copy_from_user_task, void *, dst, u32, size,
const void __user *, user_ptr, struct task_struct *, tsk, u64, flags)
const ptraddr_t, addr, struct task_struct *, tsk, u64, flags)
There's no point in passing const arguments - pointers to const are a different story. Same comment in next patch.
Kevin
{ int ret; @@ -681,8 +681,7 @@ BPF_CALL_5(bpf_copy_from_user_task, void *, dst, u32, size, if (unlikely(!size)) return 0;
- /* TODO [PCuABI] - capability checks for uaccess */
- ret = access_process_vm(tsk, user_ptr_addr(user_ptr), dst, size, 0);
- ret = access_process_vm(tsk, addr, dst, size, 0); if (ret == size) return 0;
The purpose of this eBPF helper function is to read user memory, as used by security or monitoring eBPF programs. There is therefore no requirement to check user capabilities or uaccess here. Loading of eBPF applications using this helper type is already strictly limited to privileged/root users.
Additionally, this helper is called from an eBPF program, which does not support capabilities. It is therefore unable to pass in a capability as a __user ptr in the first place, so what is actually passed in is an address.
In order to access user memory with just an address, use make_user_ptr_for_read_uaccess() to generate a capability of appropriate bounds for the kernel to use.
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- kernel/bpf/helpers.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index b4f3c2079484..780eedc3f430 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -647,8 +647,10 @@ const struct bpf_func_proto bpf_event_output_data_proto = { };
BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size, - const void __user *, user_ptr) + const ptraddr_t, addr) { + const void __user *user_ptr = + make_user_ptr_for_read_uaccess(addr, size); int ret = copy_from_user(dst, user_ptr, size);
if (unlikely(ret)) {
linux-morello@op-lists.linaro.org