Hello,
This patch series enables purecap applications to utilise
tcp_zerocopy_receive to map packets directly from a network interface
card into a shared space with the process and kernel.
I have tested these changes against musl and there still exists an issue
in this implementation with copybuf and potentially msg_control
generating an EFAULT error.
Gitlab Issue:
- https://git.morello-project.org/morello/kernel/linux/-/issues/46
Review branch:
- https://git.morello-project.org/harryramsey/linux/-/commits/tcp_zerocopy
Thanks,
Harry
Harry Ramsey (2):
tcp: Support userspace capabilities for tcp_zerocopy_receive
tcp: Implement compat version of tcp_zerocopy_receive
include/linux/sockptr.h | 28 ++++++++++
include/uapi/linux/tcp.h | 27 +++++++---
net/ipv4/tcp.c | 111 +++++++++++++++++++++++++++++++++------
3 files changed, 145 insertions(+), 21 deletions(-)
--
2.34.1
Now that explicit capability checking is carried out in the core
kernel and the drivers that are already supported in PCuABI, add a
section to the porting guide explaining in which situations explicit
checking might be required and what should be done.
Also add a few notes to the table of code examples, to make it clear
that using the new user pointer API is not necessarily enough in
itself.
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
---
A follow-up to Luca's explicit checking series.
Rendered doc:
https://git.morello-project.org/kbrodsky-arm/linux/-/blob/porting_guide_exp…
Documentation/cheri/pcuabi-porting.rst | 81 +++++++++++++++++++++-----
1 file changed, 67 insertions(+), 14 deletions(-)
diff --git a/Documentation/cheri/pcuabi-porting.rst b/Documentation/cheri/pcuabi-porting.rst
index 2a38862e869a..fcda3b0f1d83 100644
--- a/Documentation/cheri/pcuabi-porting.rst
+++ b/Documentation/cheri/pcuabi-porting.rst
@@ -15,20 +15,25 @@ The appropriate API to represent and convert user pointers is described
in the `user pointer documentation`_. A few examples of modifications
required for PCuABI compliance:
-+--------------------------------+------------------------------------+
-| Invalid code | Potential replacement |
-+================================+====================================+
-| ``(unsigned long)uptr`` | ``user_ptr_addr(uptr)`` |
-+--------------------------------+------------------------------------+
-| ``(void __user *)u64`` | | ``uaddr_to_user_ptr(u64)`` |
-| | | ``as_user_ptr(u64)`` |
-+--------------------------------+------------------------------------+
-| ``get_user(uptr, &uarg->ptr)`` | ``get_user_ptr(uptr, &uarg->ptr)`` |
-+--------------------------------+------------------------------------+
-| ``IS_ERR(ubuf)`` | ``USER_PTR_IS_ERR(ubuf)`` |
-+--------------------------------+------------------------------------+
-| ... | ... |
-+--------------------------------+------------------------------------+
++--------------------------------+------------------------------------+------------------------------------------+
+| Invalid code | Potential replacement | Notes |
++================================+====================================+==========================================+
+| ``(unsigned long)uptr`` | ``user_ptr_addr(uptr)`` | Extracting the address of a user pointer |
+| | | may indicate that explicit capability |
+| | | checking is required, see the |
+| | | `Explicit capability checking`_ section. |
++--------------------------------+------------------------------------+------------------------------------------+
+| ``(void __user *)u64`` | | ``uaddr_to_user_ptr(u64)`` | Creating a user pointer from an address |
+| | | ``as_user_ptr(u64)`` | should be avoided. |
+| | | uapi structs may need to be modified to |
+| | | hold full user pointers. |
++--------------------------------+------------------------------------+------------------------------------------+
+| ``get_user(uptr, &uarg->ptr)`` | ``get_user_ptr(uptr, &uarg->ptr)`` | |
++--------------------------------+------------------------------------+------------------------------------------+
+| ``IS_ERR(ubuf)`` | ``USER_PTR_IS_ERR(ubuf)`` | |
++--------------------------------+------------------------------------+------------------------------------------+
+| ... | ... | |
++--------------------------------+------------------------------------+------------------------------------------+
``ioctl`` handlers' third argument
==================================
@@ -363,6 +368,54 @@ of the `PCuABI documentation`_. For instance::
Fortunately, ``__user`` is mostly used in simple types, and such fixups
are rarely needed in driver code.
+Explicit capability checking
+============================
+
+In the vast majority of cases, the memory referenced by user pointers is
+accessed through the user mapping, using uaccess functions such as
+``copy_from_user()``. As long as the original user pointer is wholly
+propagated to the uaccess function, no particular attention is required.
+
+In certain situations, such accesses may instead occur via a kernel
+mapping (of the same underlying pages). Often, this kernel mapping is
+created by a function in the GUP family (``get_user_pages()``,
+``pin_user_pages()``). These cases should be carefully considered and
+typically require the user pointer to be explicitly checked (see below).
+
+The following code patterns may indicate that user memory is being
+accessed via a kernel mapping:
+
+* A call to any function named ``get_user_pages*`` or
+ ``pin_user_pages*``. Explicit checking is generally required before
+ calling such a function. Note that this does not normally apply to
+ ``{get,pin}_user_pages_remote()``, because they are intended to access
+ another process's memory and such an operation does not need to (and
+ typically cannot) be authorised by a capability.
+
+* Casting a user pointer to an integer (``(unsigned long)uptr``). This
+ typically indicates that an address-based operation, such as GUP, is
+ going to be carried out. ``user_ptr_addr()`` should be used instead of
+ the cast, but the way the address is used should also be carefully
+ considered.
+
+* Calling ``access_ok()``. Standard uaccess functions call that function
+ themselves, so an explicit call indicates either that a low-level
+ uaccess function (e.g. ``__copy_from_user()``) is going to be used -
+ which is fine - or that the access is not going to be done via uaccess
+ at all - which requires explicit checking. Note that ``access_ok()``
+ does not itself require a valid capability (i.e. it only considers the
+ address) and ``as_user_ptr()`` may occasionally be needed to pass it
+ a raw user address, but in general a full user pointer should be
+ provided by userspace and validated (either by uaccess or explicit
+ checking).
+
+Explicit checking should be done using one of the ``check_user_ptr_*()``
+functions, see the "Explicit checking" section of the `user pointer
+documentation`_. The required permissions (R/W/RW) should be minimal: if
+the kernel only reads memory via the pointer, then
+``check_user_ptr_read()`` should be used, so that a pointer without
+write permission will pass the check.
+
.. _user pointer documentation: Documentation/core-api/user_ptr.rst
.. _PCuABI documentation: Documentation/cheri/pcuabi.rst
.. _pure-capability kernel-user ABI: `PCuABI documentation`_
--
2.38.1
Hi,
This small series disables KSM (Kernel Samepage Merging) in the Morello
defconfig, as it is currently unsafe in the presence of tags - see patch
1 for details. To guarantee correctness even if it is manually enabled,
patch 2 forces memcmp_pages() to report a difference.
It is quite possible that KSM would still be worthwhile even with the
extra cost of comparing tags. Issue #62 [1] covers that investigation.
Review branch:
https://git.morello-project.org/kbrodsky-arm/linux/-/commits/morello/disabl…
Cheers,
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/62
Kevin Brodsky (2):
arm64: morello: Disable KSM in defconfig
arm64: morello: Make memcmp_pages() always report a difference
.../configs/morello_transitional_pcuabi_defconfig | 1 -
arch/arm64/kernel/morello.c | 14 ++++++++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
--
2.38.1
Hi,
This is a short series that makes a few simplifications we can now
afford: getting rid of morello_capcpy(), as it is unnecessary now that
the kernel is a "proper" hybrid binary with a tag-preserving memcpy();
and making use of <linux/cheri.h> to simplify includes.
The Morello helpers could be further simplified by reimplementing many
of them in C instead of assembly, issue #61 [1] was created to that end.
Review branch:
https://git.morello-project.org/kbrodsky-arm/linux/-/commits/morello/cheri_…
Cheers,
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/61
Kevin Brodsky (3):
arm64: morello: Replace morello_capcpy() with standard copy
arm64: Replace cheriintrin.h with linux/cheri.h
lib: Replace cheriintrin.h with linux/cheri.h
arch/arm64/include/asm/morello.h | 6 ------
arch/arm64/kernel/process.c | 9 +--------
arch/arm64/kernel/signal.c | 5 +----
arch/arm64/lib/morello.S | 18 ------------------
lib/test_printf.c | 4 +---
lib/vsprintf.c | 5 +----
6 files changed, 4 insertions(+), 43 deletions(-)
--
2.38.1
Hi,
This short series refactors the way pointers to the stack are
manipulated in binfmt_elf. The changes are generic and arguably improve
binfmt_elf, but the main objective is to eliminate unnecessary creation
of capabilities in PCuABI (through calls to uaddr_to_user_ptr_safe()).
This is done by using an actual user pointer to keep track of the
current position on the stack, and writing all data through that
pointer, instead of using an addresss and creating a new user pointer
for every access. This is what patch 1 does. Patch 2 simplifies the
elf_stack_put_user* macros we previously introduced, as we do not need
them to do something special in PCuABI any more.
This series should help with further work on restricting initial
capabilities [1]. It does not have any user-visible effect itself
however. The new "root stack capability" is still unrestricted, but the
fact that all capabilities to the stack are derived from it means that
any later narrowing of its bounds or permissions will automatically
propagate.
Note that these changes are mostly orthogonal to Téo's series [2] that
partially addresses [1]; it just means that using
uaddr_to_user_ptr_safe() is no longer necessary to derive the argv /
envp capabilities.
Review branch:
https://git.morello-project.org/kbrodsky-arm/linux/-/commits/morello/binfmt…
Thanks,
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/19
[2] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org…
Kevin Brodsky (2):
fs/binfmt_elf: Improve SP manipulation in PCuABI
fs/binfmt_elf: Simplify elf_stack_put_user*
fs/binfmt_elf.c | 85 +++++++++++++++++++++++-------------------
fs/compat_binfmt_elf.c | 9 +----
include/linux/elf.h | 12 +-----
3 files changed, 48 insertions(+), 58 deletions(-)
--
2.38.1
Hi,
After getting side tracked by eBPF libraries/tools (libbpf/bpftool) and
kselftest cross-compilation, here's the core kernel changes following on
from the RFC[1] posted late last year.
The bpf syscall is updated to propagate user pointers as capabilities in
the pure-capability kernel-user ABI (PCuABI). It also includes an
approach to support the existing aarch64 ABI as a compatibility layer
(compat64).
One complication here is from the fact this syscall supports many
multiplexed sub-commands, some of which are themselves multiplexed with
a number of nested multiplexed options.
A further complication is that the existing syscall uses a trick of
storing user pointers as u64 to avoid needing a compat handler for
32-bit systems (see patch 3). To retain compatibility with the aarch64
ABI and add Morello support, a compat layer is added here only for the
compat64 case, guarded by #ifdef CONFIG_COMPAT64. Normal compat32
operation is therefore unchanged.
Compared to the original RFC, inbound (userspace->kernel) conversion
between compat64/native struct layouts is now handled upfront. This
minimises changes to subcommand handlers. Some subcommands require
conversion back out to userspace and that is by necessity handled where
it occurs.
Patch 1 is not essential to this series but it's a nice debug feature to
have and works[2]. It enables BPF_PROG_TYPE_TRACEPOINT which many eBPF
kselftests use.
Patch 2 is required setup for the rest of the patches.
Patches 3-8 implement the core compat64 handling. Each commit compiles
cleanly but relevant parts will be broken inbetween. They're split
mainly to make review here easier.
Patch 9 fixes a check to also check configs passed in via compat64.
Patch 10 finally enables capabilities in the kernel.
Testing wise, see associated LTP changes below which will be posted to
linux-morello-ltp mailing list. The eBPF LTP tests are fairly minimal
and test only a small part of the changes here. There's a new test to
test patch 9.
The kernel kselftests contain much more extensive eBPF tests. The
kselftests have been used to test many parts of the compat64 handling
but overall more work needs to be done here:
a) enable cross-compilation for purecap as well as x86->aarch64
b) replace ptr_to_u64() with casts to uintptr_t in tests
b) general libbpf/bpftool enablement and fixes since many tests rely
on this
c) CONFIG_DEBUG_INFO_BTF required for many tests but this requires
the build system to have a recent version of pahole tool
Next steps once we have the core kernel support is porting libbpf and
bpftool for purecap plus work on enabling kselftests as above.
Kernel branch available at:
https://git.morello-project.org/zdleaf/linux/-/tree/morello/bpf
Associated LTP test/changes at:
https://git.morello-project.org/zdleaf/morello-linux-test-project/-/tree/mo…
Thanks,
Zach
[1] [RFC PATCH 0/9] update bpf syscall for PCuABI/compat64
https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org…
[2] [PATCH v3 0/5] Restore syscall tracing on Morello
https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org…
Zachary Leaf (10):
arm64: morello: enable syscall tracing
bpf/net: copy ptrs from user with bpf/sockptr_t
bpf: compat64: add handler and convert bpf_attr in
bpf: compat64: bpf_attr convert out
bpf: compat64: handle bpf_btf_info
bpf: compat64: handle bpf_prog_info
bpf: compat64: handle bpf_map_info
bpf: compat64: handle bpf_link_info
bpf: compat64: support CHECK_ATTR macro
bpf: use user pointer types in uAPI structs
.../morello_transitional_pcuabi_defconfig | 2 +-
arch/arm64/kernel/sys_compat64.c | 4 +
drivers/media/rc/bpf-lirc.c | 7 +-
include/linux/bpf_compat.h | 413 ++++++
include/linux/bpfptr.h | 18 +-
include/linux/sockptr.h | 9 +
include/uapi/linux/bpf.h | 94 +-
kernel/bpf/bpf_iter.c | 2 +-
kernel/bpf/btf.c | 97 +-
kernel/bpf/cgroup.c | 10 +-
kernel/bpf/hashtab.c | 13 +-
kernel/bpf/net_namespace.c | 7 +-
kernel/bpf/offload.c | 2 +-
kernel/bpf/syscall.c | 1136 +++++++++++++----
kernel/bpf/verifier.c | 2 +-
kernel/trace/bpf_trace.c | 6 +-
net/bpf/bpf_dummy_struct_ops.c | 3 +-
net/bpf/test_run.c | 32 +-
net/core/sock_map.c | 7 +-
19 files changed, 1534 insertions(+), 330 deletions(-)
create mode 100644 include/linux/bpf_compat.h
--
2.34.1
Hello!
Here is patch series v5 incoming for the explicit capability checking
series for issue #7[1].
This patch series can be found on my fork[2].
Kind regards,
Luca
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/7
[2] https://git.morello-project.org/Sevenarth/linux/-/commits/morello/gup-check…
v5:
- rephrased commit descriptions
- changed explicit checks for the USB code to be performed only when
performing DMA transfers
v4:
- rebased onto morello/next
- rephrased commit descriptions and notes left in the code
- signature of first_iovec_segment has been updated to return a pointer
instead of an address and the appropriate changes have been made
- read+write checks have been combined together in the same if statement
- unlikely check has been removed where appropriate
- the USB User Request Block buffer is now checked against both write and
read permissions according to the transfer direction as indicated by
is_in
- a leftover from v2 at io_uring/rsrc.c:1249 has been reverted back to
original
v3:
- rebased onto morello/next
- amended commit description for "gup: Add explicit capability checks"
- refactored mm/gup.c
- refactored lib/iov_iter.c
- removed bpf patch
- moved USB Request Block explicit check to proc_do_submiturb
- removed explicit check in get_futex_key
- changed prototype of io_uring_cmd_import_fixed and io_import_fixed to
use a pointer type and adjusted the relevant castings
- fixed io_uring_cmd_import_fixed prototype for !defined(CONFIG_IO_URING)
- refactored explicit check in io_uring/kbuf.c:io_register_pbuf_ring(..)
- removed explicit check from io_uring/kbuf.c:io_add_buffers(..)
- rephrased the no explicit check needed note in io_sqe_buffer_register
- reverted "struct io_mapped_ubuf" to use u64
- removed explicit check from io_uring_cmd_prep
- updated TODO for the NVMe driver
Luca Vizzarro (7):
gup: Add explicit capability checks
iov_iter: Add explicit capability checks
usb: core: Fix copy of URB from userspace
usb: core: Add explicit capability checks
futex: Add explicit capability checks
io_uring: Add explicit capability checks
nvme: Add TODO for PCuABI implementation
drivers/nvme/host/ioctl.c | 1 +
drivers/usb/core/devio.c | 10 ++++++++--
include/linux/io_uring.h | 6 +++---
include/linux/pagemap.h | 2 +-
io_uring/kbuf.c | 26 +++++++++++++-------------
io_uring/net.c | 3 +--
io_uring/rsrc.c | 14 ++++++++++++--
io_uring/rsrc.h | 2 +-
io_uring/rw.c | 3 +--
io_uring/uring_cmd.c | 2 +-
kernel/futex/core.c | 11 ++++++++---
lib/iov_iter.c | 31 ++++++++++++++++++++++++-------
mm/gup.c | 6 ++++--
13 files changed, 78 insertions(+), 39 deletions(-)
--
2.34.1
Hello!
Here is patch series v4 incoming for the explicit capability checking
series for issue #7[1].
This patch series will be found on my fork at the link in the
footnotes[2], as soon as GitLab is fixed.
Kind regards,
Luca
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/7
[2] https://git.morello-project.org/Sevenarth/linux/-/commits/morello/gup-check…
v4:
- rebased onto morello/next
- rephrased commit descriptions and notes left in the code
- signature of first_iovec_segment has been updated to return a pointer
instead of an address and the appropriate changes have been made
- read+write checks have been combined together in the same if statement
- unlikely check has been removed where appropriate
- the USB User Request Block buffer is now checked against both write and
read permissions according to the transfer direction as indicated by
is_in
- a leftover from v2 at io_uring/rsrc.c:1249 has been reverted back to
original
v3:
- rebased onto morello/next
- amended commit description for "gup: Add explicit capability checks"
- refactored mm/gup.c
- refactored lib/iov_iter.c
- removed bpf patch
- moved USB Request Block explicit check to proc_do_submiturb
- removed explicit check in get_futex_key
- changed prototype of io_uring_cmd_import_fixed and io_import_fixed to
use a pointer type and adjusted the relevant castings
- fixed io_uring_cmd_import_fixed prototype for !defined(CONFIG_IO_URING)
- refactored explicit check in io_uring/kbuf.c:io_register_pbuf_ring(..)
- removed explicit check from io_uring/kbuf.c:io_add_buffers(..)
- rephrased the no explicit check needed note in io_sqe_buffer_register
- reverted "struct io_mapped_ubuf" to use u64
- removed explicit check from io_uring_cmd_prep
- updated TODO for the NVMe driver
Luca Vizzarro (7):
gup: Add explicit capability checks
iov_iter: Add explicit capability checks
usb: core: Fix copy of URB from userspace
usb: core: Add explicit capability checks
futex: Add explicit capability checks
io_uring: Add explicit capability checks
nvme: Add TODO for PCuABI implementation
drivers/nvme/host/ioctl.c | 1 +
drivers/usb/core/devio.c | 8 ++++++--
include/linux/io_uring.h | 6 +++---
include/linux/pagemap.h | 2 +-
io_uring/kbuf.c | 26 +++++++++++++-------------
io_uring/net.c | 3 +--
io_uring/rsrc.c | 14 ++++++++++++--
io_uring/rsrc.h | 2 +-
io_uring/rw.c | 3 +--
io_uring/uring_cmd.c | 2 +-
kernel/futex/core.c | 11 ++++++++---
lib/iov_iter.c | 31 ++++++++++++++++++++++++-------
mm/gup.c | 6 ++++--
13 files changed, 76 insertions(+), 39 deletions(-)
--
2.34.1
Signal handlers that intend to set PCC to a new value need to be
careful not to use a sealed function pointer (sentry) directly. In
purecap, function pointers are typically sentries and therefore need
to be explicitly unsealed and their LSB cleared (as per the bullet
point above).
Reported-by: Yury Khrustalev <yury.khrustalev(a)arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
---
Documentation/arm64/morello.rst | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/Documentation/arm64/morello.rst b/Documentation/arm64/morello.rst
index 3452f4fe4fa9..0a76bbf06290 100644
--- a/Documentation/arm64/morello.rst
+++ b/Documentation/arm64/morello.rst
@@ -552,6 +552,11 @@ Note: modifying the saved Morello context
to modify the ISA of the interrupted context by writing to the C64
bit of the saved PSTATE in ``sigcontext``.
+ * RB-sealed capabilities. The saved PCC should not be RB-sealed; unlike
+ capability-based branch instructions, exception return uses the target
+ capability as-is, without automatic unsealing. Explicit unsealing is
+ therefore required to avoid a capability sealed fault.
+
C64 ISA support
---------------
--
2.38.1