This series of patches enables nfs rootfs
support on the Morello board.
Patch 01 is fixing the inital kernel build error
associated with a wrong function pointer type within
the sunrpc modules due to the unlocked_ioctl fp,
the error occurs upon enabling nfs within the defconfig.
Patch 02 deals with the fallout caused by changes
inferred by patches 01. See details in the description
of the patch.
Patch 03 is enabling nfs rootfs by default in the kernel.
It was confirmed that the kernel can boot with a nfs rootfs.
V3 changes:
- fix commit title @ patch 2
- align the change in proc.c
Pawel Zalewski (3):
net: sunrpc: fix unlocked_ioctl handler signature
proc: change proc_ops.proc_ioct handler signature
arm64: morello: enable nfs rootfs by default
arch/arm64/configs/morello_transitional_pcuabi_defconfig | 2 ++
drivers/pci/proc.c | 4 ++--
include/linux/proc_fs.h | 2 +-
net/sunrpc/cache.c | 6 +++---
net/sunrpc/rpc_pipe.c | 2 +-
5 files changed, 9 insertions(+), 7 deletions(-)
--
2.34.1
According to the documentation of fcntl, some commands take an int as
argument. In practice not all of them enforce this behaviour, as they
instead accept a more permissive long and in most cases not even a
range check is performed.
An issue could possibly arise from a combination of the handling of the
varargs in user space and the ABI rules of the target, which may result
in the top bits of an int argument being non-zero.
This issue was originally raised and detailed in the following thread:
https://lore.kernel.org/linux-api/Y1%2FDS6uoWP7OSkmd@arm.com/
And was discovered during the porting of Linux to Morello [1].
This series modifies the interested commands so that they explicitly
take an int argument. It also propagates this change down to helper and
related functions as necessary.
This series is also available on my fork at:
https://git.morello-project.org/Sevenarth/linux/-/commits/fcntl-int-handlin…
Best regards,
Luca Vizzarro
[1] https://git.morello-project.org/morello/kernel/linux
Luca Vizzarro (5):
fcntl: Cast commands with int args explicitly
fs: Pass argument to fcntl_setlease as int
pipe: Pass argument of pipe_fcntl as int
memfd: Pass argument of memfd_fcntl as int
dnotify: Pass argument of fcntl_dirnotify as int
fs/cifs/cifsfs.c | 2 +-
fs/fcntl.c | 29 +++++++++++++++--------------
fs/libfs.c | 2 +-
fs/locks.c | 20 ++++++++++----------
fs/nfs/nfs4_fs.h | 2 +-
fs/nfs/nfs4file.c | 2 +-
fs/nfs/nfs4proc.c | 4 ++--
fs/notify/dnotify/dnotify.c | 4 ++--
fs/pipe.c | 6 +++---
include/linux/dnotify.h | 4 ++--
include/linux/filelock.h | 12 ++++++------
include/linux/fs.h | 6 +++---
include/linux/memfd.h | 4 ++--
include/linux/pipe_fs_i.h | 4 ++--
mm/memfd.c | 6 +-----
15 files changed, 52 insertions(+), 55 deletions(-)
--
2.34.1
This series makes it possible for purecap apps to use the aio_ring
shared memory region to bypass the io_getevents syscall's overhead.
This functionality is also used in libaio.
With these patches, all io_* LTP tests pass in both Purecap and
plain AArch64 modes. Note that the LTP tests only address the basic
functionality of the aio system and a significant portion of the
functionality is untested in LTP.
For a more comprehensive testing, libaio has been updated with the new
uAPI and ported. All the tests in libaio pass accordingly, in both
Purecap and plain AArch64 modes.
Gitlab issue:
https://git.morello-project.org/morello/kernel/linux/-/issues/49
Review branch:
https://git.morello-project.org/tudcre01/linux/-/commits/morello/aio_v1
Tudor Cretu (4):
aio: Fix the relationship between ctx pages and io_events array
aio: Implement compat handling for the io_event struct
aio: Allow capability tag access on the shared memory
aio: Use user pointer type in the io_event struct and aio_context_t
fs/aio.c | 197 ++++++++++++++++++++++++++---------
include/uapi/linux/aio_abi.h | 12 +--
2 files changed, 153 insertions(+), 56 deletions(-)
--
2.34.1
Hi,
Here are a few small patches to fix kernel module loading in the PCuABI kernel [1].
(Sorry about the delay!) There are some notes on testing in the third patch.
Thanks,
Kristina
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/41
Kristina Martsenko (3):
module: Allow arch overrides for ELF arch check
arm64: elf: Enable module loading for PCuABI kernels
arm64: morello: Add test modules to defconfig
.../configs/morello_transitional_pcuabi_defconfig | 6 ++++++
arch/arm64/include/asm/elf.h | 10 ++--------
include/linux/moduleloader.h | 4 ++++
kernel/module/main.c | 2 +-
4 files changed, 13 insertions(+), 9 deletions(-)
base-commit: 7f84d159d4eb989bf2f42d2e3b27a204cb3c1ec4
--
2.25.1
Hi,
This series introduces new user_ptr helpers to help in certain
uaccess-related situations. This is a follow-up to my previous series
"New CHERI API and separation of root capabilities"; the CHERI helpers
it introduced are used to implement the new generic user_ptr helpers in
PCuABI.
The new helpers are (see patch 1 for details):
* make_user_ptr_for_<perms>_uaccess(), to create user pointers in order
to perform uaccess, with appropriate bounds and permissions.
* check_user_ptr_<perms>(), to perform explicit checking of user
pointers.
This series does not actually make use of check_user_ptr_<perms>(),
rather it prepares the ground for implementing explicit checking when
user memory is accessed via kernel mappings [1].
The rest of the series (patch 2-9) is about converting existing uses of
uaddr_to_user_ptr_safe(), as it should now only be used for *providing*
user pointers to userspace, and not for uaccess.
After this series, the only remaining users of uaddr_to_user_ptr_safe()
are:
- fs/binfmt_elf.c to provide all the initial capabilities (stack,
AT_CHERI_*_CAP, etc.). uaddr_to_user_ptr_safe() is still used to write
the initial data on the stack too; it didn't seem worthwhile to
refactor this code as it is going to change anyway as part of [2]
and [3].
- mmap / mremap / shmat to return a valid capability.
To clarify which helper should be used in which situation, here are two
tables specifying the helper to use depending on whether the address is
specified by userspace or the kernel itself, and whether the pointer is
provided to userspace or used by the kernel itself.
*Before* this series:
+-----------------------------------+---------------------+--------------------------+
| Pointer for \ Address provided by | User | Kernel |
+===================================+=====================+==========================+
| User | - | uaddr_to_user_ptr_safe() |
+-----------------------------------+---------------------+--------------------------+
| Kernel (uaccess) | uaddr_to_user_ptr() | uaddr_to_user_ptr_safe() |
+-----------------------------------+---------------------+--------------------------+
*After* this series:
+-----------------------------------+---------------------+-------------------------------+
| Pointer for \ Address provided by | User | Kernel |
+===================================+=====================+===============================+
| User | - | uaddr_to_user_ptr_safe() |
+-----------------------------------+---------------------+-------------------------------+
| Kernel (uaccess) | uaddr_to_user_ptr() | make_user_ptr_*_for_uaccess() |
+-----------------------------------+---------------------+-------------------------------+
Eventually both uaddr_to_user_ptr() and uaddr_to_user_ptr_safe() should
disappear, the first thanks to userspace always providing full pointers
and the second being replaced by handcrafted code creating capabilities
in line with the PCuABI spec (whose bounds give access to only the
intended object and potentially padding).
Note that patch 1 and 4 were included in the first RFC of the CHERI API
series [4]. They remain broadly the same, but:
- make_privileged_user_ptr and check_user_ptr() have been renamed, and the
permissions are now specified by calling the right variant of the function
instead of passing a bitfield. They are now called respectively
make_user_ptr_for_<perms>_uaccess() and check_user_ptr_<perms>().
- The user_ptr documentation has been updated accordingly.
- The commit messages have been improved to reflect the overall
intention better.
Review branch:
https://git.morello-project.org/kbrodsky-arm/linux/-/commits/morello/user_p…
Rendered doc:
https://git.morello-project.org/kbrodsky-arm/linux/-/blob/morello/user_ptr_…
Thanks,
Kevin
[1] https://git.morello-project.org/morello/kernel/linux/-/issues/7
[2] https://git.morello-project.org/morello/kernel/linux/-/issues/19
[3] https://git.morello-project.org/morello/kernel/linux/-/issues/22
[4] https://op-lists.linaro.org/archives/list/linux-morello@op-lists.linaro.org…
Kevin Brodsky (9):
linux/user_ptr.h: Introduce uaccess-related helpers
fs/binfmt_elf: Create appropriate user pointer for uaccess
coredump: Create appropriate user pointer for uaccess
mm/memory: Create appropriate user pointer for uaccess
Revert "mm/hugetlb: Use appropriate user pointer conversions"
Revert "mm/shmem: Use appropriate user pointer conversions"
audit: Create appropriate user pointer for uaccess
perf: Avoid uaddr_to_user_ptr_safe() for arbitrary user address
arm64: Create appropriate user pointer for uaccess
Documentation/core-api/user_ptr.rst | 100 ++++++++++++++++++----------
arch/arm64/kernel/debug-monitors.c | 3 +-
arch/arm64/kernel/traps.c | 2 +-
fs/binfmt_elf.c | 14 ++--
fs/coredump.c | 4 +-
include/linux/user_ptr.h | 86 ++++++++++++++++++++++--
kernel/auditsc.c | 3 +-
kernel/events/internal.h | 2 +-
lib/user_ptr.c | 46 +++++++++++++
mm/hugetlb.c | 2 +-
mm/memory.c | 2 +-
mm/shmem.c | 2 +-
12 files changed, 216 insertions(+), 50 deletions(-)
--
2.38.1
Some io_uring operations' SQEs store user_data values in the addr2 field.
These don't need to be modified as they're not dereferenced by the kernel.
Reported-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
Signed-off-by: Tudor Cretu <tudor.cretu(a)arm.com>
---
v2:
- Updated the comment that it's only propagated, not matched.
Review branch:
https://git.morello-project.org/tudcre01/linux/-/commits/morello/addr2_fix_…
---
io_uring/io_uring.h | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 5b4f0f298ad9..26cfe280b049 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -132,9 +132,22 @@ static inline void convert_compat64_io_uring_sqe(struct io_ring_ctx *ctx,
sqe->ioprio = READ_ONCE(compat_sqe->ioprio);
sqe->fd = READ_ONCE(compat_sqe->fd);
BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr2, addr);
- sqe->addr2 = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr2));
- BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr, len);
+ /*
+ * Some opcodes set a user_data value in the addr2 field to propagate
+ * it as-is to the user_data field of a CQE. It's not dereferenced
+ * by the kernel, so don't modify it.
+ */
+ switch (sqe->opcode) {
+ case IORING_OP_POLL_REMOVE:
+ case IORING_OP_MSG_RING:
+ sqe->addr2 = (__kernel_uintptr_t)READ_ONCE(compat_sqe->addr2);
+ break;
+ default:
+ sqe->addr2 = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr2));
+ break;
+ }
+ BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr, len);
/*
* Some opcodes set a user_data value in the addr field to be matched
* with a pre-existing IO event's user_data. It's not dereferenced by
--
2.34.1
With this patch capabilities provided to userspace to access argv and envp
are properly bounded with the permissions defined in the PCuABI spec.
This change is split between two files, as they handle two parts of the
process. This cover letter tries to explain the process surrounding the
argument strings, hopefully to facilitate reviewing the patch but also
to check my understanding, and the logic implemented by the patch.
Hopefully I can find some time to refine it after reviews, but in case
I cannot I hope the details will be useful.
# General patch comments
One thing that might be an issue is that I don't see how to update
fs/exec.c:bprm_stack_limits to handle this new process short of
reproducing it almost entirely inside the function.
There are warnings because elf_stack_put_user_cap expects uintcap_t and
cheri_build_user_cap builds a capability, but I didn't really know
how to handle that.
The function itself might be entirely unecessary.
checkpatch complains about elf_stack_put_user_cap, but it follows the
style of the other functions, and from missing blank lines which appear
to be there, so I'm not too sure what to do about it.
# Giving argument strings to userspace
Calling a new executable is done through execve, where userspace passes
the argv and envp pointer arrays to the kernel.
In fs/exec.c:do_execveat_common, the kernel creates a struct linux_binprm
which is used throughout the process of loading a new binary.
Among other things, it keeps track of the stack for the new executable,
both the memory space and its current top, which it allocates here.
The argument are copied to this new stack in fs/exec.c:copy_strings.
It does so from the top of the stack, starting with the last envp element
and ending with the first of argv: opposite to how it will be read later.
The function allocates pages as it goes, thus splitting the string in at
most page-sized chunks, and handles offsets where the page was already
partially used or the remainder of the string will not fill it.
This only puts the /strings/ themselves on the stack.
Later, in fs/binfmt_elf.c:create_elf_tables, the kernel puts the auxv,
argv and envp arrays themselves – thus the pointers to the strings put
on the stack earlier – on the stack.
It does so in the order expected by userspace: from bottom to top of the
strings, so from the first of argv to the last of envp.
# Making it CHERI
We thus have a potential issue: the moment where the strings themselves
are allocated is different from the place where the capabilities are
created, with limited ways to pass information in between.
However, one piece of information is available in both cases: the length
of the null-terminated string.
This way, we can get the same representable length from the length of the
string, which remains unchanged, while allocating the string *and* when
creating the capability.
Assuming we can properly align the strings in copy_strings, we don't need
more information to properly create a capability with exact bounds
in create_elf_tables.
Thankfully, we can re-use most of the machinery in copy_strings to do so.
If the representable length of a string differs from its real length, we
need to get the mask needed for aligning it for exact bounds.
Given this new length, we can compute the position where it would end up
without alignment and ALIGN_DOWN() as we are going downards the stack.
This gives the position where the string *must* start for exact capability
bounds. Thus, all the padding will be after the string.
However as we are going backwards, we need to put the padding first and
the string second.
The code already goes through the string chunk by chunk, handling smaller
chunks at page and argument boundaries.
We can re-use this by first going through the length of padding, and once
this is done go through the string as normal.
This guarantees the string will start at the properly aligned address with
the appropriate amount of padding behind the previous allocations.
We do need to handle this padding and to detect when strings might need
proper handling for exact bounds.
As there isn't padding normally, we can simply check if there is some.
If there is, get the representable length of the string and use it for
the exact capability bounds.
As all the padding is after the strings, if there was padding needed for
alignment the next argument will start in the padding. This is slow, but
just go through the zeroes until we find the real start of the argument
and go from there.
This is not really properly done in this revision, it is a remnant of
the first draft and as such will improperly flag arguments as needing
adjustment after one that needed.
This is known but should be simply fixable by moving this looping
through the zeroes after allocating the capability.
In fs/binfmt_elf.c:create_elf_tables argv and envp are handled exactly the
same but in different loops. I removed the comments from the envp section
for conciseness.
# Testing
To generate a big enough argument, you can use this dirty bash script:
bigarg=""; count=0; while [ $(echo $bigarg | wc -c) -le 20000 ]; do
bigarg=${bigarg}"==${count}==This_should_be_a_really_big_arg_string"
count=$(($count + 1))
done
It is a bit unnecessary and slow but it does allow easily checking that
it starts and ends in the proper places.
Without the patch, the kernel log should show a CPU fault if $bigarg is
passed as an argument or in the environment. With the patch, it should
work fine.
Hopefully this is useful for reviewing, and correct in the first place !
Review branch:
https://git.morello-project.org/Teo-CD/linux/-/tree/review-arg-str-resticte…
Commit itself:
https://git.morello-project.org/Teo-CD/linux/-/commit/a4d9fe880d098e9f9fe1e…
Thanks very much in advance !
Téo
Teo Couprie Diaz (1):
fs: Handle exact bounds for argv and envp
fs/binfmt_elf.c | 111 ++++++++++++++++++++++++++++++++++++++++++--
fs/exec.c | 66 ++++++++++++++++++++++++++
include/linux/elf.h | 4 ++
3 files changed, 178 insertions(+), 3 deletions(-)
--
2.34.1