This series makes it possible for purecap apps to use the io_uring system.
With these patches, all io_uring LTP tests pass in both Purecap and plain AArch64 modes. Note that the LTP tests only address the basic functionality of the io_uring system and a significant portion of the multiplexed functionality is untested in LTP.
I have finished investigating Purecap and plain AArch64 liburing tests and examples and the series is updated accordingly.
v5: - Revert changes in trace/events/io_uring.h - Add new header trace/events/io_uring.h for compat structs - Change cqe_cached/ccqe_sentinel to indices - Move print_sqe and print_cqe macros outside of the function - Rename is_compat64_io_ring_ctx to io_in_compat64 - Add helper for user_data values comparison - Add condition to not change addr fielt to a compat_ptr for opcodes where it's a user_data value stored - Other small fixes suggested by Kevin
v4: - Rebase on top of morello/next - Remove the union for flags in struct compat_io_uring_sqe and only kept a single member - Improve format and move functions as per feedback on v3 - Add a new helper for checking if context is compat - Remove struct conversion in fdinfo and just use macros - Remove the union from struct io_overflow_cqe and just leave the native struct - Fix the cqe_cached/cqe_sentinel mechanism - Separate the fix for the shared ring size's off-by-one error into a new PATCH 6 - Remove the compat_ptr for addr fields that represent user_data values - Extend the trace events accordingly to propagate capabilities - Use copy*_with_ptr routine for copy_msghdr_from_user in a new PATCH 1 - Fix the misuse of addr2 and off in IORING_OP_CONNECT and IORING_OP_POLL_REMOVE
v3: - Introduce Patch 5 which exposes the compat handling logic for epoll_event. This is used then in io_uring/epoll.c. - Introduce Patch 6 which makes sure that when struct iovec is copied from userspace, the capability tags are preserved. - Fix a few sizeof(var) to sizeof(*var). - Use iovec_from_user so that compat handling logic is applied instead of copying directly from user - Add a few missing copy_from_user_with_ptr where suitable.
v2: - Rebase on top of release 6.1 - Remove VM_READ_CAPS/VM_LOAD_CAPS patches as they are already merged - Update commit message in PATCH 1 - Add the generic changes PATCH 2 and PATCH 3 to avoid copying user pointers from/to userspace unnecesarily. These could be upstreamable. - Split "pulling the cqes memeber out" change into PATCH 4 - The changes for PATCH 5 and 6 are now split into their respective files after the rebase. - Format and change organization based on the feedback on the previous version, including creating helpers copy_*_from_* for various uAPI structs - Add comments related to handling of setup flags IORING_SETUP_SQE128 and IORING_SETUP_CQE32 - Add handling for new uAPI structs: io_uring_buf, io_uring_buf_ring, io_uring_buf_reg, io_uring_sync_cancel_reg.
Gitlab issue: https://git.morello-project.org/morello/kernel/linux/-/issues/2
Review branch: https://git.morello-project.org/tudcre01/linux/-/commits/morello/io_uring_v5
Tudor Cretu (10): net: socket: use copy_from_user_with_ptr for struct user_msghdr io_uring/rw: Restrict copy to only uiov->len from userspace io_uring/tctx: Copy only the offset field back to user io_uring: Pull cqes member out from rings struct epoll: Expose compat handling logic of epoll_event io_uring/kbuf: Fix size for shared buffer ring io_uring: Make cqe_cached and cqe_sentinel indices instead of pointers io_uring: Implement compat versions of uAPI structs and handle them io_uring: Allow capability tag access on the shared memory io_uring: Use user pointer type in the uAPI structs
fs/eventpoll.c | 38 ++-- include/linux/eventpoll.h | 4 + include/linux/io_uring_compat.h | 129 +++++++++++++ include/linux/io_uring_types.h | 35 ++-- include/uapi/linux/io_uring.h | 76 ++++---- io_uring/advise.c | 7 +- io_uring/cancel.c | 32 +++- io_uring/cancel.h | 2 +- io_uring/epoll.c | 4 +- io_uring/fdinfo.c | 82 +++++--- io_uring/fs.c | 16 +- io_uring/io_uring.c | 321 +++++++++++++++++++++++--------- io_uring/io_uring.h | 147 +++++++++++++-- io_uring/kbuf.c | 111 +++++++++-- io_uring/kbuf.h | 8 +- io_uring/msg_ring.c | 4 +- io_uring/net.c | 25 +-- io_uring/openclose.c | 4 +- io_uring/poll.c | 8 +- io_uring/rsrc.c | 138 +++++++++++--- io_uring/rw.c | 22 +-- io_uring/statx.c | 4 +- io_uring/tctx.c | 56 +++++- io_uring/timeout.c | 14 +- io_uring/uring_cmd.c | 5 + io_uring/uring_cmd.h | 4 + io_uring/xattr.c | 12 +- net/socket.c | 2 +- 28 files changed, 1000 insertions(+), 310 deletions(-) create mode 100644 include/linux/io_uring_compat.h
struct user_msghdr contains user pointers, so use the correct copy routine.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- net/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/socket.c b/net/socket.c index 741086ceff95d..0ac6d2a16808e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2418,7 +2418,7 @@ static int copy_msghdr_from_user(struct msghdr *kmsg, struct user_msghdr msg; ssize_t err;
- if (copy_from_user(&msg, umsg, sizeof(*umsg))) + if (copy_from_user_with_ptr(&msg, umsg, sizeof(*umsg))) return -EFAULT;
err = __copy_msghdr(kmsg, &msg, save_addr);
Only the len member is needed, so restrict the copy_from_user to that.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- io_uring/rw.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/io_uring/rw.c b/io_uring/rw.c index 1393cdae75854..2edca190450ee 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -55,7 +55,6 @@ static int io_iov_compat_buffer_select_prep(struct io_rw *rw) static int io_iov_buffer_select_prep(struct io_kiocb *req) { struct iovec __user *uiov; - struct iovec iov; struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
if (rw->len != 1) @@ -67,9 +66,8 @@ static int io_iov_buffer_select_prep(struct io_kiocb *req) #endif
uiov = u64_to_user_ptr(rw->addr); - if (copy_from_user(&iov, uiov, sizeof(*uiov))) + if (get_user(rw->len, &uiov->iov_len)) return -EFAULT; - rw->len = iov.iov_len; return 0; }
Upon successful return of the io_uring_register system call, the offset field will contain the value of the registered file descriptor to be used for future io_uring_enter system calls. The rest of the struct doesn't need to be copied back to userspace, so restrict the copy_to_user call only to the offset field.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- io_uring/tctx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/tctx.c b/io_uring/tctx.c index 4324b1cf1f6af..96f77450cf4e2 100644 --- a/io_uring/tctx.c +++ b/io_uring/tctx.c @@ -289,7 +289,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, break;
reg.offset = ret; - if (copy_to_user(&arg[i], ®, sizeof(reg))) { + if (put_user(reg.offset, &arg[i].offset)) { fput(tctx->registered_rings[reg.offset]); tctx->registered_rings[reg.offset] = NULL; ret = -EFAULT;
Pull cqes member out from rings struct so that we are able to have a union between cqes and cqes_compat. This is done in a similar way to commit 75b28affdd6a ("io_uring: allocate the two rings together"), where sq_array was pulled out from the rings struct.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/io_uring_types.h | 18 +++++++++-------- io_uring/fdinfo.c | 2 +- io_uring/io_uring.c | 35 ++++++++++++++++++++++++---------- 3 files changed, 36 insertions(+), 19 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index df7d4febc38a4..440179029a8f0 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -141,14 +141,6 @@ struct io_rings { * ordered with any other data. */ u32 cq_overflow; - /* - * Ring buffer of completion events. - * - * The kernel writes completion events fresh every time they are - * produced, so the application is allowed to modify pending - * entries. - */ - struct io_uring_cqe cqes[] ____cacheline_aligned_in_smp; };
struct io_restriction { @@ -270,7 +262,17 @@ struct io_ring_ctx { struct xarray personalities; u32 pers_next;
+ /* completion data */ struct { + /* + * Ring buffer of completion events. + * + * The kernel writes completion events fresh every time they are + * produced, so the application is allowed to modify pending + * entries. + */ + struct io_uring_cqe *cqes; + /* * We cache a range of free CQEs we can use, once exhausted it * should go through a slower range setup, see __io_get_cqe() diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index 2e04850a657b0..bc8c9d764bc13 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -119,7 +119,7 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, cq_entries = min(cq_tail - cq_head, ctx->cq_entries); for (i = 0; i < cq_entries; i++) { unsigned int entry = i + cq_head; - struct io_uring_cqe *cqe = &r->cqes[(entry & cq_mask) << cq_shift]; + struct io_uring_cqe *cqe = &ctx->cqes[(entry & cq_mask) << cq_shift];
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", entry & cq_mask, cqe->user_data, cqe->res, diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index df41a63c642c1..707229ae04dc8 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -743,7 +743,6 @@ bool io_req_cqe_overflow(struct io_kiocb *req) */ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) { - struct io_rings *rings = ctx->rings; unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1); unsigned int free, queued, len;
@@ -768,14 +767,14 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) len <<= 1; }
- ctx->cqe_cached = &rings->cqes[off]; + ctx->cqe_cached = &ctx->cqes[off]; ctx->cqe_sentinel = ctx->cqe_cached + len;
ctx->cached_cq_tail++; ctx->cqe_cached++; if (ctx->flags & IORING_SETUP_CQE32) ctx->cqe_cached++; - return &rings->cqes[off]; + return &ctx->cqes[off]; }
bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, @@ -2476,13 +2475,28 @@ static void *io_mem_alloc(size_t size) }
static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, - unsigned int cq_entries, size_t *sq_offset) + unsigned int cq_entries, size_t *sq_offset, + size_t *cq_offset) { struct io_rings *rings; - size_t off, sq_array_size; + size_t off, cq_array_size, sq_array_size; + + off = sizeof(*rings); + +#ifdef CONFIG_SMP + off = ALIGN(off, SMP_CACHE_BYTES); + if (off == 0) + return SIZE_MAX; +#endif + + if (cq_offset) + *cq_offset = off; + + cq_array_size = array_size(sizeof(struct io_uring_cqe), cq_entries); + if (cq_array_size == SIZE_MAX) + return SIZE_MAX;
- off = struct_size(rings, cqes, cq_entries); - if (off == SIZE_MAX) + if (check_add_overflow(off, cq_array_size, &off)) return SIZE_MAX; if (ctx->flags & IORING_SETUP_CQE32) { if (check_shl_overflow(off, 1, &off)) @@ -3314,13 +3328,13 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, struct io_uring_params *p) { struct io_rings *rings; - size_t size, sq_array_offset; + size_t size, cqes_offset, sq_array_offset;
/* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; ctx->cq_entries = p->cq_entries;
- size = rings_size(ctx, p->sq_entries, p->cq_entries, &sq_array_offset); + size = rings_size(ctx, p->sq_entries, p->cq_entries, &sq_array_offset, &cqes_offset); if (size == SIZE_MAX) return -EOVERFLOW;
@@ -3329,6 +3343,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -ENOMEM;
ctx->rings = rings; + ctx->cqes = (struct io_uring_cqe *)((char *)rings + cqes_offset); ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); rings->sq_ring_mask = p->sq_entries - 1; rings->cq_ring_mask = p->cq_entries - 1; @@ -3533,7 +3548,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask); p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries); p->cq_off.overflow = offsetof(struct io_rings, cq_overflow); - p->cq_off.cqes = offsetof(struct io_rings, cqes); + p->cq_off.cqes = (char *)ctx->cqes - (char *)ctx->rings; p->cq_off.flags = offsetof(struct io_rings, cq_flags);
p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP |
Move the logic that copies an epoll_event from user to its own function and expose it in the eventpoll.h header. This allows other subsystems such as io_uring to handle epoll_events.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- fs/eventpoll.c | 38 +++++++++++++++++++++++++------------- include/linux/eventpoll.h | 4 ++++ 2 files changed, 29 insertions(+), 13 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 7e33a2781dec8..c6afc25b1d4ee 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2197,6 +2197,27 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, return error; }
+static int get_compat_epoll_event(struct epoll_event *epds, + const void __user *user_epds) +{ + struct compat_epoll_event compat_epds; + + if (unlikely(copy_from_user(&compat_epds, user_epds, sizeof(compat_epds)))) + return -EFAULT; + epds->events = compat_epds.events; + epds->data = (__kernel_uintptr_t)as_user_ptr(compat_epds.data); + return 0; +} + +int copy_epoll_event_from_user(struct epoll_event *epds, + const void __user *user_epds, + bool compat) +{ + if (compat) + return get_compat_epoll_event(epds, user_epds); + return copy_from_user_with_ptr(epds, user_epds, sizeof(*epds)); +} + /* * The following function implements the controller interface for * the eventpoll file that enables the insertion/removal/change of @@ -2211,20 +2232,11 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, struct epoll_event epds;
if (ep_op_has_event(op)) { - if (in_compat_syscall()) { - struct compat_epoll_event compat_epds; - - if (copy_from_user(&compat_epds, event, - sizeof(struct compat_epoll_event))) - return -EFAULT; + int ret;
- epds.events = compat_epds.events; - epds.data = (__kernel_uintptr_t)as_user_ptr(compat_epds.data); - } else { - if (copy_from_user_with_ptr(&epds, event, - sizeof(struct epoll_event))) - return -EFAULT; - } + ret = copy_epoll_event_from_user(&epds, event, in_compat_syscall()); + if (ret) + return -EFAULT; }
return do_epoll_ctl(epfd, op, fd, &epds, false); diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 457811d82ff20..62b0829354a0e 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -103,4 +103,8 @@ epoll_put_uevent(__poll_t revents, __kernel_uintptr_t data, } #endif
+int copy_epoll_event_from_user(struct epoll_event *epds, + const void __user *user_epds, + bool compat); + #endif /* #ifndef _LINUX_EVENTPOLL_H */
The size of the ring is the product of ring_entries and the size of struct io_uring_buf. Using struct_size is equivalent to (ring_entries + 1) * sizeof(struct io_uring_buf) and generates an off-by-one error. Fix it by using size_mul directly.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- io_uring/kbuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index e2c46889d5fab..182e594b56c6e 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -509,7 +509,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) }
pages = io_pin_pages(reg.ring_addr, - struct_size(br, bufs, reg.ring_entries), + size_mul(sizeof(struct io_uring_buf), reg.ring_entries), &nr_pages); if (IS_ERR(pages)) { kfree(free_bl);
The change refactors the caching mechanism; it doesn't change the implementation behaviour. Making cqe_cahced and cqe_sentinel indices instead of pointers allows easier bookkeeping in upcoming patches that enable compat handling at the io_uring_cqe level.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/io_uring_types.h | 4 ++-- io_uring/io_uring.c | 2 +- io_uring/io_uring.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 440179029a8f0..3d14c6feb51b6 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -277,8 +277,8 @@ struct io_ring_ctx { * We cache a range of free CQEs we can use, once exhausted it * should go through a slower range setup, see __io_get_cqe() */ - struct io_uring_cqe *cqe_cached; - struct io_uring_cqe *cqe_sentinel; + unsigned int cqe_cached; + unsigned int cqe_sentinel;
unsigned cached_cq_tail; unsigned cq_entries; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 707229ae04dc8..fb6d07e1e7358 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -767,7 +767,7 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) len <<= 1; }
- ctx->cqe_cached = &ctx->cqes[off]; + ctx->cqe_cached = off; ctx->cqe_sentinel = ctx->cqe_cached + len;
ctx->cached_cq_tail++; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 50bc3af449534..6d9720dd8f469 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -97,7 +97,7 @@ static inline struct io_uring_cqe *io_get_cqe_overflow(struct io_ring_ctx *ctx, bool overflow) { if (likely(ctx->cqe_cached < ctx->cqe_sentinel)) { - struct io_uring_cqe *cqe = ctx->cqe_cached; + struct io_uring_cqe *cqe = &ctx->cqes[ctx->cqe_cached];
ctx->cached_cq_tail++; ctx->cqe_cached++;
Introduce compat versions of the structs exposed in the uAPI headers that might contain pointers as a member. Also, implement functions that convert the compat versions to the native versions of the struct.
A subsequent patch is going to change the io_uring structs to enable them to support new architectures. On such architectures, the current struct layout still needs to be supported for compat tasks.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/io_uring_compat.h | 129 ++++++++++++++++++ include/linux/io_uring_types.h | 11 +- io_uring/cancel.c | 28 +++- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 81 ++++++----- io_uring/io_uring.c | 229 ++++++++++++++++++++++---------- io_uring/io_uring.h | 108 ++++++++++++--- io_uring/kbuf.c | 98 ++++++++++++-- io_uring/kbuf.h | 6 +- io_uring/net.c | 5 +- io_uring/rsrc.c | 110 +++++++++++++-- io_uring/tctx.c | 56 +++++++- io_uring/uring_cmd.h | 4 + 13 files changed, 716 insertions(+), 151 deletions(-) create mode 100644 include/linux/io_uring_compat.h
diff --git a/include/linux/io_uring_compat.h b/include/linux/io_uring_compat.h new file mode 100644 index 0000000000000..3e91babe2e2ba --- /dev/null +++ b/include/linux/io_uring_compat.h @@ -0,0 +1,129 @@ +#ifndef IO_URING_COMPAT_H +#define IO_URING_COMPAT_H + +#include <linux/types.h> +#include <linux/time.h> +#include <linux/fs.h> + +struct compat_io_uring_sqe { + __u8 opcode; + __u8 flags; + __u16 ioprio; + __s32 fd; + union { + __u64 off; + __u64 addr2; + struct { + __u32 cmd_op; + __u32 __pad1; + }; + }; + union { + __u64 addr; + __u64 splice_off_in; + }; + __u32 len; + /* This member is actually a union in the native struct */ + __kernel_rwf_t rw_flags; + __u64 user_data; + union { + __u16 buf_index; + __u16 buf_group; + } __packed; + __u16 personality; + union { + __s32 splice_fd_in; + __u32 file_index; + struct { + __u16 addr_len; + __u16 __pad3[1]; + }; + }; + union { + struct { + __u64 addr3; + __u64 __pad2[1]; + }; + __u8 cmd[0]; + }; +}; + +struct compat_io_uring_cqe { + __u64 user_data; + __s32 res; + __u32 flags; + __u64 big_cqe[]; +}; + +struct compat_io_uring_files_update { + __u32 offset; + __u32 resv; + __aligned_u64 fds; +}; + +struct compat_io_uring_rsrc_register { + __u32 nr; + __u32 flags; + __u64 resv2; + __aligned_u64 data; + __aligned_u64 tags; +}; + +struct compat_io_uring_rsrc_update { + __u32 offset; + __u32 resv; + __aligned_u64 data; +}; + +struct compat_io_uring_rsrc_update2 { + __u32 offset; + __u32 resv; + __aligned_u64 data; + __aligned_u64 tags; + __u32 nr; + __u32 resv2; +}; + +struct compat_io_uring_buf { + __u64 addr; + __u32 len; + __u16 bid; + __u16 resv; +}; + +struct compat_io_uring_buf_ring { + union { + struct { + __u64 resv1; + __u32 resv2; + __u16 resv3; + __u16 tail; + }; + struct compat_io_uring_buf bufs[0]; + }; +}; + +struct compat_io_uring_buf_reg { + __u64 ring_addr; + __u32 ring_entries; + __u16 bgid; + __u16 pad; + __u64 resv[3]; +}; + +struct compat_io_uring_getevents_arg { + __u64 sigmask; + __u32 sigmask_sz; + __u32 pad; + __u64 ts; +}; + +struct compat_io_uring_sync_cancel_reg { + __u64 addr; + __s32 fd; + __u32 flags; + struct __kernel_timespec timeout; + __u64 pad[4]; +}; + +#endif diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 3d14c6feb51b6..9506a8858f0ff 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -5,6 +5,7 @@ #include <linux/task_work.h> #include <linux/bitmap.h> #include <linux/llist.h> +#include <linux/io_uring_compat.h> #include <uapi/linux/io_uring.h>
struct io_wq_work_node { @@ -216,7 +217,10 @@ struct io_ring_ctx { * array. */ u32 *sq_array; - struct io_uring_sqe *sq_sqes; + union { + struct compat_io_uring_sqe *sq_sqes_compat; + struct io_uring_sqe *sq_sqes; + }; unsigned cached_sq_head; unsigned sq_entries;
@@ -271,7 +275,10 @@ struct io_ring_ctx { * produced, so the application is allowed to modify pending * entries. */ - struct io_uring_cqe *cqes; + union { + struct compat_io_uring_cqe *cqes_compat; + struct io_uring_cqe *cqes; + };
/* * We cache a range of free CQEs we can use, once exhausted it diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 2291a53cdabd1..0f942da7455b5 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -27,6 +27,32 @@ struct io_cancel { #define CANCEL_FLAGS (IORING_ASYNC_CANCEL_ALL | IORING_ASYNC_CANCEL_FD | \ IORING_ASYNC_CANCEL_ANY | IORING_ASYNC_CANCEL_FD_FIXED)
+static int get_compat64_io_uring_sync_cancel_reg(struct io_uring_sync_cancel_reg *sc, + const void __user *user_sc) +{ + struct compat_io_uring_sync_cancel_reg compat_sc; + + if (copy_from_user(&compat_sc, user_sc, sizeof(compat_sc))) + return -EFAULT; + sc->addr = compat_sc.addr; + sc->fd = compat_sc.fd; + sc->flags = compat_sc.flags; + sc->timeout = compat_sc.timeout; + memcpy(sc->pad, compat_sc.pad, sizeof(sc->pad)); + return 0; +} + +static int copy_io_uring_sync_cancel_reg_from_user(struct io_ring_ctx *ctx, + struct io_uring_sync_cancel_reg *sc, + const void __user *arg) +{ + if (io_in_compat64(ctx)) + return get_compat64_io_uring_sync_cancel_reg(sc, arg); + if (copy_from_user(sc, arg, sizeof(*sc))) + return -EFAULT; + return 0; +} + static bool io_cancel_cb(struct io_wq_work *work, void *data) { struct io_kiocb *req = container_of(work, struct io_kiocb, work); @@ -243,7 +269,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) DEFINE_WAIT(wait); int ret;
- if (copy_from_user(&sc, arg, sizeof(sc))) + if (copy_io_uring_sync_cancel_reg_from_user(ctx, &sc, arg)) return -EFAULT; if (sc.flags & ~CANCEL_FLAGS) return -EINVAL; diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 9aa74d2c80bc4..d5580ff465c3e 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -40,7 +40,7 @@ int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) struct epoll_event __user *ev;
ev = u64_to_user_ptr(READ_ONCE(sqe->addr)); - if (copy_from_user(&epoll->event, ev, sizeof(*ev))) + if (copy_epoll_event_from_user(&epoll->event, ev, req->ctx->compat)) return -EFAULT; }
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index bc8c9d764bc13..fd02317627ae7 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -48,6 +48,38 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, return 0; }
+#define print_sqe(m, sqe, sq_idx, sq_shift) \ + do { \ + seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, " \ + "addr:0x%llx, rw_flags:0x%x, buf_index:%d " \ + "user_data:%llu", \ + sq_idx, io_uring_get_opcode((sqe)->opcode), (sqe)->fd, \ + (sqe)->flags, (unsigned long long) (sqe)->off, \ + (unsigned long long) (sqe)->addr, (sqe)->rw_flags, \ + (sqe)->buf_index, (sqe)->user_data); \ + if (sq_shift) { \ + u64 *sqeb = (void *) ((sqe) + 1); \ + int size = sizeof(*(sqe)) / sizeof(u64); \ + int j; \ + \ + for (j = 0; j < size; j++) { \ + seq_printf(m, ", e%d:0x%llx", j, \ + (unsigned long long) *sqeb); \ + sqeb++; \ + } \ + } \ + } while (0) + +#define print_cqe(m, cqe, cq_idx, cq_shift) \ + do { \ + seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", \ + cq_idx, (cqe)->user_data, (cqe)->res, \ + (cqe)->flags); \ + if (cq_shift) \ + seq_printf(m, ", extra1:%llu, extra2:%llu\n", \ + (cqe)->big_cqe[0], (cqe)->big_cqe[1]); \ + } while (0) + static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) { @@ -88,45 +120,32 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, sq_entries = min(sq_tail - sq_head, ctx->sq_entries); for (i = 0; i < sq_entries; i++) { unsigned int entry = i + sq_head; - struct io_uring_sqe *sqe; - unsigned int sq_idx; + unsigned int sq_idx, sq_off;
sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]); if (sq_idx > sq_mask) continue; - sqe = &ctx->sq_sqes[sq_idx << sq_shift]; - seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, " - "addr:0x%llx, rw_flags:0x%x, buf_index:%d " - "user_data:%llu", - sq_idx, io_uring_get_opcode(sqe->opcode), sqe->fd, - sqe->flags, (unsigned long long) sqe->off, - (unsigned long long) sqe->addr, sqe->rw_flags, - sqe->buf_index, sqe->user_data); - if (sq_shift) { - u64 *sqeb = (void *) (sqe + 1); - int size = sizeof(struct io_uring_sqe) / sizeof(u64); - int j; - - for (j = 0; j < size; j++) { - seq_printf(m, ", e%d:0x%llx", j, - (unsigned long long) *sqeb); - sqeb++; - } - } + sq_off = sq_idx << sq_shift; + + if (io_in_compat64(ctx)) + print_sqe(m, &ctx->sq_sqes_compat[sq_off], sq_idx, sq_shift); + else + print_sqe(m, &ctx->sq_sqes[sq_off], sq_idx, sq_shift); + seq_printf(m, "\n"); } seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head); cq_entries = min(cq_tail - cq_head, ctx->cq_entries); for (i = 0; i < cq_entries; i++) { unsigned int entry = i + cq_head; - struct io_uring_cqe *cqe = &ctx->cqes[(entry & cq_mask) << cq_shift]; - - seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", - entry & cq_mask, cqe->user_data, cqe->res, - cqe->flags); - if (cq_shift) - seq_printf(m, ", extra1:%llu, extra2:%llu\n", - cqe->big_cqe[0], cqe->big_cqe[1]); + unsigned int cq_idx = entry & cq_mask; + unsigned int cq_off = cq_idx << cq_shift; + + if (io_in_compat64(ctx)) + print_cqe(m, &ctx->cqes_compat[cq_off], cq_idx, cq_shift); + else + print_cqe(m, &ctx->cqes[cq_off], cq_idx, cq_shift); + seq_printf(m, "\n"); }
@@ -192,12 +211,14 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
seq_printf(m, " user_data=%llu, res=%d, flags=%x\n", cqe->user_data, cqe->res, cqe->flags); - }
spin_unlock(&ctx->completion_lock); }
+#undef print_sqe +#undef print_cqe + __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) { struct io_ring_ctx *ctx = f->private_data; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index fb6d07e1e7358..a355f2a2e7ac3 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -152,6 +152,37 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
static struct kmem_cache *req_cachep;
+static int get_compat64_io_uring_getevents_arg(struct io_uring_getevents_arg *arg, + const void __user *user_arg) +{ + struct compat_io_uring_getevents_arg compat_arg; + + if (copy_from_user(&compat_arg, user_arg, sizeof(compat_arg))) + return -EFAULT; + arg->sigmask = compat_arg.sigmask; + arg->sigmask_sz = compat_arg.sigmask_sz; + arg->pad = compat_arg.pad; + arg->ts = compat_arg.ts; + return 0; +} + +static int copy_io_uring_getevents_arg_from_user(struct io_ring_ctx *ctx, + struct io_uring_getevents_arg *arg, + const void __user *argp, + size_t size) +{ + if (io_in_compat64(ctx)) { + if (size != sizeof(struct compat_io_uring_getevents_arg)) + return -EINVAL; + return get_compat64_io_uring_getevents_arg(arg, argp); + } + if (size != sizeof(*arg)) + return -EINVAL; + if (copy_from_user(arg, argp, sizeof(*arg))) + return -EFAULT; + return 0; +} + struct sock *io_uring_get_socket(struct file *file) { #if defined(CONFIG_UNIX) @@ -604,14 +635,10 @@ void io_cq_unlock_post(struct io_ring_ctx *ctx) static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { bool all_flushed; - size_t cqe_size = sizeof(struct io_uring_cqe);
if (!force && __io_cqring_events(ctx) == ctx->cq_entries) return false;
- if (ctx->flags & IORING_SETUP_CQE32) - cqe_size <<= 1; - io_cq_lock(ctx); while (!list_empty(&ctx->cq_overflow_list)) { struct io_uring_cqe *cqe = io_get_cqe_overflow(ctx, true); @@ -621,9 +648,18 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) break; ocqe = list_first_entry(&ctx->cq_overflow_list, struct io_overflow_cqe, list); - if (cqe) - memcpy(cqe, &ocqe->cqe, cqe_size); - else + if (cqe) { + u64 extra1 = 0; + u64 extra2 = 0; + + if (ctx->flags & IORING_SETUP_CQE32) { + extra1 = ocqe->cqe.big_cqe[0]; + extra2 = ocqe->cqe.big_cqe[1]; + } + + __io_fill_cqe(ctx, cqe, ocqe->cqe.user_data, ocqe->cqe.res, + ocqe->cqe.flags, extra1, extra2); + } else io_account_cq_overflow(ctx);
list_del(&ocqe->list); @@ -735,6 +771,15 @@ bool io_req_cqe_overflow(struct io_kiocb *req) req->cqe.res, req->cqe.flags, req->extra1, req->extra2); } +/* + * Retrieves the pointer of the ith CQE + */ +struct io_uring_cqe *__io_get_ith_cqe(struct io_ring_ctx *ctx, unsigned int i) +{ + return io_in_compat64(ctx) ? + (struct io_uring_cqe *)&ctx->cqes_compat[i] : + &ctx->cqes[i]; +}
/* * writes to the cq entry need to come after reading head; the @@ -774,7 +819,7 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) ctx->cqe_cached++; if (ctx->flags & IORING_SETUP_CQE32) ctx->cqe_cached++; - return &ctx->cqes[off]; + return __io_get_ith_cqe(ctx, off); }
bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, @@ -793,14 +838,7 @@ bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags if (likely(cqe)) { trace_io_uring_complete(ctx, NULL, user_data, res, cflags, 0, 0);
- WRITE_ONCE(cqe->user_data, user_data); - WRITE_ONCE(cqe->res, res); - WRITE_ONCE(cqe->flags, cflags); - - if (ctx->flags & IORING_SETUP_CQE32) { - WRITE_ONCE(cqe->big_cqe[0], 0); - WRITE_ONCE(cqe->big_cqe[1], 0); - } + __io_fill_cqe(ctx, cqe, user_data, res, cflags, 0, 0); return true; }
@@ -2240,7 +2278,9 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) /* double index for 128-byte SQEs, twice as long */ if (ctx->flags & IORING_SETUP_SQE128) head <<= 1; - return &ctx->sq_sqes[head]; + return io_in_compat64(ctx) ? + (struct io_uring_sqe *)&ctx->sq_sqes_compat[head] : + &ctx->sq_sqes[head]; }
/* drop invalid entries */ @@ -2267,6 +2307,7 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) do { const struct io_uring_sqe *sqe; struct io_kiocb *req; + struct io_uring_sqe native_sqe[2];
if (unlikely(!io_alloc_req_refill(ctx))) break; @@ -2276,6 +2317,11 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) io_req_add_to_cache(req, ctx); break; } + if (io_in_compat64(ctx)) { + convert_compat64_io_uring_sqe(ctx, native_sqe, + (struct compat_io_uring_sqe *)sqe); + sqe = native_sqe; + }
/* * Continue submitting even for sqe failure if the @@ -2480,6 +2526,9 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries { struct io_rings *rings; size_t off, cq_array_size, sq_array_size; + size_t cqe_size = io_in_compat64(ctx) ? + sizeof(struct compat_io_uring_cqe) : + sizeof(struct io_uring_cqe);
off = sizeof(*rings);
@@ -2492,7 +2541,7 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries if (cq_offset) *cq_offset = off;
- cq_array_size = array_size(sizeof(struct io_uring_cqe), cq_entries); + cq_array_size = array_size(cqe_size, cq_entries); if (cq_array_size == SIZE_MAX) return SIZE_MAX;
@@ -3120,20 +3169,19 @@ static unsigned long io_uring_nommu_get_unmapped_area(struct file *file,
#endif /* !CONFIG_MMU */
-static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz) +static int io_validate_ext_arg(struct io_ring_ctx *ctx, unsigned int flags, + const void __user *argp, size_t argsz) { if (flags & IORING_ENTER_EXT_ARG) { struct io_uring_getevents_arg arg;
- if (argsz != sizeof(arg)) - return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) - return -EFAULT; + return copy_io_uring_getevents_arg_from_user(ctx, &arg, argp, argsz); } return 0; }
-static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz, +static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned int flags, + const void __user *argp, size_t *argsz, #ifdef CONFIG_CHERI_PURECAP_UABI struct __kernel_timespec * __capability *ts, const sigset_t * __capability *sig) @@ -3143,6 +3191,7 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz #endif { struct io_uring_getevents_arg arg; + int ret;
/* * If EXT_ARG isn't set, then we have no timespec and the argp pointer @@ -3158,10 +3207,9 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz * EXT_ARG is set - ensure we agree on the size of it and copy in our * timespec and sigset_t pointers if good. */ - if (*argsz != sizeof(arg)) - return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) - return -EFAULT; + ret = copy_io_uring_getevents_arg_from_user(ctx, &arg, argp, *argsz); + if (ret) + return ret; if (arg.pad) return -EINVAL; *sig = u64_to_user_ptr(arg.sigmask); @@ -3268,7 +3316,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, */ mutex_lock(&ctx->uring_lock); iopoll_locked: - ret2 = io_validate_ext_arg(flags, argp, argsz); + ret2 = io_validate_ext_arg(ctx, flags, argp, argsz); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); @@ -3279,7 +3327,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, const sigset_t __user *sig; struct __kernel_timespec __user *ts;
- ret2 = io_get_ext_arg(flags, argp, &argsz, &ts, &sig); + ret2 = io_get_ext_arg(ctx, flags, argp, &argsz, &ts, &sig); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); @@ -3329,6 +3377,9 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, { struct io_rings *rings; size_t size, cqes_offset, sq_array_offset; + size_t sqe_size = io_in_compat64(ctx) ? + sizeof(struct compat_io_uring_sqe) : + sizeof(struct io_uring_sqe);
/* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; @@ -3351,9 +3402,9 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, rings->cq_ring_entries = p->cq_entries;
if (p->flags & IORING_SETUP_SQE128) - size = array_size(2 * sizeof(struct io_uring_sqe), p->sq_entries); + size = array_size(2 * sqe_size, p->sq_entries); else - size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); + size = array_size(sqe_size, p->sq_entries); if (size == SIZE_MAX) { io_mem_free(ctx->rings); ctx->rings = NULL; @@ -4107,48 +4158,48 @@ static int __init io_uring_init(void) #define BUILD_BUG_SQE_ELEM_SIZE(eoffset, esize, ename) \ __BUILD_BUG_VERIFY_OFFSET_SIZE(struct io_uring_sqe, eoffset, esize, ename) BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 64); - BUILD_BUG_SQE_ELEM(0, __u8, opcode); - BUILD_BUG_SQE_ELEM(1, __u8, flags); - BUILD_BUG_SQE_ELEM(2, __u16, ioprio); - BUILD_BUG_SQE_ELEM(4, __s32, fd); - BUILD_BUG_SQE_ELEM(8, __u64, off); - BUILD_BUG_SQE_ELEM(8, __u64, addr2); - BUILD_BUG_SQE_ELEM(8, __u32, cmd_op); + BUILD_BUG_SQE_ELEM(0, __u8, opcode); + BUILD_BUG_SQE_ELEM(1, __u8, flags); + BUILD_BUG_SQE_ELEM(2, __u16, ioprio); + BUILD_BUG_SQE_ELEM(4, __s32, fd); + BUILD_BUG_SQE_ELEM(8, __u64, off); + BUILD_BUG_SQE_ELEM(8, __u64, addr2); + BUILD_BUG_SQE_ELEM(8, __u32, cmd_op); BUILD_BUG_SQE_ELEM(12, __u32, __pad1); - BUILD_BUG_SQE_ELEM(16, __u64, addr); - BUILD_BUG_SQE_ELEM(16, __u64, splice_off_in); - BUILD_BUG_SQE_ELEM(24, __u32, len); + BUILD_BUG_SQE_ELEM(16, __u64, addr); + BUILD_BUG_SQE_ELEM(16, __u64, splice_off_in); + BUILD_BUG_SQE_ELEM(24, __u32, len); BUILD_BUG_SQE_ELEM(28, __kernel_rwf_t, rw_flags); BUILD_BUG_SQE_ELEM(28, /* compat */ int, rw_flags); BUILD_BUG_SQE_ELEM(28, /* compat */ __u32, rw_flags); - BUILD_BUG_SQE_ELEM(28, __u32, fsync_flags); - BUILD_BUG_SQE_ELEM(28, /* compat */ __u16, poll_events); - BUILD_BUG_SQE_ELEM(28, __u32, poll32_events); - BUILD_BUG_SQE_ELEM(28, __u32, sync_range_flags); - BUILD_BUG_SQE_ELEM(28, __u32, msg_flags); - BUILD_BUG_SQE_ELEM(28, __u32, timeout_flags); - BUILD_BUG_SQE_ELEM(28, __u32, accept_flags); - BUILD_BUG_SQE_ELEM(28, __u32, cancel_flags); - BUILD_BUG_SQE_ELEM(28, __u32, open_flags); - BUILD_BUG_SQE_ELEM(28, __u32, statx_flags); - BUILD_BUG_SQE_ELEM(28, __u32, fadvise_advice); - BUILD_BUG_SQE_ELEM(28, __u32, splice_flags); - BUILD_BUG_SQE_ELEM(28, __u32, rename_flags); - BUILD_BUG_SQE_ELEM(28, __u32, unlink_flags); - BUILD_BUG_SQE_ELEM(28, __u32, hardlink_flags); - BUILD_BUG_SQE_ELEM(28, __u32, xattr_flags); - BUILD_BUG_SQE_ELEM(28, __u32, msg_ring_flags); - BUILD_BUG_SQE_ELEM(32, __u64, user_data); - BUILD_BUG_SQE_ELEM(40, __u16, buf_index); - BUILD_BUG_SQE_ELEM(40, __u16, buf_group); - BUILD_BUG_SQE_ELEM(42, __u16, personality); - BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in); - BUILD_BUG_SQE_ELEM(44, __u32, file_index); - BUILD_BUG_SQE_ELEM(44, __u16, addr_len); - BUILD_BUG_SQE_ELEM(46, __u16, __pad3[0]); - BUILD_BUG_SQE_ELEM(48, __u64, addr3); + BUILD_BUG_SQE_ELEM(28, __u32, fsync_flags); + BUILD_BUG_SQE_ELEM(28, /* compat */ __u16, poll_events); + BUILD_BUG_SQE_ELEM(28, __u32, poll32_events); + BUILD_BUG_SQE_ELEM(28, __u32, sync_range_flags); + BUILD_BUG_SQE_ELEM(28, __u32, msg_flags); + BUILD_BUG_SQE_ELEM(28, __u32, timeout_flags); + BUILD_BUG_SQE_ELEM(28, __u32, accept_flags); + BUILD_BUG_SQE_ELEM(28, __u32, cancel_flags); + BUILD_BUG_SQE_ELEM(28, __u32, open_flags); + BUILD_BUG_SQE_ELEM(28, __u32, statx_flags); + BUILD_BUG_SQE_ELEM(28, __u32, fadvise_advice); + BUILD_BUG_SQE_ELEM(28, __u32, splice_flags); + BUILD_BUG_SQE_ELEM(28, __u32, rename_flags); + BUILD_BUG_SQE_ELEM(28, __u32, unlink_flags); + BUILD_BUG_SQE_ELEM(28, __u32, hardlink_flags); + BUILD_BUG_SQE_ELEM(28, __u32, xattr_flags); + BUILD_BUG_SQE_ELEM(28, __u32, msg_ring_flags); + BUILD_BUG_SQE_ELEM(32, __u64, user_data); + BUILD_BUG_SQE_ELEM(40, __u16, buf_index); + BUILD_BUG_SQE_ELEM(40, __u16, buf_group); + BUILD_BUG_SQE_ELEM(42, __u16, personality); + BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in); + BUILD_BUG_SQE_ELEM(44, __u32, file_index); + BUILD_BUG_SQE_ELEM(44, __u16, addr_len); + BUILD_BUG_SQE_ELEM(46, __u16, __pad3[0]); + BUILD_BUG_SQE_ELEM(48, __u64, addr3); BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd); - BUILD_BUG_SQE_ELEM(56, __u64, __pad2); + BUILD_BUG_SQE_ELEM(56, __u64, __pad2);
BUILD_BUG_ON(sizeof(struct io_uring_files_update) != sizeof(struct io_uring_rsrc_update)); @@ -4160,6 +4211,46 @@ static int __init io_uring_init(void) BUILD_BUG_ON(offsetof(struct io_uring_buf, resv) != offsetof(struct io_uring_buf_ring, tail));
+#ifdef CONFIG_COMPAT64 +#define BUILD_BUG_COMPAT_SQE_ELEM(eoffset, etype, ename) \ + __BUILD_BUG_VERIFY_OFFSET_SIZE(struct compat_io_uring_sqe, eoffset, sizeof(etype), ename) +#define BUILD_BUG_COMPAT_SQE_ELEM_SIZE(eoffset, esize, ename) \ + __BUILD_BUG_VERIFY_OFFSET_SIZE(struct compat_io_uring_sqe, eoffset, esize, ename) + BUILD_BUG_ON(sizeof(struct compat_io_uring_sqe) != 64); + BUILD_BUG_COMPAT_SQE_ELEM(0, __u8, opcode); + BUILD_BUG_COMPAT_SQE_ELEM(1, __u8, flags); + BUILD_BUG_COMPAT_SQE_ELEM(2, __u16, ioprio); + BUILD_BUG_COMPAT_SQE_ELEM(4, __s32, fd); + BUILD_BUG_COMPAT_SQE_ELEM(8, __u64, off); + BUILD_BUG_COMPAT_SQE_ELEM(8, __u64, addr2); + BUILD_BUG_COMPAT_SQE_ELEM(8, __u32, cmd_op); + BUILD_BUG_COMPAT_SQE_ELEM(12, __u32, __pad1); + BUILD_BUG_COMPAT_SQE_ELEM(16, __u64, addr); + BUILD_BUG_COMPAT_SQE_ELEM(16, __u64, splice_off_in); + BUILD_BUG_COMPAT_SQE_ELEM(24, __u32, len); + BUILD_BUG_COMPAT_SQE_ELEM(28, __kernel_rwf_t, rw_flags); + BUILD_BUG_COMPAT_SQE_ELEM(32, __u64, user_data); + BUILD_BUG_COMPAT_SQE_ELEM(40, __u16, buf_index); + BUILD_BUG_COMPAT_SQE_ELEM(40, __u16, buf_group); + BUILD_BUG_COMPAT_SQE_ELEM(42, __u16, personality); + BUILD_BUG_COMPAT_SQE_ELEM(44, __s32, splice_fd_in); + BUILD_BUG_COMPAT_SQE_ELEM(44, __u32, file_index); + BUILD_BUG_COMPAT_SQE_ELEM(44, __u16, addr_len); + BUILD_BUG_COMPAT_SQE_ELEM(46, __u16, __pad3[0]); + BUILD_BUG_COMPAT_SQE_ELEM(48, __u64, addr3); + BUILD_BUG_COMPAT_SQE_ELEM_SIZE(48, 0, cmd); + BUILD_BUG_COMPAT_SQE_ELEM(56, __u64, __pad2); + + BUILD_BUG_ON(sizeof(struct compat_io_uring_files_update) != + sizeof(struct compat_io_uring_rsrc_update)); + BUILD_BUG_ON(sizeof(struct compat_io_uring_rsrc_update) > + sizeof(struct compat_io_uring_rsrc_update2)); + + BUILD_BUG_ON(offsetof(struct compat_io_uring_buf_ring, bufs) != 0); + BUILD_BUG_ON(offsetof(struct compat_io_uring_buf, resv) != + offsetof(struct compat_io_uring_buf_ring, tail)); +#endif /* CONFIG_COMPAT64 */ + /* should fit into one byte */ BUILD_BUG_ON(SQE_VALID_FLAGS >= (1 << 8)); BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 6d9720dd8f469..1c97583fa281a 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -5,6 +5,7 @@ #include <linux/lockdep.h> #include <linux/io_uring_types.h> #include "io-wq.h" +#include "uring_cmd.h" #include "slist.h" #include "filetable.h"
@@ -24,6 +25,8 @@ enum { IOU_STOP_MULTISHOT = -ECANCELED, };
+ +struct io_uring_cqe *__io_get_ith_cqe(struct io_ring_ctx *ctx, unsigned int i); struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow); bool io_req_cqe_overflow(struct io_kiocb *req); int io_run_task_work_sig(struct io_ring_ctx *ctx); @@ -93,11 +96,61 @@ static inline void io_cq_lock(struct io_ring_ctx *ctx)
void io_cq_unlock_post(struct io_ring_ctx *ctx);
+static inline bool io_in_compat64(struct io_ring_ctx *ctx) +{ + return IS_ENABLED(CONFIG_COMPAT64) && ctx->compat; +} + +static inline void convert_compat64_io_uring_sqe(struct io_ring_ctx *ctx, + struct io_uring_sqe *sqe, + const struct compat_io_uring_sqe *compat_sqe) +{ +/* + * The struct io_uring_sqe contains anonymous unions and there is no field + * keeping track of which union's member is active. Because in all the cases, + * the unions are between integral types and the types are compatible, use the + * largest member of each union to perform the copy. Use this compile-time check + * to ensure that the union's members are not truncated during the conversion. + */ +#define BUILD_BUG_COMPAT_SQE_UNION_ELEM(elem1, elem2) \ + BUILD_BUG_ON(sizeof_field(struct compat_io_uring_sqe, elem1) != \ + (offsetof(struct compat_io_uring_sqe, elem2) - \ + offsetof(struct compat_io_uring_sqe, elem1))) + + sqe->opcode = READ_ONCE(compat_sqe->opcode); + sqe->flags = READ_ONCE(compat_sqe->flags); + sqe->ioprio = READ_ONCE(compat_sqe->ioprio); + sqe->fd = READ_ONCE(compat_sqe->fd); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr2, addr); + sqe->addr2 = READ_ONCE(compat_sqe->addr2); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr, len); + sqe->addr = READ_ONCE(compat_sqe->addr); + sqe->len = READ_ONCE(compat_sqe->len); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(rw_flags, user_data); + sqe->rw_flags = READ_ONCE(compat_sqe->rw_flags); + sqe->user_data = READ_ONCE(compat_sqe->user_data); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(buf_index, personality); + sqe->buf_index = READ_ONCE(compat_sqe->buf_index); + sqe->personality = READ_ONCE(compat_sqe->personality); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(splice_fd_in, addr3); + sqe->splice_fd_in = READ_ONCE(compat_sqe->splice_fd_in); + if (sqe->opcode == IORING_OP_URING_CMD) { + size_t compat_cmd_size = compat_uring_cmd_pdu_size(ctx->flags & + IORING_SETUP_SQE128); + + memcpy(sqe->cmd, compat_sqe->cmd, compat_cmd_size); + } else { + sqe->addr3 = READ_ONCE(compat_sqe->addr3); + sqe->__pad2[0] = READ_ONCE(compat_sqe->__pad2[0]); + } +#undef BUILD_BUG_COMPAT_SQE_UNION_ELEM +} + static inline struct io_uring_cqe *io_get_cqe_overflow(struct io_ring_ctx *ctx, bool overflow) { if (likely(ctx->cqe_cached < ctx->cqe_sentinel)) { - struct io_uring_cqe *cqe = &ctx->cqes[ctx->cqe_cached]; + struct io_uring_cqe *cqe = __io_get_ith_cqe(ctx, ctx->cqe_cached);
ctx->cached_cq_tail++; ctx->cqe_cached++; @@ -114,10 +167,40 @@ static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) return io_get_cqe_overflow(ctx, false); }
+static inline void __io_fill_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe *cqe, + u64 user_data, s32 res, u32 cflags, + u64 extra1, u64 extra2) +{ + if (io_in_compat64(ctx)) { + struct compat_io_uring_cqe *compat_cqe = (struct compat_io_uring_cqe *)cqe; + + WRITE_ONCE(compat_cqe->user_data, user_data); + WRITE_ONCE(compat_cqe->res, res); + WRITE_ONCE(compat_cqe->flags, cflags); + + if (ctx->flags & IORING_SETUP_CQE32) { + WRITE_ONCE(compat_cqe->big_cqe[0], extra1); + WRITE_ONCE(compat_cqe->big_cqe[1], extra2); + } + return; + } + + WRITE_ONCE(cqe->user_data, user_data); + WRITE_ONCE(cqe->res, res); + WRITE_ONCE(cqe->flags, cflags); + + if (ctx->flags & IORING_SETUP_CQE32) { + WRITE_ONCE(cqe->big_cqe[0], extra1); + WRITE_ONCE(cqe->big_cqe[1], extra2); + } +} + static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx, struct io_kiocb *req) { struct io_uring_cqe *cqe; + u64 extra1 = 0; + u64 extra2 = 0;
/* * If we can't get a cq entry, userspace overflowed the @@ -128,24 +211,17 @@ static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx, if (unlikely(!cqe)) return io_req_cqe_overflow(req);
+ if (ctx->flags & IORING_SETUP_CQE32 && req->flags & REQ_F_CQE32_INIT) { + extra1 = req->extra1; + extra2 = req->extra2; + } + trace_io_uring_complete(req->ctx, req, req->cqe.user_data, req->cqe.res, req->cqe.flags, - (req->flags & REQ_F_CQE32_INIT) ? req->extra1 : 0, - (req->flags & REQ_F_CQE32_INIT) ? req->extra2 : 0); + extra1, extra2);
- memcpy(cqe, &req->cqe, sizeof(*cqe)); - - if (ctx->flags & IORING_SETUP_CQE32) { - u64 extra1 = 0, extra2 = 0; - - if (req->flags & REQ_F_CQE32_INIT) { - extra1 = req->extra1; - extra2 = req->extra2; - } - - WRITE_ONCE(cqe->big_cqe[0], extra1); - WRITE_ONCE(cqe->big_cqe[1], extra2); - } + __io_fill_cqe(ctx, cqe, req->cqe.user_data, req->cqe.res, + req->cqe.flags, extra1, extra2); return true; }
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 182e594b56c6e..110edd1cb84e0 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -16,6 +16,7 @@ #include "kbuf.h"
#define IO_BUFFER_LIST_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct io_uring_buf)) +#define IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct compat_io_uring_buf))
#define BGID_ARRAY 64
@@ -28,6 +29,32 @@ struct io_provide_buf { __u16 bid; };
+static int get_compat64_io_uring_buf_reg(struct io_uring_buf_reg *reg, + const void __user *user_reg) +{ + struct compat_io_uring_buf_reg compat_reg; + + if (copy_from_user(&compat_reg, user_reg, sizeof(compat_reg))) + return -EFAULT; + reg->ring_addr = compat_reg.ring_addr; + reg->ring_entries = compat_reg.ring_entries; + reg->bgid = compat_reg.bgid; + reg->pad = compat_reg.pad; + memcpy(reg->resv, compat_reg.resv, sizeof(reg->resv)); + return 0; +} + +static int copy_io_uring_buf_reg_from_user(struct io_ring_ctx *ctx, + struct io_uring_buf_reg *reg, + const void __user *arg) +{ + if (io_in_compat64(ctx)) + return get_compat64_io_uring_buf_reg(reg, arg); + if (copy_from_user(reg, arg, sizeof(*reg))) + return -EFAULT; + return 0; +} + static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, unsigned int bgid) { @@ -125,6 +152,35 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len, return NULL; }
+static void __user *io_ring_buffer_select_compat64(struct io_kiocb *req, size_t *len, + struct io_buffer_list *bl, + unsigned int issue_flags) +{ + struct compat_io_uring_buf_ring *br = bl->buf_ring_compat; + struct compat_io_uring_buf *buf; + __u16 head = bl->head; + + if (unlikely(smp_load_acquire(&br->tail) == head)) + return NULL; + + head &= bl->mask; + if (head < IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE) { + buf = &br->bufs[head]; + } else { + int off = head & (IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE - 1); + int index = head / IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE; + buf = page_address(bl->buf_pages[index]); + buf += off; + } + if (*len == 0 || *len > buf->len) + *len = buf->len; + req->flags |= REQ_F_BUFFER_RING; + req->buf_list = bl; + req->buf_index = buf->bid; + + return compat_ptr(buf->addr); +} + static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, struct io_buffer_list *bl, unsigned int issue_flags) @@ -151,6 +207,23 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, req->buf_list = bl; req->buf_index = buf->bid;
+ return u64_to_user_ptr(buf->addr); +} + +static void __user *io_ring_buffer_select_any(struct io_kiocb *req, size_t *len, + struct io_buffer_list *bl, + unsigned int issue_flags) +{ + void __user *ret; + + if (io_in_compat64(req->ctx)) + ret = io_ring_buffer_select_compat64(req, len, bl, issue_flags); + else + ret = io_ring_buffer_select(req, len, bl, issue_flags); + + if (!ret) + return ret; + if (issue_flags & IO_URING_F_UNLOCKED || !file_can_poll(req->file)) { /* * If we came in unlocked, we have no choice but to consume the @@ -165,7 +238,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, req->buf_list = NULL; bl->head++; } - return u64_to_user_ptr(buf->addr); + return ret; }
void __user *io_buffer_select(struct io_kiocb *req, size_t *len, @@ -180,7 +253,7 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, bl = io_buffer_get_list(ctx, req->buf_index); if (likely(bl)) { if (bl->buf_nr_pages) - ret = io_ring_buffer_select(req, len, bl, issue_flags); + ret = io_ring_buffer_select_any(req, len, bl, issue_flags); else ret = io_provided_buffer_select(req, len, bl); } @@ -215,9 +288,12 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, return 0;
if (bl->buf_nr_pages) { + __u16 tail = io_in_compat64(ctx) ? + bl->buf_ring_compat->tail : + bl->buf_ring->tail; int j;
- i = bl->buf_ring->tail - bl->head; + i = tail - bl->head; for (j = 0; j < bl->buf_nr_pages; j++) unpin_user_page(bl->buf_pages[j]); kvfree(bl->buf_pages); @@ -469,13 +545,13 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) { - struct io_uring_buf_ring *br; struct io_uring_buf_reg reg; struct io_buffer_list *bl, *free_bl = NULL; struct page **pages; + size_t pages_size; int nr_pages;
- if (copy_from_user(®, arg, sizeof(reg))) + if (copy_io_uring_buf_reg_from_user(ctx, ®, arg)) return -EFAULT;
if (reg.pad || reg.resv[0] || reg.resv[1] || reg.resv[2]) @@ -508,19 +584,19 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) return -ENOMEM; }
- pages = io_pin_pages(reg.ring_addr, - size_mul(sizeof(struct io_uring_buf), reg.ring_entries), - &nr_pages); + pages_size = io_in_compat64(ctx) ? + size_mul(sizeof(struct compat_io_uring_buf), reg.ring_entries) : + size_mul(sizeof(struct io_uring_buf), reg.ring_entries); + pages = io_pin_pages(reg.ring_addr, pages_size, &nr_pages); if (IS_ERR(pages)) { kfree(free_bl); return PTR_ERR(pages); }
- br = page_address(pages[0]); bl->buf_pages = pages; bl->buf_nr_pages = nr_pages; bl->nr_entries = reg.ring_entries; - bl->buf_ring = br; + bl->buf_ring = page_address(pages[0]); bl->mask = reg.ring_entries - 1; io_buffer_add_list(ctx, bl, reg.bgid); return 0; @@ -531,7 +607,7 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) struct io_uring_buf_reg reg; struct io_buffer_list *bl;
- if (copy_from_user(®, arg, sizeof(reg))) + if (copy_io_uring_buf_reg_from_user(ctx, ®, arg)) return -EFAULT; if (reg.pad || reg.resv[0] || reg.resv[1] || reg.resv[2]) return -EINVAL; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index c23e15d7d3caf..1aa5bbbc5d628 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -2,6 +2,7 @@ #ifndef IOU_KBUF_H #define IOU_KBUF_H
+#include <linux/io_uring_types.h> #include <uapi/linux/io_uring.h>
struct io_buffer_list { @@ -13,7 +14,10 @@ struct io_buffer_list { struct list_head buf_list; struct { struct page **buf_pages; - struct io_uring_buf_ring *buf_ring; + union { + struct io_uring_buf_ring *buf_ring; + struct compat_io_uring_buf_ring *buf_ring_compat; + }; }; }; __u16 bgid; diff --git a/io_uring/net.c b/io_uring/net.c index c586278858e7e..4c133bc6f9d1d 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -4,6 +4,7 @@ #include <linux/file.h> #include <linux/slab.h> #include <linux/net.h> +#include <linux/uio.h> #include <linux/compat.h> #include <net/compat.h> #include <linux/io_uring.h> @@ -435,7 +436,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req, } else if (msg.msg_iovlen > 1) { return -EINVAL; } else { - if (copy_from_user(iomsg->fast_iov, msg.msg_iov, sizeof(*msg.msg_iov))) + void *iov = iovec_from_user(msg.msg_iov, 1, 1, iomsg->fast_iov, + req->ctx->compat); + if (IS_ERR(iov)) return -EFAULT; sr->len = iomsg->fast_iov[0].iov_len; iomsg->free_iov = NULL; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 41e192de9e8a7..8a2b5891f1030 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -23,6 +23,95 @@ struct io_rsrc_update { u32 offset; };
+static int get_compat64_io_uring_rsrc_update(struct io_uring_rsrc_update2 *up2, + const void __user *user_up) +{ + struct compat_io_uring_rsrc_update compat_up; + + if (copy_from_user(&compat_up, user_up, sizeof(compat_up))) + return -EFAULT; + up2->offset = compat_up.offset; + up2->resv = compat_up.resv; + up2->data = compat_up.data; + return 0; +} + +static int get_compat64_io_uring_rsrc_update2(struct io_uring_rsrc_update2 *up2, + const void __user *user_up2) +{ + struct compat_io_uring_rsrc_update2 compat_up2; + + if (copy_from_user(&compat_up2, user_up2, sizeof(compat_up2))) + return -EFAULT; + up2->offset = compat_up2.offset; + up2->resv = compat_up2.resv; + up2->data = compat_up2.data; + up2->tags = compat_up2.tags; + up2->nr = compat_up2.nr; + up2->resv2 = compat_up2.resv2; + return 0; +} + +static int get_compat64_io_uring_rsrc_register(struct io_uring_rsrc_register *rr, + const void __user *user_rr) +{ + struct compat_io_uring_rsrc_register compat_rr; + + if (copy_from_user(&compat_rr, user_rr, sizeof(compat_rr))) + return -EFAULT; + rr->nr = compat_rr.nr; + rr->flags = compat_rr.flags; + rr->resv2 = compat_rr.resv2; + rr->data = compat_rr.data; + rr->tags = compat_rr.tags; + return 0; +} + +static int copy_io_uring_rsrc_update_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_update2 *up2, + const void __user *arg) +{ + if (io_in_compat64(ctx)) + return get_compat64_io_uring_rsrc_update(up2, arg); + if (copy_from_user(up2, arg, sizeof(struct io_uring_rsrc_update))) + return -EFAULT; + return 0; +} + +static int copy_io_uring_rsrc_update2_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_update2 *up2, + const void __user *arg, + size_t size) +{ + if (io_in_compat64(ctx)) { + if (size != sizeof(struct compat_io_uring_rsrc_update2)) + return -EINVAL; + return get_compat64_io_uring_rsrc_update2(up2, arg); + } + if (size != sizeof(*up2)) + return -EINVAL; + if (copy_from_user(up2, arg, sizeof(*up2))) + return -EFAULT; + return 0; +} + +static int copy_io_uring_rsrc_register_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_register *rr, + const void __user *arg, + size_t size) +{ + if (io_in_compat64(ctx)) { + if (size != sizeof(struct compat_io_uring_rsrc_register)) + return -EINVAL; + return get_compat64_io_uring_rsrc_register(rr, arg); + } + if (size != sizeof(*rr)) + return -EINVAL; + if (copy_from_user(rr, arg, size)) + return -EFAULT; + return 0; +} + static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, struct io_mapped_ubuf **pimu, struct page **last_hpage); @@ -601,7 +690,7 @@ int io_register_files_update(struct io_ring_ctx *ctx, void __user *arg, if (!nr_args) return -EINVAL; memset(&up, 0, sizeof(up)); - if (copy_from_user(&up, arg, sizeof(struct io_uring_rsrc_update))) + if (copy_io_uring_rsrc_update_from_user(ctx, &up, arg)) return -EFAULT; if (up.resv || up.resv2) return -EINVAL; @@ -612,11 +701,11 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg, unsigned size, unsigned type) { struct io_uring_rsrc_update2 up; + int ret;
- if (size != sizeof(up)) - return -EINVAL; - if (copy_from_user(&up, arg, sizeof(up))) - return -EFAULT; + ret = copy_io_uring_rsrc_update2_from_user(ctx, &up, arg, size); + if (ret) + return ret; if (!up.nr || up.resv || up.resv2) return -EINVAL; return __io_register_rsrc_update(ctx, type, &up, up.nr); @@ -626,14 +715,11 @@ __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, unsigned int size, unsigned int type) { struct io_uring_rsrc_register rr; + int ret;
- /* keep it extendible */ - if (size != sizeof(rr)) - return -EINVAL; - - memset(&rr, 0, sizeof(rr)); - if (copy_from_user(&rr, arg, size)) - return -EFAULT; + ret = copy_io_uring_rsrc_register_from_user(ctx, &rr, arg, size); + if (ret) + return ret; if (!rr.nr || rr.resv2) return -EINVAL; if (rr.flags & ~IORING_RSRC_REGISTER_SPARSE) diff --git a/io_uring/tctx.c b/io_uring/tctx.c index 96f77450cf4e2..20d045b0dd831 100644 --- a/io_uring/tctx.c +++ b/io_uring/tctx.c @@ -12,6 +12,30 @@ #include "io_uring.h" #include "tctx.h"
+static int get_compat64_io_uring_rsrc_update(struct io_uring_rsrc_update *up, + const void __user *user_up) +{ + struct compat_io_uring_rsrc_update compat_up; + + if (copy_from_user(&compat_up, user_up, sizeof(compat_up))) + return -EFAULT; + up->offset = compat_up.offset; + up->resv = compat_up.resv; + up->data = compat_up.data; + return 0; +} + +static int copy_io_uring_rsrc_update_ringfd_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_update *up, + const void __user *arg) +{ + if (io_in_compat64(ctx)) + return get_compat64_io_uring_rsrc_update(up, arg); + if (copy_from_user(up, arg, sizeof(struct io_uring_rsrc_update))) + return -EFAULT; + return 0; +} + static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx, struct task_struct *task) { @@ -233,6 +257,15 @@ static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd, return -EBUSY; }
+static void __user *get_ith_io_uring_rsrc_update(struct io_ring_ctx *ctx, + void __user *__arg, + int i) +{ + if (io_in_compat64(ctx)) + return &((struct compat_io_uring_rsrc_update __user *)__arg)[i]; + return &((struct io_uring_rsrc_update __user *)__arg)[i]; +} + /* * Register a ring fd to avoid fdget/fdput for each io_uring_enter() * invocation. User passes in an array of struct io_uring_rsrc_update @@ -244,8 +277,6 @@ static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd, int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, unsigned nr_args) { - struct io_uring_rsrc_update __user *arg = __arg; - struct io_uring_rsrc_update reg; struct io_uring_task *tctx; int ret, i;
@@ -260,9 +291,14 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
tctx = current->io_uring; for (i = 0; i < nr_args; i++) { + void __user *arg; + __u32 __user *arg_offset; + struct io_uring_rsrc_update reg; int start, end;
- if (copy_from_user(®, &arg[i], sizeof(reg))) { + arg = get_ith_io_uring_rsrc_update(ctx, __arg, i); + + if (copy_io_uring_rsrc_update_ringfd_from_user(ctx, ®, arg)) { ret = -EFAULT; break; } @@ -289,7 +325,10 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, break;
reg.offset = ret; - if (put_user(reg.offset, &arg[i].offset)) { + arg_offset = io_in_compat64(ctx) ? + &((struct compat_io_uring_rsrc_update __user *)arg)->offset : + &((struct io_uring_rsrc_update __user *)arg)->offset; + if (put_user(reg.offset, arg_offset)) { fput(tctx->registered_rings[reg.offset]); tctx->registered_rings[reg.offset] = NULL; ret = -EFAULT; @@ -303,9 +342,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg, unsigned nr_args) { - struct io_uring_rsrc_update __user *arg = __arg; struct io_uring_task *tctx = current->io_uring; - struct io_uring_rsrc_update reg; int ret = 0, i;
if (!nr_args || nr_args > IO_RINGFD_REG_MAX) @@ -314,7 +351,12 @@ int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg, return 0;
for (i = 0; i < nr_args; i++) { - if (copy_from_user(®, &arg[i], sizeof(reg))) { + void __user *arg; + struct io_uring_rsrc_update reg; + + arg = get_ith_io_uring_rsrc_update(ctx, __arg, i); + + if (copy_io_uring_rsrc_update_ringfd_from_user(ctx, ®, arg)) { ret = -EFAULT; break; } diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h index 7c6697d13cb2e..96d8686db8342 100644 --- a/io_uring/uring_cmd.h +++ b/io_uring/uring_cmd.h @@ -11,3 +11,7 @@ int io_uring_cmd_prep_async(struct io_kiocb *req); #define uring_cmd_pdu_size(is_sqe128) \ ((1 + !!(is_sqe128)) * sizeof(struct io_uring_sqe) - \ offsetof(struct io_uring_sqe, cmd)) + +#define compat_uring_cmd_pdu_size(is_sqe128) \ + ((1 + !!(is_sqe128)) * sizeof(struct compat_io_uring_sqe) - \ + offsetof(struct compat_io_uring_sqe, cmd))
On 29/03/2023 17:11, Tudor Cretu wrote:
Introduce compat versions of the structs exposed in the uAPI headers that might contain pointers as a member. Also, implement functions that convert the compat versions to the native versions of the struct.
A subsequent patch is going to change the io_uring structs to enable them to support new architectures. On such architectures, the current struct layout still needs to be supported for compat tasks.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
include/linux/io_uring_compat.h | 129 ++++++++++++++++++ include/linux/io_uring_types.h | 11 +- io_uring/cancel.c | 28 +++- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 81 ++++++----- io_uring/io_uring.c | 229 ++++++++++++++++++++++---------- io_uring/io_uring.h | 108 ++++++++++++--- io_uring/kbuf.c | 98 ++++++++++++-- io_uring/kbuf.h | 6 +- io_uring/net.c | 5 +- io_uring/rsrc.c | 110 +++++++++++++-- io_uring/tctx.c | 56 +++++++- io_uring/uring_cmd.h | 4 + 13 files changed, 716 insertions(+), 151 deletions(-) create mode 100644 include/linux/io_uring_compat.h
diff --git a/include/linux/io_uring_compat.h b/include/linux/io_uring_compat.h new file mode 100644 index 0000000000000..3e91babe2e2ba --- /dev/null +++ b/include/linux/io_uring_compat.h @@ -0,0 +1,129 @@
I have no idea why include/linux/io_uring.h has no license header, but in any case we should include one. The most common in recently added files in include/linux seems to be:
/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef IO_URING_COMPAT_H +#define IO_URING_COMPAT_H
[...]
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index bc8c9d764bc13..fd02317627ae7 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -48,6 +48,38 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, return 0; } +#define print_sqe(m, sqe, sq_idx, sq_shift) \
do { \
Nit: could reduce the indentation level by one (starting from one tab is enough).
seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, " \
"addr:0x%llx, rw_flags:0x%x, buf_index:%d " \
"user_data:%llu", \
sq_idx, io_uring_get_opcode((sqe)->opcode), (sqe)->fd, \
Missing parentheses around sq_idx, also around cq_idx below.
(sqe)->flags, (unsigned long long) (sqe)->off, \
(unsigned long long) (sqe)->addr, (sqe)->rw_flags, \
(sqe)->buf_index, (sqe)->user_data); \
if (sq_shift) { \
u64 *sqeb = (void *) ((sqe) + 1); \
int size = sizeof(*(sqe)) / sizeof(u64); \
int j; \
\
for (j = 0; j < size; j++) { \
seq_printf(m, ", e%d:0x%llx", j, \
(unsigned long long) *sqeb); \
Nit:Â that line should be indented further.
sqeb++; \
} \
} \
} while (0)
+#define print_cqe(m, cqe, cq_idx, cq_shift) \
do { \
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", \
cq_idx, (cqe)->user_data, (cqe)->res, \
(cqe)->flags); \
if (cq_shift) \
seq_printf(m, ", extra1:%llu, extra2:%llu\n", \
(cqe)->big_cqe[0], (cqe)->big_cqe[1]); \
} while (0)
static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) { @@ -88,45 +120,32 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, sq_entries = min(sq_tail - sq_head, ctx->sq_entries); for (i = 0; i < sq_entries; i++) { unsigned int entry = i + sq_head;
struct io_uring_sqe *sqe;
unsigned int sq_idx;
unsigned int sq_idx, sq_off;
sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]); if (sq_idx > sq_mask) continue;
sqe = &ctx->sq_sqes[sq_idx << sq_shift];
seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, "
"addr:0x%llx, rw_flags:0x%x, buf_index:%d "
"user_data:%llu",
sq_idx, io_uring_get_opcode(sqe->opcode), sqe->fd,
sqe->flags, (unsigned long long) sqe->off,
(unsigned long long) sqe->addr, sqe->rw_flags,
sqe->buf_index, sqe->user_data);
if (sq_shift) {
u64 *sqeb = (void *) (sqe + 1);
int size = sizeof(struct io_uring_sqe) / sizeof(u64);
int j;
for (j = 0; j < size; j++) {
seq_printf(m, ", e%d:0x%llx", j,
(unsigned long long) *sqeb);
sqeb++;
}
}
sq_off = sq_idx << sq_shift;
if (io_in_compat64(ctx))
print_sqe(m, &ctx->sq_sqes_compat[sq_off], sq_idx, sq_shift);
else
print_sqe(m, &ctx->sq_sqes[sq_off], sq_idx, sq_shift);
- seq_printf(m, "\n"); } seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head); cq_entries = min(cq_tail - cq_head, ctx->cq_entries); for (i = 0; i < cq_entries; i++) { unsigned int entry = i + cq_head;
struct io_uring_cqe *cqe = &ctx->cqes[(entry & cq_mask) << cq_shift];
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x",
entry & cq_mask, cqe->user_data, cqe->res,
cqe->flags);
if (cq_shift)
seq_printf(m, ", extra1:%llu, extra2:%llu\n",
cqe->big_cqe[0], cqe->big_cqe[1]);
unsigned int cq_idx = entry & cq_mask;
unsigned int cq_off = cq_idx << cq_shift;
if (io_in_compat64(ctx))
print_cqe(m, &ctx->cqes_compat[cq_off], cq_idx, cq_shift);
else
print_cqe(m, &ctx->cqes[cq_off], cq_idx, cq_shift);
- seq_printf(m, "\n"); }
@@ -192,12 +211,14 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, seq_printf(m, " user_data=%llu, res=%d, flags=%x\n", cqe->user_data, cqe->res, cqe->flags);
Looks like another leftover.
} spin_unlock(&ctx->completion_lock); } +#undef print_sqe +#undef print_cqe
I think it's OK to leave the macros defined in the rest of the file - they are defined outside of any function and they could in principle be used by another function.
__cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) { struct io_ring_ctx *ctx = f->private_data; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index fb6d07e1e7358..a355f2a2e7ac3 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -152,6 +152,37 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx); static struct kmem_cache *req_cachep; +static int get_compat64_io_uring_getevents_arg(struct io_uring_getevents_arg *arg,
const void __user *user_arg)
+{
- struct compat_io_uring_getevents_arg compat_arg;
- if (copy_from_user(&compat_arg, user_arg, sizeof(compat_arg)))
return -EFAULT;
- arg->sigmask = compat_arg.sigmask;
- arg->sigmask_sz = compat_arg.sigmask_sz;
- arg->pad = compat_arg.pad;
- arg->ts = compat_arg.ts;
- return 0;
+}
+static int copy_io_uring_getevents_arg_from_user(struct io_ring_ctx *ctx,
struct io_uring_getevents_arg *arg,
const void __user *argp,
size_t size)
+{
- if (io_in_compat64(ctx)) {
if (size != sizeof(struct compat_io_uring_getevents_arg))
return -EINVAL;
return get_compat64_io_uring_getevents_arg(arg, argp);
- }
- if (size != sizeof(*arg))
return -EINVAL;
- if (copy_from_user(arg, argp, sizeof(*arg)))
return -EFAULT;
- return 0;
+}
struct sock *io_uring_get_socket(struct file *file) { #if defined(CONFIG_UNIX) @@ -604,14 +635,10 @@ void io_cq_unlock_post(struct io_ring_ctx *ctx) static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { bool all_flushed;
- size_t cqe_size = sizeof(struct io_uring_cqe);
if (!force && __io_cqring_events(ctx) == ctx->cq_entries) return false;
- if (ctx->flags & IORING_SETUP_CQE32)
cqe_size <<= 1;
- io_cq_lock(ctx); while (!list_empty(&ctx->cq_overflow_list)) { struct io_uring_cqe *cqe = io_get_cqe_overflow(ctx, true);
@@ -621,9 +648,18 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) break; ocqe = list_first_entry(&ctx->cq_overflow_list, struct io_overflow_cqe, list);
if (cqe)
memcpy(cqe, &ocqe->cqe, cqe_size);
else
if (cqe) {
u64 extra1 = 0;
u64 extra2 = 0;
if (ctx->flags & IORING_SETUP_CQE32) {
extra1 = ocqe->cqe.big_cqe[0];
extra2 = ocqe->cqe.big_cqe[1];
}
__io_fill_cqe(ctx, cqe, ocqe->cqe.user_data, ocqe->cqe.res,
ocqe->cqe.flags, extra1, extra2);
} else io_account_cq_overflow(ctx);
Nit: if the "if" branch uses braces, so should the "else" branch too (see the coding style). This is not always respected in practice, but I think it is more readable (and less error-prone).
list_del(&ocqe->list); @@ -735,6 +771,15 @@ bool io_req_cqe_overflow(struct io_kiocb *req) req->cqe.res, req->cqe.flags, req->extra1, req->extra2); } +/*
- Retrieves the pointer of the ith CQE
Nit: "a pointer to..."
- */
+struct io_uring_cqe *__io_get_ith_cqe(struct io_ring_ctx *ctx, unsigned int i)
Very minor: would make a bit more sense to add it *after* __io_get_cqe() (in the header too), since the latter is the more usual interface.
+{
- return io_in_compat64(ctx) ?
(struct io_uring_cqe *)&ctx->cqes_compat[i] :
&ctx->cqes[i];
+} /*
- writes to the cq entry need to come after reading head; the
@@ -774,7 +819,7 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) ctx->cqe_cached++; if (ctx->flags & IORING_SETUP_CQE32) ctx->cqe_cached++;
- return &ctx->cqes[off];
- return __io_get_ith_cqe(ctx, off);
} bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, @@ -793,14 +838,7 @@ bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags if (likely(cqe)) { trace_io_uring_complete(ctx, NULL, user_data, res, cflags, 0, 0);
WRITE_ONCE(cqe->user_data, user_data);
WRITE_ONCE(cqe->res, res);
WRITE_ONCE(cqe->flags, cflags);
if (ctx->flags & IORING_SETUP_CQE32) {
WRITE_ONCE(cqe->big_cqe[0], 0);
WRITE_ONCE(cqe->big_cqe[1], 0);
}
return true; }__io_fill_cqe(ctx, cqe, user_data, res, cflags, 0, 0);
@@ -2240,7 +2278,9 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) /* double index for 128-byte SQEs, twice as long */ if (ctx->flags & IORING_SETUP_SQE128) head <<= 1;
return &ctx->sq_sqes[head];
return io_in_compat64(ctx) ?
(struct io_uring_sqe *)&ctx->sq_sqes_compat[head] :
}&ctx->sq_sqes[head];
/* drop invalid entries */ @@ -2267,6 +2307,7 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) do { const struct io_uring_sqe *sqe; struct io_kiocb *req;
struct io_uring_sqe native_sqe[2];
if (unlikely(!io_alloc_req_refill(ctx))) break; @@ -2276,6 +2317,11 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) io_req_add_to_cache(req, ctx); break; }
if (io_in_compat64(ctx)) {
convert_compat64_io_uring_sqe(ctx, native_sqe,
(struct compat_io_uring_sqe *)sqe);
sqe = native_sqe;
}
/* * Continue submitting even for sqe failure if the @@ -2480,6 +2526,9 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries { struct io_rings *rings; size_t off, cq_array_size, sq_array_size;
- size_t cqe_size = io_in_compat64(ctx) ?
sizeof(struct compat_io_uring_cqe) :
sizeof(struct io_uring_cqe);
off = sizeof(*rings); @@ -2492,7 +2541,7 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries if (cq_offset) *cq_offset = off;
- cq_array_size = array_size(sizeof(struct io_uring_cqe), cq_entries);
- cq_array_size = array_size(cqe_size, cq_entries); if (cq_array_size == SIZE_MAX) return SIZE_MAX;
@@ -3120,20 +3169,19 @@ static unsigned long io_uring_nommu_get_unmapped_area(struct file *file, #endif /* !CONFIG_MMU */ -static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz) +static int io_validate_ext_arg(struct io_ring_ctx *ctx, unsigned int flags,
const void __user *argp, size_t argsz)
{ if (flags & IORING_ENTER_EXT_ARG) { struct io_uring_getevents_arg arg;
if (argsz != sizeof(arg))
return -EINVAL;
if (copy_from_user(&arg, argp, sizeof(arg)))
return -EFAULT;
} return 0;return copy_io_uring_getevents_arg_from_user(ctx, &arg, argp, argsz);
} -static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz, +static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned int flags,
- const void __user *argp, size_t *argsz,
Nit: should be aligned on the opening parenthesis.
Kevin
On 04-04-2023 08:05, Kevin Brodsky wrote:
On 29/03/2023 17:11, Tudor Cretu wrote:
Introduce compat versions of the structs exposed in the uAPI headers that might contain pointers as a member. Also, implement functions that convert the compat versions to the native versions of the struct.
A subsequent patch is going to change the io_uring structs to enable them to support new architectures. On such architectures, the current struct layout still needs to be supported for compat tasks.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com
include/linux/io_uring_compat.h | 129 ++++++++++++++++++ include/linux/io_uring_types.h | 11 +- io_uring/cancel.c | 28 +++- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 81 ++++++----- io_uring/io_uring.c | 229 ++++++++++++++++++++++---------- io_uring/io_uring.h | 108 ++++++++++++--- io_uring/kbuf.c | 98 ++++++++++++-- io_uring/kbuf.h | 6 +- io_uring/net.c | 5 +- io_uring/rsrc.c | 110 +++++++++++++-- io_uring/tctx.c | 56 +++++++- io_uring/uring_cmd.h | 4 + 13 files changed, 716 insertions(+), 151 deletions(-) create mode 100644 include/linux/io_uring_compat.h
diff --git a/include/linux/io_uring_compat.h b/include/linux/io_uring_compat.h new file mode 100644 index 0000000000000..3e91babe2e2ba --- /dev/null +++ b/include/linux/io_uring_compat.h @@ -0,0 +1,129 @@
I have no idea why include/linux/io_uring.h has no license header, but in any case we should include one. The most common in recently added files in include/linux seems to be:
/* SPDX-License-Identifier: GPL-2.0-only */
Thank you!
+#ifndef IO_URING_COMPAT_H +#define IO_URING_COMPAT_H
[...]
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index bc8c9d764bc13..fd02317627ae7 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -48,6 +48,38 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, return 0; } +#define print_sqe(m, sqe, sq_idx, sq_shift) \
do { \
Nit: could reduce the indentation level by one (starting from one tab is enough).
Done!
seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, " \
"addr:0x%llx, rw_flags:0x%x, buf_index:%d " \
"user_data:%llu", \
sq_idx, io_uring_get_opcode((sqe)->opcode), (sqe)->fd, \
Missing parentheses around sq_idx, also around cq_idx below.
Done!
(sqe)->flags, (unsigned long long) (sqe)->off, \
(unsigned long long) (sqe)->addr, (sqe)->rw_flags, \
(sqe)->buf_index, (sqe)->user_data); \
if (sq_shift) { \
u64 *sqeb = (void *) ((sqe) + 1); \
int size = sizeof(*(sqe)) / sizeof(u64); \
int j; \
\
for (j = 0; j < size; j++) { \
seq_printf(m, ", e%d:0x%llx", j, \
(unsigned long long) *sqeb); \
Nit:Â that line should be indented further.
Done!
sqeb++; \
} \
} \
} while (0)
+#define print_cqe(m, cqe, cq_idx, cq_shift) \
do { \
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", \
cq_idx, (cqe)->user_data, (cqe)->res, \
(cqe)->flags); \
if (cq_shift) \
seq_printf(m, ", extra1:%llu, extra2:%llu\n", \
(cqe)->big_cqe[0], (cqe)->big_cqe[1]); \
} while (0)
- static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) {
@@ -88,45 +120,32 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, sq_entries = min(sq_tail - sq_head, ctx->sq_entries); for (i = 0; i < sq_entries; i++) { unsigned int entry = i + sq_head;
struct io_uring_sqe *sqe;
unsigned int sq_idx;
unsigned int sq_idx, sq_off;
sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]); if (sq_idx > sq_mask) continue;
sqe = &ctx->sq_sqes[sq_idx << sq_shift];
seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, "
"addr:0x%llx, rw_flags:0x%x, buf_index:%d "
"user_data:%llu",
sq_idx, io_uring_get_opcode(sqe->opcode), sqe->fd,
sqe->flags, (unsigned long long) sqe->off,
(unsigned long long) sqe->addr, sqe->rw_flags,
sqe->buf_index, sqe->user_data);
if (sq_shift) {
u64 *sqeb = (void *) (sqe + 1);
int size = sizeof(struct io_uring_sqe) / sizeof(u64);
int j;
for (j = 0; j < size; j++) {
seq_printf(m, ", e%d:0x%llx", j,
(unsigned long long) *sqeb);
sqeb++;
}
}
sq_off = sq_idx << sq_shift;
if (io_in_compat64(ctx))
print_sqe(m, &ctx->sq_sqes_compat[sq_off], sq_idx, sq_shift);
else
print_sqe(m, &ctx->sq_sqes[sq_off], sq_idx, sq_shift);
- seq_printf(m, "\n"); } seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head); cq_entries = min(cq_tail - cq_head, ctx->cq_entries); for (i = 0; i < cq_entries; i++) { unsigned int entry = i + cq_head;
struct io_uring_cqe *cqe = &ctx->cqes[(entry & cq_mask) << cq_shift];
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x",
entry & cq_mask, cqe->user_data, cqe->res,
cqe->flags);
if (cq_shift)
seq_printf(m, ", extra1:%llu, extra2:%llu\n",
cqe->big_cqe[0], cqe->big_cqe[1]);
unsigned int cq_idx = entry & cq_mask;
unsigned int cq_off = cq_idx << cq_shift;
if (io_in_compat64(ctx))
print_cqe(m, &ctx->cqes_compat[cq_off], cq_idx, cq_shift);
else
print_cqe(m, &ctx->cqes[cq_off], cq_idx, cq_shift);
- seq_printf(m, "\n"); }
@@ -192,12 +211,14 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, seq_printf(m, " user_data=%llu, res=%d, flags=%x\n", cqe->user_data, cqe->res, cqe->flags);
Looks like another leftover.
Done!
} spin_unlock(&ctx->completion_lock); } +#undef print_sqe +#undef print_cqe
I think it's OK to leave the macros defined in the rest of the file - they are defined outside of any function and they could in principle be used by another function.
Done!
- __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) { struct io_ring_ctx *ctx = f->private_data;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index fb6d07e1e7358..a355f2a2e7ac3 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -152,6 +152,37 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx); static struct kmem_cache *req_cachep; +static int get_compat64_io_uring_getevents_arg(struct io_uring_getevents_arg *arg,
const void __user *user_arg)
+{
- struct compat_io_uring_getevents_arg compat_arg;
- if (copy_from_user(&compat_arg, user_arg, sizeof(compat_arg)))
return -EFAULT;
- arg->sigmask = compat_arg.sigmask;
- arg->sigmask_sz = compat_arg.sigmask_sz;
- arg->pad = compat_arg.pad;
- arg->ts = compat_arg.ts;
- return 0;
+}
+static int copy_io_uring_getevents_arg_from_user(struct io_ring_ctx *ctx,
struct io_uring_getevents_arg *arg,
const void __user *argp,
size_t size)
+{
- if (io_in_compat64(ctx)) {
if (size != sizeof(struct compat_io_uring_getevents_arg))
return -EINVAL;
return get_compat64_io_uring_getevents_arg(arg, argp);
- }
- if (size != sizeof(*arg))
return -EINVAL;
- if (copy_from_user(arg, argp, sizeof(*arg)))
return -EFAULT;
- return 0;
+}
- struct sock *io_uring_get_socket(struct file *file) { #if defined(CONFIG_UNIX)
@@ -604,14 +635,10 @@ void io_cq_unlock_post(struct io_ring_ctx *ctx) static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { bool all_flushed;
- size_t cqe_size = sizeof(struct io_uring_cqe);
if (!force && __io_cqring_events(ctx) == ctx->cq_entries) return false;
- if (ctx->flags & IORING_SETUP_CQE32)
cqe_size <<= 1;
- io_cq_lock(ctx); while (!list_empty(&ctx->cq_overflow_list)) { struct io_uring_cqe *cqe = io_get_cqe_overflow(ctx, true);
@@ -621,9 +648,18 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) break; ocqe = list_first_entry(&ctx->cq_overflow_list, struct io_overflow_cqe, list);
if (cqe)
memcpy(cqe, &ocqe->cqe, cqe_size);
else
if (cqe) {
u64 extra1 = 0;
u64 extra2 = 0;
if (ctx->flags & IORING_SETUP_CQE32) {
extra1 = ocqe->cqe.big_cqe[0];
extra2 = ocqe->cqe.big_cqe[1];
}
__io_fill_cqe(ctx, cqe, ocqe->cqe.user_data, ocqe->cqe.res,
ocqe->cqe.flags, extra1, extra2);
} else io_account_cq_overflow(ctx);
Nit: if the "if" branch uses braces, so should the "else" branch too (see the coding style). This is not always respected in practice, but I think it is more readable (and less error-prone).
Thanks for pointing it out! I wasn't aware of it. Done!
list_del(&ocqe->list); @@ -735,6 +771,15 @@ bool io_req_cqe_overflow(struct io_kiocb *req) req->cqe.res, req->cqe.flags, req->extra1, req->extra2); } +/*
- Retrieves the pointer of the ith CQE
Nit: "a pointer to..."
Done!
- */
+struct io_uring_cqe *__io_get_ith_cqe(struct io_ring_ctx *ctx, unsigned int i)
Very minor: would make a bit more sense to add it *after* __io_get_cqe() (in the header too), since the latter is the more usual interface.
Done!
@@ -3120,20 +3169,19 @@ static unsigned long io_uring_nommu_get_unmapped_area(struct file *file, #endif /* !CONFIG_MMU */ -static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz) +static int io_validate_ext_arg(struct io_ring_ctx *ctx, unsigned int flags,
{ if (flags & IORING_ENTER_EXT_ARG) { struct io_uring_getevents_arg arg;const void __user *argp, size_t argsz)
if (argsz != sizeof(arg))
return -EINVAL;
if (copy_from_user(&arg, argp, sizeof(arg)))
return -EFAULT;
} return 0; }return copy_io_uring_getevents_arg_from_user(ctx, &arg, argp, argsz);
-static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz, +static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned int flags,
- const void __user *argp, size_t *argsz,
Nit: should be aligned on the opening parenthesis.
Done!
Kevin
Many thanks, Tudor
The io_uring shared memory region hosts the io_uring_sqe and io_uring_cqe arrays. These structs may contain user pointers, so the memory region must be allowed to store and load capability pointers.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- io_uring/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index a355f2a2e7ac3..a98f0013e47c1 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3138,6 +3138,11 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) if (IS_ERR(ptr)) return PTR_ERR(ptr);
+#ifdef CONFIG_CHERI_PURECAP_UABI + vma->vm_flags |= VM_READ_CAPS | VM_WRITE_CAPS; + vma_set_page_prot(vma); +#endif + pfn = virt_to_phys(ptr) >> PAGE_SHIFT; return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot); }
Some members of the io_uring uAPI structs may contain user pointers. In the PCuABI, a user pointer is a 129-bit capability, so the __u64 type is not big enough to hold it. Use the __kernel_uintptr_t type instead, which is big enough on the affected architectures while remaining 64-bit on others.
The user_data field must be passed unchanged from the submission queue to the completion queue. As it is standard practice to store a pointer in user_data, expand the field to __kernel_uintptr_t. However, the kernel doesn't dereference the user_data, so don't convert it in the compat case.
In addition, for the io_uring structs containing user pointers, use the special copy routines when copying user pointers from/to userspace.
To ensure that a full capability comparison is performed in the cases where two user data values are compared, a new helper is introduced.
In the case of operation IORING_OP_POLL_REMOVE, if IORING_POLL_UPDATE_USER_DATA is set in the SQE len field, then the request will update the user_data of an existing poll request based on the value passed in the addr2 field, instead of the off field. This is required because the off field is not large enough to fit a user_data value.
Note that the structs io_uring_sqe and io_uring_cqe are doubled in size in PCuABI. The setup flags IORING_SETUP_SQE128 and IORING_SETUP_CQE32 used to double the sizes of the two structs up to 128 bytes and 32 bytes respectively. In PCuABI, the two flags are still used to double the sizes of the two structs, but, as they increased in size, they increase up to 256 bytes and 64 bytes, respectively.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/io_uring_types.h | 4 +- include/uapi/linux/io_uring.h | 76 ++++++++++++++++++---------------- io_uring/advise.c | 7 ++-- io_uring/cancel.c | 8 ++-- io_uring/cancel.h | 2 +- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 9 ++-- io_uring/fs.c | 16 +++---- io_uring/io_uring.c | 62 +++++++++++++++++++++++---- io_uring/io_uring.h | 51 ++++++++++++++++++----- io_uring/kbuf.c | 19 +++++---- io_uring/kbuf.h | 2 +- io_uring/msg_ring.c | 4 +- io_uring/net.c | 20 ++++----- io_uring/openclose.c | 4 +- io_uring/poll.c | 8 ++-- io_uring/rsrc.c | 44 ++++++++++---------- io_uring/rw.c | 18 ++++---- io_uring/statx.c | 4 +- io_uring/timeout.c | 14 +++---- io_uring/uring_cmd.c | 5 +++ io_uring/xattr.c | 12 +++--- 22 files changed, 240 insertions(+), 151 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 9506a8858f0ff..1f33293426c94 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -480,8 +480,8 @@ struct io_task_work { };
struct io_cqe { - __u64 user_data; - __s32 res; + __kernel_uintptr_t user_data; + __s32 res; /* fd initially, then cflags for completion */ union { __u32 flags; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 2df3225b562fa..121c9aef5ad00 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -11,6 +11,11 @@ #include <linux/fs.h> #include <linux/types.h> #include <linux/time_types.h> +#ifdef __KERNEL__ +#include <linux/stddef.h> /* for offsetof */ +#else +#include <stddef.h> /* for offsetof */ +#endif
#ifdef __cplusplus extern "C" { @@ -25,16 +30,16 @@ struct io_uring_sqe { __u16 ioprio; /* ioprio for the request */ __s32 fd; /* file descriptor to do IO on */ union { - __u64 off; /* offset into file */ - __u64 addr2; + __u64 off; /* offset into file */ + __kernel_uintptr_t addr2; struct { __u32 cmd_op; __u32 __pad1; }; }; union { - __u64 addr; /* pointer to buffer or iovecs */ - __u64 splice_off_in; + __kernel_uintptr_t addr; /* pointer to buffer or iovecs */ + __u64 splice_off_in; }; __u32 len; /* buffer size or number of iovecs */ union { @@ -58,7 +63,7 @@ struct io_uring_sqe { __u32 msg_ring_flags; __u32 uring_cmd_flags; }; - __u64 user_data; /* data to be passed back at completion time */ + __kernel_uintptr_t user_data; /* data to be passed back at completion time */ /* pack this to avoid bogus arm OABI complaints */ union { /* index into fixed buffers, if used */ @@ -78,12 +83,14 @@ struct io_uring_sqe { }; union { struct { - __u64 addr3; - __u64 __pad2[1]; + __kernel_uintptr_t addr3; + __kernel_uintptr_t __pad2[1]; }; /* * If the ring is initialized with IORING_SETUP_SQE128, then - * this field is used for 80 bytes of arbitrary command data + * this field is used to double the size of the + * struct io_uring_sqe to store bytes of arbitrary + * command data, i.e. 80 bytes or 160 bytes in PCuABI */ __u8 cmd[0]; }; @@ -326,13 +333,14 @@ enum { * IO completion data structure (Completion Queue Entry) */ struct io_uring_cqe { - __u64 user_data; /* sqe->data submission passed back */ - __s32 res; /* result code for this event */ - __u32 flags; + __kernel_uintptr_t user_data; /* sqe->data submission passed back */ + __s32 res; /* result code for this event */ + __u32 flags;
/* * If the ring is initialized with IORING_SETUP_CQE32, then this field - * contains 16-bytes of padding, doubling the size of the CQE. + * doubles the size of the CQE, i.e. contains 16 bytes, or in PCuABI, + * 32 bytes of padding. */ __u64 big_cqe[]; }; @@ -504,7 +512,7 @@ enum { struct io_uring_files_update { __u32 offset; __u32 resv; - __aligned_u64 /* __s32 * */ fds; + __kernel_aligned_uintptr_t /* __s32 * */ fds; };
/* @@ -517,21 +525,21 @@ struct io_uring_rsrc_register { __u32 nr; __u32 flags; __u64 resv2; - __aligned_u64 data; - __aligned_u64 tags; + __kernel_aligned_uintptr_t data; + __kernel_aligned_uintptr_t tags; };
struct io_uring_rsrc_update { __u32 offset; __u32 resv; - __aligned_u64 data; + __kernel_aligned_uintptr_t data; };
struct io_uring_rsrc_update2 { __u32 offset; __u32 resv; - __aligned_u64 data; - __aligned_u64 tags; + __kernel_aligned_uintptr_t data; + __kernel_aligned_uintptr_t tags; __u32 nr; __u32 resv2; }; @@ -581,10 +589,10 @@ struct io_uring_restriction { };
struct io_uring_buf { - __u64 addr; - __u32 len; - __u16 bid; - __u16 resv; + __kernel_uintptr_t addr; + __u32 len; + __u16 bid; + __u16 resv; };
struct io_uring_buf_ring { @@ -594,9 +602,7 @@ struct io_uring_buf_ring { * ring tail is overlaid with the io_uring_buf->resv field. */ struct { - __u64 resv1; - __u32 resv2; - __u16 resv3; + __u8 resv[offsetof(struct io_uring_buf, resv)]; __u16 tail; }; struct io_uring_buf bufs[0]; @@ -605,11 +611,11 @@ struct io_uring_buf_ring {
/* argument for IORING_(UN)REGISTER_PBUF_RING */ struct io_uring_buf_reg { - __u64 ring_addr; - __u32 ring_entries; - __u16 bgid; - __u16 pad; - __u64 resv[3]; + __kernel_uintptr_t ring_addr; + __u32 ring_entries; + __u16 bgid; + __u16 pad; + __u64 resv[3]; };
/* @@ -632,17 +638,17 @@ enum { };
struct io_uring_getevents_arg { - __u64 sigmask; - __u32 sigmask_sz; - __u32 pad; - __u64 ts; + __kernel_uintptr_t sigmask; + __u32 sigmask_sz; + __u32 pad; + __kernel_uintptr_t ts; };
/* * Argument for IORING_REGISTER_SYNC_CANCEL */ struct io_uring_sync_cancel_reg { - __u64 addr; + __kernel_uintptr_t addr; __s32 fd; __u32 flags; struct __kernel_timespec timeout; diff --git a/io_uring/advise.c b/io_uring/advise.c index 449c6f14649f7..05fd3bbaf8090 100644 --- a/io_uring/advise.c +++ b/io_uring/advise.c @@ -23,7 +23,7 @@ struct io_fadvise {
struct io_madvise { struct file *file; - u64 addr; + void __user *addr; u32 len; u32 advice; }; @@ -36,7 +36,7 @@ int io_madvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (sqe->buf_index || sqe->off || sqe->splice_fd_in) return -EINVAL;
- ma->addr = READ_ONCE(sqe->addr); + ma->addr = (void __user *)READ_ONCE(sqe->addr); ma->len = READ_ONCE(sqe->len); ma->advice = READ_ONCE(sqe->fadvise_advice); return 0; @@ -54,7 +54,8 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags) if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN;
- ret = do_madvise(current->mm, ma->addr, ma->len, ma->advice); + /* TODO [PCuABI] - capability checks for uaccess */ + ret = do_madvise(current->mm, user_ptr_addr(ma->addr), ma->len, ma->advice); io_req_set_res(req, ret, 0); return IOU_OK; #else diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 0f942da7455b5..fca32501dd838 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -19,7 +19,7 @@
struct io_cancel { struct file *file; - u64 addr; + __kernel_uintptr_t addr; u32 flags; s32 fd; }; @@ -34,7 +34,7 @@ static int get_compat64_io_uring_sync_cancel_reg(struct io_uring_sync_cancel_reg
if (copy_from_user(&compat_sc, user_sc, sizeof(compat_sc))) return -EFAULT; - sc->addr = compat_sc.addr; + sc->addr = (__kernel_uintptr_t)compat_sc.addr; sc->fd = compat_sc.fd; sc->flags = compat_sc.flags; sc->timeout = compat_sc.timeout; @@ -48,7 +48,7 @@ static int copy_io_uring_sync_cancel_reg_from_user(struct io_ring_ctx *ctx, { if (io_in_compat64(ctx)) return get_compat64_io_uring_sync_cancel_reg(sc, arg); - if (copy_from_user(sc, arg, sizeof(*sc))) + if (copy_from_user_with_ptr(sc, arg, sizeof(*sc))) return -EFAULT; return 0; } @@ -66,7 +66,7 @@ static bool io_cancel_cb(struct io_wq_work *work, void *data) if (req->file != cd->file) return false; } else { - if (req->cqe.user_data != cd->data) + if (!io_user_data_is_same(req->cqe.user_data, cd->data)) return false; } if (cd->flags & (IORING_ASYNC_CANCEL_ALL|IORING_ASYNC_CANCEL_ANY)) { diff --git a/io_uring/cancel.h b/io_uring/cancel.h index 6a59ee484d0cc..7c1249d61bf25 100644 --- a/io_uring/cancel.h +++ b/io_uring/cancel.h @@ -5,7 +5,7 @@ struct io_cancel_data { struct io_ring_ctx *ctx; union { - u64 data; + __kernel_uintptr_t data; struct file *file; }; u32 flags; diff --git a/io_uring/epoll.c b/io_uring/epoll.c index d5580ff465c3e..d9d5983f823c2 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -39,7 +39,7 @@ int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (ep_op_has_event(epoll->op)) { struct epoll_event __user *ev;
- ev = u64_to_user_ptr(READ_ONCE(sqe->addr)); + ev = (struct epoll_event __user *)READ_ONCE(sqe->addr); if (copy_epoll_event_from_user(&epoll->event, ev, req->ctx->compat)) return -EFAULT; } diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index fd02317627ae7..ba15996382bef 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -56,7 +56,7 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, sq_idx, io_uring_get_opcode((sqe)->opcode), (sqe)->fd, \ (sqe)->flags, (unsigned long long) (sqe)->off, \ (unsigned long long) (sqe)->addr, (sqe)->rw_flags, \ - (sqe)->buf_index, (sqe)->user_data); \ + (sqe)->buf_index, (unsigned long long)(sqe)->user_data); \ if (sq_shift) { \ u64 *sqeb = (void *) ((sqe) + 1); \ int size = sizeof(*(sqe)) / sizeof(u64); \ @@ -73,8 +73,8 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, #define print_cqe(m, cqe, cq_idx, cq_shift) \ do { \ seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", \ - cq_idx, (cqe)->user_data, (cqe)->res, \ - (cqe)->flags); \ + cq_idx, (unsigned long long) (cqe)->user_data, \ + (cqe)->res, (cqe)->flags); \ if (cq_shift) \ seq_printf(m, ", extra1:%llu, extra2:%llu\n", \ (cqe)->big_cqe[0], (cqe)->big_cqe[1]); \ @@ -210,7 +210,8 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct io_uring_cqe *cqe = &ocqe->cqe;
seq_printf(m, " user_data=%llu, res=%d, flags=%x\n", - cqe->user_data, cqe->res, cqe->flags); + (unsigned long long) cqe->user_data, cqe->res, + cqe->flags); }
spin_unlock(&ctx->completion_lock); diff --git a/io_uring/fs.c b/io_uring/fs.c index 7100c293c13a8..2e01e7da1d4ba 100644 --- a/io_uring/fs.c +++ b/io_uring/fs.c @@ -58,8 +58,8 @@ int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EBADF;
ren->old_dfd = READ_ONCE(sqe->fd); - oldf = u64_to_user_ptr(READ_ONCE(sqe->addr)); - newf = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + oldf = (char __user *)READ_ONCE(sqe->addr); + newf = (char __user *)READ_ONCE(sqe->addr2); ren->new_dfd = READ_ONCE(sqe->len); ren->flags = READ_ONCE(sqe->rename_flags);
@@ -117,7 +117,7 @@ int io_unlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (un->flags & ~AT_REMOVEDIR) return -EINVAL;
- fname = u64_to_user_ptr(READ_ONCE(sqe->addr)); + fname = (char __user *)READ_ONCE(sqe->addr); un->filename = getname(fname); if (IS_ERR(un->filename)) return PTR_ERR(un->filename); @@ -164,7 +164,7 @@ int io_mkdirat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) mkd->dfd = READ_ONCE(sqe->fd); mkd->mode = READ_ONCE(sqe->len);
- fname = u64_to_user_ptr(READ_ONCE(sqe->addr)); + fname = (char __user *)READ_ONCE(sqe->addr); mkd->filename = getname(fname); if (IS_ERR(mkd->filename)) return PTR_ERR(mkd->filename); @@ -206,8 +206,8 @@ int io_symlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EBADF;
sl->new_dfd = READ_ONCE(sqe->fd); - oldpath = u64_to_user_ptr(READ_ONCE(sqe->addr)); - newpath = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + oldpath = (char __user *)READ_ONCE(sqe->addr); + newpath = (char __user *)READ_ONCE(sqe->addr2);
sl->oldpath = getname(oldpath); if (IS_ERR(sl->oldpath)) @@ -250,8 +250,8 @@ int io_linkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
lnk->old_dfd = READ_ONCE(sqe->fd); lnk->new_dfd = READ_ONCE(sqe->len); - oldf = u64_to_user_ptr(READ_ONCE(sqe->addr)); - newf = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + oldf = (char __user *)READ_ONCE(sqe->addr); + newf = (char __user *)READ_ONCE(sqe->addr2); lnk->flags = READ_ONCE(sqe->hardlink_flags);
lnk->oldpath = getname(oldf); diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index a98f0013e47c1..ce2b87bdaddf7 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -159,10 +159,10 @@ static int get_compat64_io_uring_getevents_arg(struct io_uring_getevents_arg *ar
if (copy_from_user(&compat_arg, user_arg, sizeof(compat_arg))) return -EFAULT; - arg->sigmask = compat_arg.sigmask; + arg->sigmask = (__kernel_uintptr_t)compat_ptr(compat_arg.sigmask); arg->sigmask_sz = compat_arg.sigmask_sz; arg->pad = compat_arg.pad; - arg->ts = compat_arg.ts; + arg->ts = (__kernel_uintptr_t)compat_ptr(compat_arg.ts); return 0; }
@@ -178,7 +178,7 @@ static int copy_io_uring_getevents_arg_from_user(struct io_ring_ctx *ctx, } if (size != sizeof(*arg)) return -EINVAL; - if (copy_from_user(arg, argp, sizeof(*arg))) + if (copy_from_user_with_ptr(arg, argp, sizeof(*arg))) return -EFAULT; return 0; } @@ -723,7 +723,7 @@ static __cold void io_uring_drop_tctx_refs(struct task_struct *task) } }
-static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data, +static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, s32 res, u32 cflags, u64 extra1, u64 extra2) { struct io_overflow_cqe *ocqe; @@ -822,8 +822,8 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) return __io_get_ith_cqe(ctx, off); }
-bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, - bool allow_overflow) +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, + s32 res, u32 cflags, bool allow_overflow) { struct io_uring_cqe *cqe;
@@ -849,7 +849,7 @@ bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags }
bool io_post_aux_cqe(struct io_ring_ctx *ctx, - u64 user_data, s32 res, u32 cflags, + __kernel_uintptr_t user_data, s32 res, u32 cflags, bool allow_overflow) { bool filled; @@ -3217,9 +3217,9 @@ static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned int flags, return ret; if (arg.pad) return -EINVAL; - *sig = u64_to_user_ptr(arg.sigmask); + *sig = (sigset_t __user *)arg.sigmask; *argsz = arg.sigmask_sz; - *ts = u64_to_user_ptr(arg.ts); + *ts = (struct __kernel_timespec __user *)arg.ts; return 0; }
@@ -4162,6 +4162,49 @@ static int __init io_uring_init(void) __BUILD_BUG_VERIFY_OFFSET_SIZE(struct io_uring_sqe, eoffset, sizeof(etype), ename) #define BUILD_BUG_SQE_ELEM_SIZE(eoffset, esize, ename) \ __BUILD_BUG_VERIFY_OFFSET_SIZE(struct io_uring_sqe, eoffset, esize, ename) +#ifdef CONFIG_CHERI_PURECAP_UABI + BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 128); + BUILD_BUG_SQE_ELEM(0, __u8, opcode); + BUILD_BUG_SQE_ELEM(1, __u8, flags); + BUILD_BUG_SQE_ELEM(2, __u16, ioprio); + BUILD_BUG_SQE_ELEM(4, __s32, fd); + BUILD_BUG_SQE_ELEM(16, __u64, off); + BUILD_BUG_SQE_ELEM(16, __uintcap_t, addr2); + BUILD_BUG_SQE_ELEM(16, __u32, cmd_op); + BUILD_BUG_SQE_ELEM(20, __u32, __pad1); + BUILD_BUG_SQE_ELEM(32, __uintcap_t, addr); + BUILD_BUG_SQE_ELEM(32, __u64, splice_off_in); + BUILD_BUG_SQE_ELEM(48, __u32, len); + BUILD_BUG_SQE_ELEM(52, __kernel_rwf_t, rw_flags); + BUILD_BUG_SQE_ELEM(52, __u32, fsync_flags); + BUILD_BUG_SQE_ELEM(52, __u16, poll_events); + BUILD_BUG_SQE_ELEM(52, __u32, poll32_events); + BUILD_BUG_SQE_ELEM(52, __u32, sync_range_flags); + BUILD_BUG_SQE_ELEM(52, __u32, msg_flags); + BUILD_BUG_SQE_ELEM(52, __u32, timeout_flags); + BUILD_BUG_SQE_ELEM(52, __u32, accept_flags); + BUILD_BUG_SQE_ELEM(52, __u32, cancel_flags); + BUILD_BUG_SQE_ELEM(52, __u32, open_flags); + BUILD_BUG_SQE_ELEM(52, __u32, statx_flags); + BUILD_BUG_SQE_ELEM(52, __u32, fadvise_advice); + BUILD_BUG_SQE_ELEM(52, __u32, splice_flags); + BUILD_BUG_SQE_ELEM(52, __u32, rename_flags); + BUILD_BUG_SQE_ELEM(52, __u32, unlink_flags); + BUILD_BUG_SQE_ELEM(52, __u32, hardlink_flags); + BUILD_BUG_SQE_ELEM(52, __u32, xattr_flags); + BUILD_BUG_SQE_ELEM(52, __u32, msg_ring_flags); + BUILD_BUG_SQE_ELEM(64, __uintcap_t, user_data); + BUILD_BUG_SQE_ELEM(80, __u16, buf_index); + BUILD_BUG_SQE_ELEM(80, __u16, buf_group); + BUILD_BUG_SQE_ELEM(82, __u16, personality); + BUILD_BUG_SQE_ELEM(84, __s32, splice_fd_in); + BUILD_BUG_SQE_ELEM(84, __u32, file_index); + BUILD_BUG_SQE_ELEM(84, __u16, addr_len); + BUILD_BUG_SQE_ELEM(86, __u16, __pad3[0]); + BUILD_BUG_SQE_ELEM(96, __uintcap_t, addr3); + BUILD_BUG_SQE_ELEM_SIZE(96, 0, cmd); + BUILD_BUG_SQE_ELEM(112, __uintcap_t, __pad2); +#else /* !CONFIG_CHERI_PURECAP_UABI */ BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 64); BUILD_BUG_SQE_ELEM(0, __u8, opcode); BUILD_BUG_SQE_ELEM(1, __u8, flags); @@ -4205,6 +4248,7 @@ static int __init io_uring_init(void) BUILD_BUG_SQE_ELEM(48, __u64, addr3); BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd); BUILD_BUG_SQE_ELEM(56, __u64, __pad2); +#endif /* !CONFIG_CHERI_PURECAP_UABI */
BUILD_BUG_ON(sizeof(struct io_uring_files_update) != sizeof(struct io_uring_rsrc_update)); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 1c97583fa281a..31b80f8ff5935 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -36,10 +36,10 @@ void io_req_complete_failed(struct io_kiocb *req, s32 res); void __io_req_complete(struct io_kiocb *req, unsigned issue_flags); void io_req_complete_post(struct io_kiocb *req); void __io_req_complete_post(struct io_kiocb *req); -bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, - bool allow_overflow); -bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, - bool allow_overflow); +bool io_post_aux_cqe(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, + s32 res, u32 cflags, bool allow_overflow); +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, + s32 res, u32 cflags, bool allow_overflow); void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); @@ -101,6 +101,16 @@ static inline bool io_in_compat64(struct io_ring_ctx *ctx) return IS_ENABLED(CONFIG_COMPAT64) && ctx->compat; }
+static inline bool io_user_data_is_same(const __kernel_uintptr_t d1, + const __kernel_uintptr_t d2) +{ +#ifdef CONFIG_CHERI_PURECAP_UABI + return __builtin_cheri_equal_exact(d1, d2); +#else + return d1 == d2; +#endif +} + static inline void convert_compat64_io_uring_sqe(struct io_ring_ctx *ctx, struct io_uring_sqe *sqe, const struct compat_io_uring_sqe *compat_sqe) @@ -122,13 +132,29 @@ static inline void convert_compat64_io_uring_sqe(struct io_ring_ctx *ctx, sqe->ioprio = READ_ONCE(compat_sqe->ioprio); sqe->fd = READ_ONCE(compat_sqe->fd); BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr2, addr); - sqe->addr2 = READ_ONCE(compat_sqe->addr2); + sqe->addr2 = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr2)); BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr, len); - sqe->addr = READ_ONCE(compat_sqe->addr); + + /* + * Some opcodes set a user_data value in the addr field to be matched + * with a pre-existing IO event's user_data. It's not dereferenced by + * the kernel, so don't modify it. + */ + switch (sqe->opcode) { + case IORING_OP_POLL_REMOVE: + case IORING_OP_TIMEOUT_REMOVE: + case IORING_OP_ASYNC_CANCEL: + sqe->addr = (__kernel_uintptr_t)READ_ONCE(compat_sqe->addr); + break; + default: + sqe->addr = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr)); + break; + } + sqe->len = READ_ONCE(compat_sqe->len); BUILD_BUG_COMPAT_SQE_UNION_ELEM(rw_flags, user_data); sqe->rw_flags = READ_ONCE(compat_sqe->rw_flags); - sqe->user_data = READ_ONCE(compat_sqe->user_data); + sqe->user_data = (__kernel_uintptr_t)READ_ONCE(compat_sqe->user_data); BUILD_BUG_COMPAT_SQE_UNION_ELEM(buf_index, personality); sqe->buf_index = READ_ONCE(compat_sqe->buf_index); sqe->personality = READ_ONCE(compat_sqe->personality); @@ -138,9 +164,14 @@ static inline void convert_compat64_io_uring_sqe(struct io_ring_ctx *ctx, size_t compat_cmd_size = compat_uring_cmd_pdu_size(ctx->flags & IORING_SETUP_SQE128);
+ /* + * Note that sqe->cmd is bigger than compat_sqe->cmd, but + * uring_cmd handlers are not using that extra data in the + * compat mode, so the end of sqe->cmd is left uninitialised. + */ memcpy(sqe->cmd, compat_sqe->cmd, compat_cmd_size); } else { - sqe->addr3 = READ_ONCE(compat_sqe->addr3); + sqe->addr3 = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr3)); sqe->__pad2[0] = READ_ONCE(compat_sqe->__pad2[0]); } #undef BUILD_BUG_COMPAT_SQE_UNION_ELEM @@ -168,13 +199,13 @@ static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) }
static inline void __io_fill_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe *cqe, - u64 user_data, s32 res, u32 cflags, + __kernel_uintptr_t user_data, s32 res, u32 cflags, u64 extra1, u64 extra2) { if (io_in_compat64(ctx)) { struct compat_io_uring_cqe *compat_cqe = (struct compat_io_uring_cqe *)cqe;
- WRITE_ONCE(compat_cqe->user_data, user_data); + WRITE_ONCE(compat_cqe->user_data, (__u64)user_data); WRITE_ONCE(compat_cqe->res, res); WRITE_ONCE(compat_cqe->flags, cflags);
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 110edd1cb84e0..70056c27d7780 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -22,7 +22,7 @@
struct io_provide_buf { struct file *file; - __u64 addr; + void __user *addr; __u32 len; __u32 bgid; __u16 nbufs; @@ -36,7 +36,7 @@ static int get_compat64_io_uring_buf_reg(struct io_uring_buf_reg *reg,
if (copy_from_user(&compat_reg, user_reg, sizeof(compat_reg))) return -EFAULT; - reg->ring_addr = compat_reg.ring_addr; + reg->ring_addr = (__kernel_uintptr_t)compat_ptr(compat_reg.ring_addr); reg->ring_entries = compat_reg.ring_entries; reg->bgid = compat_reg.bgid; reg->pad = compat_reg.pad; @@ -50,7 +50,7 @@ static int copy_io_uring_buf_reg_from_user(struct io_ring_ctx *ctx, { if (io_in_compat64(ctx)) return get_compat64_io_uring_buf_reg(reg, arg); - if (copy_from_user(reg, arg, sizeof(*reg))) + if (copy_from_user_with_ptr(reg, arg, sizeof(*reg))) return -EFAULT; return 0; } @@ -147,7 +147,7 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len, req->flags |= REQ_F_BUFFER_SELECTED; req->kbuf = kbuf; req->buf_index = kbuf->bid; - return u64_to_user_ptr(kbuf->addr); + return kbuf->addr; } return NULL; } @@ -207,7 +207,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, req->buf_list = bl; req->buf_index = buf->bid;
- return u64_to_user_ptr(buf->addr); + return (void __user *)buf->addr; }
static void __user *io_ring_buffer_select_any(struct io_kiocb *req, size_t *len, @@ -405,17 +405,17 @@ int io_provide_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe if (!tmp || tmp > USHRT_MAX) return -E2BIG; p->nbufs = tmp; - p->addr = READ_ONCE(sqe->addr); + p->addr = (void __user *)READ_ONCE(sqe->addr); p->len = READ_ONCE(sqe->len);
if (check_mul_overflow((unsigned long)p->len, (unsigned long)p->nbufs, &size)) return -EOVERFLOW; - if (check_add_overflow((unsigned long)p->addr, size, &tmp_check)) + if (check_add_overflow(user_ptr_addr(p->addr), size, &tmp_check)) return -EOVERFLOW;
size = (unsigned long)p->len * p->nbufs; - if (!access_ok(u64_to_user_ptr(p->addr), size)) + if (!access_ok(p->addr, size)) return -EFAULT;
p->bgid = READ_ONCE(sqe->buf_group); @@ -475,7 +475,7 @@ static int io_add_buffers(struct io_ring_ctx *ctx, struct io_provide_buf *pbuf, struct io_buffer_list *bl) { struct io_buffer *buf; - u64 addr = pbuf->addr; + void __user *addr = pbuf->addr; int i, bid = pbuf->bid;
for (i = 0; i < pbuf->nbufs; i++) { @@ -587,6 +587,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) pages_size = io_in_compat64(ctx) ? size_mul(sizeof(struct compat_io_uring_buf), reg.ring_entries) : size_mul(sizeof(struct io_uring_buf), reg.ring_entries); + /* TODO [PCuABI] - capability checks for uaccess */ pages = io_pin_pages(reg.ring_addr, pages_size, &nr_pages); if (IS_ERR(pages)) { kfree(free_bl); diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 1aa5bbbc5d628..1977c13ccf3ff 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -31,7 +31,7 @@ struct io_buffer_list {
struct io_buffer { struct list_head list; - __u64 addr; + void __user *addr; __u32 len; __u16 bid; __u16 bgid; diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c index 90d2fc6fd80e4..654f5ad0b11c0 100644 --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -15,7 +15,7 @@
struct io_msg { struct file *file; - u64 user_data; + __kernel_uintptr_t user_data; u32 len; u32 cmd; u32 src_fd; @@ -130,7 +130,7 @@ int io_msg_ring_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(sqe->buf_index || sqe->personality)) return -EINVAL;
- msg->user_data = READ_ONCE(sqe->off); + msg->user_data = READ_ONCE(sqe->addr2); msg->len = READ_ONCE(sqe->len); msg->cmd = READ_ONCE(sqe->addr); msg->src_fd = READ_ONCE(sqe->addr3); diff --git a/io_uring/net.c b/io_uring/net.c index 4c133bc6f9d1d..6fd28a49b6715 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -243,13 +243,13 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (req->opcode == IORING_OP_SEND) { if (READ_ONCE(sqe->__pad3[0])) return -EINVAL; - sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + sr->addr = (void __user *)READ_ONCE(sqe->addr2); sr->addr_len = READ_ONCE(sqe->addr_len); } else if (sqe->addr2 || sqe->file_index) { return -EINVAL; }
- sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); + sr->umsg = (struct user_msghdr __user *)READ_ONCE(sqe->addr); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); if (sr->flags & ~IORING_RECVSEND_POLL_FIRST) @@ -421,7 +421,7 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req, struct user_msghdr msg; int ret;
- if (copy_from_user(&msg, sr->umsg, sizeof(*sr->umsg))) + if (copy_from_user_with_ptr(&msg, sr->umsg, sizeof(*sr->umsg))) return -EFAULT;
ret = __copy_msghdr(&iomsg->msg, &msg, &iomsg->uaddr); @@ -549,7 +549,7 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(sqe->file_index || sqe->addr2)) return -EINVAL;
- sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); + sr->umsg = (struct user_msghdr __user *)READ_ONCE(sqe->addr); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); if (sr->flags & ~(RECVMSG_FLAGS)) @@ -966,7 +966,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (req->opcode == IORING_OP_SEND_ZC) { if (READ_ONCE(sqe->__pad3[0])) return -EINVAL; - zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + zc->addr = (void __user *)READ_ONCE(sqe->addr2); zc->addr_len = READ_ONCE(sqe->addr_len); } else { if (unlikely(sqe->addr2 || sqe->file_index)) @@ -975,7 +975,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EINVAL; }
- zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + zc->buf = (void __user *)READ_ONCE(sqe->addr); zc->len = READ_ONCE(sqe->len); zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; if (zc->msg_flags & MSG_DONTWAIT) @@ -1242,8 +1242,8 @@ int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (sqe->len || sqe->buf_index) return -EINVAL;
- accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr)); - accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + accept->addr = (void __user *)READ_ONCE(sqe->addr); + accept->addr_len = (int __user *)READ_ONCE(sqe->addr2); accept->flags = READ_ONCE(sqe->accept_flags); accept->nofile = rlimit(RLIMIT_NOFILE); flags = READ_ONCE(sqe->ioprio); @@ -1392,8 +1392,8 @@ int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (sqe->len || sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in) return -EINVAL;
- conn->addr = u64_to_user_ptr(READ_ONCE(sqe->addr)); - conn->addr_len = READ_ONCE(sqe->addr2); + conn->addr = (void __user *)READ_ONCE(sqe->addr); + conn->addr_len = READ_ONCE(sqe->off); conn->in_progress = false; return 0; } diff --git a/io_uring/openclose.c b/io_uring/openclose.c index 67178e4bb282d..0a5c838885306 100644 --- a/io_uring/openclose.c +++ b/io_uring/openclose.c @@ -47,7 +47,7 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe open->how.flags |= O_LARGEFILE;
open->dfd = READ_ONCE(sqe->fd); - fname = u64_to_user_ptr(READ_ONCE(sqe->addr)); + fname = (char __user *)READ_ONCE(sqe->addr); open->filename = getname(fname); if (IS_ERR(open->filename)) { ret = PTR_ERR(open->filename); @@ -81,7 +81,7 @@ int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) size_t len; int ret;
- how = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + how = (struct open_how __user *)READ_ONCE(sqe->addr2); len = READ_ONCE(sqe->len); if (len < OPEN_HOW_SIZE_VER0) return -EINVAL; diff --git a/io_uring/poll.c b/io_uring/poll.c index d9bf1767867e6..647b775a15ea4 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -22,8 +22,8 @@
struct io_poll_update { struct file *file; - u64 old_user_data; - u64 new_user_data; + __kernel_uintptr_t old_user_data; + __kernel_uintptr_t new_user_data; __poll_t events; bool update_events; bool update_user_data; @@ -764,7 +764,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
spin_lock(&hb->lock); hlist_for_each_entry(req, &hb->list, hash_node) { - if (cd->data != req->cqe.user_data) + if (!io_user_data_is_same(cd->data, req->cqe.user_data)) continue; if (poll_only && req->opcode != IORING_OP_POLL_ADD) continue; @@ -890,7 +890,7 @@ int io_poll_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) upd->update_events = flags & IORING_POLL_UPDATE_EVENTS; upd->update_user_data = flags & IORING_POLL_UPDATE_USER_DATA;
- upd->new_user_data = READ_ONCE(sqe->off); + upd->new_user_data = READ_ONCE(sqe->addr2); if (!upd->update_user_data && upd->new_user_data) return -EINVAL; if (upd->update_events) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 8a2b5891f1030..9e716fef91d79 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -18,7 +18,7 @@
struct io_rsrc_update { struct file *file; - u64 arg; + __s32 __user *arg; u32 nr_args; u32 offset; }; @@ -32,7 +32,7 @@ static int get_compat64_io_uring_rsrc_update(struct io_uring_rsrc_update2 *up2, return -EFAULT; up2->offset = compat_up.offset; up2->resv = compat_up.resv; - up2->data = compat_up.data; + up2->data = (__kernel_uintptr_t)compat_ptr(compat_up.data); return 0; }
@@ -45,8 +45,8 @@ static int get_compat64_io_uring_rsrc_update2(struct io_uring_rsrc_update2 *up2, return -EFAULT; up2->offset = compat_up2.offset; up2->resv = compat_up2.resv; - up2->data = compat_up2.data; - up2->tags = compat_up2.tags; + up2->data = (__kernel_uintptr_t)compat_ptr(compat_up2.data); + up2->tags = (__kernel_uintptr_t)compat_ptr(compat_up2.tags); up2->nr = compat_up2.nr; up2->resv2 = compat_up2.resv2; return 0; @@ -62,8 +62,8 @@ static int get_compat64_io_uring_rsrc_register(struct io_uring_rsrc_register *rr rr->nr = compat_rr.nr; rr->flags = compat_rr.flags; rr->resv2 = compat_rr.resv2; - rr->data = compat_rr.data; - rr->tags = compat_rr.tags; + rr->data = (__kernel_uintptr_t)compat_ptr(compat_rr.data); + rr->tags = (__kernel_uintptr_t)compat_ptr(compat_rr.tags); return 0; }
@@ -73,7 +73,7 @@ static int copy_io_uring_rsrc_update_from_user(struct io_ring_ctx *ctx, { if (io_in_compat64(ctx)) return get_compat64_io_uring_rsrc_update(up2, arg); - if (copy_from_user(up2, arg, sizeof(struct io_uring_rsrc_update))) + if (copy_from_user_with_ptr(up2, arg, sizeof(struct io_uring_rsrc_update))) return -EFAULT; return 0; } @@ -90,7 +90,7 @@ static int copy_io_uring_rsrc_update2_from_user(struct io_ring_ctx *ctx, } if (size != sizeof(*up2)) return -EINVAL; - if (copy_from_user(up2, arg, sizeof(*up2))) + if (copy_from_user_with_ptr(up2, arg, sizeof(*up2))) return -EFAULT; return 0; } @@ -107,7 +107,7 @@ static int copy_io_uring_rsrc_register_from_user(struct io_ring_ctx *ctx, } if (size != sizeof(*rr)) return -EINVAL; - if (copy_from_user(rr, arg, size)) + if (copy_from_user_with_ptr(rr, arg, size)) return -EFAULT; return 0; } @@ -190,13 +190,13 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst, if (copy_from_user(&ciov, &ciovs[index], sizeof(ciov))) return -EFAULT;
- dst->iov_base = u64_to_user_ptr((u64)ciov.iov_base); + dst->iov_base = compat_ptr(ciov.iov_base); dst->iov_len = ciov.iov_len; return 0; } #endif src = (struct iovec __user *) arg; - if (copy_from_user(dst, &src[index], sizeof(*dst))) + if (copy_from_user_with_ptr(dst, &src[index], sizeof(*dst))) return -EFAULT; return 0; } @@ -523,8 +523,8 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, struct io_uring_rsrc_update2 *up, unsigned nr_args) { - u64 __user *tags = u64_to_user_ptr(up->tags); - __s32 __user *fds = u64_to_user_ptr(up->data); + u64 __user *tags = (u64 __user *)up->tags; + __s32 __user *fds = (__s32 __user *)up->data; struct io_rsrc_data *data = ctx->file_data; struct io_fixed_file *file_slot; struct file *file; @@ -603,9 +603,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, struct io_uring_rsrc_update2 *up, unsigned int nr_args) { - u64 __user *tags = u64_to_user_ptr(up->tags); + u64 __user *tags = (u64 __user *)up->tags; struct iovec iov; - struct iovec __user *iovs = u64_to_user_ptr(up->data); + struct iovec __user *iovs = (struct iovec __user *)up->data; struct page *last_hpage = NULL; bool needs_switch = false; __u32 done; @@ -729,13 +729,13 @@ __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, case IORING_RSRC_FILE: if (rr.flags & IORING_RSRC_REGISTER_SPARSE && rr.data) break; - return io_sqe_files_register(ctx, u64_to_user_ptr(rr.data), - rr.nr, u64_to_user_ptr(rr.tags)); + return io_sqe_files_register(ctx, (void __user *)rr.data, + rr.nr, (u64 __user *)rr.tags); case IORING_RSRC_BUFFER: if (rr.flags & IORING_RSRC_REGISTER_SPARSE && rr.data) break; - return io_sqe_buffers_register(ctx, u64_to_user_ptr(rr.data), - rr.nr, u64_to_user_ptr(rr.tags)); + return io_sqe_buffers_register(ctx, (void __user *)rr.data, + rr.nr, (u64 __user *)rr.tags); } return -EINVAL; } @@ -753,7 +753,7 @@ int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) up->nr_args = READ_ONCE(sqe->len); if (!up->nr_args) return -EINVAL; - up->arg = READ_ONCE(sqe->addr); + up->arg = (__s32 __user *)READ_ONCE(sqe->addr); return 0; }
@@ -761,7 +761,7 @@ static int io_files_update_with_index_alloc(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req, struct io_rsrc_update); - __s32 __user *fds = u64_to_user_ptr(up->arg); + __s32 __user *fds = up->arg; unsigned int done; struct file *file; int ret, fd; @@ -804,7 +804,7 @@ int io_files_update(struct io_kiocb *req, unsigned int issue_flags) int ret;
up2.offset = up->offset; - up2.data = up->arg; + up2.data = (__kernel_uintptr_t)up->arg; up2.nr = 0; up2.tags = 0; up2.resv = 0; diff --git a/io_uring/rw.c b/io_uring/rw.c index 2edca190450ee..229c0d778c9d6 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -23,7 +23,7 @@ struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ struct kiocb kiocb; - u64 addr; + void __user *addr; u32 len; rwf_t flags; }; @@ -39,7 +39,7 @@ static int io_iov_compat_buffer_select_prep(struct io_rw *rw) struct compat_iovec __user *uiov; compat_ssize_t clen;
- uiov = u64_to_user_ptr(rw->addr); + uiov = rw->addr; if (!access_ok(uiov, sizeof(*uiov))) return -EFAULT; if (__get_user(clen, &uiov->iov_len)) @@ -65,7 +65,7 @@ static int io_iov_buffer_select_prep(struct io_kiocb *req) return io_iov_compat_buffer_select_prep(rw); #endif
- uiov = u64_to_user_ptr(rw->addr); + uiov = rw->addr; if (get_user(rw->len, &uiov->iov_len)) return -EFAULT; return 0; @@ -104,7 +104,7 @@ int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe) rw->kiocb.ki_ioprio = get_current_ioprio(); }
- rw->addr = READ_ONCE(sqe->addr); + rw->addr = (void __user *)READ_ONCE(sqe->addr); rw->len = READ_ONCE(sqe->len); rw->flags = READ_ONCE(sqe->rw_flags);
@@ -364,13 +364,14 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, ssize_t ret;
if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { - ret = io_import_fixed(ddir, iter, req->imu, rw->addr, rw->len); + ret = io_import_fixed(ddir, iter, req->imu, + user_ptr_addr(rw->addr), rw->len); if (ret) return ERR_PTR(ret); return NULL; }
- buf = u64_to_user_ptr(rw->addr); + buf = rw->addr; sqe_len = rw->len;
if (opcode == IORING_OP_READ || opcode == IORING_OP_WRITE || @@ -379,8 +380,7 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, buf = io_buffer_select(req, &sqe_len, issue_flags); if (!buf) return ERR_PTR(-ENOBUFS); - /* TODO [PCuABI] - capability checks for uaccess */ - rw->addr = user_ptr_addr(buf); + rw->addr = buf; rw->len = sqe_len; }
@@ -446,7 +446,7 @@ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter) if (!iov_iter_is_bvec(iter)) { iovec = iov_iter_iovec(iter); } else { - iovec.iov_base = u64_to_user_ptr(rw->addr); + iovec.iov_base = rw->addr; iovec.iov_len = rw->len; }
diff --git a/io_uring/statx.c b/io_uring/statx.c index d8fc933d3f593..d2604fdbcbe33 100644 --- a/io_uring/statx.c +++ b/io_uring/statx.c @@ -32,8 +32,8 @@ int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
sx->dfd = READ_ONCE(sqe->fd); sx->mask = READ_ONCE(sqe->len); - path = u64_to_user_ptr(READ_ONCE(sqe->addr)); - sx->buffer = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + path = (char __user *)READ_ONCE(sqe->addr); + sx->buffer = (struct statx __user *)READ_ONCE(sqe->addr2); sx->flags = READ_ONCE(sqe->statx_flags);
sx->filename = getname_flags(path, diff --git a/io_uring/timeout.c b/io_uring/timeout.c index e8a8c20994805..7ea112e13f659 100644 --- a/io_uring/timeout.c +++ b/io_uring/timeout.c @@ -26,7 +26,7 @@ struct io_timeout {
struct io_timeout_rem { struct file *file; - u64 addr; + __kernel_uintptr_t addr;
/* timeout update */ struct timespec64 ts; @@ -229,7 +229,7 @@ static struct io_kiocb *io_timeout_extract(struct io_ring_ctx *ctx, struct io_kiocb *tmp = cmd_to_io_kiocb(timeout);
if (!(cd->flags & IORING_ASYNC_CANCEL_ANY) && - cd->data != tmp->cqe.user_data) + !io_user_data_is_same(cd->data, tmp->cqe.user_data)) continue; if (cd->flags & (IORING_ASYNC_CANCEL_ALL|IORING_ASYNC_CANCEL_ANY)) { if (cd->seq == tmp->work.cancel_seq) @@ -337,7 +337,7 @@ static clockid_t io_timeout_get_clock(struct io_timeout_data *data) } }
-static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data, +static int io_linked_timeout_update(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, struct timespec64 *ts, enum hrtimer_mode mode) __must_hold(&ctx->timeout_lock) { @@ -348,7 +348,7 @@ static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data, list_for_each_entry(timeout, &ctx->ltimeout_list, list) { struct io_kiocb *tmp = cmd_to_io_kiocb(timeout);
- if (user_data == tmp->cqe.user_data) { + if (io_user_data_is_same(user_data, tmp->cqe.user_data)) { req = tmp; break; } @@ -365,7 +365,7 @@ static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data, return 0; }
-static int io_timeout_update(struct io_ring_ctx *ctx, __u64 user_data, +static int io_timeout_update(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, struct timespec64 *ts, enum hrtimer_mode mode) __must_hold(&ctx->timeout_lock) { @@ -405,7 +405,7 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) tr->ltimeout = true; if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS)) return -EINVAL; - if (get_timespec64(&tr->ts, u64_to_user_ptr(sqe->addr2))) + if (get_timespec64(&tr->ts, (struct __kernel_timespec __user *)sqe->addr2)) return -EFAULT; if (tr->ts.tv_sec < 0 || tr->ts.tv_nsec < 0) return -EINVAL; @@ -490,7 +490,7 @@ static int __io_timeout_prep(struct io_kiocb *req, data->req = req; data->flags = flags;
- if (get_timespec64(&data->ts, u64_to_user_ptr(sqe->addr))) + if (get_timespec64(&data->ts, (struct __kernel_timespec __user *)sqe->addr)) return -EFAULT;
if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index e50de0b6b9f84..4d2d2e3f885ee 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -65,8 +65,13 @@ int io_uring_cmd_prep_async(struct io_kiocb *req) struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); size_t cmd_size;
+#ifdef CONFIG_CHERI_PURECAP_UABI + BUILD_BUG_ON(uring_cmd_pdu_size(0) != 32); + BUILD_BUG_ON(uring_cmd_pdu_size(1) != 160); +#else BUILD_BUG_ON(uring_cmd_pdu_size(0) != 16); BUILD_BUG_ON(uring_cmd_pdu_size(1) != 80); +#endif
cmd_size = uring_cmd_pdu_size(req->ctx->flags & IORING_SETUP_SQE128);
diff --git a/io_uring/xattr.c b/io_uring/xattr.c index 99df641594d74..1f13032e59536 100644 --- a/io_uring/xattr.c +++ b/io_uring/xattr.c @@ -53,8 +53,8 @@ static int __io_getxattr_prep(struct io_kiocb *req,
ix->filename = NULL; ix->ctx.kvalue = NULL; - name = u64_to_user_ptr(READ_ONCE(sqe->addr)); - ix->ctx.cvalue = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + name = (char __user *)READ_ONCE(sqe->addr); + ix->ctx.cvalue = (void __user *)READ_ONCE(sqe->addr2); ix->ctx.size = READ_ONCE(sqe->len); ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
@@ -93,7 +93,7 @@ int io_getxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (ret) return ret;
- path = u64_to_user_ptr(READ_ONCE(sqe->addr3)); + path = (char __user *)READ_ONCE(sqe->addr3);
ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL); if (IS_ERR(ix->filename)) { @@ -159,8 +159,8 @@ static int __io_setxattr_prep(struct io_kiocb *req, return -EBADF;
ix->filename = NULL; - name = u64_to_user_ptr(READ_ONCE(sqe->addr)); - ix->ctx.cvalue = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + name = (char __user *)READ_ONCE(sqe->addr); + ix->ctx.cvalue = (void __user *)READ_ONCE(sqe->addr2); ix->ctx.kvalue = NULL; ix->ctx.size = READ_ONCE(sqe->len); ix->ctx.flags = READ_ONCE(sqe->xattr_flags); @@ -189,7 +189,7 @@ int io_setxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (ret) return ret;
- path = u64_to_user_ptr(READ_ONCE(sqe->addr3)); + path = (char __user *)READ_ONCE(sqe->addr3);
ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL); if (IS_ERR(ix->filename)) {
On 29/03/2023 17:11, Tudor Cretu wrote:
+static inline bool io_user_data_is_same(const __kernel_uintptr_t d1,
const __kernel_uintptr_t d2)
Nit: there is generally not much point in having const arguments, as the only thing it achieves is preventing the function from modifying what is effectively local variables. Note that this is very different from passing a _pointer to_ const (const T *), because in that case the constness actually prevents the function from modifying data it doesn't own.
Kevin
On 04-04-2023 08:06, Kevin Brodsky wrote:
On 29/03/2023 17:11, Tudor Cretu wrote:
+static inline bool io_user_data_is_same(const __kernel_uintptr_t d1,
const __kernel_uintptr_t d2)
Nit: there is generally not much point in having const arguments, as the only thing it achieves is preventing the function from modifying what is effectively local variables. Note that this is very different from passing a _pointer to_ const (const T *), because in that case the constness actually prevents the function from modifying data it doesn't own.
Kevin
Oups, forgot to turn my brain on when copy-pasted this. I have removed the const in the next version.
Many thanks for the review!
Tudor
linux-morello@op-lists.linaro.org