This series makes it possible for purecap apps to use the io_uring system.
With these patches, all io_uring LTP tests pass in both Purecap and compat modes. Note that the LTP tests only address the basic functionality of the io_uring system and a significant portion of the multiplexed functionality is untested in LTP.
I have started running and investigating liburing tests and examples, so this is not a final version, but review is still very much appreciated while I go through the liburing tests.
v2: - Rebase on top of release 6.1 - Remove VM_READ_CAPS/VM_LOAD_CAPS patches as they are already merged - Update commit message in PATCH 1 - Add the generic changes PATCH 2 and PATCH 3 to avoid copying user pointers from/to userspace unnecesarily. These could be upstreamable. - Split "pulling the cqes memeber out" change into PATCH 4 - The changes for PATCH 5 and 6 are now split into their respective files after the rebase. - Format and change organization based on the feedback on the previous version, including creating helpers copy_*_from_* for various uAPI structs - Add comments related to handling of setup flags IORING_SETUP_SQE128 and IORING_SETUP_CQE32 - Add handling for new uAPI structs: io_uring_buf, io_uring_buf_ring, io_uring_buf_reg, io_uring_sync_cancel_reg.
Gitlab issue: https://git.morello-project.org/morello/kernel/linux/-/issues/2
Review branch: https://git.morello-project.org/tudcre01/linux/-/commits/morello/io_uring_v2
Tudor Cretu (7): compiler_types: Add (u)intcap_t to native_words io_uring/rw : Restrict copy to only uiov->len from userspace io_uring/tctx: Copy only the offset field back to user io_uring: Pull cqes member out from rings struct io_uring: Implement compat versions of uAPI structs and handle them io_uring: Use user pointer type in the uAPI structs io_uring: Allow capability tag access on the shared memory
include/linux/compiler_types.h | 7 + include/linux/io_uring_types.h | 160 ++++++++++++++-- include/uapi/linux/io_uring.h | 62 ++++--- io_uring/advise.c | 2 +- io_uring/cancel.c | 40 +++- io_uring/cancel.h | 2 +- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 64 ++++++- io_uring/fs.c | 16 +- io_uring/io_uring.c | 329 +++++++++++++++++++++++++-------- io_uring/io_uring.h | 126 ++++++++++--- io_uring/kbuf.c | 119 ++++++++++-- io_uring/kbuf.h | 8 +- io_uring/msg_ring.c | 4 +- io_uring/net.c | 18 +- io_uring/openclose.c | 4 +- io_uring/poll.c | 4 +- io_uring/rsrc.c | 150 ++++++++++++--- io_uring/rw.c | 17 +- io_uring/statx.c | 4 +- io_uring/tctx.c | 57 +++++- io_uring/timeout.c | 10 +- io_uring/uring_cmd.c | 5 + io_uring/uring_cmd.h | 7 + io_uring/xattr.c | 12 +- 25 files changed, 977 insertions(+), 252 deletions(-)
On CHERI architectures, the stores/loads of capabilities should be atomic. Add (u)intcap_t types to the native_words check in order to allow the stores/loads of capabilities to pass the checks for atomic operations.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/compiler_types.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 365fda8e7e424..19a85e9490ff3 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -328,9 +328,16 @@ struct ftrace_likely_data { default: (x)))
/* Is this type a native word size -- useful for atomic operations */ +#ifdef __CHERI__ +#define __native_word(t) \ + (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \ + sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long) || \ + __same_type(t, __intcap_t) || __same_type(t, __uintcap_t)) +#else #define __native_word(t) \ (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \ sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) +#endif
#ifdef __OPTIMIZE__ # define __compiletime_assert(condition, msg, prefix, suffix) \
Only the len member is needed, so restrict the copy_from_user to that.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- io_uring/rw.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/io_uring/rw.c b/io_uring/rw.c index 1393cdae75854..2edca190450ee 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -55,7 +55,6 @@ static int io_iov_compat_buffer_select_prep(struct io_rw *rw) static int io_iov_buffer_select_prep(struct io_kiocb *req) { struct iovec __user *uiov; - struct iovec iov; struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
if (rw->len != 1) @@ -67,9 +66,8 @@ static int io_iov_buffer_select_prep(struct io_kiocb *req) #endif
uiov = u64_to_user_ptr(rw->addr); - if (copy_from_user(&iov, uiov, sizeof(*uiov))) + if (get_user(rw->len, &uiov->iov_len)) return -EFAULT; - rw->len = iov.iov_len; return 0; }
Upon successful return of the io_uring_register system call, the offset field will contain the value of the registered file descriptor to be used for future io_uring_enter system calls. The rest of the struct doesn't need to be copied back to userspace, so restrict the copy_to_user call only to the offset field.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- io_uring/tctx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/tctx.c b/io_uring/tctx.c index 4324b1cf1f6af..96f77450cf4e2 100644 --- a/io_uring/tctx.c +++ b/io_uring/tctx.c @@ -289,7 +289,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, break;
reg.offset = ret; - if (copy_to_user(&arg[i], ®, sizeof(reg))) { + if (put_user(reg.offset, &arg[i].offset)) { fput(tctx->registered_rings[reg.offset]); tctx->registered_rings[reg.offset] = NULL; ret = -EFAULT;
Pull cqes member out from rings struct so that we are able to have a union between cqes and cqes_compat. This is done in a similar way to commit 75b28affdd6a ("io_uring: allocate the two rings together"), where sq_array was pulled out from the rings struct.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/io_uring_types.h | 18 +++++++++-------- io_uring/fdinfo.c | 2 +- io_uring/io_uring.c | 35 ++++++++++++++++++++++++---------- 3 files changed, 36 insertions(+), 19 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index df7d4febc38a4..440179029a8f0 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -141,14 +141,6 @@ struct io_rings { * ordered with any other data. */ u32 cq_overflow; - /* - * Ring buffer of completion events. - * - * The kernel writes completion events fresh every time they are - * produced, so the application is allowed to modify pending - * entries. - */ - struct io_uring_cqe cqes[] ____cacheline_aligned_in_smp; };
struct io_restriction { @@ -270,7 +262,17 @@ struct io_ring_ctx { struct xarray personalities; u32 pers_next;
+ /* completion data */ struct { + /* + * Ring buffer of completion events. + * + * The kernel writes completion events fresh every time they are + * produced, so the application is allowed to modify pending + * entries. + */ + struct io_uring_cqe *cqes; + /* * We cache a range of free CQEs we can use, once exhausted it * should go through a slower range setup, see __io_get_cqe() diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index 2e04850a657b0..bc8c9d764bc13 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -119,7 +119,7 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, cq_entries = min(cq_tail - cq_head, ctx->cq_entries); for (i = 0; i < cq_entries; i++) { unsigned int entry = i + cq_head; - struct io_uring_cqe *cqe = &r->cqes[(entry & cq_mask) << cq_shift]; + struct io_uring_cqe *cqe = &ctx->cqes[(entry & cq_mask) << cq_shift];
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", entry & cq_mask, cqe->user_data, cqe->res, diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index df41a63c642c1..707229ae04dc8 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -743,7 +743,6 @@ bool io_req_cqe_overflow(struct io_kiocb *req) */ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) { - struct io_rings *rings = ctx->rings; unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1); unsigned int free, queued, len;
@@ -768,14 +767,14 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) len <<= 1; }
- ctx->cqe_cached = &rings->cqes[off]; + ctx->cqe_cached = &ctx->cqes[off]; ctx->cqe_sentinel = ctx->cqe_cached + len;
ctx->cached_cq_tail++; ctx->cqe_cached++; if (ctx->flags & IORING_SETUP_CQE32) ctx->cqe_cached++; - return &rings->cqes[off]; + return &ctx->cqes[off]; }
bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, @@ -2476,13 +2475,28 @@ static void *io_mem_alloc(size_t size) }
static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, - unsigned int cq_entries, size_t *sq_offset) + unsigned int cq_entries, size_t *sq_offset, + size_t *cq_offset) { struct io_rings *rings; - size_t off, sq_array_size; + size_t off, cq_array_size, sq_array_size; + + off = sizeof(*rings); + +#ifdef CONFIG_SMP + off = ALIGN(off, SMP_CACHE_BYTES); + if (off == 0) + return SIZE_MAX; +#endif + + if (cq_offset) + *cq_offset = off; + + cq_array_size = array_size(sizeof(struct io_uring_cqe), cq_entries); + if (cq_array_size == SIZE_MAX) + return SIZE_MAX;
- off = struct_size(rings, cqes, cq_entries); - if (off == SIZE_MAX) + if (check_add_overflow(off, cq_array_size, &off)) return SIZE_MAX; if (ctx->flags & IORING_SETUP_CQE32) { if (check_shl_overflow(off, 1, &off)) @@ -3314,13 +3328,13 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, struct io_uring_params *p) { struct io_rings *rings; - size_t size, sq_array_offset; + size_t size, cqes_offset, sq_array_offset;
/* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; ctx->cq_entries = p->cq_entries;
- size = rings_size(ctx, p->sq_entries, p->cq_entries, &sq_array_offset); + size = rings_size(ctx, p->sq_entries, p->cq_entries, &sq_array_offset, &cqes_offset); if (size == SIZE_MAX) return -EOVERFLOW;
@@ -3329,6 +3343,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -ENOMEM;
ctx->rings = rings; + ctx->cqes = (struct io_uring_cqe *)((char *)rings + cqes_offset); ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); rings->sq_ring_mask = p->sq_entries - 1; rings->cq_ring_mask = p->cq_entries - 1; @@ -3533,7 +3548,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask); p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries); p->cq_off.overflow = offsetof(struct io_rings, cq_overflow); - p->cq_off.cqes = offsetof(struct io_rings, cqes); + p->cq_off.cqes = (char *)ctx->cqes - (char *)ctx->rings; p->cq_off.flags = offsetof(struct io_rings, cq_flags);
p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP |
Introduce compat versions of the structs exposed in the uAPI headers that might contain pointers as a member. Also, implement functions that convert the compat versions to the native versions of the struct.
A subsequent patch is going to change the io_uring structs to enable them to support new architectures. On such architectures, the current struct layout still needs to be supported for compat tasks.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/io_uring_types.h | 140 ++++++++++++++++++- io_uring/cancel.c | 38 ++++- io_uring/fdinfo.c | 55 +++++++- io_uring/io_uring.c | 248 +++++++++++++++++++++++---------- io_uring/io_uring.h | 122 +++++++++++++--- io_uring/kbuf.c | 108 ++++++++++++-- io_uring/kbuf.h | 6 +- io_uring/rsrc.c | 125 +++++++++++++++-- io_uring/tctx.c | 57 ++++++-- io_uring/uring_cmd.h | 7 + 10 files changed, 777 insertions(+), 129 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 440179029a8f0..737bc1aa67306 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -7,6 +7,130 @@ #include <linux/llist.h> #include <uapi/linux/io_uring.h>
+struct compat_io_uring_sqe { + __u8 opcode; + __u8 flags; + __u16 ioprio; + __s32 fd; + union { + __u64 off; + __u64 addr2; + struct { + __u32 cmd_op; + __u32 __pad1; + }; + }; + union { + __u64 addr; + __u64 splice_off_in; + }; + __u32 len; + union { + __kernel_rwf_t rw_flags; + __u32 fsync_flags; + __u16 poll_events; + __u32 poll32_events; + __u32 sync_range_flags; + __u32 msg_flags; + __u32 timeout_flags; + __u32 accept_flags; + __u32 cancel_flags; + __u32 open_flags; + __u32 statx_flags; + __u32 fadvise_advice; + __u32 splice_flags; + __u32 rename_flags; + __u32 unlink_flags; + __u32 hardlink_flags; + __u32 xattr_flags; + __u32 msg_ring_flags; + __u32 uring_cmd_flags; + }; + __u64 user_data; + union { + __u16 buf_index; + __u16 buf_group; + } __packed; + __u16 personality; + union { + __s32 splice_fd_in; + __u32 file_index; + struct { + __u16 addr_len; + __u16 __pad3[1]; + }; + }; + union { + struct { + __u64 addr3; + __u64 __pad2[1]; + }; + __u8 cmd[0]; + }; +}; + +struct compat_io_uring_cqe { + __u64 user_data; + __s32 res; + __u32 flags; + __u64 big_cqe[]; +}; + +struct compat_io_uring_files_update { + __u32 offset; + __u32 resv; + __aligned_u64 fds; +}; + +struct compat_io_uring_rsrc_register { + __u32 nr; + __u32 flags; + __u64 resv2; + __aligned_u64 data; + __aligned_u64 tags; +}; + +struct compat_io_uring_rsrc_update { + __u32 offset; + __u32 resv; + __aligned_u64 data; +}; + +struct compat_io_uring_rsrc_update2 { + __u32 offset; + __u32 resv; + __aligned_u64 data; + __aligned_u64 tags; + __u32 nr; + __u32 resv2; +}; + +struct compat_io_uring_buf { + __u64 addr; + __u32 len; + __u16 bid; + __u16 resv; +}; + +struct compat_io_uring_buf_ring { + union { + struct { + __u64 resv1; + __u32 resv2; + __u16 resv3; + __u16 tail; + }; + struct compat_io_uring_buf bufs[0]; + }; +}; + +struct compat_io_uring_getevents_arg { + __u64 sigmask; + __u32 sigmask_sz; + __u32 pad; + __u64 ts; +}; + struct io_wq_work_node { struct io_wq_work_node *next; }; @@ -216,7 +340,11 @@ struct io_ring_ctx { * array. */ u32 *sq_array; - struct io_uring_sqe *sq_sqes; + + union { + struct compat_io_uring_sqe *sq_sqes_compat; + struct io_uring_sqe *sq_sqes; + }; unsigned cached_sq_head; unsigned sq_entries;
@@ -271,7 +399,10 @@ struct io_ring_ctx { * produced, so the application is allowed to modify pending * entries. */ - struct io_uring_cqe *cqes; + union { + struct compat_io_uring_cqe *cqes_compat; + struct io_uring_cqe *cqes; + };
/* * We cache a range of free CQEs we can use, once exhausted it @@ -581,7 +712,10 @@ struct io_kiocb {
struct io_overflow_cqe { struct list_head list; - struct io_uring_cqe cqe; + union { + struct compat_io_uring_cqe compat_cqe; + struct io_uring_cqe cqe; + }; };
#endif diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 2291a53cdabd1..befcb4aeae914 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -27,6 +27,42 @@ struct io_cancel { #define CANCEL_FLAGS (IORING_ASYNC_CANCEL_ALL | IORING_ASYNC_CANCEL_FD | \ IORING_ASYNC_CANCEL_ANY | IORING_ASYNC_CANCEL_FD_FIXED)
+struct compat_io_uring_sync_cancel_reg { + __u64 addr; + __s32 fd; + __u32 flags; + struct __kernel_timespec timeout; + __u64 pad[4]; +}; + +#ifdef CONFIG_COMPAT64 +static int get_compat_io_uring_sync_cancel_reg(struct io_uring_sync_cancel_reg *sc, + const void __user *user_sc) +{ + struct compat_io_uring_sync_cancel_reg compat_sc; + + if (unlikely(copy_from_user(&compat_sc, user_sc, sizeof(compat_sc)))) + return -EFAULT; + sc->addr = compat_sc.addr; + sc->fd = compat_sc.fd; + sc->flags = compat_sc.flags; + sc->timeout = compat_sc.timeout; + memcpy(sc->pad, compat_sc.pad, sizeof(sc->pad)); + return 0; +} +#endif + +static int copy_io_uring_sync_cancel_reg_from_user(struct io_ring_ctx *ctx, + struct io_uring_sync_cancel_reg *sc, + const void __user *arg) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) + return get_compat_io_uring_sync_cancel_reg(sc, arg); +#endif + return copy_from_user(sc, arg, sizeof(sc)); +} + static bool io_cancel_cb(struct io_wq_work *work, void *data) { struct io_kiocb *req = container_of(work, struct io_kiocb, work); @@ -243,7 +279,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) DEFINE_WAIT(wait); int ret;
- if (copy_from_user(&sc, arg, sizeof(sc))) + if (copy_io_uring_sync_cancel_reg_from_user(ctx, &sc, arg)) return -EFAULT; if (sc.flags & ~CANCEL_FLAGS) return -EINVAL; diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index bc8c9d764bc13..c5bd669081c98 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -89,12 +89,25 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, for (i = 0; i < sq_entries; i++) { unsigned int entry = i + sq_head; struct io_uring_sqe *sqe; - unsigned int sq_idx; + unsigned int sq_idx, sq_off; +#ifdef CONFIG_COMPAT64 + struct io_uring_sqe *native_sqe = NULL; +#endif
sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]); if (sq_idx > sq_mask) continue; - sqe = &ctx->sq_sqes[sq_idx << sq_shift]; + sq_off = sq_idx << sq_shift; + sqe = ctx->compat ? (void *)&ctx->sq_sqes_compat[sq_off] : &ctx->sq_sqes[sq_off]; +#ifdef CONFIG_COMPAT64 + if (ctx->compat) { + native_sqe = kmalloc(sizeof(struct io_uring_sqe) << sq_shift, GFP_KERNEL); + convert_compat_io_uring_sqe(ctx, native_sqe, + (struct compat_io_uring_sqe *)sqe); + sqe = native_sqe; + } +#endif + seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, " "addr:0x%llx, rw_flags:0x%x, buf_index:%d " "user_data:%llu", @@ -104,7 +117,8 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, sqe->buf_index, sqe->user_data); if (sq_shift) { u64 *sqeb = (void *) (sqe + 1); - int size = sizeof(struct io_uring_sqe) / sizeof(u64); + int size = (ctx->compat ? sizeof(struct compat_io_uring_sqe) + : sizeof(struct io_uring_sqe)) / sizeof(u64); int j;
for (j = 0; j < size; j++) { @@ -114,12 +128,29 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, } } seq_printf(m, "\n"); +#ifdef CONFIG_COMPAT64 + kfree(native_sqe); +#endif } seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head); cq_entries = min(cq_tail - cq_head, ctx->cq_entries); for (i = 0; i < cq_entries; i++) { unsigned int entry = i + cq_head; - struct io_uring_cqe *cqe = &ctx->cqes[(entry & cq_mask) << cq_shift]; + struct io_uring_cqe *cqe; + unsigned int cq_off = (entry & cq_mask) << cq_shift; +#ifdef CONFIG_COMPAT64 + struct io_uring_cqe *native_cqe = NULL; +#endif + + cqe = ctx->compat ? (void *)&ctx->cqes_compat[cq_off] : &ctx->cqes[cq_off]; +#ifdef CONFIG_COMPAT64 + if (ctx->compat) { + native_cqe = kmalloc(sizeof(struct io_uring_cqe) << cq_shift, GFP_KERNEL); + convert_compat_io_uring_cqe(ctx, native_cqe, + (struct compat_io_uring_cqe *)cqe); + cqe = native_cqe; + } +#endif
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", entry & cq_mask, cqe->user_data, cqe->res, @@ -128,6 +159,9 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, seq_printf(m, ", extra1:%llu, extra2:%llu\n", cqe->big_cqe[0], cqe->big_cqe[1]); seq_printf(m, "\n"); +#ifdef CONFIG_COMPAT64 + kfree(native_cqe); +#endif }
/* @@ -189,10 +223,23 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, spin_lock(&ctx->completion_lock); list_for_each_entry(ocqe, &ctx->cq_overflow_list, list) { struct io_uring_cqe *cqe = &ocqe->cqe; +#ifdef CONFIG_COMPAT64 + struct io_uring_cqe *native_cqe = NULL; + + if (ctx->compat) { + native_cqe = kmalloc(sizeof(struct io_uring_cqe) << cq_shift, GFP_KERNEL); + convert_compat_io_uring_cqe(ctx, native_cqe, + (struct compat_io_uring_cqe *)cqe); + cqe = native_cqe; + } +#endif
seq_printf(m, " user_data=%llu, res=%d, flags=%x\n", cqe->user_data, cqe->res, cqe->flags);
+#ifdef CONFIG_COMPAT64 + kfree(native_cqe); +#endif }
spin_unlock(&ctx->completion_lock); diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 707229ae04dc8..2bd48cff83c3f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -152,6 +152,43 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
static struct kmem_cache *req_cachep;
+#ifdef CONFIG_COMPAT64 +static int get_compat_io_uring_getevents_arg(struct io_uring_getevents_arg *arg, + const void __user *user_arg) +{ + struct compat_io_uring_getevents_arg compat_arg; + + if (unlikely(copy_from_user(&compat_arg, user_arg, sizeof(compat_arg)))) + return -EFAULT; + arg->sigmask = (__kernel_uintptr_t)compat_ptr(compat_arg.sigmask); + arg->sigmask_sz = compat_arg.sigmask_sz; + arg->pad = compat_arg.pad; + arg->ts = (__kernel_uintptr_t)compat_ptr(compat_arg.ts); + return 0; +} +#endif /* CONFIG_COMPAT64 */ + +static int copy_io_uring_getevents_arg_from_user(struct io_ring_ctx *ctx, + struct io_uring_getevents_arg *arg, + const void __user *argp, + size_t size) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) { + if (size != sizeof(struct compat_io_uring_getevents_arg)) + return -EINVAL; + if (get_compat_io_uring_getevents_arg(arg, argp)) + return -EFAULT; + return 0; + } +#endif + if (size != sizeof(arg)) + return -EINVAL; + if (copy_from_user(arg, argp, sizeof(arg))) + return -EFAULT; + return 0; +} + struct sock *io_uring_get_socket(struct file *file) { #if defined(CONFIG_UNIX) @@ -604,7 +641,9 @@ void io_cq_unlock_post(struct io_ring_ctx *ctx) static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { bool all_flushed; - size_t cqe_size = sizeof(struct io_uring_cqe); + size_t cqe_size = ctx->compat ? + sizeof(struct compat_io_uring_cqe) : + sizeof(struct io_uring_cqe);
if (!force && __io_cqring_events(ctx) == ctx->cq_entries) return false; @@ -741,10 +780,11 @@ bool io_req_cqe_overflow(struct io_kiocb *req) * control dependency is enough as we're using WRITE_ONCE to * fill the cq entry */ -struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) +void *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) { unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1); unsigned int free, queued, len; + void *cqe;
/* * Posting into the CQ when there are pending overflowed CQEs may break @@ -767,14 +807,15 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) len <<= 1; }
- ctx->cqe_cached = &ctx->cqes[off]; + cqe = ctx->compat ? (void *)&ctx->cqes_compat[off] : (void *)&ctx->cqes[off]; + ctx->cqe_cached = cqe; ctx->cqe_sentinel = ctx->cqe_cached + len;
ctx->cached_cq_tail++; ctx->cqe_cached++; if (ctx->flags & IORING_SETUP_CQE32) ctx->cqe_cached++; - return &ctx->cqes[off]; + return cqe; }
bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, @@ -793,14 +834,7 @@ bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags if (likely(cqe)) { trace_io_uring_complete(ctx, NULL, user_data, res, cflags, 0, 0);
- WRITE_ONCE(cqe->user_data, user_data); - WRITE_ONCE(cqe->res, res); - WRITE_ONCE(cqe->flags, cflags); - - if (ctx->flags & IORING_SETUP_CQE32) { - WRITE_ONCE(cqe->big_cqe[0], 0); - WRITE_ONCE(cqe->big_cqe[1], 0); - } + __io_fill_cqe_any(ctx, cqe, user_data, res, cflags, 0, 0); return true; }
@@ -2222,7 +2256,7 @@ static void io_commit_sqring(struct io_ring_ctx *ctx) * used, it's important that those reads are done through READ_ONCE() to * prevent a re-load down the line. */ -static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) +static const void *io_get_sqe(struct io_ring_ctx *ctx) { unsigned head, mask = ctx->sq_entries - 1; unsigned sq_idx = ctx->cached_sq_head++ & mask; @@ -2240,7 +2274,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) /* double index for 128-byte SQEs, twice as long */ if (ctx->flags & IORING_SETUP_SQE128) head <<= 1; - return &ctx->sq_sqes[head]; + return ctx->compat ? (void *)&ctx->sq_sqes_compat[head] + : (void *)&ctx->sq_sqes[head]; }
/* drop invalid entries */ @@ -2265,8 +2300,11 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) io_submit_state_start(&ctx->submit_state, left);
do { - const struct io_uring_sqe *sqe; + const void *sqe; struct io_kiocb *req; +#ifdef CONFIG_COMPAT64 + struct io_uring_sqe native_sqe; +#endif
if (unlikely(!io_alloc_req_refill(ctx))) break; @@ -2276,6 +2314,12 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) io_req_add_to_cache(req, ctx); break; } +#ifdef CONFIG_COMPAT64 + if (ctx->compat) { + convert_compat_io_uring_sqe(ctx, &native_sqe, sqe); + sqe = &native_sqe; + } +#endif
/* * Continue submitting even for sqe failure if the @@ -2480,6 +2524,9 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries { struct io_rings *rings; size_t off, cq_array_size, sq_array_size; + size_t cqe_size = ctx->compat ? + sizeof(struct compat_io_uring_cqe) : + sizeof(struct io_uring_cqe);
off = sizeof(*rings);
@@ -2492,7 +2539,7 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries if (cq_offset) *cq_offset = off;
- cq_array_size = array_size(sizeof(struct io_uring_cqe), cq_entries); + cq_array_size = array_size(cqe_size, cq_entries); if (cq_array_size == SIZE_MAX) return SIZE_MAX;
@@ -3120,20 +3167,22 @@ static unsigned long io_uring_nommu_get_unmapped_area(struct file *file,
#endif /* !CONFIG_MMU */
-static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz) +static int io_validate_ext_arg(struct io_ring_ctx *ctx, unsigned int flags, + const void __user *argp, size_t argsz) { if (flags & IORING_ENTER_EXT_ARG) { struct io_uring_getevents_arg arg; + int ret;
- if (argsz != sizeof(arg)) - return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) - return -EFAULT; + ret = copy_io_uring_getevents_arg_from_user(ctx, &arg, argp, argsz); + if (ret) + return ret; } return 0; }
-static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz, +static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned int flags, + const void __user *argp, size_t *argsz, #ifdef CONFIG_CHERI_PURECAP_UABI struct __kernel_timespec * __capability *ts, const sigset_t * __capability *sig) @@ -3143,6 +3192,7 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz #endif { struct io_uring_getevents_arg arg; + int ret;
/* * If EXT_ARG isn't set, then we have no timespec and the argp pointer @@ -3158,10 +3208,9 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz * EXT_ARG is set - ensure we agree on the size of it and copy in our * timespec and sigset_t pointers if good. */ - if (*argsz != sizeof(arg)) - return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) - return -EFAULT; + ret = copy_io_uring_getevents_arg_from_user(ctx, &arg, argp, *argsz); + if (ret) + return ret; if (arg.pad) return -EINVAL; *sig = u64_to_user_ptr(arg.sigmask); @@ -3268,7 +3317,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, */ mutex_lock(&ctx->uring_lock); iopoll_locked: - ret2 = io_validate_ext_arg(flags, argp, argsz); + ret2 = io_validate_ext_arg(ctx, flags, argp, argsz); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); @@ -3279,7 +3328,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, const sigset_t __user *sig; struct __kernel_timespec __user *ts;
- ret2 = io_get_ext_arg(flags, argp, &argsz, &ts, &sig); + ret2 = io_get_ext_arg(ctx, flags, argp, &argsz, &ts, &sig); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); @@ -3329,6 +3378,9 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, { struct io_rings *rings; size_t size, cqes_offset, sq_array_offset; + size_t sqe_size = ctx->compat ? + sizeof(struct compat_io_uring_sqe) : + sizeof(struct io_uring_sqe);
/* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; @@ -3351,9 +3403,9 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, rings->cq_ring_entries = p->cq_entries;
if (p->flags & IORING_SETUP_SQE128) - size = array_size(2 * sizeof(struct io_uring_sqe), p->sq_entries); + size = array_size(2 * sqe_size, p->sq_entries); else - size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); + size = array_size(sqe_size, p->sq_entries); if (size == SIZE_MAX) { io_mem_free(ctx->rings); ctx->rings = NULL; @@ -4107,48 +4159,45 @@ static int __init io_uring_init(void) #define BUILD_BUG_SQE_ELEM_SIZE(eoffset, esize, ename) \ __BUILD_BUG_VERIFY_OFFSET_SIZE(struct io_uring_sqe, eoffset, esize, ename) BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 64); - BUILD_BUG_SQE_ELEM(0, __u8, opcode); - BUILD_BUG_SQE_ELEM(1, __u8, flags); - BUILD_BUG_SQE_ELEM(2, __u16, ioprio); - BUILD_BUG_SQE_ELEM(4, __s32, fd); - BUILD_BUG_SQE_ELEM(8, __u64, off); - BUILD_BUG_SQE_ELEM(8, __u64, addr2); - BUILD_BUG_SQE_ELEM(8, __u32, cmd_op); + BUILD_BUG_SQE_ELEM(0, __u8, opcode); + BUILD_BUG_SQE_ELEM(1, __u8, flags); + BUILD_BUG_SQE_ELEM(2, __u16, ioprio); + BUILD_BUG_SQE_ELEM(4, __s32, fd); + BUILD_BUG_SQE_ELEM(8, __u64, off); + BUILD_BUG_SQE_ELEM(8, __u64, addr2); + BUILD_BUG_SQE_ELEM(8, __u32, cmd_op); BUILD_BUG_SQE_ELEM(12, __u32, __pad1); - BUILD_BUG_SQE_ELEM(16, __u64, addr); - BUILD_BUG_SQE_ELEM(16, __u64, splice_off_in); - BUILD_BUG_SQE_ELEM(24, __u32, len); - BUILD_BUG_SQE_ELEM(28, __kernel_rwf_t, rw_flags); - BUILD_BUG_SQE_ELEM(28, /* compat */ int, rw_flags); - BUILD_BUG_SQE_ELEM(28, /* compat */ __u32, rw_flags); - BUILD_BUG_SQE_ELEM(28, __u32, fsync_flags); - BUILD_BUG_SQE_ELEM(28, /* compat */ __u16, poll_events); - BUILD_BUG_SQE_ELEM(28, __u32, poll32_events); - BUILD_BUG_SQE_ELEM(28, __u32, sync_range_flags); - BUILD_BUG_SQE_ELEM(28, __u32, msg_flags); - BUILD_BUG_SQE_ELEM(28, __u32, timeout_flags); - BUILD_BUG_SQE_ELEM(28, __u32, accept_flags); - BUILD_BUG_SQE_ELEM(28, __u32, cancel_flags); - BUILD_BUG_SQE_ELEM(28, __u32, open_flags); - BUILD_BUG_SQE_ELEM(28, __u32, statx_flags); - BUILD_BUG_SQE_ELEM(28, __u32, fadvise_advice); - BUILD_BUG_SQE_ELEM(28, __u32, splice_flags); - BUILD_BUG_SQE_ELEM(28, __u32, rename_flags); - BUILD_BUG_SQE_ELEM(28, __u32, unlink_flags); - BUILD_BUG_SQE_ELEM(28, __u32, hardlink_flags); - BUILD_BUG_SQE_ELEM(28, __u32, xattr_flags); - BUILD_BUG_SQE_ELEM(28, __u32, msg_ring_flags); - BUILD_BUG_SQE_ELEM(32, __u64, user_data); - BUILD_BUG_SQE_ELEM(40, __u16, buf_index); - BUILD_BUG_SQE_ELEM(40, __u16, buf_group); - BUILD_BUG_SQE_ELEM(42, __u16, personality); - BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in); - BUILD_BUG_SQE_ELEM(44, __u32, file_index); - BUILD_BUG_SQE_ELEM(44, __u16, addr_len); - BUILD_BUG_SQE_ELEM(46, __u16, __pad3[0]); - BUILD_BUG_SQE_ELEM(48, __u64, addr3); + BUILD_BUG_SQE_ELEM(16, __u64, addr); + BUILD_BUG_SQE_ELEM(16, __u64, splice_off_in); + BUILD_BUG_SQE_ELEM(24, __u32, len); + BUILD_BUG_SQE_ELEM(28, __kernel_rwf_t, rw_flags); + BUILD_BUG_SQE_ELEM(28, __u32, fsync_flags); + BUILD_BUG_SQE_ELEM(28, __u32, poll32_events); + BUILD_BUG_SQE_ELEM(28, __u32, sync_range_flags); + BUILD_BUG_SQE_ELEM(28, __u32, msg_flags); + BUILD_BUG_SQE_ELEM(28, __u32, timeout_flags); + BUILD_BUG_SQE_ELEM(28, __u32, accept_flags); + BUILD_BUG_SQE_ELEM(28, __u32, cancel_flags); + BUILD_BUG_SQE_ELEM(28, __u32, open_flags); + BUILD_BUG_SQE_ELEM(28, __u32, statx_flags); + BUILD_BUG_SQE_ELEM(28, __u32, fadvise_advice); + BUILD_BUG_SQE_ELEM(28, __u32, splice_flags); + BUILD_BUG_SQE_ELEM(28, __u32, rename_flags); + BUILD_BUG_SQE_ELEM(28, __u32, unlink_flags); + BUILD_BUG_SQE_ELEM(28, __u32, hardlink_flags); + BUILD_BUG_SQE_ELEM(28, __u32, xattr_flags); + BUILD_BUG_SQE_ELEM(28, __u32, msg_ring_flags); + BUILD_BUG_SQE_ELEM(32, __u64, user_data); + BUILD_BUG_SQE_ELEM(40, __u16, buf_index); + BUILD_BUG_SQE_ELEM(40, __u16, buf_group); + BUILD_BUG_SQE_ELEM(42, __u16, personality); + BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in); + BUILD_BUG_SQE_ELEM(44, __u32, file_index); + BUILD_BUG_SQE_ELEM(44, __u16, addr_len); + BUILD_BUG_SQE_ELEM(46, __u16, __pad3[0]); + BUILD_BUG_SQE_ELEM(48, __u64, addr3); BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd); - BUILD_BUG_SQE_ELEM(56, __u64, __pad2); + BUILD_BUG_SQE_ELEM(56, __u64, __pad2);
BUILD_BUG_ON(sizeof(struct io_uring_files_update) != sizeof(struct io_uring_rsrc_update)); @@ -4160,6 +4209,65 @@ static int __init io_uring_init(void) BUILD_BUG_ON(offsetof(struct io_uring_buf, resv) != offsetof(struct io_uring_buf_ring, tail));
+#ifdef CONFIG_COMPAT64 +#define BUILD_BUG_COMPAT_SQE_ELEM(eoffset, etype, ename) \ + __BUILD_BUG_VERIFY_OFFSET_SIZE(struct compat_io_uring_sqe, eoffset, sizeof(etype), ename) +#define BUILD_BUG_COMPAT_SQE_ELEM_SIZE(eoffset, esize, ename) \ + __BUILD_BUG_VERIFY_OFFSET_SIZE(struct compat_io_uring_sqe, eoffset, esize, ename) + BUILD_BUG_ON(sizeof(struct compat_io_uring_sqe) != 64); + BUILD_BUG_COMPAT_SQE_ELEM(0, __u8, opcode); + BUILD_BUG_COMPAT_SQE_ELEM(1, __u8, flags); + BUILD_BUG_COMPAT_SQE_ELEM(2, __u16, ioprio); + BUILD_BUG_COMPAT_SQE_ELEM(4, __s32, fd); + BUILD_BUG_COMPAT_SQE_ELEM(8, __u64, off); + BUILD_BUG_COMPAT_SQE_ELEM(8, __u64, addr2); + BUILD_BUG_COMPAT_SQE_ELEM(8, __u32, cmd_op); + BUILD_BUG_COMPAT_SQE_ELEM(12, __u32, __pad1); + BUILD_BUG_COMPAT_SQE_ELEM(16, __u64, addr); + BUILD_BUG_COMPAT_SQE_ELEM(16, __u64, splice_off_in); + BUILD_BUG_COMPAT_SQE_ELEM(24, __u32, len); + BUILD_BUG_COMPAT_SQE_ELEM(28, __kernel_rwf_t, rw_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, /* compat */ int, rw_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, /* compat */ __u32, rw_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, fsync_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, /* compat */ __u16, poll_events); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, poll32_events); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, sync_range_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, msg_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, timeout_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, accept_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, cancel_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, open_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, statx_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, fadvise_advice); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, splice_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, rename_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, unlink_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, hardlink_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, xattr_flags); + BUILD_BUG_COMPAT_SQE_ELEM(28, __u32, msg_ring_flags); + BUILD_BUG_COMPAT_SQE_ELEM(32, __u64, user_data); + BUILD_BUG_COMPAT_SQE_ELEM(40, __u16, buf_index); + BUILD_BUG_COMPAT_SQE_ELEM(40, __u16, buf_group); + BUILD_BUG_COMPAT_SQE_ELEM(42, __u16, personality); + BUILD_BUG_COMPAT_SQE_ELEM(44, __s32, splice_fd_in); + BUILD_BUG_COMPAT_SQE_ELEM(44, __u32, file_index); + BUILD_BUG_COMPAT_SQE_ELEM(44, __u16, addr_len); + BUILD_BUG_COMPAT_SQE_ELEM(46, __u16, __pad3[0]); + BUILD_BUG_COMPAT_SQE_ELEM(48, __u64, addr3); + BUILD_BUG_COMPAT_SQE_ELEM_SIZE(48, 0, cmd); + BUILD_BUG_COMPAT_SQE_ELEM(56, __u64, __pad2); + + BUILD_BUG_ON(sizeof(struct compat_io_uring_files_update) != + sizeof(struct compat_io_uring_rsrc_update)); + BUILD_BUG_ON(sizeof(struct compat_io_uring_rsrc_update) > + sizeof(struct compat_io_uring_rsrc_update2)); + + BUILD_BUG_ON(offsetof(struct compat_io_uring_buf_ring, bufs) != 0); + BUILD_BUG_ON(offsetof(struct compat_io_uring_buf, resv) != + offsetof(struct compat_io_uring_buf_ring, tail)); +#endif /* CONFIG_COMPAT64 */ + /* should fit into one byte */ BUILD_BUG_ON(SQE_VALID_FLAGS >= (1 << 8)); BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 50bc3af449534..fb2711770bfb0 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -5,6 +5,7 @@ #include <linux/lockdep.h> #include <linux/io_uring_types.h> #include "io-wq.h" +#include "uring_cmd.h" #include "slist.h" #include "filetable.h"
@@ -24,7 +25,7 @@ enum { IOU_STOP_MULTISHOT = -ECANCELED, };
-struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow); +void *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow); bool io_req_cqe_overflow(struct io_kiocb *req); int io_run_task_work_sig(struct io_ring_ctx *ctx); int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked); @@ -93,8 +94,67 @@ static inline void io_cq_lock(struct io_ring_ctx *ctx)
void io_cq_unlock_post(struct io_ring_ctx *ctx);
-static inline struct io_uring_cqe *io_get_cqe_overflow(struct io_ring_ctx *ctx, - bool overflow) +#ifdef CONFIG_COMPAT64 +static inline void convert_compat_io_uring_cqe(struct io_ring_ctx *ctx, + struct io_uring_cqe *cqe, + const struct compat_io_uring_cqe *compat_cqe) +{ + cqe->user_data = READ_ONCE(compat_cqe->user_data); + cqe->res = READ_ONCE(compat_cqe->res); + cqe->flags = READ_ONCE(compat_cqe->flags); + + if (ctx->flags & IORING_SETUP_CQE32) { + cqe->big_cqe[0] = READ_ONCE(compat_cqe->big_cqe[0]); + cqe->big_cqe[1] = READ_ONCE(compat_cqe->big_cqe[1]); + } +} + +static inline void convert_compat_io_uring_sqe(struct io_ring_ctx *ctx, + struct io_uring_sqe *sqe, + const struct compat_io_uring_sqe *compat_sqe) +{ +/* + * The struct io_uring_sqe contains anonymous unions and there is no field + * keeping track of which union's member is active. Because in all the cases, + * the unions are between integral types and the types are compatible, use the + * largest member of each union to perform the copy. Use this compile-time check + * to ensure that the union's members are not truncated during the conversion. + */ +#define BUILD_BUG_COMPAT_SQE_UNION_ELEM(elem1, elem2) \ + BUILD_BUG_ON(sizeof_field(struct compat_io_uring_sqe, elem1) != \ + (offsetof(struct compat_io_uring_sqe, elem2) - \ + offsetof(struct compat_io_uring_sqe, elem1))) + + sqe->opcode = READ_ONCE(compat_sqe->opcode); + sqe->flags = READ_ONCE(compat_sqe->flags); + sqe->ioprio = READ_ONCE(compat_sqe->ioprio); + sqe->fd = READ_ONCE(compat_sqe->fd); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr2, addr); + sqe->addr2 = READ_ONCE(compat_sqe->addr2); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr, len); + sqe->addr = READ_ONCE(compat_sqe->addr); + sqe->len = READ_ONCE(compat_sqe->len); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(rw_flags, user_data); + sqe->rw_flags = READ_ONCE(compat_sqe->rw_flags); + sqe->user_data = READ_ONCE(compat_sqe->user_data); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(buf_index, personality); + sqe->buf_index = READ_ONCE(compat_sqe->buf_index); + sqe->personality = READ_ONCE(compat_sqe->personality); + BUILD_BUG_COMPAT_SQE_UNION_ELEM(splice_fd_in, addr3); + sqe->splice_fd_in = READ_ONCE(compat_sqe->splice_fd_in); + if (sqe->opcode == IORING_OP_URING_CMD) { + size_t compat_cmd_size = compat_uring_cmd_pdu_size(ctx->flags & + IORING_SETUP_SQE128); + + memcpy(sqe->cmd, compat_sqe->cmd, compat_cmd_size); + } else + sqe->addr3 = READ_ONCE(compat_sqe->addr3); +#undef BUILD_BUG_COMPAT_SQE_UNION_ELEM +} +#endif /* CONFIG_COMPAT64 */ + +static inline void *io_get_cqe_overflow(struct io_ring_ctx *ctx, + bool overflow) { if (likely(ctx->cqe_cached < ctx->cqe_sentinel)) { struct io_uring_cqe *cqe = ctx->cqe_cached; @@ -109,15 +169,46 @@ static inline struct io_uring_cqe *io_get_cqe_overflow(struct io_ring_ctx *ctx, return __io_get_cqe(ctx, overflow); }
-static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) +static inline void *io_get_cqe(struct io_ring_ctx *ctx) { return io_get_cqe_overflow(ctx, false); }
+static inline void __io_fill_cqe_any(struct io_ring_ctx *ctx, struct io_uring_cqe *cqe, + u64 user_data, s32 res, u32 cflags, + u64 extra1, u64 extra2) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) { + struct compat_io_uring_cqe *compat_cqe = (struct compat_io_uring_cqe *)cqe; + + WRITE_ONCE(compat_cqe->user_data, user_data); + WRITE_ONCE(compat_cqe->res, res); + WRITE_ONCE(compat_cqe->flags, cflags); + + if (ctx->flags & IORING_SETUP_CQE32) { + WRITE_ONCE(compat_cqe->big_cqe[0], extra1); + WRITE_ONCE(compat_cqe->big_cqe[1], extra2); + } + return; + } +#endif + WRITE_ONCE(cqe->user_data, user_data); + WRITE_ONCE(cqe->res, res); + WRITE_ONCE(cqe->flags, cflags); + + if (ctx->flags & IORING_SETUP_CQE32) { + WRITE_ONCE(cqe->big_cqe[0], extra1); + WRITE_ONCE(cqe->big_cqe[1], extra2); + } +} + static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx, struct io_kiocb *req) { struct io_uring_cqe *cqe; + u64 extra1 = 0; + u64 extra2 = 0;
/* * If we can't get a cq entry, userspace overflowed the @@ -128,24 +219,17 @@ static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx, if (unlikely(!cqe)) return io_req_cqe_overflow(req);
+ if (ctx->flags & IORING_SETUP_CQE32 && req->flags & REQ_F_CQE32_INIT) { + extra1 = req->extra1; + extra2 = req->extra2; + } + trace_io_uring_complete(req->ctx, req, req->cqe.user_data, req->cqe.res, req->cqe.flags, - (req->flags & REQ_F_CQE32_INIT) ? req->extra1 : 0, - (req->flags & REQ_F_CQE32_INIT) ? req->extra2 : 0); + extra1, extra2);
- memcpy(cqe, &req->cqe, sizeof(*cqe)); - - if (ctx->flags & IORING_SETUP_CQE32) { - u64 extra1 = 0, extra2 = 0; - - if (req->flags & REQ_F_CQE32_INIT) { - extra1 = req->extra1; - extra2 = req->extra2; - } - - WRITE_ONCE(cqe->big_cqe[0], extra1); - WRITE_ONCE(cqe->big_cqe[1], extra2); - } + __io_fill_cqe_any(ctx, cqe, req->cqe.user_data, req->cqe.res, + req->cqe.flags, extra1, extra2); return true; }
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index e2c46889d5fab..0fec941644260 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -16,6 +16,9 @@ #include "kbuf.h"
#define IO_BUFFER_LIST_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct io_uring_buf)) +#ifdef CONFIG_COMPAT64 +#define IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct compat_io_uring_buf)) +#endif
#define BGID_ARRAY 64
@@ -28,6 +31,42 @@ struct io_provide_buf { __u16 bid; };
+#ifdef CONFIG_COMPAT64 +struct compat_io_uring_buf_reg { + __u64 ring_addr; + __u32 ring_entries; + __u16 bgid; + __u16 pad; + __u64 resv[3]; +}; + +static int get_compat_io_uring_buf_reg(struct io_uring_buf_reg *reg, + const void __user *user_reg) +{ + struct compat_io_uring_buf_reg compat_reg; + + if (unlikely(copy_from_user(&compat_reg, user_reg, sizeof(compat_reg)))) + return -EFAULT; + reg->ring_addr = compat_reg.ring_addr; + reg->ring_entries = compat_reg.ring_entries; + reg->bgid = compat_reg.bgid; + reg->pad = compat_reg.pad; + memcpy(reg->resv, compat_reg.resv, sizeof(reg->resv)); + return 0; +} +#endif + +static int copy_io_uring_buf_reg_from_user(struct io_ring_ctx *ctx, + struct io_uring_buf_reg *reg, + const void __user *arg) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) + return get_compat_io_uring_buf_reg(reg, arg); +#endif + return copy_from_user(reg, arg, sizeof(reg)); +} + static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, unsigned int bgid) { @@ -125,6 +164,41 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len, return NULL; }
+#ifdef CONFIG_COMPAT64 +static void __user *io_ring_buffer_select_compat(struct io_kiocb *req, size_t *len, + struct io_buffer_list *bl, + unsigned int issue_flags) +{ + struct compat_io_uring_buf_ring *br = bl->buf_ring_compat; + struct compat_io_uring_buf *buf; + __u16 head = bl->head; + + if (unlikely(smp_load_acquire(&br->tail) == head)) + return NULL; + + head &= bl->mask; + if (head < IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE) { + buf = &br->bufs[head]; + } else { + int off = head & (IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE - 1); + int index = head / IO_BUFFER_LIST_COMPAT_BUF_PER_PAGE; + buf = page_address(bl->buf_pages[index]); + buf += off; + } + if (*len == 0 || *len > buf->len) + *len = buf->len; + req->flags |= REQ_F_BUFFER_RING; + req->buf_list = bl; + req->buf_index = buf->bid; + + if (issue_flags & IO_URING_F_UNLOCKED || !file_can_poll(req->file)) { + req->buf_list = NULL; + bl->head++; + } + return compat_ptr(buf->addr); +} +#endif /* CONFIG_COMPAT64 */ + static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, struct io_buffer_list *bl, unsigned int issue_flags) @@ -168,6 +242,17 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, return u64_to_user_ptr(buf->addr); }
+static void __user *io_ring_buffer_select_any(struct io_kiocb *req, size_t *len, + struct io_buffer_list *bl, + unsigned int issue_flags) +{ +#ifdef CONFIG_COMPAT64 + if (req->ctx->compat) + return io_ring_buffer_select_compat(req, len, bl, issue_flags); +#endif + return io_ring_buffer_select(req, len, bl, issue_flags); +} + void __user *io_buffer_select(struct io_kiocb *req, size_t *len, unsigned int issue_flags) { @@ -180,7 +265,7 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, bl = io_buffer_get_list(ctx, req->buf_index); if (likely(bl)) { if (bl->buf_nr_pages) - ret = io_ring_buffer_select(req, len, bl, issue_flags); + ret = io_ring_buffer_select_any(req, len, bl, issue_flags); else ret = io_provided_buffer_select(req, len, bl); } @@ -215,9 +300,12 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, return 0;
if (bl->buf_nr_pages) { + __u16 tail = ctx->compat ? + bl->buf_ring_compat->tail : + bl->buf_ring->tail; int j;
- i = bl->buf_ring->tail - bl->head; + i = tail - bl->head; for (j = 0; j < bl->buf_nr_pages; j++) unpin_user_page(bl->buf_pages[j]); kvfree(bl->buf_pages); @@ -469,13 +557,13 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) { - struct io_uring_buf_ring *br; struct io_uring_buf_reg reg; struct io_buffer_list *bl, *free_bl = NULL; struct page **pages; + size_t pages_size; int nr_pages;
- if (copy_from_user(®, arg, sizeof(reg))) + if (copy_io_uring_buf_reg_from_user(ctx, ®, arg)) return -EFAULT;
if (reg.pad || reg.resv[0] || reg.resv[1] || reg.resv[2]) @@ -508,19 +596,19 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) return -ENOMEM; }
- pages = io_pin_pages(reg.ring_addr, - struct_size(br, bufs, reg.ring_entries), - &nr_pages); + pages_size = ctx->compat ? + size_mul(sizeof(struct compat_io_uring_buf), reg.ring_entries) : + size_mul(sizeof(struct io_uring_buf), reg.ring_entries); + pages = io_pin_pages(reg.ring_addr, pages_size, &nr_pages); if (IS_ERR(pages)) { kfree(free_bl); return PTR_ERR(pages); }
- br = page_address(pages[0]); bl->buf_pages = pages; bl->buf_nr_pages = nr_pages; bl->nr_entries = reg.ring_entries; - bl->buf_ring = br; + bl->buf_ring = page_address(pages[0]); bl->mask = reg.ring_entries - 1; io_buffer_add_list(ctx, bl, reg.bgid); return 0; @@ -531,7 +619,7 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) struct io_uring_buf_reg reg; struct io_buffer_list *bl;
- if (copy_from_user(®, arg, sizeof(reg))) + if (copy_io_uring_buf_reg_from_user(ctx, ®, arg)) return -EFAULT; if (reg.pad || reg.resv[0] || reg.resv[1] || reg.resv[2]) return -EINVAL; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index c23e15d7d3caf..1aa5bbbc5d628 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -2,6 +2,7 @@ #ifndef IOU_KBUF_H #define IOU_KBUF_H
+#include <linux/io_uring_types.h> #include <uapi/linux/io_uring.h>
struct io_buffer_list { @@ -13,7 +14,10 @@ struct io_buffer_list { struct list_head buf_list; struct { struct page **buf_pages; - struct io_uring_buf_ring *buf_ring; + union { + struct io_uring_buf_ring *buf_ring; + struct compat_io_uring_buf_ring *buf_ring_compat; + }; }; }; __u16 bgid; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 41e192de9e8a7..7e6428a44d625 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -23,6 +23,106 @@ struct io_rsrc_update { u32 offset; };
+#ifdef CONFIG_COMPAT64 +static int get_compat_io_uring_rsrc_update(struct io_uring_rsrc_update2 *up2, + const void __user *user_up) +{ + struct compat_io_uring_rsrc_update compat_up; + + if (unlikely(copy_from_user(&compat_up, user_up, sizeof(compat_up)))) + return -EFAULT; + up2->offset = compat_up.offset; + up2->resv = compat_up.resv; + up2->data = compat_up.data; + return 0; +} + +static int get_compat_io_uring_rsrc_update2(struct io_uring_rsrc_update2 *up2, + const void __user *user_up2) +{ + struct compat_io_uring_rsrc_update2 compat_up2; + + if (unlikely(copy_from_user(&compat_up2, user_up2, sizeof(compat_up2)))) + return -EFAULT; + up2->offset = compat_up2.offset; + up2->resv = compat_up2.resv; + up2->data = compat_up2.data; + up2->tags = compat_up2.tags; + up2->nr = compat_up2.nr; + up2->resv2 = compat_up2.resv2; + return 0; +} + +static int get_compat_io_uring_rsrc_register(struct io_uring_rsrc_register *rr, + const void __user *user_rr) +{ + struct compat_io_uring_rsrc_register compat_rr; + + if (unlikely(copy_from_user(&compat_rr, user_rr, sizeof(compat_rr)))) + return -EFAULT; + rr->nr = compat_rr.nr; + rr->flags = compat_rr.flags; + rr->resv2 = compat_rr.resv2; + rr->data = compat_rr.data; + rr->tags = compat_rr.tags; + return 0; +} +#endif /* CONFIG_COMPAT64 */ + +static int copy_io_uring_rsrc_update_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_update2 *up2, + const void __user *arg) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) + return get_compat_io_uring_rsrc_update(up2, arg); +#endif + return copy_from_user(up2, arg, sizeof(struct io_uring_rsrc_update)); +} + +static int copy_io_uring_rsrc_update2_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_update2 *up2, + const void __user *arg, + size_t size) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) { + if (size != sizeof(struct compat_io_uring_rsrc_update2)) + return -EINVAL; + if (get_compat_io_uring_rsrc_update2(up2, arg)) + return -EFAULT; + return 0; + } +#endif + if (size != sizeof(up2)) + return -EINVAL; + if (copy_from_user(up2, arg, sizeof(*up2))) + return -EFAULT; + return 0; +} + +static int copy_io_uring_rsrc_register_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_register *rr, + const void __user *arg, + size_t size) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) { + if (size != sizeof(struct compat_io_uring_rsrc_register)) + return -EINVAL; + if (get_compat_io_uring_rsrc_register(rr, arg)) + return -EFAULT; + return 0; + } +#endif + /* keep it extendible */ + if (size != sizeof(rr)) + return -EINVAL; + if (copy_from_user(rr, arg, size)) + return -EFAULT; + return 0; +} + static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, struct io_mapped_ubuf **pimu, struct page **last_hpage); @@ -597,12 +697,14 @@ int io_register_files_update(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args) { struct io_uring_rsrc_update2 up; + int ret;
if (!nr_args) return -EINVAL; memset(&up, 0, sizeof(up)); - if (copy_from_user(&up, arg, sizeof(struct io_uring_rsrc_update))) - return -EFAULT; + ret = copy_io_uring_rsrc_update_from_user(ctx, &up, arg); + if (ret) + return ret; if (up.resv || up.resv2) return -EINVAL; return __io_register_rsrc_update(ctx, IORING_RSRC_FILE, &up, nr_args); @@ -612,11 +714,11 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg, unsigned size, unsigned type) { struct io_uring_rsrc_update2 up; + int ret;
- if (size != sizeof(up)) - return -EINVAL; - if (copy_from_user(&up, arg, sizeof(up))) - return -EFAULT; + ret = copy_io_uring_rsrc_update2_from_user(ctx, &up, arg, size); + if (ret) + return ret; if (!up.nr || up.resv || up.resv2) return -EINVAL; return __io_register_rsrc_update(ctx, type, &up, up.nr); @@ -626,14 +728,11 @@ __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, unsigned int size, unsigned int type) { struct io_uring_rsrc_register rr; + int ret;
- /* keep it extendible */ - if (size != sizeof(rr)) - return -EINVAL; - - memset(&rr, 0, sizeof(rr)); - if (copy_from_user(&rr, arg, size)) - return -EFAULT; + ret = copy_io_uring_rsrc_register_from_user(ctx, &rr, arg, size); + if (ret) + return ret; if (!rr.nr || rr.resv2) return -EINVAL; if (rr.flags & ~IORING_RSRC_REGISTER_SPARSE) diff --git a/io_uring/tctx.c b/io_uring/tctx.c index 96f77450cf4e2..6ab9916ed3844 100644 --- a/io_uring/tctx.c +++ b/io_uring/tctx.c @@ -12,6 +12,32 @@ #include "io_uring.h" #include "tctx.h"
+#ifdef CONFIG_COMPAT64 +static int get_compat_io_uring_rsrc_update(struct io_uring_rsrc_update *up, + const void __user *user_up) +{ + struct compat_io_uring_rsrc_update compat_up; + + if (unlikely(copy_from_user(&compat_up, user_up, sizeof(compat_up)))) + return -EFAULT; + up->offset = compat_up.offset; + up->resv = compat_up.resv; + up->data = compat_up.data; + return 0; +} +#endif /* CONFIG_COMPAT64 */ + +static int copy_io_uring_rsrc_update_from_user(struct io_ring_ctx *ctx, + struct io_uring_rsrc_update *up, + const void __user *arg) +{ +#ifdef CONFIG_COMPAT64 + if (ctx->compat) + return get_compat_io_uring_rsrc_update(up, arg); +#endif + return copy_from_user(up, arg, sizeof(struct io_uring_rsrc_update)); +} + static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx, struct task_struct *task) { @@ -244,8 +270,6 @@ static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd, int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, unsigned nr_args) { - struct io_uring_rsrc_update __user *arg = __arg; - struct io_uring_rsrc_update reg; struct io_uring_task *tctx; int ret, i;
@@ -260,9 +284,17 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
tctx = current->io_uring; for (i = 0; i < nr_args; i++) { + void __user *arg; + __u32 __user *arg_offset; + struct io_uring_rsrc_update reg; int start, end;
- if (copy_from_user(®, &arg[i], sizeof(reg))) { + if (ctx->compat) + arg = &((struct compat_io_uring_rsrc_update __user *)__arg)[i]; + else + arg = &((struct io_uring_rsrc_update __user *)__arg)[i]; + + if (copy_io_uring_rsrc_update_from_user(ctx, ®, arg)) { ret = -EFAULT; break; } @@ -288,8 +320,10 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, if (ret < 0) break;
- reg.offset = ret; - if (put_user(reg.offset, &arg[i].offset)) { + arg_offset = ctx->compat ? + &((struct compat_io_uring_rsrc_update __user *)arg)->offset : + &((struct io_uring_rsrc_update __user *)arg)->offset; + if (put_user(reg.offset, arg_offset)) { fput(tctx->registered_rings[reg.offset]); tctx->registered_rings[reg.offset] = NULL; ret = -EFAULT; @@ -303,9 +337,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg, int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg, unsigned nr_args) { - struct io_uring_rsrc_update __user *arg = __arg; struct io_uring_task *tctx = current->io_uring; - struct io_uring_rsrc_update reg; int ret = 0, i;
if (!nr_args || nr_args > IO_RINGFD_REG_MAX) @@ -314,10 +346,19 @@ int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg, return 0;
for (i = 0; i < nr_args; i++) { - if (copy_from_user(®, &arg[i], sizeof(reg))) { + void __user *arg; + struct io_uring_rsrc_update reg; + + if (ctx->compat) + arg = &((struct compat_io_uring_rsrc_update __user *)__arg)[i]; + else + arg = &((struct io_uring_rsrc_update __user *)__arg)[i]; + + if (copy_io_uring_rsrc_update_from_user(ctx, ®, arg)) { ret = -EFAULT; break; } + if (reg.resv || reg.data || reg.offset >= IO_RINGFD_REG_MAX) { ret = -EINVAL; break; diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h index 7c6697d13cb2e..d67bb30ad543b 100644 --- a/io_uring/uring_cmd.h +++ b/io_uring/uring_cmd.h @@ -11,3 +11,10 @@ int io_uring_cmd_prep_async(struct io_kiocb *req); #define uring_cmd_pdu_size(is_sqe128) \ ((1 + !!(is_sqe128)) * sizeof(struct io_uring_sqe) - \ offsetof(struct io_uring_sqe, cmd)) + +#ifdef CONFIG_COMPAT64 +#define compat_uring_cmd_pdu_size(is_sqe128) \ + ((1 + !!(is_sqe128)) * sizeof(struct compat_io_uring_sqe) - \ + offsetof(struct compat_io_uring_sqe, cmd)) +#endif +
Some members of the io_uring uAPI structs may contain user pointers. In the PCuABI, a user pointer is a 129-bit capability, so the __u64 type is not big enough to hold it. Use the __kernel_uintptr_t type instead, which is big enough on the affected architectures while remaining 64-bit on others.
The user_data field must be passed unchanged from the submission queue to the completion queue. As it is standard practice to store a pointer in user_data, expand the field to __kernel_uintptr_t. However, the kernel doesn't dereference the user_data, so don't cast it to a compat_ptr.
In addition, for the io_uring structs containing user pointers, use the special copy routines when copying user pointers from/to userspace.
Note that the structs io_uring_sqe and io_uring_cqe are doubled in size in PCuABI. The setup flags IORING_SETUP_SQE128 and IORING_SETUP_CQE32 used to double the sizes of the two structs up to 128 bytes and 32 bytes respectively. In PCuABI, the two flags are still used to double the sizes of the two structs, but, as they increased in size, they increase up to 256 bytes and 64 bytes.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- include/linux/io_uring_types.h | 4 +-- include/uapi/linux/io_uring.h | 62 +++++++++++++++++++--------------- io_uring/advise.c | 2 +- io_uring/cancel.c | 6 ++-- io_uring/cancel.h | 2 +- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 9 ++--- io_uring/fs.c | 16 ++++----- io_uring/io_uring.c | 49 +++++++++++++++++++++++---- io_uring/io_uring.h | 16 ++++----- io_uring/kbuf.c | 15 ++++---- io_uring/kbuf.h | 2 +- io_uring/msg_ring.c | 4 +-- io_uring/net.c | 18 +++++----- io_uring/openclose.c | 4 +-- io_uring/poll.c | 4 +-- io_uring/rsrc.c | 41 +++++++++++----------- io_uring/rw.c | 13 ++++--- io_uring/statx.c | 4 +-- io_uring/tctx.c | 4 +-- io_uring/timeout.c | 10 +++--- io_uring/uring_cmd.c | 5 +++ io_uring/xattr.c | 12 +++---- 23 files changed, 177 insertions(+), 127 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 737bc1aa67306..1407ccadf0575 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -604,8 +604,8 @@ struct io_task_work { };
struct io_cqe { - __u64 user_data; - __s32 res; + __kernel_uintptr_t user_data; + __s32 res; /* fd initially, then cflags for completion */ union { __u32 flags; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 2df3225b562fa..14aa30151c5f9 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -11,6 +11,11 @@ #include <linux/fs.h> #include <linux/types.h> #include <linux/time_types.h> +#ifdef __KERNEL__ +#include <linux/stddef.h> +#else +#include <stddef.h> +#endif
#ifdef __cplusplus extern "C" { @@ -25,16 +30,16 @@ struct io_uring_sqe { __u16 ioprio; /* ioprio for the request */ __s32 fd; /* file descriptor to do IO on */ union { - __u64 off; /* offset into file */ - __u64 addr2; + __u64 off; /* offset into file */ + __kernel_uintptr_t addr2; struct { __u32 cmd_op; __u32 __pad1; }; }; union { - __u64 addr; /* pointer to buffer or iovecs */ - __u64 splice_off_in; + __kernel_uintptr_t addr; /* pointer to buffer or iovecs */ + __u64 splice_off_in; }; __u32 len; /* buffer size or number of iovecs */ union { @@ -58,7 +63,7 @@ struct io_uring_sqe { __u32 msg_ring_flags; __u32 uring_cmd_flags; }; - __u64 user_data; /* data to be passed back at completion time */ + __kernel_uintptr_t user_data; /* data to be passed back at completion time */ /* pack this to avoid bogus arm OABI complaints */ union { /* index into fixed buffers, if used */ @@ -78,12 +83,14 @@ struct io_uring_sqe { }; union { struct { - __u64 addr3; - __u64 __pad2[1]; + __kernel_uintptr_t addr3; + __u64 __pad2[1]; }; /* * If the ring is initialized with IORING_SETUP_SQE128, then - * this field is used for 80 bytes of arbitrary command data + * this field is used to double the size of the + * struct io_uring_sqe to store bytes of arbitrary + * command data, i.e. 80 bytes or 160 bytes in PCuABI */ __u8 cmd[0]; }; @@ -326,13 +333,14 @@ enum { * IO completion data structure (Completion Queue Entry) */ struct io_uring_cqe { - __u64 user_data; /* sqe->data submission passed back */ - __s32 res; /* result code for this event */ - __u32 flags; + __kernel_uintptr_t user_data; /* sqe->data submission passed back */ + __s32 res; /* result code for this event */ + __u32 flags;
/* * If the ring is initialized with IORING_SETUP_CQE32, then this field - * contains 16-bytes of padding, doubling the size of the CQE. + * doubles the size of the CQE, i.e. contains 16 bytes, or in PCuABI, + * 32 bytes of padding. */ __u64 big_cqe[]; }; @@ -504,7 +512,7 @@ enum { struct io_uring_files_update { __u32 offset; __u32 resv; - __aligned_u64 /* __s32 * */ fds; + __kernel_aligned_uintptr_t /* __s32 * */ fds; };
/* @@ -517,21 +525,21 @@ struct io_uring_rsrc_register { __u32 nr; __u32 flags; __u64 resv2; - __aligned_u64 data; - __aligned_u64 tags; + __kernel_aligned_uintptr_t data; + __kernel_aligned_uintptr_t tags; };
struct io_uring_rsrc_update { __u32 offset; __u32 resv; - __aligned_u64 data; + __kernel_aligned_uintptr_t data; };
struct io_uring_rsrc_update2 { __u32 offset; __u32 resv; - __aligned_u64 data; - __aligned_u64 tags; + __kernel_aligned_uintptr_t data; + __kernel_aligned_uintptr_t tags; __u32 nr; __u32 resv2; }; @@ -581,7 +589,7 @@ struct io_uring_restriction { };
struct io_uring_buf { - __u64 addr; + __kernel_uintptr_t addr; __u32 len; __u16 bid; __u16 resv; @@ -594,9 +602,7 @@ struct io_uring_buf_ring { * ring tail is overlaid with the io_uring_buf->resv field. */ struct { - __u64 resv1; - __u32 resv2; - __u16 resv3; + __u8 resv1[offsetof(struct io_uring_buf, resv)]; __u16 tail; }; struct io_uring_buf bufs[0]; @@ -605,7 +611,7 @@ struct io_uring_buf_ring {
/* argument for IORING_(UN)REGISTER_PBUF_RING */ struct io_uring_buf_reg { - __u64 ring_addr; + __kernel_uintptr_t ring_addr; __u32 ring_entries; __u16 bgid; __u16 pad; @@ -632,17 +638,17 @@ enum { };
struct io_uring_getevents_arg { - __u64 sigmask; - __u32 sigmask_sz; - __u32 pad; - __u64 ts; + __kernel_uintptr_t sigmask; + __u32 sigmask_sz; + __u32 pad; + __kernel_uintptr_t ts; };
/* * Argument for IORING_REGISTER_SYNC_CANCEL */ struct io_uring_sync_cancel_reg { - __u64 addr; + __kernel_uintptr_t addr; __s32 fd; __u32 flags; struct __kernel_timespec timeout; diff --git a/io_uring/advise.c b/io_uring/advise.c index 449c6f14649f7..5bb8094204979 100644 --- a/io_uring/advise.c +++ b/io_uring/advise.c @@ -23,7 +23,7 @@ struct io_fadvise {
struct io_madvise { struct file *file; - u64 addr; + __kernel_uintptr_t addr; u32 len; u32 advice; }; diff --git a/io_uring/cancel.c b/io_uring/cancel.c index befcb4aeae914..67c608983f0cf 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -19,7 +19,7 @@
struct io_cancel { struct file *file; - u64 addr; + __kernel_uintptr_t addr; u32 flags; s32 fd; }; @@ -43,7 +43,7 @@ static int get_compat_io_uring_sync_cancel_reg(struct io_uring_sync_cancel_reg *
if (unlikely(copy_from_user(&compat_sc, user_sc, sizeof(compat_sc)))) return -EFAULT; - sc->addr = compat_sc.addr; + sc->addr = (__kernel_uintptr_t)compat_ptr(compat_sc.addr); sc->fd = compat_sc.fd; sc->flags = compat_sc.flags; sc->timeout = compat_sc.timeout; @@ -60,7 +60,7 @@ static int copy_io_uring_sync_cancel_reg_from_user(struct io_ring_ctx *ctx, if (ctx->compat) return get_compat_io_uring_sync_cancel_reg(sc, arg); #endif - return copy_from_user(sc, arg, sizeof(sc)); + return copy_from_user_with_ptr(sc, arg, sizeof(sc)); }
static bool io_cancel_cb(struct io_wq_work *work, void *data) diff --git a/io_uring/cancel.h b/io_uring/cancel.h index 6a59ee484d0cc..7c1249d61bf25 100644 --- a/io_uring/cancel.h +++ b/io_uring/cancel.h @@ -5,7 +5,7 @@ struct io_cancel_data { struct io_ring_ctx *ctx; union { - u64 data; + __kernel_uintptr_t data; struct file *file; }; u32 flags; diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 9aa74d2c80bc4..a9f59f863b5c0 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -39,7 +39,7 @@ int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (ep_op_has_event(epoll->op)) { struct epoll_event __user *ev;
- ev = u64_to_user_ptr(READ_ONCE(sqe->addr)); + ev = (struct epoll_event __user *)READ_ONCE(sqe->addr); if (copy_from_user(&epoll->event, ev, sizeof(*ev))) return -EFAULT; } diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index c5bd669081c98..41628302d1f8b 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -114,7 +114,7 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, sq_idx, io_uring_get_opcode(sqe->opcode), sqe->fd, sqe->flags, (unsigned long long) sqe->off, (unsigned long long) sqe->addr, sqe->rw_flags, - sqe->buf_index, sqe->user_data); + sqe->buf_index, (unsigned long long) sqe->user_data); if (sq_shift) { u64 *sqeb = (void *) (sqe + 1); int size = (ctx->compat ? sizeof(struct compat_io_uring_sqe) @@ -153,8 +153,8 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, #endif
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x", - entry & cq_mask, cqe->user_data, cqe->res, - cqe->flags); + entry & cq_mask, (unsigned long long) cqe->user_data, + cqe->res, cqe->flags); if (cq_shift) seq_printf(m, ", extra1:%llu, extra2:%llu\n", cqe->big_cqe[0], cqe->big_cqe[1]); @@ -235,7 +235,8 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, #endif
seq_printf(m, " user_data=%llu, res=%d, flags=%x\n", - cqe->user_data, cqe->res, cqe->flags); + (unsigned long long) cqe->user_data, cqe->res, + cqe->flags);
#ifdef CONFIG_COMPAT64 kfree(native_cqe); diff --git a/io_uring/fs.c b/io_uring/fs.c index 7100c293c13a8..2e01e7da1d4ba 100644 --- a/io_uring/fs.c +++ b/io_uring/fs.c @@ -58,8 +58,8 @@ int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EBADF;
ren->old_dfd = READ_ONCE(sqe->fd); - oldf = u64_to_user_ptr(READ_ONCE(sqe->addr)); - newf = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + oldf = (char __user *)READ_ONCE(sqe->addr); + newf = (char __user *)READ_ONCE(sqe->addr2); ren->new_dfd = READ_ONCE(sqe->len); ren->flags = READ_ONCE(sqe->rename_flags);
@@ -117,7 +117,7 @@ int io_unlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (un->flags & ~AT_REMOVEDIR) return -EINVAL;
- fname = u64_to_user_ptr(READ_ONCE(sqe->addr)); + fname = (char __user *)READ_ONCE(sqe->addr); un->filename = getname(fname); if (IS_ERR(un->filename)) return PTR_ERR(un->filename); @@ -164,7 +164,7 @@ int io_mkdirat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) mkd->dfd = READ_ONCE(sqe->fd); mkd->mode = READ_ONCE(sqe->len);
- fname = u64_to_user_ptr(READ_ONCE(sqe->addr)); + fname = (char __user *)READ_ONCE(sqe->addr); mkd->filename = getname(fname); if (IS_ERR(mkd->filename)) return PTR_ERR(mkd->filename); @@ -206,8 +206,8 @@ int io_symlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EBADF;
sl->new_dfd = READ_ONCE(sqe->fd); - oldpath = u64_to_user_ptr(READ_ONCE(sqe->addr)); - newpath = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + oldpath = (char __user *)READ_ONCE(sqe->addr); + newpath = (char __user *)READ_ONCE(sqe->addr2);
sl->oldpath = getname(oldpath); if (IS_ERR(sl->oldpath)) @@ -250,8 +250,8 @@ int io_linkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
lnk->old_dfd = READ_ONCE(sqe->fd); lnk->new_dfd = READ_ONCE(sqe->len); - oldf = u64_to_user_ptr(READ_ONCE(sqe->addr)); - newf = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + oldf = (char __user *)READ_ONCE(sqe->addr); + newf = (char __user *)READ_ONCE(sqe->addr2); lnk->flags = READ_ONCE(sqe->hardlink_flags);
lnk->oldpath = getname(oldf); diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 2bd48cff83c3f..8db8ff2349aed 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -184,7 +184,7 @@ static int copy_io_uring_getevents_arg_from_user(struct io_ring_ctx *ctx, #endif if (size != sizeof(arg)) return -EINVAL; - if (copy_from_user(arg, argp, sizeof(arg))) + if (copy_from_user_with_ptr(arg, argp, sizeof(arg))) return -EFAULT; return 0; } @@ -726,7 +726,7 @@ static __cold void io_uring_drop_tctx_refs(struct task_struct *task) } }
-static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data, +static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, s32 res, u32 cflags, u64 extra1, u64 extra2) { struct io_overflow_cqe *ocqe; @@ -818,7 +818,7 @@ void *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) return cqe; }
-bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, s32 res, u32 cflags, bool allow_overflow) { struct io_uring_cqe *cqe; @@ -845,7 +845,7 @@ bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags }
bool io_post_aux_cqe(struct io_ring_ctx *ctx, - u64 user_data, s32 res, u32 cflags, + __kernel_uintptr_t user_data, s32 res, u32 cflags, bool allow_overflow) { bool filled; @@ -3213,9 +3213,9 @@ static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned int flags, return ret; if (arg.pad) return -EINVAL; - *sig = u64_to_user_ptr(arg.sigmask); + *sig = (sigset_t __user *)arg.sigmask; *argsz = arg.sigmask_sz; - *ts = u64_to_user_ptr(arg.ts); + *ts = (struct __kernel_timespec __user *)arg.ts; return 0; }
@@ -4158,6 +4158,42 @@ static int __init io_uring_init(void) __BUILD_BUG_VERIFY_OFFSET_SIZE(struct io_uring_sqe, eoffset, sizeof(etype), ename) #define BUILD_BUG_SQE_ELEM_SIZE(eoffset, esize, ename) \ __BUILD_BUG_VERIFY_OFFSET_SIZE(struct io_uring_sqe, eoffset, esize, ename) +#ifdef CONFIG_CHERI_PURECAP_UABI + BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 128); + BUILD_BUG_SQE_ELEM(0, __u8, opcode); + BUILD_BUG_SQE_ELEM(1, __u8, flags); + BUILD_BUG_SQE_ELEM(2, __u16, ioprio); + BUILD_BUG_SQE_ELEM(4, __s32, fd); + BUILD_BUG_SQE_ELEM(16, __u64, off); + BUILD_BUG_SQE_ELEM(16, __uintcap_t, addr2); + BUILD_BUG_SQE_ELEM(32, __uintcap_t, addr); + BUILD_BUG_SQE_ELEM(32, __u64, splice_off_in); + BUILD_BUG_SQE_ELEM(48, __u32, len); + BUILD_BUG_SQE_ELEM(52, __kernel_rwf_t, rw_flags); + BUILD_BUG_SQE_ELEM(52, __u32, fsync_flags); + BUILD_BUG_SQE_ELEM(52, __u16, poll_events); + BUILD_BUG_SQE_ELEM(52, __u32, poll32_events); + BUILD_BUG_SQE_ELEM(52, __u32, sync_range_flags); + BUILD_BUG_SQE_ELEM(52, __u32, msg_flags); + BUILD_BUG_SQE_ELEM(52, __u32, timeout_flags); + BUILD_BUG_SQE_ELEM(52, __u32, accept_flags); + BUILD_BUG_SQE_ELEM(52, __u32, cancel_flags); + BUILD_BUG_SQE_ELEM(52, __u32, open_flags); + BUILD_BUG_SQE_ELEM(52, __u32, statx_flags); + BUILD_BUG_SQE_ELEM(52, __u32, fadvise_advice); + BUILD_BUG_SQE_ELEM(52, __u32, splice_flags); + BUILD_BUG_SQE_ELEM(64, __uintcap_t, user_data); + BUILD_BUG_SQE_ELEM(80, __u16, buf_index); + BUILD_BUG_SQE_ELEM(80, __u16, buf_group); + BUILD_BUG_SQE_ELEM(82, __u16, personality); + BUILD_BUG_SQE_ELEM(84, __s32, splice_fd_in); + BUILD_BUG_SQE_ELEM(84, __u32, file_index); + BUILD_BUG_SQE_ELEM(84, __u16, addr_len); + BUILD_BUG_SQE_ELEM(86, __u16, __pad3[0]); + BUILD_BUG_SQE_ELEM(96, __uintcap_t, addr3); + BUILD_BUG_SQE_ELEM_SIZE(96, 0, cmd); + BUILD_BUG_SQE_ELEM(112, __u64, __pad2); +#else /* !CONFIG_CHERI_PURECAP_UABI */ BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 64); BUILD_BUG_SQE_ELEM(0, __u8, opcode); BUILD_BUG_SQE_ELEM(1, __u8, flags); @@ -4198,6 +4234,7 @@ static int __init io_uring_init(void) BUILD_BUG_SQE_ELEM(48, __u64, addr3); BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd); BUILD_BUG_SQE_ELEM(56, __u64, __pad2); +#endif /* !CONFIG_CHERI_PURECAP_UABI */
BUILD_BUG_ON(sizeof(struct io_uring_files_update) != sizeof(struct io_uring_rsrc_update)); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index fb2711770bfb0..f8c4e6d61124b 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -34,9 +34,9 @@ void io_req_complete_failed(struct io_kiocb *req, s32 res); void __io_req_complete(struct io_kiocb *req, unsigned issue_flags); void io_req_complete_post(struct io_kiocb *req); void __io_req_complete_post(struct io_kiocb *req); -bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, +bool io_post_aux_cqe(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, s32 res, u32 cflags, bool allow_overflow); -bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, s32 res, u32 cflags, bool allow_overflow); void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
@@ -99,7 +99,7 @@ static inline void convert_compat_io_uring_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe *cqe, const struct compat_io_uring_cqe *compat_cqe) { - cqe->user_data = READ_ONCE(compat_cqe->user_data); + cqe->user_data = (__kernel_uintptr_t)READ_ONCE(compat_cqe->user_data); cqe->res = READ_ONCE(compat_cqe->res); cqe->flags = READ_ONCE(compat_cqe->flags);
@@ -130,13 +130,13 @@ static inline void convert_compat_io_uring_sqe(struct io_ring_ctx *ctx, sqe->ioprio = READ_ONCE(compat_sqe->ioprio); sqe->fd = READ_ONCE(compat_sqe->fd); BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr2, addr); - sqe->addr2 = READ_ONCE(compat_sqe->addr2); + sqe->addr2 = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr2)); BUILD_BUG_COMPAT_SQE_UNION_ELEM(addr, len); - sqe->addr = READ_ONCE(compat_sqe->addr); + sqe->addr = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr)); sqe->len = READ_ONCE(compat_sqe->len); BUILD_BUG_COMPAT_SQE_UNION_ELEM(rw_flags, user_data); sqe->rw_flags = READ_ONCE(compat_sqe->rw_flags); - sqe->user_data = READ_ONCE(compat_sqe->user_data); + sqe->user_data = (__kernel_uintptr_t)READ_ONCE(compat_sqe->user_data); BUILD_BUG_COMPAT_SQE_UNION_ELEM(buf_index, personality); sqe->buf_index = READ_ONCE(compat_sqe->buf_index); sqe->personality = READ_ONCE(compat_sqe->personality); @@ -148,7 +148,7 @@ static inline void convert_compat_io_uring_sqe(struct io_ring_ctx *ctx,
memcpy(sqe->cmd, compat_sqe->cmd, compat_cmd_size); } else - sqe->addr3 = READ_ONCE(compat_sqe->addr3); + sqe->addr3 = (__kernel_uintptr_t)compat_ptr(READ_ONCE(compat_sqe->addr3)); #undef BUILD_BUG_COMPAT_SQE_UNION_ELEM } #endif /* CONFIG_COMPAT64 */ @@ -175,7 +175,7 @@ static inline void *io_get_cqe(struct io_ring_ctx *ctx) }
static inline void __io_fill_cqe_any(struct io_ring_ctx *ctx, struct io_uring_cqe *cqe, - u64 user_data, s32 res, u32 cflags, + __kernel_uintptr_t user_data, s32 res, u32 cflags, u64 extra1, u64 extra2) { #ifdef CONFIG_COMPAT64 diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 0fec941644260..59b178f4964ed 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -24,7 +24,7 @@
struct io_provide_buf { struct file *file; - __u64 addr; + __kernel_uintptr_t addr; __u32 len; __u32 bgid; __u16 nbufs; @@ -47,7 +47,7 @@ static int get_compat_io_uring_buf_reg(struct io_uring_buf_reg *reg,
if (unlikely(copy_from_user(&compat_reg, user_reg, sizeof(compat_reg)))) return -EFAULT; - reg->ring_addr = compat_reg.ring_addr; + reg->ring_addr = (__kernel_uintptr_t)compat_ptr(compat_reg.ring_addr); reg->ring_entries = compat_reg.ring_entries; reg->bgid = compat_reg.bgid; reg->pad = compat_reg.pad; @@ -64,7 +64,7 @@ static int copy_io_uring_buf_reg_from_user(struct io_ring_ctx *ctx, if (ctx->compat) return get_compat_io_uring_buf_reg(reg, arg); #endif - return copy_from_user(reg, arg, sizeof(reg)); + return copy_from_user_with_ptr(reg, arg, sizeof(reg)); }
static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, @@ -159,7 +159,7 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len, req->flags |= REQ_F_BUFFER_SELECTED; req->kbuf = kbuf; req->buf_index = kbuf->bid; - return u64_to_user_ptr(kbuf->addr); + return (void __user *)kbuf->addr; } return NULL; } @@ -239,7 +239,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, req->buf_list = NULL; bl->head++; } - return u64_to_user_ptr(buf->addr); + return (void __user *)buf->addr; }
static void __user *io_ring_buffer_select_any(struct io_kiocb *req, size_t *len, @@ -427,7 +427,7 @@ int io_provide_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe return -EOVERFLOW;
size = (unsigned long)p->len * p->nbufs; - if (!access_ok(u64_to_user_ptr(p->addr), size)) + if (!access_ok(p->addr, size)) return -EFAULT;
p->bgid = READ_ONCE(sqe->buf_group); @@ -487,7 +487,7 @@ static int io_add_buffers(struct io_ring_ctx *ctx, struct io_provide_buf *pbuf, struct io_buffer_list *bl) { struct io_buffer *buf; - u64 addr = pbuf->addr; + __kernel_uintptr_t addr = pbuf->addr; int i, bid = pbuf->bid;
for (i = 0; i < pbuf->nbufs; i++) { @@ -599,6 +599,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) pages_size = ctx->compat ? size_mul(sizeof(struct compat_io_uring_buf), reg.ring_entries) : size_mul(sizeof(struct io_uring_buf), reg.ring_entries); + /* TODO [PCuABI] - capability checks for uaccess */ pages = io_pin_pages(reg.ring_addr, pages_size, &nr_pages); if (IS_ERR(pages)) { kfree(free_bl); diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 1aa5bbbc5d628..50a0a7524e6b6 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -31,7 +31,7 @@ struct io_buffer_list {
struct io_buffer { struct list_head list; - __u64 addr; + __kernel_uintptr_t addr; __u32 len; __u16 bid; __u16 bgid; diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c index 90d2fc6fd80e4..f929c4f98a2d5 100644 --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -15,7 +15,7 @@
struct io_msg { struct file *file; - u64 user_data; + __kernel_uintptr_t user_data; u32 len; u32 cmd; u32 src_fd; @@ -130,7 +130,7 @@ int io_msg_ring_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(sqe->buf_index || sqe->personality)) return -EINVAL;
- msg->user_data = READ_ONCE(sqe->off); + msg->user_data = READ_ONCE(sqe->addr); msg->len = READ_ONCE(sqe->len); msg->cmd = READ_ONCE(sqe->addr); msg->src_fd = READ_ONCE(sqe->addr3); diff --git a/io_uring/net.c b/io_uring/net.c index c586278858e7e..d6440dfdf8e1a 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -242,13 +242,13 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (req->opcode == IORING_OP_SEND) { if (READ_ONCE(sqe->__pad3[0])) return -EINVAL; - sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + sr->addr = (void __user *)READ_ONCE(sqe->addr2); sr->addr_len = READ_ONCE(sqe->addr_len); } else if (sqe->addr2 || sqe->file_index) { return -EINVAL; }
- sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); + sr->umsg = (struct user_msghdr __user *)READ_ONCE(sqe->addr); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); if (sr->flags & ~IORING_RECVSEND_POLL_FIRST) @@ -546,7 +546,7 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(sqe->file_index || sqe->addr2)) return -EINVAL;
- sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); + sr->umsg = (struct user_msghdr __user *)READ_ONCE(sqe->addr); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); if (sr->flags & ~(RECVMSG_FLAGS)) @@ -963,7 +963,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (req->opcode == IORING_OP_SEND_ZC) { if (READ_ONCE(sqe->__pad3[0])) return -EINVAL; - zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + zc->addr = (void __user *)READ_ONCE(sqe->addr2); zc->addr_len = READ_ONCE(sqe->addr_len); } else { if (unlikely(sqe->addr2 || sqe->file_index)) @@ -972,7 +972,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EINVAL; }
- zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + zc->buf = (void __user *)READ_ONCE(sqe->addr); zc->len = READ_ONCE(sqe->len); zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; if (zc->msg_flags & MSG_DONTWAIT) @@ -1239,8 +1239,8 @@ int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (sqe->len || sqe->buf_index) return -EINVAL;
- accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr)); - accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + accept->addr = (void __user *)READ_ONCE(sqe->addr); + accept->addr_len = (int __user *)READ_ONCE(sqe->addr2); accept->flags = READ_ONCE(sqe->accept_flags); accept->nofile = rlimit(RLIMIT_NOFILE); flags = READ_ONCE(sqe->ioprio); @@ -1389,8 +1389,8 @@ int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (sqe->len || sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in) return -EINVAL;
- conn->addr = u64_to_user_ptr(READ_ONCE(sqe->addr)); - conn->addr_len = READ_ONCE(sqe->addr2); + conn->addr = (void __user *)READ_ONCE(sqe->addr); + conn->addr_len = READ_ONCE(sqe->addr2); conn->in_progress = false; return 0; } diff --git a/io_uring/openclose.c b/io_uring/openclose.c index 67178e4bb282d..0a5c838885306 100644 --- a/io_uring/openclose.c +++ b/io_uring/openclose.c @@ -47,7 +47,7 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe open->how.flags |= O_LARGEFILE;
open->dfd = READ_ONCE(sqe->fd); - fname = u64_to_user_ptr(READ_ONCE(sqe->addr)); + fname = (char __user *)READ_ONCE(sqe->addr); open->filename = getname(fname); if (IS_ERR(open->filename)) { ret = PTR_ERR(open->filename); @@ -81,7 +81,7 @@ int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) size_t len; int ret;
- how = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + how = (struct open_how __user *)READ_ONCE(sqe->addr2); len = READ_ONCE(sqe->len); if (len < OPEN_HOW_SIZE_VER0) return -EINVAL; diff --git a/io_uring/poll.c b/io_uring/poll.c index d9bf1767867e6..4914048fed0f8 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -22,8 +22,8 @@
struct io_poll_update { struct file *file; - u64 old_user_data; - u64 new_user_data; + __kernel_uintptr_t old_user_data; + __kernel_uintptr_t new_user_data; __poll_t events; bool update_events; bool update_user_data; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 7e6428a44d625..5e974afba7b1f 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -18,7 +18,7 @@
struct io_rsrc_update { struct file *file; - u64 arg; + __kernel_uintptr_t arg; u32 nr_args; u32 offset; }; @@ -33,7 +33,7 @@ static int get_compat_io_uring_rsrc_update(struct io_uring_rsrc_update2 *up2, return -EFAULT; up2->offset = compat_up.offset; up2->resv = compat_up.resv; - up2->data = compat_up.data; + up2->data = (__kernel_uintptr_t)compat_ptr(compat_up.data); return 0; }
@@ -46,8 +46,8 @@ static int get_compat_io_uring_rsrc_update2(struct io_uring_rsrc_update2 *up2, return -EFAULT; up2->offset = compat_up2.offset; up2->resv = compat_up2.resv; - up2->data = compat_up2.data; - up2->tags = compat_up2.tags; + up2->data = (__kernel_uintptr_t)compat_ptr(compat_up2.data); + up2->tags = (__kernel_uintptr_t)compat_ptr(compat_up2.tags); up2->nr = compat_up2.nr; up2->resv2 = compat_up2.resv2; return 0; @@ -63,8 +63,8 @@ static int get_compat_io_uring_rsrc_register(struct io_uring_rsrc_register *rr, rr->nr = compat_rr.nr; rr->flags = compat_rr.flags; rr->resv2 = compat_rr.resv2; - rr->data = compat_rr.data; - rr->tags = compat_rr.tags; + rr->data = (__kernel_uintptr_t)compat_ptr(compat_rr.data); + rr->tags = (__kernel_uintptr_t)compat_ptr(compat_rr.tags); return 0; } #endif /* CONFIG_COMPAT64 */ @@ -77,7 +77,7 @@ static int copy_io_uring_rsrc_update_from_user(struct io_ring_ctx *ctx, if (ctx->compat) return get_compat_io_uring_rsrc_update(up2, arg); #endif - return copy_from_user(up2, arg, sizeof(struct io_uring_rsrc_update)); + return copy_from_user_with_ptr(up2, arg, sizeof(struct io_uring_rsrc_update)); }
static int copy_io_uring_rsrc_update2_from_user(struct io_ring_ctx *ctx, @@ -96,7 +96,7 @@ static int copy_io_uring_rsrc_update2_from_user(struct io_ring_ctx *ctx, #endif if (size != sizeof(up2)) return -EINVAL; - if (copy_from_user(up2, arg, sizeof(*up2))) + if (copy_from_user_with_ptr(up2, arg, sizeof(*up2))) return -EFAULT; return 0; } @@ -118,7 +118,7 @@ static int copy_io_uring_rsrc_register_from_user(struct io_ring_ctx *ctx, /* keep it extendible */ if (size != sizeof(rr)) return -EINVAL; - if (copy_from_user(rr, arg, size)) + if (copy_from_user_with_ptr(rr, arg, size)) return -EFAULT; return 0; } @@ -201,13 +201,14 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst, if (copy_from_user(&ciov, &ciovs[index], sizeof(ciov))) return -EFAULT;
- dst->iov_base = u64_to_user_ptr((u64)ciov.iov_base); + dst->iov_base = compat_ptr(ciov.iov_base); + dst->iov_len = ciov.iov_len; return 0; } #endif src = (struct iovec __user *) arg; - if (copy_from_user(dst, &src[index], sizeof(*dst))) + if (copy_from_user_with_ptr(dst, &src[index], sizeof(*dst))) return -EFAULT; return 0; } @@ -534,8 +535,8 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, struct io_uring_rsrc_update2 *up, unsigned nr_args) { - u64 __user *tags = u64_to_user_ptr(up->tags); - __s32 __user *fds = u64_to_user_ptr(up->data); + u64 __user *tags = (u64 __user *)up->tags; + __s32 __user *fds = (__s32 __user *)up->data; struct io_rsrc_data *data = ctx->file_data; struct io_fixed_file *file_slot; struct file *file; @@ -614,9 +615,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, struct io_uring_rsrc_update2 *up, unsigned int nr_args) { - u64 __user *tags = u64_to_user_ptr(up->tags); + u64 __user *tags = (u64 __user *)up->tags; struct iovec iov; - struct iovec __user *iovs = u64_to_user_ptr(up->data); + struct iovec __user *iovs = (struct iovec __user *)up->data; struct page *last_hpage = NULL; bool needs_switch = false; __u32 done; @@ -742,13 +743,13 @@ __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, case IORING_RSRC_FILE: if (rr.flags & IORING_RSRC_REGISTER_SPARSE && rr.data) break; - return io_sqe_files_register(ctx, u64_to_user_ptr(rr.data), - rr.nr, u64_to_user_ptr(rr.tags)); + return io_sqe_files_register(ctx, (void __user *)rr.data, + rr.nr, (u64 __user *)rr.tags); case IORING_RSRC_BUFFER: if (rr.flags & IORING_RSRC_REGISTER_SPARSE && rr.data) break; - return io_sqe_buffers_register(ctx, u64_to_user_ptr(rr.data), - rr.nr, u64_to_user_ptr(rr.tags)); + return io_sqe_buffers_register(ctx, (void __user *)rr.data, + rr.nr, (u64 __user *)rr.tags); } return -EINVAL; } @@ -774,7 +775,7 @@ static int io_files_update_with_index_alloc(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req, struct io_rsrc_update); - __s32 __user *fds = u64_to_user_ptr(up->arg); + __s32 __user *fds = (__s32 __user *)up->arg; unsigned int done; struct file *file; int ret, fd; diff --git a/io_uring/rw.c b/io_uring/rw.c index 2edca190450ee..424ee773c95aa 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -23,7 +23,7 @@ struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ struct kiocb kiocb; - u64 addr; + __kernel_uintptr_t addr; u32 len; rwf_t flags; }; @@ -39,7 +39,7 @@ static int io_iov_compat_buffer_select_prep(struct io_rw *rw) struct compat_iovec __user *uiov; compat_ssize_t clen;
- uiov = u64_to_user_ptr(rw->addr); + uiov = (struct compat_iovec __user *)rw->addr; if (!access_ok(uiov, sizeof(*uiov))) return -EFAULT; if (__get_user(clen, &uiov->iov_len)) @@ -65,7 +65,7 @@ static int io_iov_buffer_select_prep(struct io_kiocb *req) return io_iov_compat_buffer_select_prep(rw); #endif
- uiov = u64_to_user_ptr(rw->addr); + uiov = (struct iovec __user *)rw->addr; if (get_user(rw->len, &uiov->iov_len)) return -EFAULT; return 0; @@ -370,7 +370,7 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, return NULL; }
- buf = u64_to_user_ptr(rw->addr); + buf = (void __user *)rw->addr; sqe_len = rw->len;
if (opcode == IORING_OP_READ || opcode == IORING_OP_WRITE || @@ -379,8 +379,7 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, buf = io_buffer_select(req, &sqe_len, issue_flags); if (!buf) return ERR_PTR(-ENOBUFS); - /* TODO [PCuABI] - capability checks for uaccess */ - rw->addr = user_ptr_addr(buf); + rw->addr = (__kernel_uintptr_t)buf; rw->len = sqe_len; }
@@ -446,7 +445,7 @@ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter) if (!iov_iter_is_bvec(iter)) { iovec = iov_iter_iovec(iter); } else { - iovec.iov_base = u64_to_user_ptr(rw->addr); + iovec.iov_base = (void __user *)rw->addr; iovec.iov_len = rw->len; }
diff --git a/io_uring/statx.c b/io_uring/statx.c index d8fc933d3f593..d2604fdbcbe33 100644 --- a/io_uring/statx.c +++ b/io_uring/statx.c @@ -32,8 +32,8 @@ int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
sx->dfd = READ_ONCE(sqe->fd); sx->mask = READ_ONCE(sqe->len); - path = u64_to_user_ptr(READ_ONCE(sqe->addr)); - sx->buffer = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + path = (char __user *)READ_ONCE(sqe->addr); + sx->buffer = (struct statx __user *)READ_ONCE(sqe->addr2); sx->flags = READ_ONCE(sqe->statx_flags);
sx->filename = getname_flags(path, diff --git a/io_uring/tctx.c b/io_uring/tctx.c index 6ab9916ed3844..828383907b3b9 100644 --- a/io_uring/tctx.c +++ b/io_uring/tctx.c @@ -22,7 +22,7 @@ static int get_compat_io_uring_rsrc_update(struct io_uring_rsrc_update *up, return -EFAULT; up->offset = compat_up.offset; up->resv = compat_up.resv; - up->data = compat_up.data; + up->data = (__kernel_uintptr_t)compat_ptr(compat_up.data); return 0; } #endif /* CONFIG_COMPAT64 */ @@ -35,7 +35,7 @@ static int copy_io_uring_rsrc_update_from_user(struct io_ring_ctx *ctx, if (ctx->compat) return get_compat_io_uring_rsrc_update(up, arg); #endif - return copy_from_user(up, arg, sizeof(struct io_uring_rsrc_update)); + return copy_from_user_with_ptr(up, arg, sizeof(struct io_uring_rsrc_update)); }
static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx, diff --git a/io_uring/timeout.c b/io_uring/timeout.c index e8a8c20994805..5a0fe53c13329 100644 --- a/io_uring/timeout.c +++ b/io_uring/timeout.c @@ -26,7 +26,7 @@ struct io_timeout {
struct io_timeout_rem { struct file *file; - u64 addr; + __kernel_uintptr_t addr;
/* timeout update */ struct timespec64 ts; @@ -337,7 +337,7 @@ static clockid_t io_timeout_get_clock(struct io_timeout_data *data) } }
-static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data, +static int io_linked_timeout_update(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, struct timespec64 *ts, enum hrtimer_mode mode) __must_hold(&ctx->timeout_lock) { @@ -365,7 +365,7 @@ static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data, return 0; }
-static int io_timeout_update(struct io_ring_ctx *ctx, __u64 user_data, +static int io_timeout_update(struct io_ring_ctx *ctx, __kernel_uintptr_t user_data, struct timespec64 *ts, enum hrtimer_mode mode) __must_hold(&ctx->timeout_lock) { @@ -405,7 +405,7 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) tr->ltimeout = true; if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS)) return -EINVAL; - if (get_timespec64(&tr->ts, u64_to_user_ptr(sqe->addr2))) + if (get_timespec64(&tr->ts, (struct __kernel_timespec __user *)sqe->addr2)) return -EFAULT; if (tr->ts.tv_sec < 0 || tr->ts.tv_nsec < 0) return -EINVAL; @@ -490,7 +490,7 @@ static int __io_timeout_prep(struct io_kiocb *req, data->req = req; data->flags = flags;
- if (get_timespec64(&data->ts, u64_to_user_ptr(sqe->addr))) + if (get_timespec64(&data->ts, (struct __kernel_timespec __user *)sqe->addr)) return -EFAULT;
if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index e50de0b6b9f84..4d2d2e3f885ee 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -65,8 +65,13 @@ int io_uring_cmd_prep_async(struct io_kiocb *req) struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); size_t cmd_size;
+#ifdef CONFIG_CHERI_PURECAP_UABI + BUILD_BUG_ON(uring_cmd_pdu_size(0) != 32); + BUILD_BUG_ON(uring_cmd_pdu_size(1) != 160); +#else BUILD_BUG_ON(uring_cmd_pdu_size(0) != 16); BUILD_BUG_ON(uring_cmd_pdu_size(1) != 80); +#endif
cmd_size = uring_cmd_pdu_size(req->ctx->flags & IORING_SETUP_SQE128);
diff --git a/io_uring/xattr.c b/io_uring/xattr.c index 99df641594d74..1f13032e59536 100644 --- a/io_uring/xattr.c +++ b/io_uring/xattr.c @@ -53,8 +53,8 @@ static int __io_getxattr_prep(struct io_kiocb *req,
ix->filename = NULL; ix->ctx.kvalue = NULL; - name = u64_to_user_ptr(READ_ONCE(sqe->addr)); - ix->ctx.cvalue = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + name = (char __user *)READ_ONCE(sqe->addr); + ix->ctx.cvalue = (void __user *)READ_ONCE(sqe->addr2); ix->ctx.size = READ_ONCE(sqe->len); ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
@@ -93,7 +93,7 @@ int io_getxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (ret) return ret;
- path = u64_to_user_ptr(READ_ONCE(sqe->addr3)); + path = (char __user *)READ_ONCE(sqe->addr3);
ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL); if (IS_ERR(ix->filename)) { @@ -159,8 +159,8 @@ static int __io_setxattr_prep(struct io_kiocb *req, return -EBADF;
ix->filename = NULL; - name = u64_to_user_ptr(READ_ONCE(sqe->addr)); - ix->ctx.cvalue = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + name = (char __user *)READ_ONCE(sqe->addr); + ix->ctx.cvalue = (void __user *)READ_ONCE(sqe->addr2); ix->ctx.kvalue = NULL; ix->ctx.size = READ_ONCE(sqe->len); ix->ctx.flags = READ_ONCE(sqe->xattr_flags); @@ -189,7 +189,7 @@ int io_setxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (ret) return ret;
- path = u64_to_user_ptr(READ_ONCE(sqe->addr3)); + path = (char __user *)READ_ONCE(sqe->addr3);
ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL); if (IS_ERR(ix->filename)) {
The io_uring shared memory region hosts the io_uring_sqe and io_uring_cqe arrays. These structs may contain user pointers, so the memory region must be allowed to store and load capability pointers.
Signed-off-by: Tudor Cretu tudor.cretu@arm.com --- io_uring/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 8db8ff2349aed..035fa37e6fab7 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3136,6 +3136,11 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) if (IS_ERR(ptr)) return PTR_ERR(ptr);
+#ifdef CONFIG_CHERI_PURECAP_UABI + vma->vm_flags |= VM_READ_CAPS | VM_WRITE_CAPS; + vma_set_page_prot(vma); +#endif + pfn = virt_to_phys(ptr) >> PAGE_SHIFT; return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot); }
Hi Tudor,
On 13/02/2023 17:37, Tudor Cretu wrote:
This series makes it possible for purecap apps to use the io_uring system.
With these patches, all io_uring LTP tests pass in both Purecap and compat modes. Note that the LTP tests only address the basic functionality of the io_uring system and a significant portion of the multiplexed functionality is untested in LTP.
I have started running and investigating liburing tests and examples, so this is not a final version, but review is still very much appreciated while I go through the liburing tests.
I went through the whole series and I don't have much to say : it looks great !
The changes make sense to me and feel consistent, well done on untangling this hairy subject.
Not the most helpful review but at least nothing jumped out to me :)
Cheers, Téo
v2:
- Rebase on top of release 6.1
- Remove VM_READ_CAPS/VM_LOAD_CAPS patches as they are already merged
- Update commit message in PATCH 1
- Add the generic changes PATCH 2 and PATCH 3 to avoid copying user pointers from/to userspace unnecesarily. These could be upstreamable.
- Split "pulling the cqes memeber out" change into PATCH 4
- The changes for PATCH 5 and 6 are now split into their respective files after the rebase.
- Format and change organization based on the feedback on the previous version, including creating helpers copy_*_from_* for various uAPI structs
- Add comments related to handling of setup flags IORING_SETUP_SQE128 and IORING_SETUP_CQE32
- Add handling for new uAPI structs: io_uring_buf, io_uring_buf_ring, io_uring_buf_reg, io_uring_sync_cancel_reg.
Gitlab issue: https://git.morello-project.org/morello/kernel/linux/-/issues/2
Review branch: https://git.morello-project.org/tudcre01/linux/-/commits/morello/io_uring_v2
Tudor Cretu (7): compiler_types: Add (u)intcap_t to native_words io_uring/rw : Restrict copy to only uiov->len from userspace io_uring/tctx: Copy only the offset field back to user io_uring: Pull cqes member out from rings struct io_uring: Implement compat versions of uAPI structs and handle them io_uring: Use user pointer type in the uAPI structs io_uring: Allow capability tag access on the shared memory
include/linux/compiler_types.h | 7 + include/linux/io_uring_types.h | 160 ++++++++++++++-- include/uapi/linux/io_uring.h | 62 ++++--- io_uring/advise.c | 2 +- io_uring/cancel.c | 40 +++- io_uring/cancel.h | 2 +- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 64 ++++++- io_uring/fs.c | 16 +- io_uring/io_uring.c | 329 +++++++++++++++++++++++++-------- io_uring/io_uring.h | 126 ++++++++++--- io_uring/kbuf.c | 119 ++++++++++-- io_uring/kbuf.h | 8 +- io_uring/msg_ring.c | 4 +- io_uring/net.c | 18 +- io_uring/openclose.c | 4 +- io_uring/poll.c | 4 +- io_uring/rsrc.c | 150 ++++++++++++--- io_uring/rw.c | 17 +- io_uring/statx.c | 4 +- io_uring/tctx.c | 57 +++++- io_uring/timeout.c | 10 +- io_uring/uring_cmd.c | 5 + io_uring/uring_cmd.h | 7 + io_uring/xattr.c | 12 +- 25 files changed, 977 insertions(+), 252 deletions(-)
On 22-02-2023 17:48, Teo Couprie Diaz wrote:
Hi Tudor,
On 13/02/2023 17:37, Tudor Cretu wrote:
This series makes it possible for purecap apps to use the io_uring system.
With these patches, all io_uring LTP tests pass in both Purecap and compat modes. Note that the LTP tests only address the basic functionality of the io_uring system and a significant portion of the multiplexed functionality is untested in LTP.
I have started running and investigating liburing tests and examples, so this is not a final version, but review is still very much appreciated while I go through the liburing tests.
I went through the whole series and I don't have much to say : it looks great !
The changes make sense to me and feel consistent, well done on untangling this hairy subject.
Not the most helpful review but at least nothing jumped out to me :)
Cheers, Téo
Thank you for the review, Téo! It's very much appreciated!
Thanks, Tudor
v2:
- Rebase on top of release 6.1
- Remove VM_READ_CAPS/VM_LOAD_CAPS patches as they are already merged
- Update commit message in PATCH 1
- Add the generic changes PATCH 2 and PATCH 3 to avoid copying user
pointers from/to userspace unnecesarily. These could be upstreamable.
- Split "pulling the cqes memeber out" change into PATCH 4
- The changes for PATCH 5 and 6 are now split into their respective
files after the rebase.
- Format and change organization based on the feedback on the
previous version, including creating helpers copy_*_from_* for various uAPI structs
- Add comments related to handling of setup flags IORING_SETUP_SQE128
and IORING_SETUP_CQE32
- Add handling for new uAPI structs: io_uring_buf, io_uring_buf_ring,
io_uring_buf_reg, io_uring_sync_cancel_reg.
Gitlab issue: https://git.morello-project.org/morello/kernel/linux/-/issues/2
Review branch: https://git.morello-project.org/tudcre01/linux/-/commits/morello/io_uring_v2
Tudor Cretu (7): compiler_types: Add (u)intcap_t to native_words io_uring/rw : Restrict copy to only uiov->len from userspace io_uring/tctx: Copy only the offset field back to user io_uring: Pull cqes member out from rings struct io_uring: Implement compat versions of uAPI structs and handle them io_uring: Use user pointer type in the uAPI structs io_uring: Allow capability tag access on the shared memory
include/linux/compiler_types.h | 7 + include/linux/io_uring_types.h | 160 ++++++++++++++-- include/uapi/linux/io_uring.h | 62 ++++--- io_uring/advise.c | 2 +- io_uring/cancel.c | 40 +++- io_uring/cancel.h | 2 +- io_uring/epoll.c | 2 +- io_uring/fdinfo.c | 64 ++++++- io_uring/fs.c | 16 +- io_uring/io_uring.c | 329 +++++++++++++++++++++++++-------- io_uring/io_uring.h | 126 ++++++++++--- io_uring/kbuf.c | 119 ++++++++++-- io_uring/kbuf.h | 8 +- io_uring/msg_ring.c | 4 +- io_uring/net.c | 18 +- io_uring/openclose.c | 4 +- io_uring/poll.c | 4 +- io_uring/rsrc.c | 150 ++++++++++++--- io_uring/rw.c | 17 +- io_uring/statx.c | 4 +- io_uring/tctx.c | 57 +++++- io_uring/timeout.c | 10 +- io_uring/uring_cmd.c | 5 + io_uring/uring_cmd.h | 7 + io_uring/xattr.c | 12 +- 25 files changed, 977 insertions(+), 252 deletions(-)
linux-morello mailing list -- linux-morello@op-lists.linaro.org To unsubscribe send an email to linux-morello-leave@op-lists.linaro.org
linux-morello@op-lists.linaro.org