On 02/11/2022 19:25, Tudor Cretu wrote:

@@ -9389,13 +9561,29 @@ static void *io_mem_alloc(size_t size)
      return (void *) __get_free_pages(gfp, get_order(size));
}
-static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
-                size_t *sq_offset)
+static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries,
+                unsigned int cq_entries, size_t *cq_offset, size_t *sq_offset)
{
      struct io_rings *rings;
-    size_t off, sq_array_size;
+    size_t off, cq_array_size, sq_array_size;
+
+    off = sizeof(*rings);
+
+#ifdef CONFIG_SMP
+    off = ALIGN(off, SMP_CACHE_BYTES);
+    if (off == 0)
+        return SIZE_MAX;
+#endif
+
+    if (cq_offset)
+        *cq_offset = off;
+
+    cq_array_size = array_size(ctx->compat ? sizeof(struct compat_io_uring_cqe)
+                           : sizeof(struct io_uring_cqe), cq_entries);
+    if (cq_array_size == SIZE_MAX)
+        return SIZE_MAX;
-    off = struct_size(rings, cqes, cq_entries);
+    off = size_add(off, cq_array_size);
      if (off == SIZE_MAX)
          return SIZE_MAX;
@@ -9412,7 +9600,8 @@ static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
      if (sq_array_size == SIZE_MAX)
          return SIZE_MAX;
-    if (check_add_overflow(off, sq_array_size, &off))
+    off = size_add(off, sq_array_size);
+    if (off == SIZE_MAX)

Could you elaborate on why we need this change?

It's not needed, just wanted to align it with the rest of the checks in the function and I find it more readable. I can make it again check_add_overflow, as well with the check at "size_add(off, cq_array_size)" if you prefer.

check_add_overflow() is already used in many places in this file, and size_add() is not, so yes I think it's better to use check_add_overflow(). Not that I disagree that size_add() is clearer, but consistency first :)

          return SIZE_MAX;
      return off;
@@ -9972,7 +10161,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
      WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
      io_mem_free(ctx->rings);
-    io_mem_free(ctx->sq_sqes);
+    io_mem_free(ctx->compat ? (void *)ctx->sq_sqes_compat : (void *)ctx->sq_sqes);

I find this sort of diff slightly frustrating as sq_sqes and sq_sqes_compat are both pointers, simply to a different type. If I got this correctly, you are making use of the type in two cases, while it is irrelevant in three (like here). However it is technically UB to refer to the inactive member of the union, so just getting rid of this diff wouldn't be recommended.

Fortunately there is an alternative: instead of having a union of pointers, make it a pointer to a union, e.g.:

   union {
     struct io_uring_cqe native;
     struct compat_io_uring_cqe compat;
   } *sq_sqes;

This way sq_sqes can still be viewed as a void * without change, and the cases where the type matters have to be changed anyway.

Might as well do the same with cqes/cqes_compat to be consistent.

I think this change would be misleading. sq_sqes is not a pointer to an element which is either native or compat, but rather is an array of {compat_}io_uring_sqe structs, so it's not really the same thing. The compat member of the union would never be used. In the compat case, I would need to cast sq_sqes to an array of struct compat_io_uring_sqe and then get the element by index. Is this what intended to suggest?

You are absolutely right, I somehow missed this "detail"... I don't think it is possible to have a different approach to the union then.

That said I have realised that I was also wrong on the fact that using the inactive member is UB: it is in C++ but not in C, where it is explicitly supported since C99 (this is known as "type punning", see [1]). Therefore we can have the best of both worlds: use ctx->sq_sqes unconditionally if the actual type doesn't matter, like here or when initialising the member, and otherwise use the right member.

      percpu_ref_exit(&ctx->refs);
      free_uid(ctx->user);

[...]

@@ -11799,6 +12158,13 @@ static int __init io_uring_init(void)
      BUILD_BUG_ON(sizeof(etype) != sizeof_field(stype, ename)); \
} while (0)
+    BUILD_BUG_ON(__builtin_types_compatible_p(struct io_uring_sqe, struct compat_io_uring_sqe));

The condition is reversed 😄

Actually, this __builtin_types_compatible_p returns 0 😅, so it doesn't do what I expected. Copying all the layout checks for the struct compat_io_uring_sqe doesn't feel right, so I am unsure what to do here. Any suggestion?

Right, that makes sense in fact, as two distinct structs are incompatible regardless of their layout (you don't want to inadvertently allow assigning a struct to another just because the number and types of their members happen to match).

I think you might as well copy all the checks. [2] did that for siginfo_t and I think being explicit makes sense here, it doesn't really hurt to bloat this function a bit.

+
+    BUILD_BUG_ON(sizeof(struct compat_io_uring_files_update) !=
+             sizeof(struct compat_io_uring_rsrc_update));
+    BUILD_BUG_ON(sizeof(struct compat_io_uring_rsrc_update) >
+             sizeof(struct compat_io_uring_rsrc_update2));

Nit: I'd rather have the compat checks after the native ones.

Ack

Kevin

Many thanks for the in-depth review and suggestions!

You are very welcome :)

Kevin

[1] https://en.cppreference.com/w/c/language/union
[2] https://lkml.kernel.org/r/20210429190734.624918-3-elver@google.com