On 11/05/2023 16:34, Luca Vizzarro wrote:
On 03/05/2023 18:12, Tudor Cretu wrote:
Related to this instance and the io_uring/kbuf.c, I think it would be better to add the check in io_pin_pages (a function defined by io_uring). Only that function calls pin_user_pages, the actual GUP function. Also, I don't think that to check for iov_len is sufficient if it doesn't represent an integer number of pages. What if the last page grabbed contains area outside of the capability bounds...
This is a rather complicated one. The point of view I am following here is that we are only checking that the address space for the user data is correctly delimited by the user capability. While in this case the kernel effectively grabs all the interested pages, the user space won't certainly gain access to them. Therefore, this is a matter of making sure that the kernel itself does not access parts of pages which are not covered by the provided capability bounds. If we care about this now, how could we tackle it then?
On where to do the checking: as a rule, we should do it as generically as possible, so in the helper and not the caller if possible. Whether we can do that here is still unclear to me.
On the bound checking: I do think that checking iov_len is sufficient. What matters is that only the [iov_base, iov_base + iov_len] area is accessed by the kernel on behalf of userspace, and AFAICT this is the case. The fact that the kernel needs to pin entire pages is an implementation detail that should remain invisible to userspace.
I can see this comes from __iov_iter_get_pages_alloc, but it still gives me a headache figuring out why it it like this...
I am there with you. I personally don't understand it either. I was indeed hoping that somebody would shed some light on it.
That one made me twitch too. Took me a while to figure it out, but in fact it makes perfect sense, if you think about it from the right angle! iov_iter_rw() returns the direction of *data_source*. What the iterator does is on the other side, hence the direction is reversed. Take for instance this line in new_sync_read():
iov_iter_ubuf(&iter, READ, buf, len);
Here READ means "we're reading a file", and as a consequence we are *writing* to the specified buffer. For that reason even simple syscalls like read/write are confusing in terms of data direction...
Kevin