On Tue, Oct 25, 2022 at 04:13:37PM +0100, Vincenzo Frascino wrote:
On 10/11/22 11:45, Arnd Bergmann wrote:
=== Integer types ===
- I understand that __user pointers, __kernel_uintptr_t, and __uintcap_t are 128 bit wide in a purecap kernel. However I'm constantly confused about the size of the basic types (void *, long, long long) in userspace, in a purecap kernel and in a kernel without CONFIG_CHERI_PURECAP_UABI. While those are probably completely obvious to anyone working on Morello, maybe you can repeat how these are defined in the ABI documents and/or in the Documentation/arm64/morello.rst files.
I agree, it is not a straight forward concept and requires further clarification. To fix this we are currently working on a new document (will be placed in Documentation/cheri/) that will cover various aspects of the PCuABI including sizes of the basic types. It should be ready in the coming months. I will make sure you are Cc'ed once it is posted for review.
We decided to go for a separate document because we think that the information contained might be relevant even for other architectures/platforms and that it makes it easier to extend it as we proceed with the implementation of the kernel.
- The use of uintcap_t and __kernel_uintcap_t type names makes perfect sense in the context of Morello, but there is a risk that anything using those will have to be reworked again in order to deal with architectures that have >64bit addresses without using Cheri-style capabilities.
The risk is there and IMHO it depends on whether or not we decide to support capability based architectures in the kernel together with 128 bit ones.
If we do, we will always have the necessity to distinguish in between what it is a pointer (allows deref) and what it is an address (no deref). Another dimension to keep into account is how we distinguish in between what belongs to the kernel to what belongs to the user.
Keeping this in mind, probably we require a matrix of four types to make sure we capture all the possibilies:
- For the kernel:
- ptraddr_t identifies an address
- uintptr_t identifies a pointer (capability)
- For the user:
- user_ptraddr_t identifies an address
- user_uintptr_t identifies a pointer (capability)
This approach seems generic and should work transparently for architectures that have >64bit addresses without using Cheri-style capabilities.
For what concerns the provenance (user vs kernel), we could use __user to detect it. Whereas tools like sparse should be OK with it, I am not sure the same can be applied to compilers (we need to discuss it with compiler guys). In fact I am not currently aware of any annotation that modifies directly the size of an integer type and there might be good reasons for that.
The good thing is that, in the current implementation, we are not using/planning to use uintcap_t (__kernel_uintcap_t) in many places, hence it should not be hard to replace them once we agree on the naming. Our implementation relies mosty on user_uintptr_t which is aligned with what I was mentioning above.
Note: I do not have a good eye for naming of the types, so suggestions are welcome.
- I assume we'll eventually need to do a mass-conversion from 'unsigned long' to 'uintptr_t' (or similar) inside the kernel at some point. This seems entirely uncontroversial but will need some interesting scripting using coccinelle.
Keeping in mind the distinction in between addresses and pointers, I think that having an automated way to convert the kernel would definitely make our life easier. Something I played in past with is combining python with coccinelle which might be useful in this situation as well.
When I last spoke to Torvalds about uintptr_t, he's completely opposed. In fact, we already do make a distinction inside the kernel between addresses (unsigned long) and pointers (void * / void __user *). Yes, there are places where we do confuse the two, but I think they're fixable.
This was what I was trying to propose in my talk, that we widen both long and pointer to 128-bit at the same time. It means we don't need the mass-conversion of long -> intptr_t.