Hi,
The following is a breakdown (as best I can figure) of the work needed
to demonstrate VirtIO backends in Rust on the Xen hypervisor. It
requires work across a number of projects but notably core rust and virtio
enabling in the Xen project (building on the work EPAM has already done)
and the start of enabling rust-vmm crate to work with Xen.
The first demo is a fairly simple toy to exercise the direct hypercall
approach for a unikernel backend. On it's own it isn't super impressive
but hopefully serves as a proof of concept for the idea of having
backends running in a single exception level where latency will be
important.
The second is a much more ambitious bridge between Xen and vhost-user to
allow for re-use of the existing vhost-user backends with the bridge
acting as a proxy for what would usually be a full VMM in the type-2
hypervisor case. With that in mind the rust-vmm work is only aimed at
doing the device emulation and doesn't address the larger question of
how type-1 hypervisors can be integrated into the rust-vmm hypervisor
model.
A quick note about the estimates. They are exceedingly rough guesses
plucked out of the air and I would be grateful for feedback from the
appropriate domain experts on if I'm being overly optimistic or
pessimistic.
The links to the Stratos JIRA should be at least read accessible to all
although they contain the same information as the attached document
(albeit with nicer PNG renderings of my ASCII art ;-). There is a
Stratos sync-up call next Thursday:
https://calendar.google.com/event?action=TEMPLATE&tmeid=MWpidm5lbzM5NjlydnA…
and I'm sure there will also be discussion in the various projects
(hence the wide CC list). The Stratos calls are open to anyone who wants
to attend and we welcome feedback from all who are interested.
So on with the work breakdown:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STRATOS PLANNING FOR 21 TO 22
Alex Bennée
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Table of Contents
─────────────────
1. Xen Rust Bindings ([STR-51])
.. 1. Upstream an "official" rust crate for Xen ([STR-52])
.. 2. Basic Hypervisor Interactions hypercalls ([STR-53])
.. 3. [#10] Access to XenStore service ([STR-54])
.. 4. VirtIO support hypercalls ([STR-55])
2. Xen Hypervisor Support for Stratos ([STR-56])
.. 1. Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
.. 2. Tweaks to tooling to launch VirtIO guests
3. rust-vmm support for Xen VirtIO ([STR-59])
.. 1. Make vm-memory Xen aware ([STR-60])
.. 2. Xen IO notification and IRQ injections ([STR-61])
4. Stratos Demos
.. 1. Rust based stubdomain monitor ([STR-62])
.. 2. Xen aware vhost-user master ([STR-63])
1 Xen Rust Bindings ([STR-51])
══════════════════════════════
There exists a [placeholder repository] with the start of a set of
x86_64 bindings for Xen and a very basic hello world uni-kernel
example. This forms the basis of the initial Xen Rust work and will be
available as a [xen-sys crate] via cargo.
[STR-51] <https://linaro.atlassian.net/browse/STR-51>
[placeholder repository] <https://gitlab.com/cardoe/oxerun.git>
[xen-sys crate] <https://crates.io/crates/xen-sys>
1.1 Upstream an "official" rust crate for Xen ([STR-52])
────────────────────────────────────────────────────────
To start with we will want an upstream location for future work to be
based upon. The intention is the crate is independent of the version
of Xen it runs on (above the baseline version chosen). This will
entail:
• ☐ agreeing with upstream the name/location for the source
• ☐ documenting the rules for the "stable" hypercall ABI
• ☐ establish an internal interface to elide between ioctl mediated
and direct hypercalls
• ☐ ensure the crate is multi-arch and has feature parity for arm64
As such we expect the implementation to be standalone, i.e. not
wrapping the existing Xen libraries for mediation. There should be a
close (1-to-1) mapping between the interfaces in the crate and the
eventual hypercall made to the hypervisor.
Estimate: 4w (elapsed likely longer due to discussion)
[STR-52] <https://linaro.atlassian.net/browse/STR-52>
1.2 Basic Hypervisor Interactions hypercalls ([STR-53])
───────────────────────────────────────────────────────
These are the bare minimum hypercalls implemented as both ioctl and
direct calls. These allow for a very basic binary to:
• ☐ console_io - output IO via the Xen console
• ☐ domctl stub - basic stub for domain control (different API?)
• ☐ sysctl stub - basic stub for system control (different API?)
The idea would be this provides enough hypercall interface to query
the list of domains and output their status via the xen console. There
is an open question about if the domctl and sysctl hypercalls are way
to go.
Estimate: 6w
[STR-53] <https://linaro.atlassian.net/browse/STR-53>
1.3 [#10] Access to XenStore service ([STR-54])
───────────────────────────────────────────────
This is a shared configuration storage space accessed via either Unix
sockets (on dom0) or via the Xenbus. This is used to access
configuration information for the domain.
Is this needed for a backend though? Can everything just be passed
direct on the command line?
Estimate: 4w
[STR-54] <https://linaro.atlassian.net/browse/STR-54>
1.4 VirtIO support hypercalls ([STR-55])
────────────────────────────────────────
These are the hypercalls that need to be implemented to support a
VirtIO backend. This includes the ability to map another guests memory
into the current domains address space, register to receive IOREQ
events when the guest knocks at the doorbell and inject kicks into the
guest. The hypercalls we need to support would be:
• ☐ dmop - device model ops (*_ioreq_server, setirq, nr_vpus)
• ☐ foreignmemory - map and unmap guest memory
The DMOP space is larger than what we need for an IOREQ backend so
I've based it just on what arch/arm/dm.c exports which is the subset
introduced for EPAM's virtio work.
Estimate: 12w
[STR-55] <https://linaro.atlassian.net/browse/STR-55>
2 Xen Hypervisor Support for Stratos ([STR-56])
═══════════════════════════════════════════════
These tasks include tasks needed to support the various different
deployments of Stratos components in Xen.
[STR-56] <https://linaro.atlassian.net/browse/STR-56>
2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
───────────────────────────────────────────────────────────────
Currently the foreign memory mapping support only works for dom0 due
to reference counting issues. If we are to support backends running in
their own domains this will need to get fixed.
Estimate: 8w
[STR-57] <https://linaro.atlassian.net/browse/STR-57>
2.2 Tweaks to tooling to launch VirtIO guests
─────────────────────────────────────────────
There might not be too much to do here. The EPAM work already did
something similar for their PoC for virtio-block. Essentially we need
to ensure:
• ☐ DT bindings are passed to the guest for virtio-mmio device
discovery
• ☐ Our rust backend can be instantiated before the domU is launched
This currently assumes the tools and the backend are running in dom0.
Estimate: 4w
3 rust-vmm support for Xen VirtIO ([STR-59])
════════════════════════════════════════════
This encompasses the tasks required to get a vhost-user server up and
running while interfacing to the Xen hypervisor. This will require the
xen-sys.rs crate for the actual interface to the hypervisor.
We need to work out how a Xen configuration option would be passed to
the various bits of rust-vmm when something is being built.
[STR-59] <https://linaro.atlassian.net/browse/STR-59>
3.1 Make vm-memory Xen aware ([STR-60])
───────────────────────────────────────
The vm-memory crate is the root crate for abstracting access to the
guests memory. It currently has multiple configuration builds to
handle difference between mmap on Windows and Unix. Although mmap
isn't directly exposed the public interfaces support a mmap like
interface. We would need to:
• ☐ work out how to expose foreign memory via the vm-memory mechanism
I'm not sure if this just means implementing the GuestMemory trait for
a GuestMemoryXen or if we need to present a mmap like interface.
Estimate: 8w
[STR-60] <https://linaro.atlassian.net/browse/STR-60>
3.2 Xen IO notification and IRQ injections ([STR-61])
─────────────────────────────────────────────────────
The KVM world provides for ioeventfd (notifications) and irqfd
(injection) to signal asynchronously between the guest and the
backend. As far a I can tell this is currently handled inside the
various VMMs which assume a KVM backend.
While the vhost-user slave code doesn't see the
register_ioevent/register_irqfd events it does deal with EventFDs
throughout the code. Perhaps the best approach here would be to create
a IOREQ crate that can create EventFD descriptors which can then be
passed to the slaves to use for notification and injection.
Otherwise there might be an argument for a new crate that can
encapsulate this behaviour for both KVM/ioeventd and Xen/IOREQ setups?
Estimate: 8w?
[STR-61] <https://linaro.atlassian.net/browse/STR-61>
4 Stratos Demos
═══════════════
These tasks cover the creation of demos that brig together all the
previous bits of work to demonstrate a new area of capability that has
been opened up by Stratos work.
4.1 Rust based stubdomain monitor ([STR-62])
────────────────────────────────────────────
This is a basic demo that is a proof of concept for a unikernel style
backend written in pure Rust. This work would be a useful precursor
for things such as the RTOS Dom0 on a safety island ([STR-11]) or as a
carrier for the virtio-scmi backend.
The monitor program will periodically poll the state of the other
domains and echo their status to the Xen console.
Estimate: 4w
#+name: stub-domain-example
#+begin_src ditaa :cmdline -o :file stub_domain_example.png
Dom0 | DomU | DomStub
| |
: /-------------\ :
| |cPNK | |
| | | |
| | | |
/------------------------------------\ | | GuestOS | |
|cPNK | | | | |
EL0 | Dom0 Userspace (xl tools, QEMU) | | | | | /---------------\
| | | | | | |cYEL |
\------------------------------------/ | | | | | |
+------------------------------------+ | | | | | Rust Monitor |
EL1 |cA1B Dom0 Kernel | | | | | | |
+------------------------------------+ | \-------------/ | \---------------/
-------------------------------------------------------------------------------=------------------
+-------------------------------------------------------------------------------------+
EL2 |cC02 Xen Hypervisor |
+-------------------------------------------------------------------------------------+
#+end_src
[STR-62] <https://linaro.atlassian.net/browse/STR-62>
[STR-11] <https://linaro.atlassian.net/browse/STR-11>
4.2 Xen aware vhost-user master ([STR-63])
──────────────────────────────────────────
Usually the master side of a vhost-user system is embedded directly in
the VMM itself. However in a Xen deployment their is no overarching
VMM but a series of utility programs that query the hypervisor
directly. The Xen tooling is also responsible for setting up any
support processes that are responsible for emulating HW for the guest.
The task aims to bridge the gap between Xen's normal HW emulation path
(ioreq) and VirtIO's userspace device emulation (vhost-user). The
process would be started with some information on where the
virtio-mmio address space is and what the slave binary will be. It
will then:
• map the guest into Dom0 userspace and attach to a MemFD
• register the appropriate memory regions as IOREQ regions with Xen
• create EventFD channels for the virtio kick notifications (one each
way)
• spawn the vhost-user slave process and mediate the notifications and
kicks between the slave and Xen itself
#+name: xen-vhost-user-master
#+begin_src ditaa :cmdline -o :file xen_vhost_user_master.png
Dom0 DomU
|
|
|
|
|
|
+-------------------+ +-------------------+ |
| |----------->| | |
| vhost-user | vhost-user | vhost-user | : /------------------------------------\
| slave | protocol | master | | | |
| (existing) |<-----------| (rust) | | | |
+-------------------+ +-------------------+ | | |
^ ^ | ^ | | Guest Userspace |
| | | | | | |
| | | IOREQ | | | |
| | | | | | |
v v V | | \------------------------------------/
+---------------------------------------------------+ | +------------------------------------+
| ^ ^ | ioctl ^ | | | |
| | iofd/irqfd eventFD | | | | | | Guest Kernel |
| +---------------------------+ | | | | | +-------------+ |
| | | | | | | virtio-dev | |
| Host Kernel V | | | | +-------------+ |
+---------------------------------------------------+ | +------------------------------------+
| ^ | | ^
| hyper | | |
----------------------=------------- | -=--- | ----=------ | -----=- | --------=------------------
| call | Trap | | IRQ
V | V |
+-------------------------------------------------------------------------------------+
| | ^ | ^ |
| | +-------------+ | |
EL2 | Xen Hypervisor | | |
| +-------------------------------+ |
| |
+-------------------------------------------------------------------------------------+
#+end_src
[STR-63] <https://linaro.atlassian.net/browse/STR-63>
--
Alex Bennée
Hi,
One of the goals of Project Stratos is to enable hypervisor agnostic
backends so we can enable as much re-use of code as possible and avoid
repeating ourselves. This is the flip side of the front end where
multiple front-end implementations are required - one per OS, assuming
you don't just want Linux guests. The resultant guests are trivially
movable between hypervisors modulo any abstracted paravirt type
interfaces.
In my original thumb nail sketch of a solution I envisioned vhost-user
daemons running in a broadly POSIX like environment. The interface to
the daemon is fairly simple requiring only some mapped memory and some
sort of signalling for events (on Linux this is eventfd). The idea was a
stub binary would be responsible for any hypervisor specific setup and
then launch a common binary to deal with the actual virtqueue requests
themselves.
Since that original sketch we've seen an expansion in the sort of ways
backends could be created. There is interest in encapsulating backends
in RTOSes or unikernels for solutions like SCMI. There interest in Rust
has prompted ideas of using the trait interface to abstract differences
away as well as the idea of bare-metal Rust backends.
We have a card (STR-12) called "Hypercall Standardisation" which
calls for a description of the APIs needed from the hypervisor side to
support VirtIO guests and their backends. However we are some way off
from that at the moment as I think we need to at least demonstrate one
portable backend before we start codifying requirements. To that end I
want to think about what we need for a backend to function.
Configuration
=============
In the type-2 setup this is typically fairly simple because the host
system can orchestrate the various modules that make up the complete
system. In the type-1 case (or even type-2 with delegated service VMs)
we need some sort of mechanism to inform the backend VM about key
details about the system:
- where virt queue memory is in it's address space
- how it's going to receive (interrupt) and trigger (kick) events
- what (if any) resources the backend needs to connect to
Obviously you can elide over configuration issues by having static
configurations and baking the assumptions into your guest images however
this isn't scalable in the long term. The obvious solution seems to be
extending a subset of Device Tree data to user space but perhaps there
are other approaches?
Before any virtio transactions can take place the appropriate memory
mappings need to be made between the FE guest and the BE guest.
Currently the whole of the FE guests address space needs to be visible
to whatever is serving the virtio requests. I can envision 3 approaches:
* BE guest boots with memory already mapped
This would entail the guest OS knowing where in it's Guest Physical
Address space is already taken up and avoiding clashing. I would assume
in this case you would want a standard interface to userspace to then
make that address space visible to the backend daemon.
* BE guests boots with a hypervisor handle to memory
The BE guest is then free to map the FE's memory to where it wants in
the BE's guest physical address space. To activate the mapping will
require some sort of hypercall to the hypervisor. I can see two options
at this point:
- expose the handle to userspace for daemon/helper to trigger the
mapping via existing hypercall interfaces. If using a helper you
would have a hypervisor specific one to avoid the daemon having to
care too much about the details or push that complexity into a
compile time option for the daemon which would result in different
binaries although a common source base.
- expose a new kernel ABI to abstract the hypercall differences away
in the guest kernel. In this case the userspace would essentially
ask for an abstract "map guest N memory to userspace ptr" and let
the kernel deal with the different hypercall interfaces. This of
course assumes the majority of BE guests would be Linux kernels and
leaves the bare-metal/unikernel approaches to their own devices.
Operation
=========
The core of the operation of VirtIO is fairly simple. Once the
vhost-user feature negotiation is done it's a case of receiving update
events and parsing the resultant virt queue for data. The vhost-user
specification handles a bunch of setup before that point, mostly to
detail where the virt queues are set up FD's for memory and event
communication. This is where the envisioned stub process would be
responsible for getting the daemon up and ready to run. This is
currently done inside a big VMM like QEMU but I suspect a modern
approach would be to use the rust-vmm vhost crate. It would then either
communicate with the kernel's abstracted ABI or be re-targeted as a
build option for the various hypervisors.
One question is how to best handle notification and kicks. The existing
vhost-user framework uses eventfd to signal the daemon (although QEMU
is quite capable of simulating them when you use TCG). Xen has it's own
IOREQ mechanism. However latency is an important factor and having
events go through the stub would add quite a lot.
Could we consider the kernel internally converting IOREQ messages from
the Xen hypervisor to eventfd events? Would this scale with other kernel
hypercall interfaces?
So any thoughts on what directions are worth experimenting with?
--
Alex Bennée
The I2C protocol allows zero-length requests with no data, like the
SMBus Quick command, where the command is inferred based on the
read/write flag itself.
In order to allow such a request, allocate another bit,
VIRTIO_I2C_FLAGS_M_RD(1), in the flags to pass the request type, as read
or write. This was earlier done using the read/write permission to the
buffer itself.
This still won't work well if multiple buffers are passed for the same
request, i.e. the write-read requests, as the VIRTIO_I2C_FLAGS_M_RD flag
can only be used with a single buffer.
Coming back to it, there is no need to send multiple buffers with a
single request. All we need, is a way to group several requests
together, which we can already do based on the
VIRTIO_I2C_FLAGS_FAIL_NEXT flag.
Remove support for multiple buffers within a single request.
Since we are at very early stage of development currently, we can do
these modifications without addition of new features or versioning of
the protocol.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
V2->V3:
- Add conformance clauses that require that the flag is consistent with the
buffer.
V1->V2:
- Name the buffer-less request as zero-length request.
Hi Guys,
I did try to follow the discussion you guys had during V4, where we added
support for multiple buffers for the same request, which I think is unnecessary
now, after introduction of the VIRTIO_I2C_FLAGS_FAIL_NEXT flag.
https://lists.oasis-open.org/archives/virtio-comment/202011/msg00005.html
And so starting this discussion again, because we need to support stuff
like: i2cdetect -q <i2c-bus-number>, which issues a zero-length SMBus
Quick command.
---
virtio-i2c.tex | 66 +++++++++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 30 deletions(-)
diff --git a/virtio-i2c.tex b/virtio-i2c.tex
index 949d75f44158..c7335372a8bb 100644
--- a/virtio-i2c.tex
+++ b/virtio-i2c.tex
@@ -54,8 +54,7 @@ \subsubsection{Device Operation: Request Queue}\label{sec:Device Types / I2C Ada
\begin{lstlisting}
struct virtio_i2c_req {
struct virtio_i2c_out_hdr out_hdr;
- u8 write_buf[];
- u8 read_buf[];
+ u8 buf[];
struct virtio_i2c_in_hdr in_hdr;
};
\end{lstlisting}
@@ -84,16 +83,16 @@ \subsubsection{Device Operation: Request Queue}\label{sec:Device Types / I2C Ada
and sets it on the other requests. If this bit is set and a device fails
to process the current request, it needs to fail the next request instead
of attempting to execute it.
+
+\item[VIRTIO_I2C_FLAGS_M_RD(1)] is used to mark the request as READ or WRITE.
\end{description}
Other bits of \field{flags} are currently reserved as zero for future feature
extensibility.
-The \field{write_buf} of the request contains one segment of an I2C transaction
-being written to the device.
-
-The \field{read_buf} of the request contains one segment of an I2C transaction
-being read from the device.
+The \field{buf} of the request is optional and contains one segment of an I2C
+transaction being read from or written to the device, based on the value of the
+\field{VIRTIO_I2C_FLAGS_M_RD} bit in the \field{flags} field.
The final \field{status} byte of the request is written by the device: either
VIRTIO_I2C_MSG_OK for success or VIRTIO_I2C_MSG_ERR for error.
@@ -103,27 +102,27 @@ \subsubsection{Device Operation: Request Queue}\label{sec:Device Types / I2C Ada
#define VIRTIO_I2C_MSG_ERR 1
\end{lstlisting}
-If ``length of \field{read_buf}''=0 and ``length of \field{write_buf}''>0,
-the request is called write request.
+If \field{VIRTIO_I2C_FLAGS_M_RD} bit is set in the \field{flags}, then the
+request is called a read request.
-If ``length of \field{read_buf}''>0 and ``length of \field{write_buf}''=0,
-the request is called read request.
+If \field{VIRTIO_I2C_FLAGS_M_RD} bit is unset in the \field{flags}, then the
+request is called a write request.
-If ``length of \field{read_buf}''>0 and ``length of \field{write_buf}''>0,
-the request is called write-read request. It means an I2C write segment followed
-by a read segment. Usually, the write segment provides the number of an I2C
-controlled device register to be read.
+The \field{buf} is optional and will not be present for a zero-length request,
+like SMBus Quick.
-The case when ``length of \field{write_buf}''=0, and at the same time,
-``length of \field{read_buf}''=0 doesn't make any sense.
+The virtio I2C protocol supports write-read requests, i.e. an I2C write segment
+followed by a read segment (usually, the write segment provides the number of an
+I2C controlled device register to be read), by grouping a list of requests
+together using the \field{VIRTIO_I2C_FLAGS_FAIL_NEXT} flag.
\subsubsection{Device Operation: Operation Status}\label{sec:Device Types / I2C Adapter Device / Device Operation: Operation Status}
-\field{addr}, \field{flags}, ``length of \field{write_buf}'' and ``length of \field{read_buf}''
-are determined by the driver, while \field{status} is determined by the processing
-of the device. A driver puts the data written to the device into \field{write_buf}, while
-a device puts the data of the corresponding length into \field{read_buf} according to the
-request of the driver.
+\field{addr}, \field{flags}, and ``length of \field{buf}'' are determined by the
+driver, while \field{status} is determined by the processing of the device. A
+driver, for a write request, puts the data to be written to the device into the
+\field{buf}, while a device, for a read request, puts the data read from device
+into the \field{buf} according to the request from the driver.
A driver may send one request or multiple requests to the device at a time.
The requests in the virtqueue are both queued and processed in order.
@@ -141,11 +140,16 @@ \subsubsection{Device Operation: Operation Status}\label{sec:Device Types / I2C
A driver MUST set the reserved bits of \field{flags} to be zero.
-The driver MUST NOT send a request with ``length of \field{write_buf}''=0 and
-``length of \field{read_buf}''=0 at the same time.
+A driver MUST NOT send the \field{buf}, for a zero-length request.
+
+A driver MUST NOT use \field{buf}, for a read request, if the final
+\field{status} returned from the device is VIRTIO_I2C_MSG_ERR.
-A driver MUST NOT use \field{read_buf} if the final \field{status} returned
-from the device is VIRTIO_I2C_MSG_ERR.
+A driver MUST set the \field{VIRTIO_I2C_FLAGS_M_RD} flag for a read operation,
+where the buffer is write-only for the device.
+
+A driver MUST NOT set the \field{VIRTIO_I2C_FLAGS_M_RD} flag for a write
+operation, where the buffer is read-only for the device.
A driver MUST queue the requests in order if multiple requests are going to
be sent at a time.
@@ -160,11 +164,13 @@ \subsubsection{Device Operation: Operation Status}\label{sec:Device Types / I2C
A device SHOULD keep consistent behaviors with the hardware as described in
\hyperref[intro:I2C]{I2C}.
-A device MUST NOT change the value of \field{addr}, reserved bits of \field{flags}
-and \field{write_buf}.
+A device MUST NOT change the value of \field{addr}, and reserved bits of
+\field{flags}.
+
+A device MUST not change the value of the \field{buf} for a write request.
-A device MUST place one I2C segment of the corresponding length into \field{read_buf}
-according the driver's request.
+A device MUST place one I2C segment of the ``length of \field{buf}'', for the
+read request, into the \field{buf} according the driver's request.
A device MUST guarantee the requests in the virtqueue being processed in order
if multiple requests are received at a time.
--
2.31.1.272.g89b43f80a514
Hi,
As we consider the next cycle for Project Stratos I would like to make
some more progress on hypervisor agnosticism for our virtio backends.
While we have implemented a number of virtio vhost-user backends using C
we've rapidly switched to using rust-vmm based ones for virtio-i2c,
virtio-rng and virtio-gpio. Given the interest in Rust for implementing
backends does it make sense to do some enabling work in rust-vmm to
support Xen?
There are two chunks of work I can think of:
1. Enough of libxl/hypervisor interface to implement an IOREQ end point.
This would require supporting enough of the hypervisor interface to
support the implementation of an IOREQ server. We would also need to
think about how we would map the IOREQ view of the world into the
existing vhost-user interface so we can re-use the current vhost-user
backends code base. The two approaches I can think of are:
a) implement a vhost-user master that speaks IOREQ to the hypervisor
and vhost-user to the vhost-user slave. In this case the bridge
would be standing in for something like QEMU.
b) implement some variants of the vhost-user slave traits that can
talk directly to the hypervisor to get/send the equivalent
kick/notify events. I don't know if this might be too complex as the
impedance matching between the two interfaces might be too great.
This assumes most of the setup is done by the existing toolstack, so
the existing libxl tools are used to create, connect and configure the
domains before the backend is launched.
which leads to:
2. The rest of the libxl/hypervisor interface.
This would be the rest of the interface to allow rust-vmm tools to be
written that could create, configure and manage Xen domains with pure
rust tools. My main concern about this is how rust-vmm's current model
(which is very much KVM influenced) will be able to handle the
differences for a type-1 hypervisor. Wei's pointed me to the Linux
support that was added to expose a Hyper-V control interface via the
Linux kernel. While I can see support has been merged on other rust
based projects I think the rust-vmm crate is still outstanding:
https://github.com/rust-vmm/community/issues/50
and I guess this would need revisiting for Xen to see if the proposed
abstraction would scale across other hypervisors.
Finally there is the question of how/if any of this would relate to the
concept of bare-metal rust backends? We've talked about bare metal
backends before but I wonder if the programming model for them is going
to be outside the scope of rust-vmm? Would be program just be hardwired
to IRQs and be presented a doorbell port to kick or would we want to
have at least some of the higher level rust-vmm abstractions for dealing
with navigating the virtqueues and responding and filling in data?
Thoughts?
--
Alex Bennée
Hello,
This patchset adds vhost-user-i2c device's support in Qemu. Initially I tried to
add the backend implementation as well into Qemu, but as I was looking for a
hypervisor agnostic backend implementation, I decided to keep it outside of
Qemu. Eventually I implemented it in Rust and it works very well with this
patchset, and it is under review [1] to be merged in common rust vhost devices
crate.
The kernel virtio I2C driver [2] is fully reviewed and is ready to be merged soon.
V1->V2:
- Dropped the backend support from qemu and minor cleanups.
I2C Testing:
------------
I didn't have access to a real hardware where I can play with a I2C
client device (like RTC, eeprom, etc) to verify the working of the
backend daemon, so I decided to test it on my x86 box itself with
hierarchy of two ARM64 guests.
The first ARM64 guest was passed "-device ds1338,address=0x20" option,
so it could emulate a ds1338 RTC device, which connects to an I2C bus.
Once the guest came up, ds1338 device instance was created within the
guest kernel by doing:
echo ds1338 0x20 > /sys/bus/i2c/devices/i2c-0/new_device
[
Note that this may end up binding the ds1338 device to its driver,
which won't let our i2c daemon talk to the device. For that we need to
manually unbind the device from the driver:
echo 0-0020 > /sys/bus/i2c/devices/0-0020/driver/unbind
]
After this is done, you will get /dev/rtc1. This is the device we wanted
to emulate, which will be accessed by the vhost-user-i2c backend daemon
via the /dev/i2c-0 file present in the guest VM.
At this point we need to start the backend daemon and give it a
socket-path to talk to from qemu (you can pass -v to it to get more
detailed messages):
vhost-user-i2c --socket-path=vi2c.sock -l 0:32
[ Here, 0:32 is the bus/device mapping, 0 for /dev/i2c-0 and 32 (i.e.
0x20) is client address of ds1338 that we used while creating the
device. ]
Now we need to start the second level ARM64 guest (from within the first
guest) to get the i2c-virtio.c Linux driver up. The second level guest
is passed the following options to connect to the same socket:
-chardev socket,path=vi2c.sock0,id=vi2c \
-device vhost-user-i2c-pci,chardev=vi2c,id=i2c
Once the second level guest boots up, we will see the i2c-virtio bus at
/sys/bus/i2c/devices/i2c-X/. From there we can now make it emulate the
ds1338 device again by doing:
echo ds1338 0x20 > /sys/bus/i2c/devices/i2c-0/new_device
[ This time we want ds1338's driver to be bound to the device, so it
should be enabled in the kernel as well. ]
And we will get /dev/rtc1 device again here in the second level guest.
Now we can play with the rtc device with help of hwclock utility and we
can see the following sequence of transfers happening if we try to
update rtc's time from system time.
hwclock -w -f /dev/rtc1 (in guest2) ->
Reaches i2c-virtio.c (Linux bus driver in guest2) ->
transfer over virtio ->
Reaches the qemu's vhost-i2c device emulation (running over guest1) ->
Reaches the backend daemon vhost-user-i2c started earlier (in guest1) ->
ioctl(/dev/i2c-0, I2C_RDWR, ..); (in guest1) ->
reaches qemu's hw/rtc/ds1338.c (running over host)
SMBUS Testing:
--------------
I wasn't required to have such a tedious setup for testing out with
SMBUS devices. I was able to emulate a SMBUS device on my x86 machine
using i2c-stub driver.
$ modprobe i2c-stub chip_addr=0x20
//Boot the arm64 guest now with i2c-virtio driver and then do:
$ echo al3320a 0x20 > /sys/class/i2c-adapter/i2c-0/new_device
$ cat /sys/bus/iio/devices/iio:device0/in_illuminance_raw
That's it.
I hope I was able to give a clear picture of my test setup here :)
--
Viresh
Viresh Kumar (3):
hw/virtio: add boilerplate for vhost-user-i2c device
hw/virtio: add vhost-user-i2c-pci boilerplate
MAINTAINERS: Add entry for virtio-i2c
MAINTAINERS | 7 +
hw/virtio/Kconfig | 5 +
hw/virtio/meson.build | 2 +
hw/virtio/vhost-user-i2c-pci.c | 69 +++++++
hw/virtio/vhost-user-i2c.c | 288 +++++++++++++++++++++++++++++
include/hw/virtio/vhost-user-i2c.h | 28 +++
6 files changed, 399 insertions(+)
create mode 100644 hw/virtio/vhost-user-i2c-pci.c
create mode 100644 hw/virtio/vhost-user-i2c.c
create mode 100644 include/hw/virtio/vhost-user-i2c.h
--
2.31.1.272.g89b43f80a514
Hi
I believe there is a hidden problem in the virtio implementation of Qemu
(up to 6.1.0) in calculating the offset of the "used" split vring and the
spec need some clarifications. Should anyone decide it deserve
upstream/spec changes, feel free to do so.
Cheers
FF
According to the specification in 2.6.2:
#define ALIGN(x) (((x) + qalign) & ~qalign)
static inline unsigned virtq_size(unsigned int qsz)
{
return ALIGN(sizeof(struct virtq_desc)*qsz + sizeof(u16)*(*3* + qsz))
}
And more specifically, "used" starts after "avail" as defined in 2.6.6
struct virtq_avail {
#define VIRTQ_AVAIL_F_NO_INTERRUPT 1
le16 flags;
le16 idx;
le16 ring[ /* Queue Size */ ];
*le16 used_event; /* Only if VIRTIO_F_EVENT_IDX */*
};
Linux and kvmtool calculates the offset with the formula:
LINUX: vring_init @ include/uapi/linux/virtio_ring.h
vr->avail = (struct vring_avail *)((char *)p + num * sizeof(struct
vring_desc));
vr->used = (void *)(((uintptr_t)&vr->avail->ring[num] *+ sizeof(__virtio16)*
+ align-1) & ~(align - 1));
The "+ sizeof(__virtio16)" properly accounts for the "used_event" in struct
virtue_avail.
Hypervisor ACRN uses a similar scheme: virtio_vq_init @
/devicemodel/hw/pci/virtio/virtio.c
vq->avail = (struct vring_avail *)vb;
vb += (2 + vq->qsize *+ 1*) * sizeof(uint16_t);
/* Then it's rounded up to the next page... */
vb = (char *)roundup2((uintptr_t)vb, VIRTIO_PCI_VRING_ALIGN);
/* ... and the last page(s) are the used ring. */
vq->used = (struct vring_used *)vb;
But Qemu uses: QEMU: virtio_queue_update_rings @ hw/virtio/virtio.c
vring->used = vring_align(vring->avail +
offsetof(VRingAvail, ring[vring->num]),
vring->align);
Linux alignment policies end up having vring->align values either 4096 (for
MMIO) or 64 (PCI), and thus there are no visible issues.
If you use a different OS that choses an alignment of 4 (valid as per
section 2.6) then Qemu does not calculate the same location for used and
virtio does not work anymore. The OS actually works fine with the alignment
of 4 with kvmtool and ACRN.
There are two other problems:
on the spec, the comment "/* Only if VIRTIO_F_EVENT_IDX */" on the avail
structure is not clear wether if:
- the field is always there but its content are only valid if...
- the field may be absent altogether.
Inferring from calculation formulae the field is always present, but some
language would help clarifying this.
On 2.6.2, the alignment formula "#define ALIGN(x) (((x) + qalign) &
~qalign)" is true if "qalign" is actually a mask as in many parts of the
spec, align is referred to as a power of 2. It may be good to change the
text with something like: #define ALIGN(x) (((x) + (qalign - 1)) & ~(qalign
-1)) /* where qalign is a power of Z */
--
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog(a)linaro.org | Skype: ffozog
Hi
I was asked by AGL Virtualization AG if there was any implementation of the
virtio-video driver: could anyone let me know what I should answer?
Cheers
FF
--
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog(a)linaro.org | Skype: ffozog
I top post as I find it difficult to identify where to make the comments.
1) BE acceleration
Network and storage backends may actually be executed in SmartNICs. As
virtio 1.1 is hardware friendly, there may be SmartNICs with virtio 1.1 PCI
VFs. Is it a valid use case for the generic BE framework to be used in this
context?
DPDK is used in some BE to significantly accelerate switching. DPDK is also
used sometimes in guests. In that case, there are no event injection but
just high performance memory scheme. Is this considered as a use case?
2) Virtio as OS HAL
Panasonic CTO has been calling for a virtio based HAL and based on the
teachings of Google GKI, an internal HAL seem inevitable in the long term.
Virtio is then a contender to Google promoted Android HAL. Could the
framework be used in that context?
On Wed, 11 Aug 2021 at 08:28, AKASHI Takahiro via Stratos-dev <
stratos-dev(a)op-lists.linaro.org> wrote:
> On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > CCing people working on Xen+VirtIO and IOREQs. Not trimming the original
> > email to let them read the full context.
> >
> > My comments below are related to a potential Xen implementation, not
> > because it is the only implementation that matters, but because it is
> > the one I know best.
>
> Please note that my proposal (and hence the working prototype)[1]
> is based on Xen's virtio implementation (i.e. IOREQ) and particularly
> EPAM's virtio-disk application (backend server).
> It has been, I believe, well generalized but is still a bit biased
> toward this original design.
>
> So I hope you like my approach :)
>
> [1]
> https://op-lists.linaro.org/pipermail/stratos-dev/2021-August/000546.html
>
> Let me take this opportunity to explain a bit more about my approach below.
>
> > Also, please see this relevant email thread:
> > https://marc.info/?l=xen-devel&m=162373754705233&w=2
> >
> >
> > On Wed, 4 Aug 2021, Alex Bennée wrote:
> > > Hi,
> > >
> > > One of the goals of Project Stratos is to enable hypervisor agnostic
> > > backends so we can enable as much re-use of code as possible and avoid
> > > repeating ourselves. This is the flip side of the front end where
> > > multiple front-end implementations are required - one per OS, assuming
> > > you don't just want Linux guests. The resultant guests are trivially
> > > movable between hypervisors modulo any abstracted paravirt type
> > > interfaces.
> > >
> > > In my original thumb nail sketch of a solution I envisioned vhost-user
> > > daemons running in a broadly POSIX like environment. The interface to
> > > the daemon is fairly simple requiring only some mapped memory and some
> > > sort of signalling for events (on Linux this is eventfd). The idea was
> a
> > > stub binary would be responsible for any hypervisor specific setup and
> > > then launch a common binary to deal with the actual virtqueue requests
> > > themselves.
> > >
> > > Since that original sketch we've seen an expansion in the sort of ways
> > > backends could be created. There is interest in encapsulating backends
> > > in RTOSes or unikernels for solutions like SCMI. There interest in Rust
> > > has prompted ideas of using the trait interface to abstract differences
> > > away as well as the idea of bare-metal Rust backends.
> > >
> > > We have a card (STR-12) called "Hypercall Standardisation" which
> > > calls for a description of the APIs needed from the hypervisor side to
> > > support VirtIO guests and their backends. However we are some way off
> > > from that at the moment as I think we need to at least demonstrate one
> > > portable backend before we start codifying requirements. To that end I
> > > want to think about what we need for a backend to function.
> > >
> > > Configuration
> > > =============
> > >
> > > In the type-2 setup this is typically fairly simple because the host
> > > system can orchestrate the various modules that make up the complete
> > > system. In the type-1 case (or even type-2 with delegated service VMs)
> > > we need some sort of mechanism to inform the backend VM about key
> > > details about the system:
> > >
> > > - where virt queue memory is in it's address space
> > > - how it's going to receive (interrupt) and trigger (kick) events
> > > - what (if any) resources the backend needs to connect to
> > >
> > > Obviously you can elide over configuration issues by having static
> > > configurations and baking the assumptions into your guest images
> however
> > > this isn't scalable in the long term. The obvious solution seems to be
> > > extending a subset of Device Tree data to user space but perhaps there
> > > are other approaches?
> > >
> > > Before any virtio transactions can take place the appropriate memory
> > > mappings need to be made between the FE guest and the BE guest.
> >
> > > Currently the whole of the FE guests address space needs to be visible
> > > to whatever is serving the virtio requests. I can envision 3
> approaches:
> > >
> > > * BE guest boots with memory already mapped
> > >
> > > This would entail the guest OS knowing where in it's Guest Physical
> > > Address space is already taken up and avoiding clashing. I would
> assume
> > > in this case you would want a standard interface to userspace to then
> > > make that address space visible to the backend daemon.
>
> Yet another way here is that we would have well known "shared memory"
> between
> VMs. I think that Jailhouse's ivshmem gives us good insights on this matter
> and that it can even be an alternative for hypervisor-agnostic solution.
>
> (Please note memory regions in ivshmem appear as a PCI device and can be
> mapped locally.)
>
> I want to add this shared memory aspect to my virtio-proxy, but
> the resultant solution would eventually look similar to ivshmem.
>
> > > * BE guests boots with a hypervisor handle to memory
> > >
> > > The BE guest is then free to map the FE's memory to where it wants in
> > > the BE's guest physical address space.
> >
> > I cannot see how this could work for Xen. There is no "handle" to give
> > to the backend if the backend is not running in dom0. So for Xen I think
> > the memory has to be already mapped
>
> In Xen's IOREQ solution (virtio-blk), the following information is expected
> to be exposed to BE via Xenstore:
> (I know that this is a tentative approach though.)
> - the start address of configuration space
> - interrupt number
> - file path for backing storage
> - read-only flag
> And the BE server have to call a particular hypervisor interface to
> map the configuration space.
>
> In my approach (virtio-proxy), all those Xen (or hypervisor)-specific
> stuffs are contained in virtio-proxy, yet another VM, to hide all details.
>
> # My point is that a "handle" is not mandatory for executing mapping.
>
> > and the mapping probably done by the
> > toolstack (also see below.) Or we would have to invent a new Xen
> > hypervisor interface and Xen virtual machine privileges to allow this
> > kind of mapping.
>
> > If we run the backend in Dom0 that we have no problems of course.
>
> One of difficulties on Xen that I found in my approach is that calling
> such hypervisor intefaces (registering IOREQ, mapping memory) is only
> allowed on BE servers themselvies and so we will have to extend those
> interfaces.
> This, however, will raise some concern on security and privilege
> distribution
> as Stefan suggested.
> >
> >
> > > To activate the mapping will
> > > require some sort of hypercall to the hypervisor. I can see two
> options
> > > at this point:
> > >
> > > - expose the handle to userspace for daemon/helper to trigger the
> > > mapping via existing hypercall interfaces. If using a helper you
> > > would have a hypervisor specific one to avoid the daemon having to
> > > care too much about the details or push that complexity into a
> > > compile time option for the daemon which would result in different
> > > binaries although a common source base.
> > >
> > > - expose a new kernel ABI to abstract the hypercall differences away
> > > in the guest kernel. In this case the userspace would essentially
> > > ask for an abstract "map guest N memory to userspace ptr" and let
> > > the kernel deal with the different hypercall interfaces. This of
> > > course assumes the majority of BE guests would be Linux kernels and
> > > leaves the bare-metal/unikernel approaches to their own devices.
> > >
> > > Operation
> > > =========
> > >
> > > The core of the operation of VirtIO is fairly simple. Once the
> > > vhost-user feature negotiation is done it's a case of receiving update
> > > events and parsing the resultant virt queue for data. The vhost-user
> > > specification handles a bunch of setup before that point, mostly to
> > > detail where the virt queues are set up FD's for memory and event
> > > communication. This is where the envisioned stub process would be
> > > responsible for getting the daemon up and ready to run. This is
> > > currently done inside a big VMM like QEMU but I suspect a modern
> > > approach would be to use the rust-vmm vhost crate. It would then either
> > > communicate with the kernel's abstracted ABI or be re-targeted as a
> > > build option for the various hypervisors.
> >
> > One thing I mentioned before to Alex is that Xen doesn't have VMMs the
> > way they are typically envisioned and described in other environments.
> > Instead, Xen has IOREQ servers. Each of them connects independently to
> > Xen via the IOREQ interface. E.g. today multiple QEMUs could be used as
> > emulators for a single Xen VM, each of them connecting to Xen
> > independently via the IOREQ interface.
> >
> > The component responsible for starting a daemon and/or setting up shared
> > interfaces is the toolstack: the xl command and the libxl/libxc
> > libraries.
>
> I think that VM configuration management (or orchestration in Startos
> jargon?) is a subject to debate in parallel.
> Otherwise, is there any good assumption to avoid it right now?
>
> > Oleksandr and others I CCed have been working on ways for the toolstack
> > to create virtio backends and setup memory mappings. They might be able
> > to provide more info on the subject. I do think we miss a way to provide
> > the configuration to the backend and anything else that the backend
> > might require to start doing its job.
> >
> >
> > > One question is how to best handle notification and kicks. The existing
> > > vhost-user framework uses eventfd to signal the daemon (although QEMU
> > > is quite capable of simulating them when you use TCG). Xen has it's own
> > > IOREQ mechanism. However latency is an important factor and having
> > > events go through the stub would add quite a lot.
> >
> > Yeah I think, regardless of anything else, we want the backends to
> > connect directly to the Xen hypervisor.
>
> In my approach,
> a) BE -> FE: interrupts triggered by BE calling a hypervisor interface
> via virtio-proxy
> b) FE -> BE: MMIO to config raises events (in event channels), which is
> converted to a callback to BE via virtio-proxy
> (Xen's event channel is internnally implemented by
> interrupts.)
>
> I don't know what "connect directly" means here, but sending interrupts
> to the opposite side would be best efficient.
> Ivshmem, I suppose, takes this approach by utilizing PCI's msi-x mechanism.
>
> >
> > > Could we consider the kernel internally converting IOREQ messages from
> > > the Xen hypervisor to eventfd events? Would this scale with other
> kernel
> > > hypercall interfaces?
> > >
> > > So any thoughts on what directions are worth experimenting with?
> >
> > One option we should consider is for each backend to connect to Xen via
> > the IOREQ interface. We could generalize the IOREQ interface and make it
> > hypervisor agnostic. The interface is really trivial and easy to add.
>
> As I said above, my proposal does the same thing that you mentioned here :)
> The difference is that I do call hypervisor interfaces via virtio-proxy.
>
> > The only Xen-specific part is the notification mechanism, which is an
> > event channel. If we replaced the event channel with something else the
> > interface would be generic. See:
> >
> https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/io…
> >
> > I don't think that translating IOREQs to eventfd in the kernel is a
> > good idea: if feels like it would be extra complexity and that the
> > kernel shouldn't be involved as this is a backend-hypervisor interface.
>
> Given that we may want to implement BE as a bare-metal application
> as I did on Zephyr, I don't think that the translation would not be
> a big issue, especially on RTOS's.
> It will be some kind of abstraction layer of interrupt handling
> (or nothing but a callback mechanism).
>
> > Also, eventfd is very Linux-centric and we are trying to design an
> > interface that could work well for RTOSes too. If we want to do
> > something different, both OS-agnostic and hypervisor-agnostic, perhaps
> > we could design a new interface. One that could be implementable in the
> > Xen hypervisor itself (like IOREQ) and of course any other hypervisor
> > too.
> >
> >
> > There is also another problem. IOREQ is probably not be the only
> > interface needed. Have a look at
> > https://marc.info/?l=xen-devel&m=162373754705233&w=2. Don't we also need
> > an interface for the backend to inject interrupts into the frontend? And
> > if the backend requires dynamic memory mappings of frontend pages, then
> > we would also need an interface to map/unmap domU pages.
>
> My proposal document might help here; All the interfaces required for
> virtio-proxy (or hypervisor-related interfaces) are listed as
> RPC protocols :)
>
> > These interfaces are a lot more problematic than IOREQ: IOREQ is tiny
> > and self-contained. It is easy to add anywhere. A new interface to
> > inject interrupts or map pages is more difficult to manage because it
> > would require changes scattered across the various emulators.
>
> Exactly. I have no confident yet that my approach will also apply
> to other hypervisors than Xen.
> Technically, yes, but whether people can accept it or not is a different
> matter.
>
> Thanks,
> -Takahiro Akashi
>
> --
> Stratos-dev mailing list
> Stratos-dev(a)op-lists.linaro.org
> https://op-lists.linaro.org/mailman/listinfo/stratos-dev
>
--
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog(a)linaro.org | Skype: ffozog
The I2C protocol allows zero-length requests with no data, like the
SMBus Quick command, where the command is inferred based on the
read/write flag itself.
In order to allow such a request, allocate another bit,
VIRTIO_I2C_FLAGS_M_RD(1), in the flags to pass the request type, as read
or write. This was earlier done using the read/write permission to the
buffer itself.
This still won't work well if multiple buffers are passed for the same
request, i.e. the write-read requests, as the VIRTIO_I2C_FLAGS_M_RD flag
can only be used with a single buffer.
Coming back to it, there is no need to send multiple buffers with a
single request. All we need, is a way to group several requests
together, which we can already do based on the
VIRTIO_I2C_FLAGS_FAIL_NEXT flag.
Remove support for multiple buffers within a single request.
Since we are at very early stage of development currently, we can do
these modifications without addition of new features or versioning of
the protocol.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
V1->V2:
- Name the buffer-less request as zero-length request.
Hi Guys,
I did try to follow the discussion you guys had during V4, where we added
support for multiple buffers for the same request, which I think is unnecessary
now, after introduction of the VIRTIO_I2C_FLAGS_FAIL_NEXT flag.
https://lists.oasis-open.org/archives/virtio-comment/202011/msg00005.html
And so starting this discussion again, because we need to support stuff
like: i2cdetect -q <i2c-bus-number>, which issues a zero-length SMBus
Quick command.
---
virtio-i2c.tex | 60 +++++++++++++++++++++++++-------------------------
1 file changed, 30 insertions(+), 30 deletions(-)
diff --git a/virtio-i2c.tex b/virtio-i2c.tex
index 949d75f44158..ae344b2bc822 100644
--- a/virtio-i2c.tex
+++ b/virtio-i2c.tex
@@ -54,8 +54,7 @@ \subsubsection{Device Operation: Request Queue}\label{sec:Device Types / I2C Ada
\begin{lstlisting}
struct virtio_i2c_req {
struct virtio_i2c_out_hdr out_hdr;
- u8 write_buf[];
- u8 read_buf[];
+ u8 buf[];
struct virtio_i2c_in_hdr in_hdr;
};
\end{lstlisting}
@@ -84,16 +83,16 @@ \subsubsection{Device Operation: Request Queue}\label{sec:Device Types / I2C Ada
and sets it on the other requests. If this bit is set and a device fails
to process the current request, it needs to fail the next request instead
of attempting to execute it.
+
+\item[VIRTIO_I2C_FLAGS_M_RD(1)] is used to mark the request as READ or WRITE.
\end{description}
Other bits of \field{flags} are currently reserved as zero for future feature
extensibility.
-The \field{write_buf} of the request contains one segment of an I2C transaction
-being written to the device.
-
-The \field{read_buf} of the request contains one segment of an I2C transaction
-being read from the device.
+The \field{buf} of the request is optional and contains one segment of an I2C
+transaction being read from or written to the device, based on the value of the
+\field{VIRTIO_I2C_FLAGS_M_RD} bit in the \field{flags} field.
The final \field{status} byte of the request is written by the device: either
VIRTIO_I2C_MSG_OK for success or VIRTIO_I2C_MSG_ERR for error.
@@ -103,27 +102,27 @@ \subsubsection{Device Operation: Request Queue}\label{sec:Device Types / I2C Ada
#define VIRTIO_I2C_MSG_ERR 1
\end{lstlisting}
-If ``length of \field{read_buf}''=0 and ``length of \field{write_buf}''>0,
-the request is called write request.
+If \field{VIRTIO_I2C_FLAGS_M_RD} bit is set in the \field{flags}, then the
+request is called a read request.
-If ``length of \field{read_buf}''>0 and ``length of \field{write_buf}''=0,
-the request is called read request.
+If \field{VIRTIO_I2C_FLAGS_M_RD} bit is unset in the \field{flags}, then the
+request is called a write request.
-If ``length of \field{read_buf}''>0 and ``length of \field{write_buf}''>0,
-the request is called write-read request. It means an I2C write segment followed
-by a read segment. Usually, the write segment provides the number of an I2C
-controlled device register to be read.
+The \field{buf} is optional and will not be present for a zero-length request,
+like SMBus Quick.
-The case when ``length of \field{write_buf}''=0, and at the same time,
-``length of \field{read_buf}''=0 doesn't make any sense.
+The virtio I2C protocol supports write-read requests, i.e. an I2C write segment
+followed by a read segment (usually, the write segment provides the number of an
+I2C controlled device register to be read), by grouping a list of requests
+together using the \field{VIRTIO_I2C_FLAGS_FAIL_NEXT} flag.
\subsubsection{Device Operation: Operation Status}\label{sec:Device Types / I2C Adapter Device / Device Operation: Operation Status}
-\field{addr}, \field{flags}, ``length of \field{write_buf}'' and ``length of \field{read_buf}''
-are determined by the driver, while \field{status} is determined by the processing
-of the device. A driver puts the data written to the device into \field{write_buf}, while
-a device puts the data of the corresponding length into \field{read_buf} according to the
-request of the driver.
+\field{addr}, \field{flags}, and ``length of \field{buf}'' are determined by the
+driver, while \field{status} is determined by the processing of the device. A
+driver, for a write request, puts the data to be written to the device into the
+\field{buf}, while a device, for a read request, puts the data read from device
+into the \field{buf} according to the request from the driver.
A driver may send one request or multiple requests to the device at a time.
The requests in the virtqueue are both queued and processed in order.
@@ -141,11 +140,10 @@ \subsubsection{Device Operation: Operation Status}\label{sec:Device Types / I2C
A driver MUST set the reserved bits of \field{flags} to be zero.
-The driver MUST NOT send a request with ``length of \field{write_buf}''=0 and
-``length of \field{read_buf}''=0 at the same time.
+A driver MUST NOT send the \field{buf}, for a zero-length request.
-A driver MUST NOT use \field{read_buf} if the final \field{status} returned
-from the device is VIRTIO_I2C_MSG_ERR.
+A driver MUST NOT use \field{buf}, for a read request, if the final
+\field{status} returned from the device is VIRTIO_I2C_MSG_ERR.
A driver MUST queue the requests in order if multiple requests are going to
be sent at a time.
@@ -160,11 +158,13 @@ \subsubsection{Device Operation: Operation Status}\label{sec:Device Types / I2C
A device SHOULD keep consistent behaviors with the hardware as described in
\hyperref[intro:I2C]{I2C}.
-A device MUST NOT change the value of \field{addr}, reserved bits of \field{flags}
-and \field{write_buf}.
+A device MUST NOT change the value of \field{addr}, and reserved bits of
+\field{flags}.
+
+A device MUST not change the value of the \field{buf} for a write request.
-A device MUST place one I2C segment of the corresponding length into \field{read_buf}
-according the driver's request.
+A device MUST place one I2C segment of the ``length of \field{buf}'', for the
+read request, into the \field{buf} according the driver's request.
A device MUST guarantee the requests in the virtqueue being processed in order
if multiple requests are received at a time.
--
2.31.1.272.g89b43f80a514