Hi Ilias/Akashi-san,
So after the call earlier this week Mike asked me to write-up a EPIC for the measurement work. I think this covers the deployments we want to measure but any feedback is welcome. The Xen documentation talks rather euphemistically about Open vSwitch but I don't know if that is just the management name for using the same network routing internals as the other ePPF ones.
Anyway any thoughts?
━━━━━━━━━━━━━━━━━━━━━━━ STRATOS XDP MEASURING
Alex Bennée ━━━━━━━━━━━━━━━━━━━━━━━
Table of Contents ─────────────────
1. Abstract 2. Networking Setups .. 1. Test Setup .. 2. Potential Packet Paths .. 3. Host Networking .. 4. KVM Guest with vhost networking .. 5. Pass-through (SR-IOV or virtualised HW) .. 6. Open vSwitch routing (Xen)
1 Abstract ══════════
As we move network endpoints into different VM configurations we need to understand the costs and latency effects those choices will have. With that understanding we can then consider various approaches to optimising packet flow through virtual machines.
2 Networking Setups ═══════════════════
2.1 Test Setup ──────────────
The test setup will require two machines. The test controller will be the source of the test packets and measure the round trip latency to get a reply from the test client. The test client will be setup in multiple configurations so the latency can be checked.
+-----------------------+ +-----------------------+ |c1AB | |c1AB | | Test Control | | Test Client | | | | | +-+--------+-+--------+-+ +-+--------+-+--------+-+ |{mo} | |{mo} | |{mo} | |{mo} | | eth0 | | eth1 | | eth1 | | eth0 | |cRED | |cPNK | |cPNK | |cRED | +--------+ +--------+ +--------+ +--------+ | ^ ^ | | | test link | | : +---------------------------+ : | 10GbE | | | /--------------------------------------------=-----------\ | LAN | ---------------------------------------------=----------/
2.2 Potential Packet Paths ──────────────────────────
For each experiment we need to measure the latency of 3 different packet reflectors. A simple ping pong running either via:
• xdp_xmit - lowest latency turnaround at the driver • xdp_redir - bypass linux networking stack to user-space • xpd_pass - normal packet path to conventional AF_INET socket
+--------------------------------------------------------------------------+ |cDED | | /---------------\ /---------------\ | | |cGRE | |cGRE | | | | Test | | Test | | | | Program | | Program | | | | | | | | | +---------------+ +---------------+ user-space | | |cYEL | |cCEF | | | | AF_INET | | AF_XDP | | | | | | | | | +---------------+ +---------------+ | | ^ ^ | | | | | | : : | +--------------------------------------------------------------------------+ : : | | +--------------------------------------------------------------------------+ |cBAD | | | | +---------------+ : 2. xdp_redir | | |cYEL | | kernel-space | | | | 3. xdp_pass +---------+ | | | Linux | <-----=------ | XDP | ---=----+ | | | Networking | +----+---------+----+ | 1. xdp_xmit | | | stack | |cRED | | | | | | | Driver | <--+ | | | | | | | | ---------------/ -------------------/ | | | +--------------------------------------------------------------------------+
2.3 Host Networking ───────────────────
This is the default case with no virtualisation involved.
/---------------------\ |cGRE | | Network Test | Host User Space | Program | | | +---------------------+ +------------------------------------------------+ | +---------------------+ | | |cYEL | | | | Packet Routing | | SW | | | | | +---------------------+ Host Kernel Space | | |cRED | | | | Driver | | | | | | | ---------------------/ | +------------------------------------------------+ ----=----------------------------------------------------------------------------------------- +---------------------+ |{mo} | HW | Network | | Card | |c444 | +---------------------+
2.4 KVM Guest with vhost networking ───────────────────────────────────
This is a KVM only case where the vhost device allows packets to be delivered directly from the guest kernels address space. It still relies on the host kernels networking stack though.
| | /------------\ | |cGRE | | | | Guest User | +------------+ | ----------------=------ | +------------+ | |cYEL | : +------------+ Guest Kernel | |cRED | | | virtio-net | | +------------+ +------------------------------------------------+ | +---------------------+---------------------+ | | |cYEL |cPNK vhost-virtio | | | | Packet Routing +---------------------/ | SW | | | | | +---------------------+ Host Kernel Space | | |cRED | | | | Driver | | | | | | | ---------------------/ | +------------------------------------------------+ ----=------------------------------------------------------ +---------------------+ |{mo} | HW | Network | | Card | |c444 | +---------------------+
2.5 Pass-through (SR-IOV or virtualised HW) ───────────────────────────────────────────
Either using direct pass-through to a discrete ethernet device or a virtualised function. The control of the packet starts and ends in the guests kernel.
Host System/Dom0 Guest VM/DomU
| /---------\ | |cGRE | /---------------------\ | /-----------+---------+ |cBAD Kernel | : |cBAD |cYEL | ---------------------/ | -----------+---------+ +-------------------------------------+ |cRED | |cB9C Hypervisor | | Driver | +-------------------------------------+ +---------/ ----=------------------------------------------------------ +---------------+ +---------------+ |{mo} | |{mo} | HW | NIC | | vNIC | |c444 | |c544 | +---------------+ +---------------+
2.6 Open vSwitch routing (Xen) ──────────────────────────────
Here the packets are switched into paravirtualized Xen interfaces by the Dom0 kernel. I'm a little unsure as to what Open vSwitch uses to route stuff and if it's the same as the existing eBPF stuff.
| | Dom0/Host : DomU Guest | | | /-----------\ | |cGRE | Guest | | | User +------------------------------------------------+ | +-+-----------+---------------+ | +-----------------------------------------+ | | | +-------+ | SW | |cYEL Open vSwitch | | | | |cYEL | | | +---------------------+-----------+-------+ | | | +-------+ Guest | | |cRED | |cPNK | | | | |cRED | Kernel | | | Driver | | XenPV |<-=|-=|=-|=->| XenPV | | | | | | BE | | | | | FE | | | ---------------------/ -------+ | | | +-------/ | +------------------------------------------------+ | +-----------------------------+ +------------------------------------------------------------------------------------+ |cB9C Hypervisor | +------------------------------------------------------------------------------------+ -----------------------------------------------------------------------------------------------=------------ +---------------------+ |{mo} | HW | Network | | Card | |c444 | +---------------------+
On Thu, Nov 18, 2021 at 4:05 PM Alex Bennée alex.bennee@linaro.org wrote:
OpenvSwitch and eBPF are different ways of achieving similar goals: with eBPF, a user application can provide a hardcoded fast path for packet manipulation that gets compiled into binary Linux kernel code using a jit, while openvswitch provides a standard way to describe forwarding rules, which can implemented in different ways, including a linux kernel module, or a DPDK application. In the Xen context I think this always implies using the linux kernel module to integrate with the xenback driver. As far as I understand this means you are not using XDP or eBPF in Dom0.
Arnd
Arnd Bergmann arnd@arndb.de writes:
So many ways to route packets in the kernel!
Practically though should we measure the current "native xen" approach or do we want to look at extending the eBPF routing to be able to deliver packets to the Xen PV drivers? My hunch would be as this project is a measuring project we should look at the current approach as it can then inform our next steps.
Arnd
Alex Bennée alex.bennee@linaro.org writes:
Ilias/Akashi,
Any comments before I put this into JIRA cards?
On Wed, Nov 24, 2021 at 05:49:59AM +0000, Alex Benn??e wrote:
1) Do we learn any lessons from the past study like: https://connect.linaro.org/resources/bkk19/bkk19-504/ "BKK19-504 XDP Offload for OPC UA" either on test methods/scenario or performance results?
2) You expect that the measurement will be done on the network between a guest VM and a outer (physical) machine (or sensor/camera?). What about vm-vm performance?
3) If the network between vm and a outside machine does matter, we should also evaluate the case where multiple guests are running and accessing the same (physical) NIC (at least, virtually).
4) Should we take TSN switch on the system into consideration?
-Takahiro Akashi
-- Alex Bennée
Hi,
I have just spotted this conversation. It looks very timely as we are involved in a 4-way NDA PoC with {Arm, Schneider, Cap Gemini} on RT workloads on top of Xen that consume TSN.
Here is a mind dump on this: - exposing TSN to VMs in a hardware independent way will need at the very least virtio-net evolution (much more on this below) - key measures are latency/jitter from wire to application, App-to-app latency/jitter (for a particular framework OPC UA or DDS), TSN contract implementation accuracy - external influences on the previous (noisy neighbor, network congestion or DDoS)
In the context of the PoC with Schneider, we are going to receive equipment (Xilinx ZCU 102 one NXP board) to measure wire to app latency (replay of BKK19 PoC). Other parties in the PoC will implement all measures listed above. Linaro may end up being contracted (paid for) to participate more concretely.
Cheers
FF
TSN in VMS Which VM has the right to request TSN priorities? How to signal them? How to implement traffic schapers such as Time Based Shaper? How to deal with frame preemption? Can eBPF be "deported" from VM to host or even to SmartNIC?
On Thu, 25 Nov 2021 at 07:43, AKASHI Takahiro via Stratos-dev < stratos-dev@op-lists.linaro.org> wrote:
AKASHI Takahiro takahiro.akashi@linaro.org writes:
I think the main takeaway from that is XDP is a useful optimisation. While we want to test the XDP paths we are mainly interested in what effect the virtualisation has.
For the latency tests I'm suggesting a fixed master machine running Linux directly and a second machine with various configurations as the reflector.
vm-vm performance could be an interesting number but for trying to identify where the bottlenecks are having one of the test connection unchanging will hopefully give more solid numbers.
That would certainly be a good additional test case.
Wouldn't we need support we don't yet have for that?
Alex,
On Thu, Nov 18, 2021 at 03:05:27PM +0000, Alex Benn??e wrote:
I don't understand what this diagram means. It seems that Test program (on AF_INET) and Test program (on AF_XDP) talk with each other. Is it ever possible if they speak different address family? I think that we'd better mention AF_XDP and XDP in different contexts.
-Takahiro Akashi
-- Alex Bennée
On Thu, Nov 18, 2021 at 03:05:27PM +0000, Alex Bennée via Stratos-dev wrote:
One problem I've had with network benchmark was reaching the link limit. On an recent Arm server the bottleneck was 10GbE so I was unable to compare different software optimizations (measuring mainly bandwidth, though, not latency). If that happens and you do need a physical link (e.g. testing VFIO or HW acceleration) then 50/100+ GbE will be needed, and that's a different budget. Currently I do VM <-> host userspace for network benchmarks so everything is CPU-bound.
Thanks, Jean
On Fri, Nov 26, 2021 at 9:28 AM Jean-Philippe Brucker jean-philippe@linaro.org wrote:
I would think that latency testing is more useful than throughput testing then. I'd focus on the maximum latency in a bufferbloat scenario here, flooding the link with large packets while measuring the round-trip ping time.
Arnd
On Fri, Nov 26, 2021 at 10:33 AM AKASHI Takahiro takahiro.akashi@linaro.org wrote:
Round-trip latency is what you can actually measure easily, and if everything goes well, it will be consistently low. If you see unexpected latency here, you can then spend additional effort to find out which side introduced it.
Arnd
stratos-dev@op-lists.linaro.org