Alex,
On Thu, Nov 18, 2021 at 03:05:27PM +0000, Alex Benn??e wrote:
Hi Ilias/Akashi-san,
So after the call earlier this week Mike asked me to write-up a EPIC for the measurement work. I think this covers the deployments we want to measure but any feedback is welcome. The Xen documentation talks rather euphemistically about Open vSwitch but I don't know if that is just the management name for using the same network routing internals as the other ePPF ones.
Anyway any thoughts?
━━━━━━━━━━━━━━━━━━━━━━━ STRATOS XDP MEASURING Alex Bennée ━━━━━━━━━━━━━━━━━━━━━━━
Table of Contents ─────────────────
- Abstract
- Networking Setups
.. 1. Test Setup .. 2. Potential Packet Paths .. 3. Host Networking .. 4. KVM Guest with vhost networking .. 5. Pass-through (SR-IOV or virtualised HW) .. 6. Open vSwitch routing (Xen)
1 Abstract ══════════
As we move network endpoints into different VM configurations we need to understand the costs and latency effects those choices will have. With that understanding we can then consider various approaches to optimising packet flow through virtual machines.
2 Networking Setups ═══════════════════
2.1 Test Setup ──────────────
The test setup will require two machines. The test controller will be the source of the test packets and measure the round trip latency to get a reply from the test client. The test client will be setup in multiple configurations so the latency can be checked.
+-----------------------+ +-----------------------+ |c1AB | |c1AB | | Test Control | | Test Client | | | | | +-+--------+-+--------+-+ +-+--------+-+--------+-+ |{mo} | |{mo} | |{mo} | |{mo} | | eth0 | | eth1 | | eth1 | | eth0 | |cRED | |cPNK | |cPNK | |cRED | +--------+ +--------+ +--------+ +--------+ | ^ ^ | | | test link | | : +---------------------------+ : | 10GbE | | | /--------------------------------------------=-----------\ | LAN | ---------------------------------------------=----------/
2.2 Potential Packet Paths ──────────────────────────
For each experiment we need to measure the latency of 3 different packet reflectors. A simple ping pong running either via:
• xdp_xmit - lowest latency turnaround at the driver • xdp_redir - bypass linux networking stack to user-space • xpd_pass - normal packet path to conventional AF_INET socket
+--------------------------------------------------------------------------+ |cDED | | /---------------\ /---------------\ | | |cGRE | |cGRE | | | | Test | | Test | | | | Program | | Program | | | | | | | | | +---------------+ +---------------+ user-space | | |cYEL | |cCEF | | | | AF_INET | | AF_XDP | | | | | | | | | +---------------+ +---------------+ | | ^ ^ | | | | | | : : | +--------------------------------------------------------------------------+ : : | | +--------------------------------------------------------------------------+ |cBAD | | | | +---------------+ : 2. xdp_redir | | |cYEL | | kernel-space | | | | 3. xdp_pass +---------+ | | | Linux | <-----=------ | XDP | ---=----+ | | | Networking | +----+---------+----+ | 1. xdp_xmit | | | stack | |cRED | | | | | | | Driver | <--+ | | | | | | | | ---------------/ -------------------/ | | | +--------------------------------------------------------------------------+
I don't understand what this diagram means. It seems that Test program (on AF_INET) and Test program (on AF_XDP) talk with each other. Is it ever possible if they speak different address family? I think that we'd better mention AF_XDP and XDP in different contexts.
-Takahiro Akashi
2.3 Host Networking ───────────────────
This is the default case with no virtualisation involved.
/---------------------\ |cGRE | | Network Test | Host User Space | Program | | | +---------------------+ +------------------------------------------------+ | +---------------------+ | | |cYEL | | | | Packet Routing | |
SW | | | | | +---------------------+ Host Kernel Space | | |cRED | | | | Driver | | | | | | | ---------------------/ | +------------------------------------------------+ ----=----------------------------------------------------------------------------------------- +---------------------+ |{mo} | HW | Network | | Card | |c444 | +---------------------+
2.4 KVM Guest with vhost networking ───────────────────────────────────
This is a KVM only case where the vhost device allows packets to be delivered directly from the guest kernels address space. It still relies on the host kernels networking stack though.
| | /------------\ | |cGRE | | | | Guest User | +------------+ | ----------------=------ | +------------+ | |cYEL | : +------------+ Guest Kernel | |cRED | | | virtio-net | | +------------+ +------------------------------------------------+ | +---------------------+---------------------+ | | |cYEL |cPNK vhost-virtio | | | | Packet Routing +---------------------/ | SW | | | | | +---------------------+ Host Kernel Space | | |cRED | | | | Driver | | | | | | | \---------------------/ | +------------------------------------------------+ ----=------------------------------------------------------ +---------------------+ |{mo} | HW | Network | | Card | |c444 | +---------------------+
2.5 Pass-through (SR-IOV or virtualised HW) ───────────────────────────────────────────
Either using direct pass-through to a discrete ethernet device or a virtualised function. The control of the packet starts and ends in the guests kernel.
Host System/Dom0 Guest VM/DomU | /---------\ | |cGRE | /---------------------\ | /-----------+---------+ |cBAD Kernel | : |cBAD |cYEL | \---------------------/ | \-----------+---------+ +-------------------------------------+ |cRED | |cB9C Hypervisor | | Driver | +-------------------------------------+ +---------/
----=------------------------------------------------------ +---------------+ +---------------+ |{mo} | |{mo} | HW | NIC | | vNIC | |c444 | |c544 | +---------------+ +---------------+
2.6 Open vSwitch routing (Xen) ──────────────────────────────
Here the packets are switched into paravirtualized Xen interfaces by the Dom0 kernel. I'm a little unsure as to what Open vSwitch uses to route stuff and if it's the same as the existing eBPF stuff.
| | Dom0/Host : DomU Guest | | | /-----------\ | |cGRE | Guest | | | User +------------------------------------------------+ | +-+-----------+---------------+ | +-----------------------------------------+ | | | +-------+ | SW | |cYEL Open vSwitch | | | | |cYEL | | | +---------------------+-----------+-------+ | | | +-------+ Guest | | |cRED | |cPNK | | | | |cRED | Kernel | | | Driver | | XenPV |<-=|-=|=-|=->| XenPV | | | | | | BE | | | | | FE | | | \---------------------/ \-------+ | | | +-------/ | +------------------------------------------------+ | +-----------------------------+ +------------------------------------------------------------------------------------+ |cB9C Hypervisor | +------------------------------------------------------------------------------------+
-----------------------------------------------------------------------------------------------=------------ +---------------------+ |{mo} | HW | Network | | Card | |c444 | +---------------------+
-- Alex Bennée