Triton networking and fabric operations guide
This document introduces and covers the architecture as well as debugging of Triton's network virtualization components. This document is oriented towards operators of the system. If you have not read the Triton networking and fabric user guide, please do so before continuing.
Triton has support for network virtualization at various layers of the software stack. The core platform has supported aspects of network virtualization since its inception and it has formed the basic building blocks for Triton's broader orchestration.
The heart of network virtualization lies in the compute nodes (servers) running the SmartOS platform. The platform supports several different things which Triton combines:
- Virtual NICs
- Per-Instance TCP/IP stacks
- Antispoof Protection
- Overlay Devices
On top of this, we've built the user-facing notion of a fabric -- the idea that every customer can have their own private network where they never have to worry about another customer seeing their own private traffic. See the Triton networking and fabric user guide for more information on the capabilities.
As an operator of a Triton environment, the most important thing that we've done is to build this from a software-first, hardware-accelerated perspective. Meaning that you do not require any special networking or server hardware to run these environments, standard commodity components that have been on the market for years is sufficient; however, if you do have hardware that supports this, then Triton as a system can take advantage of it.
For each port on a physical server, there is one physical network interface created in the operating system. On the other hand, every instance that is provisioned has one or more virtual NICs (VNICs) associated with it. A VNIC is one of the basic building blocks of networking in containers running SmartOS. Each VNIC has its own hardware identifier called a MAC address associated with it. To everything else in the system, it looks identical to a physical network interface. It can have many of the same properties set on it, it can be independently tagged with a VLAN id, etc.
VNICs can be created in the operating system on top of other data links such as physical devices, devices that represent a link aggregation, an etherstub -- a virtual switch that only exists on a single machine, or an overlay device. The operating system takes care of multiplexing these devices and utilizing the resources of the underlying device. For example, when VNICs are created over a physical device, the operating system will program the physical device's packet filters with additional mac addresses that is should receive. However, if the hardware runs out of physical filters, then the operating system will program the device to ensure it receives the remaining packets and does packet classification in software.
For every network interface that an instance has, whether it is a normal container zone, an lx-branded zone, or a hardware virtual machine, there is a corresponding VNIC created that belongs to the zone. The assignment and management of these VNICs and the IP addresses that are used on top of them are all owned by NAPI, with the starting point for manipulating it is VMAPI.
Every zone on the system has an exclusive IP stack - a unique virtualized view of the networking world that only they can see and manipulate. This is what allows different instances to be able to have their own networking information and not have to worry about what other tenants on the compute node are doing.
The IP stack consists of the information required for networking to work, including:
- Interfaces and their state
- Lists of assigned IP addresses
- Routing Table
- TCP, IP, UDP, and SCTP state
- Network tunables
- IPsec state
- Firewall Rules
Every VNIC is tagged with various protections that we refer to as anti-spoof, as it protects against a given instance pretending that they're someone else on the network. These checks are all done before a packet can go on the wire, so regardless if it comes from a KVM guest or somewhere else, it will be noticed.
The various antispoof protections are broken into the following different categories:
mac-nospoof This prevents an instance from sending a layer 2 frame with a MAC address that doesn't match the data-link.
ip-nospoof This prevents an instance from sending an IPv4 or IPv6 packet where the source address doesn't match. In addition, it also applies to ARP and NDP packets by enforcing that the source addresses match the senders.
- dhcp-nospoof This prevents an instance from making a DHCP and DHCPv6 request using a DHCP client ID that isn't in an allowed list.
This prevents an instance from sending traffic whose Ethertype doesn't correspond to IPv4, IPv6, or ARP traffic. This protects the physical infrastructure.
VXLAN is a protocol for encapsulating network traffic that wraps up a given network packet and allows it to be tagged with a 24-bit id, in many ways, analogous to a VLAN id. VXLAN works by taking a normal full Ethernet packet and wrapping it up in a UDP packet with a VXLAN header.
The idea of encapsulating network traffic is not new. It's commonly used with some VPN technology and in fact, mirrors how a packet is generally constructed. That is, a TCP frame is encapsulated in an IP frame, which in turn is wrapped up in an Ethernet frame.
The following image shows how a normal TCP packet gets wrapped up with VXLAN and actually sent on the wire:
Original Encapsulated Packet Packet +----------+ +----------+--------+ | |================>| | | | TCP Data |================>| TCP Data | | | Payload |================>| Payload | | | |================>| | | +----------+ +----------+ | | TCP |================>| TCP | | | Header |================>| Header | | +----------+ +----------+ | | IP |================>| IP | | | Header |================>| Header | | +----------+ +----------+ | | MAC |================>| MAC | | | Header |================>| Header | | +----------+ +==========+ | | VXLAN | | | Header | | +----------+ | | UDP Data Payload | | | +-------------------+ | IP | | Header | +-------------------+ | MAC | | Header | +-------------------+
The VXLAN frame format is standardized as RFC 7438 and it's also understood by newer networking cards and switches, which influenced our general decision to use VXLAN as part of the general architecture for network virtualization in Triton. We'll discuss more about how VXLAN gets used in the section Triton Virtual Networking Design and Architecture.
The idea of an overlay network is to have one physical network, but to run multiple independent logical networks multiplexed on top of it. That physical network is often referred to as an underlay network. Each of these independent networks cannot see one another. A classic example of an overlay network is the use of VLANs on a physical network. Each VLAN identifier can be used to create a unique network.
These are implemented in the operating system by what we call an overlay device. An overlay device represents the combination of three different things:
- An encapsulation method
- A search and lookup method
- An identifier
The encapsulation method describes how is a packet transformed to be put on the wire. If you consider the VLAN example, the Ethernet header is modified and a VLAN header is inserted.
The search method is in charge of determining where packets should go. If someone was configuring a simple point to point tunnel, it would instruct packets to always go to the same host. On the other hand, if you want to have a virtual network, then the search plugin can query a centralized database or look up where the node is in some other way.
Finally, each instance needs to have its own identifier, this is what allows us to differentiate different users on the same underlay network. For example, in Triton, each customer in a DC has a different identifier.
While a network on a fabric allows for a private network, users may still want to be able to access the broader Internet to download software updates, use external DNS names, and perform other administrative tasks.
What most users want is to allow their network to have some gateway to the broader Internet that does not allow incoming traffic. In terms of IPv4, this is traditionally implemented with a NAT. Triton allows each network on a fabric to have the option to allow for an Internet gateway. When this option is selected a simple NAT will be provisioned automatically for the Network when there are machines on it that require one.
At this time, these NAT instances are not highly-available, but they should still be adequate for most needs.