- Why the name Serval?
- So what benefits does Serval offer clients like smartphones?
- So what benefits does Serval offer operators deploying Internet services?
- Does Serval require changing the entire network?
- Does Serval involve rewriting entire applications?
- Can I deploy Serval without recompiling my kernel and disrupting existing applications?
- I don’t want to modify applications, can I still use Serval?
- So where does the service access layer sit in the stack?
- Is Serval fast?
- Is your implementation open source? If so, what license is it released under?
- So why not use DNS for server selection?
- So why not use IP anycast?
- So why not use a cluster load balancer?
- Serval sounds a lot like [fill in your favorite network architecture or layer 3.5]. How is Serval different and why should I care?
Serval comes from Service Acccess Layer, but refers to our larger architecture built around this new layer, which we instead abbreviate to SAL. A Serval is also an african wild cat whose speed, agility and elegance match well the goals of our architecture.
- Intelligent interface selection for multi-homed endpoints (for performance, battery life, cost, policy, …)
- Transparent and automated connection migration without disruption (e.g., between Wifi and 3G), without the need for home agents, tunneling, or triangle routing
- Service registration and resolution in both ad-hoc and infrastructure modes
- … and more
- Service-aware stack that provides automated (un)registration for service instances
- Intelligent, flexible load-balancing for replicated services (without explicit layer 2-7 load balancers), both within layer-2 domains and across the wide-area
- Direct connections between end-hosts after connection resolution and establishment (no need for traffic to pass through load balancers after initial SYN resolution)
- Ability to migrate flows, without service disruption, between local interfaces or across physical hardware (allowing for VM migration across layer-3 domains)
- … and more
No, Serval is a “layer 3.5″ solution, and sits on top of existing IP networks.
No, the Serval stack exposes standard BSD sockets to applications, and redefines the sockaddr type. So instead of opening a PF_INET socket and connecting to an IP and port (set in the sockaddr_in), you open a PF_SERVAL socket and the sockaddr struct takes a serviceID. A comparison between using IPv4 and Serval sockets is given with the following C-like pseudocode:
|s = socket(PF_INET)
sendto(s, IP:port, data)
|s = socket(PF_SERVAL)
// Unconnected datagram:
sendto(s, srvID, data)
Serval re-implements all layers above the network layer (IP) and can therefore co-exist with the existing stack and BSD sockets API without affecting legacy applications. This also means that the Serval kernel module can be dynamically inserted and removed on a running operating system. Existing applications can continue using AF_INET sockets without disruption.
Serval works well with a high-speed translator that allows legacy TCP/IP clients to access a Serval-running network. The translator works with Linux’s splice mechanism that allows data to be copied between sockets without entering user-space. Thus, a data center or larger network can run Serval, capitalizing on its load-balancing and migration features, while still allowing legacy clients efficient access to its services.
The service-access layer (SAL) is a new “layer 3.5″: The network layer (IP) is unmodified, and data delivery is still handled by transport. Connection establishment and management, typically part of TCP and other other transport protocols, gets handled by the service-access layer instead (as does migration), leaving transport to only worry about actual data delivery. This is akin to TCP operating only in its ESTABLISHED state.
Our service access layer does not add any appreciable overhead to existing connections. Benchmarks show that TCP throughput performance on a GigE connection is 99.925% of the Linux kernel stack. In fact, our “TCP-to-Serval” translator, which allows unmodified clients or client applications to communication with Serval hosts, also performs equivalently to the unmodified stack. The latter’s high performance is due to in-kernel zero-copying between sockets using Linux’s splice system call.
|Stack||Mean (Mbit/s)||StdDev (Mbit/s)|
Our Serval implementation is indeed open source. The end-host stack, which has been tested on both Linux and Android, is licensed under the GPLv2. This license is necessary given that it runs as a kernel module and its transport-layer implementation (and in particular, its TCP-like reliable stream protocol) significantly reuses and refactors code from the Linux kernel.
DNS early binds a domain name to an IP address, which both client resolvers and client applications will cache. This requires more global knowledge of server load at the DNS service, and caching will also prevent fine-grained load balancer and fast failover. Thus, most replicated services don’t use only DNS; they are often used in combination with load balancers, complicating deployments and still leading to failures when (typically stateful) load balancers crash or become overloaded. Serval, on the other hand, uses late binding, which allows service operators to make decisions near to where the final server selection must take place. This can incorporate more recent and fine-grained information, and can react quicker to system dynamics.
IP anycast does not provide connection affinity: It operates only on the packet-level, not the flow or connection level. Hence, a routing change upstream can cause packets of a flow to get routed to a different anycasted instance — breaking the connection. Thus, IP anycast is typically only used for stateless traffic (e.g., DNS UDP requests), while Serval’s service-layer anycast provides connection affinity in order to support stateful connections and applications. The “anycast” choice is performed only on the first packet of a new flow; subsequent packets of that flow are bound to the two end-host attachment points (unless and until there is a flow migration from either side).
Hardware load balancers (which commonly NAT between VIP/DIP addresses) face scalability limitations, as they typically cache per-connection client state to ensure that all the packets belonging to the same transport connection are mapped to the same internal destination as the pool of servers change (otherwise, TCP would break). They are also a common source of failures in datacenters. Finally, these load balancers are typically used only within a single local area network, and thus require some other means for wide-area selection (e.g., DNS). Serval’s service routers replace the need for separate domain resolution and VIP/DIP load balancers, with an architecture that explicit supports service registration and resolution.
Many of the ideas and problems addressed by Serval have been kicking around the networking research community for years. However, few of the previous works have fully understood the problems and issues at hand, nor have they had solutions realized and studied in working systems. We believe service access is too important a problem to already claim early victory, when in fact little progress has been made in real systems. Figuring out how Serval’s underlying ideas (late binding, service-level anycast, migration, etc.) fit together in a coherent networking stack is non-trivial. We believe that Serval provides a unique take on these ideas and applies new solutions to problems in new areas, such as data centers.