Couple of weeks ago Google has presented their paper Snap: a Microkernel Approach to Host Networking on ACM SIGOPS 27th Symposium on OperatingSystems Principles (SOSP ’19) conference. It’s definitely worth reading, especially if you develop your own fast network I/O subsystem. The review is limited to performance aspects of the approach and doesn’t consider upgrade questions raised in the paper.
It’s surely was interesting for us to review the paper in context of our Tempesta FW, so at the below there are not only key point about Snap, but also comparisons with Tempesta FW approach.
- First of all, Snap implements its own proprietary protocol stack -not TCP/IP and not QUIC. So it’s suitable for the local data center only. The paper doesn’t describe the protocol details, but since the protocol algorithms do impact performance, the performance comparison of the stack with the Linux TCP/IP looks like comparison of apples with oranges – different protocols for different applications.
- While Snap is about user-space networking, Google did plenty of the Linux kernel modifications for the custom MicroQuanta scheduling discipline, custom NIC driver(s), CPU and memory accounting, DMA operations to pass network data in zero-copy fashion.
- The definitely nice feature of Snap is that it can move a network thread to sleep, so network servicing threads don’t burn CPU in idle pooling like other kernel-bypass approaches.
- Application and network I/O (Pony Express) threads typically work on different CPUs, so there is worse CPU data caching job for the network data since the same packet data must be transferred between CPUs (if application and Pony Express work on hyperthreads of the same CPU, then there is no such issue), but instruction cache works better since less code is running on a single CPU.
- The paper discusses specific operation mode, “one-side operations”, as “Avoiding the invocation of the application thread scheduler (i.e., Linux CFS) to dispatch, notify, and schedule a thread substantially improves CPU effi-ciency and tail latency” – this is actually what Tempesta FW does all the time – it performs full network data processing immediately as a packet arrives to a NIC and only one context is used – no dispatching, notification, scheduling, copying and so on.
- OS preemption of the Pony Express threads is quite vague in the paper. Since the system scheduler is modified, we can imagine that the threads are never preempted (we can dedicate a CPU to a particular task in baseline Linux after all). But if the threads are actually can be preempted, then there is no opportunity to use cheap RCU synchronization and safe spin-locks.
- Meltdown-like attacks are addressed by the paper, but since an application and Snap works in different process contexts, there are still context switches with full TLB flushes.
- The future work sections says that there is some rudimentary support for zero-copy data transfer from an application to NIC through Pony Express threads and data structures, but there is still not general solution with unlimited memory to use.
- As with any user-space networking stacks, the unavailability of handy and common Linux tools like netfilter firewall, eBPF, tc, and many others is still a concern.
We are hiring! Look for our opportunities
Share the article