Formal specification and testing of QUIC

Kenneth L. McMillan, Lenore D. Zuck
[doi] [ISBN] [Google Scholar] [DBLP] [Citeseer] [url]
Read: 17 April 2021

Proceedings of the ACM Special Interest Group on Data Communication
SIGCOMM '19
Beijing, China
Association for Computing Machinery
New York, NY, USA
Pages 227-240
2019
Note(s): test-generation, Ivy verifier, symbolic execution, weakest precondition, QUIC protocol
Papers: rath:epiq:2018

The QUIC protocol is a Google-designed protocol providing a secure, authenticated replacement for TLS/TCP (i.e., to replace https). The goal of this paper is to improve on the use of interoperability testing to test the QUIC specification and implementations. They create a “substantial” specification of the QUIC standard in the Ivy verifier language from which they generate tests to use with implementations using a compositional approach (assume/guarantee reasoning) and constrained random generation. They found bugs in implementations and a potential DoS in the protocol. Some parts of the informally specified standard were hard to formalize effectively. They focus on safety but discuss liveness.

Tests are generated by generating random sequences of events while accumulating constraints on parameters in a way that can be seen as symbolic execution or weakest precondition computation.

Their specification is mostly extensional (talks about externally visible behaviour) whereas the QUIC standard is intentional (talks about internal implementation details). Intentional specs can be a bit more tractable because they are less non-deterministic. Their spec is partly intentional in that it talks about events occuring between layers of the network stack. They found some of the intentional parts of the spec were not effectively testable — although this does not always mean that the requirement is superfluous.

Tests events are generated at around 10Hz. They found bugs at a rate of about one a day. Triaging errors, refining the specs and working with implementors to diagnose and fix errors was the time consuming thing.

An interesting limitation of this general approach is that tests are generated in advance (i.e., offline). This prevented them from exploring adaptive behaviour and, in particular, recovery and congestion control. (To work online, they suggest running tests in virtual environments.) They also don’t (can’t) explore liveness. And, although they could potentially use their tests as fuzzing seeds to test error detection, handling and robustness, they did not do so.


  • Interoperability-guided testing of QUIC implementations using symbolic execution [rath:epiq:2018]