Like galea:arxiv:2018 (which is also worth a read), this paper looks at how to find verification bottlenecks in symbolic evaluation. They propose a symbolic profiling approach that works for any symbolic evaluator (symbolic executor, BMC or hybrids) and, using a simple but effective metric to rank hotspots, is effective at highlighting the root cause of the problem.
Symbolic profiling (SymPro)
- Two resources: symbolic heap and symbolic evaluation graph
- Heap = constants, terms, etc. (with hash-consing used to avoid duplicates)
- Graph = paths and any merge points (DAG) from evaluation strategy
- Track evolution: where are values created, accessed, sent to solver; how are paths merged at ctrl-flow joins.
- Rank procedure calls by metrics about evolution
- Model is general (to any forward symbolic evaluator: symbolic execution, BMC and hybrids), explainable (independent of internal details) and actionable (finds root causes)
- Key insight: “effective symbolic evaluation involves maximizing (opportunities for) concrete evaluation while minimizing path explosion”.
- Challenge: naïve time-profiling highlights the wrong part of the code: the consumption of symbolic terms, not the construction. (This reminds me of profiling lazy evaluation in Haskell.)
List of antipatterns in solver-aided code
- Algorithmic
- Representation
- Irregular data structures that cause divergence in control flow. (IIRC, Rosette groups data structures by shape (they call it “type”) and merges flows that have the same shape.)
- Concreteness
- Failure leads to large evaluation graphs with many infeasible paths.
- Introducing case splits can concretize symbolic values. Eg “if x == 0 { … 0 … } elsif x == 1 { … 1 … } else { … x … }” makes x concrete in the first two branches (and maybe the third option is an error case?) (Rosette’s for/all form does this (section 5.1))
Profiling
- Considered input-sensitive profiling [Coppa et al., 2012] but correlation between input size and performance is often poor for symbolic evaluation: inaccurate and noisy.
- Considered path-based profiling kuznetsov:pldi:2012: rank functions based on number of infeasible paths explored and size of path conditions. But infeasible paths are not always the problem and measuring feasibility is expensive (uses solver); and does not work for BMC.
- These lead them to heap + graph model (above).
- Symbolic profiler interface (I have transliterated the interface into pseudo-Rust).
- mkTerm(Vec
) -> term - step(state, Vec
) -> Vec - merge(srcloc, Vec
) -> state - solve(srcloc, expr) -> bool
- mkTerm(Vec
- Symbolic profiler interface (I have transliterated the interface into pseudo-Rust).
Visualization
- Flamegraph: time vs. call stack colour-coded by a score
- Score computed from five statistics
- wall-clock time; #terms created; #unused terms; union size (sum of out-degrees); merge cases (sum of in-degrees). (See figure 3.)
- Calculation of overall score: normalize the five statistics to range 0.0 .. 1.0; sum results
- Experiment with machine-learning ranking scheme had high recall but poor precision (many false positives) - so manual scheme is default.
Extensive empirical evaluation
- Ferrite crash-safe file system
- Cosette SQL query system
- Neutrons (radiotherapy system)
- Also Bonsai (checks soundness of type systems), Quivela (verifies security of crypto protocols), Fluidics (synthesizes microfluidics protocols), RTR (Refinement typechecker for Ruby).
- Evaluates which of the metrics are most reliable in ranking bottlenecks.
- User study: time taken to identify performance issue: 4 benchmarks, with/without SymPro. (A bit small for statistical significance
- Implemented on both Rosette and Jalangi. Tried on red-black tree, calculator parser and BDD.
- Highlights idea of using SymPro to develop efficient formal models of libraries and frameworks. (Because optimizing for reasoning and optimizing for concrete execution can require conflicting changes.)
Notes related to Finding code that explodes under symbolic evaluation
Bounded model-checking, Case splitting, Symbolic evaluation, Symbolic execution, Verification performance of code, Verification profiling of code
Papers related to Finding code that explodes under symbolic evaluation
- Evaluating manual intervention to address the challenges of bug finding with KLEE [galea:arxiv:2018]