Indexed streams: A formal intermediate representation for fused contraction programs

Scott Kovach, Praneeth Kolichala, Tiancheng Gu, Fredrik Kjolstad
[doi] [Google Scholar] [DBLP] [Citeseer] [url]

Proc. ACM Program. Lang. 7(PLDI)
Association for Computing Machinery
New York, NY, USA
jun 2023
Note(s): tensor, sparse model, loop fusion, Lean theorem prover

Takes operators from “positive algebra” (that includes relational algebra), shows that they work for tensor algebra, uses to perform operator fusion on access patterns that are hierarchical (ie multidimensional) and may skip value (ie sparse).

Compilation is in two steps: (1) a “contraction language” expression is compiled to the indexed stream intermediate representation; and (2) the indexed stream representation is compiled to imperative C code.

The result is concise (the transformations are implemented in only 540 lines of Lean code); matches (for performance) TACO (sparse operations), SQLite (relational operators) and DuckDB (relational operators); and is proved correct in the Lean theorem prover.

The contraction language consists of variables, addition, multiplication, contraction, expansion and renaming. Contraction aggregates (sums) values across a dimension/attribute. Expansion repeats a value across a dimension/attribute.

Indexed streams have four operators:

ready - is there a value at the current index
skip - advance index according to an index value
index - current index value. This is a lower bound on the next ready index state.
value - index at current index

Streams are monotone: “index(q) <= index(skip(q, (i,r)))”