A whirlwind introduction to dataflow graphs (2018) (fgiesen.wordpress.com)

🤖 AI Summary
This post presents a compact, practical introduction to using dataflow graphs as a predictive performance model for low-level code (C++ lowered to a simple IR). Rather than relying on descriptive profiling or ad-hoc microbenchmarks, the author builds a simplified machine model: unlimited 64-bit registers, explicit load/store memory, one-instruction-per-line pseudo‑assembly, and flattened control flow into basic blocks. Timing assumptions are explicit—loads cost 4 cycles, most other ops 1 cycle, branches have variable cost, and there’s a tunable issue width W representing how many new instructions can start per cycle. The model follows an “as‑if” rule (results must match sequential semantics) but otherwise permits aggressive reordering and speculative execution like an out‑of‑order CPU or a wide dataflow engine. Why this matters to AI/ML engineers and compiler/tooling folks: dataflow graphs make the shape of dependencies explicit—nodes are operations, edges carry latencies—so you can reason quantitatively about throughput, pipeline stalls, and how instruction-level parallelism or memory latency will limit performance. This approach approximates GPUs, superscalar CPUs, and software-pipelined schedules, helping guide design and optimization decisions (kernel reorganization, tiling, auto-tuning, scheduling) before wasting time on full implementations or misleading microbenchmarks. The post illustrates the method with a simple array-sum loop, showing how ranks and latencies reveal the true bottlenecks.
Loading comments...
loading comments...