Jax and OpenXLA Part 1: Run Process and Underlying Logic (www.intel.com)

🤖 AI Summary
This article maps the end-to-end pipeline that turns a Python JAX program into GPU-executable code using OpenXLA and Intel’s OpenXLA extension. Using a concrete JAX example run on Intel GPUs, it traces each stage: JAX parses Python into StableHLO (MLIR), OpenXLA converts StableHLO into HLO, HLO is optimized/scheduled and lowered to LLVM IR and finally to device code (SPIR-V). The piece highlights practical debug hooks and dumps—export JAX_DUMP_IR_TO to capture StableHLO MLIR, and XLA_FLAGS (e.g., --xla_dump_hlo_as_text, --xla_dump_to, and pass regexes) to emit initial/final HLO, pass logs and a hardware pbtxt describing GPU limits. Intel’s PJRT plug‑in integrates at the PJRT API layer and calls the public StableHLO module without extra transformations, enabling relatively fast integration for JAX and initial XLA support for TensorFlow/PyTorch on Intel GPUs. Technically, the article walks through example MLIR/HLO artifacts (jax_ir*.mlir → module_0000.*.txt), explaining JIT-compiled modules, mhlo attributes (mhlo.num_partitions/num_replicas), tensor types (tensor<f32>), and location markers used for debugging. It contrasts StableHLO (high-level, framework-facing) with HLO (optimization/scheduling, SPMD/shard propagation, is_scheduled=true) and shows how operations like stablehlo.multiply become wrapped computations in final HLO. The guide is practical for developers debugging JAX/OpenXLA transforms or contributing compiler backends for Intel GPUs, offering a reproducible trace of how high-level Python math becomes optimized GPU kernels.
Loading comments...
loading comments...