Pickle isn't slow, it's a protocol (2018) (matthewrocklin.com)

🤖 AI Summary
Dask engineers traced a performance problem to PyTorch model serialization: sending models between workers was painfully slow (about 1 MB/s for GPU-backed tensors and ~50 MB/s for CPU tensors) because PyTorch’s __reduce__ implementation converted tensors to Python lists via .tolist(). Rather than special-casing PyTorch inside Dask, the team opened an upstream issue and a small change was made in PyTorch to serialize tensors by using torch.save into a bytes buffer (io.BytesIO) and returning a loader that calls torch.load. That five-line fix raised serialization throughput to roughly 1 GB/s, removing the need for consumers to implement bespoke workarounds. The broader point is architectural: Pickle is a protocol, not an inherently slow library — poor implementations of the protocol introduce bottlenecks. By implementing __reduce__ efficiently (returning a loader and compact byte payload), libraries can interoperate with high performance across the Python ecosystem, sparing other projects from special-case code. Specialized serializers still make sense for advanced scenarios (zero-copy shared memory, RDMA/NVLink, preserving tensor-view relationships in multiprocessing), but default protocol implementations should be fast and correct so that ecosystem-level integrations (Dask, Ray, PySpark, IPython parallel, etc.) work well out of the box.
Loading comments...
loading comments...