I created a lib for turning PyTorch training scripts into event driven systems (github.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

TorchSystem is a new open-source Python framework (pip install torchsystem) for building event-driven, domain-driven PyTorch systems that decouple model state from orchestration and infrastructure. It formalizes a "model aggregate" (neural net + loss, optimizer, metrics, etc.) and pairs it with stateless service handlers that perform domain tasks like training or evaluation. Services emit typed events (e.g., Trained, Evaluated) via a Producer, and Consumers react to those events to handle side effects such as logging, checkpointing, or metrics — keeping core logic testable and infrastructure-agnostic. Technically, TorchSystem provides a small DSL: Aggregate base classes, a registry (getname/gethash, register) to identify models, dependency injection primitives (Provider, Depends) for wiring resources, Service/Consumer decorators for handlers, Producer.dispatch for events, and a Compiler pattern to build/compile aggregates (device movement, weight restore, multiprocessing concerns). It’s pure Python, no infra required, and supports unions/generics for flexible event routing. The framework is significant for ML engineers building complex training pipelines or production model lifecycles because it enforces separation of concerns, improves reproducibility and testability, and makes it easier to plug in logging, tracking, or persistence systems without changing core training logic.

Loading comments...

loading comments...