Show HN: SyGra – Graph-oriented Synthetic data generation Pipeline for LLMs (github.com)

0 points 4 hours ago ago | visit original

🤖 AI Summary

ServiceNow’s new open-source tool SyGra is a graph-oriented framework for building complex synthetic-data generation pipelines aimed at LLM workflows. Built on LangGraph, SyGra lets you design end-to-end data flows as YAML-configured computational graphs (or run them via a Python library/CLI). Nodes represent actions—LLM calls, multi-LLM load‑balanced calls, lambdas, agents, samplers—and can include preprocessing, postprocessing, and prompt templates that inject seed data. Edges support conditional logic (Python code), parallel and one-to-many flows, and even loop-like behavior; final outputs are assembled from per-record graph state and written to configured sinks (file or Hugging Face). Technically, SyGra emphasizes modularity and scale: it streams or batches Hugging Face datasets, supports multiple inference backends (TGI, vLLM, Azure/Azure OpenAI, Ollama, Triton) via pluggable clients (inference runs are external), and centralizes model runtime parameters in models.yaml. Tasks combine YAML graph config with Python processors for custom logic, making it easy to extend with new node types. For the AI/ML community, SyGra simplifies reproducible synthetic data generation and orchestration of multi-step LLM interactions—useful for dataset augmentation, fine-tuning, and evaluation pipelines. The project is available under Apache 2.0.

Loading comments...

loading comments...