🤖 AI Summary
Genesis Preflight is an open-source, zero-dependency Rust CLI released to help researchers prepare scientific datasets for the U.S. Department of Energy’s Genesis Mission by validating FAIR (Findable, Accessible, Interoperable, Reusable) compliance and generating AI-ready documentation. Targeted at labs, universities, and data stewards, it scans datasets, produces README/metadata/DATACARD/MANIFEST and schema templates, and returns a numerical compliance score (0–100) plus actionable issues. The tool is MIT-licensed, audit-friendly (no unsafe code, no network or telemetry), and designed for use in sensitive or air-gapped environments common in national labs.
Technically, Genesis Preflight is built only from the Rust standard library and performs full-file, streaming CSV analysis (memory O(columns), not O(rows)) to infer column types with an 80% threshold and track type distributions across multi-gigabyte files. It validates required files and content (README >200 chars, recognized license text, metadata.json fields), verifies SHA‑256 manifests (FIPS 180‑4), auto-detects CSV delimiters, outputs JSON reports for CI/CD, and exposes exit codes for gating. Scoring splits FAIR into four 0–25 buckets with critical/ warning penalties. The combination of reproducible, explainable checks, offline operation, and CI-friendly outputs makes it immediately useful for automating dataset quality gates and producing AI-ready corpora for ML workflows and large-scale DOE data catalogs.
Loading comments...
login to comment
loading comments...
no comments yet