Show HN: Realistic malicious encrypted traffic datasets for ML (maltracer.com)

0 points 247 days ago ago | visit original

🤖 AI Summary

MalTracer is a Show HN release that supplies a continuously updated dataset of realistic malicious encrypted traffic aimed at ML researchers and threat hunters. The project runs live malware samples in sandboxes and captures only the malicious TLS traffic at the flow level, producing labeled network flows (not full payloads) that reflect contemporary adversary behavior. The authors argue existing public datasets are stale or unrealistic, and MalTracer’s automated, ongoing execution pipeline is intended to give models fresher, higher-fidelity training and evaluation data. For the AI/ML community this matters because encrypted traffic classification and threat detection increasingly depend on metadata and flow characteristics (timing, packet sizes, byte/packet counts, TLS fingerprints like JA3/JA3S, SNI and certificate fields) rather than payloads. MalTracer’s flow-level focus preserves privacy while providing the features ML systems need to train detectors, anomaly models, and threat attribution tools. Practical implications include easier benchmarking of encrypted-traffic classifiers and improved robustness to current malware C2 patterns; caveats include potential dataset bias from sandboxed execution, class imbalance if only malicious flows are stored, and the need to verify labeling and representativeness for real-world deployment. The project is soliciting feedback from practitioners.

Loading comments...

loading comments...