An Empirical Study on Why LLMs Struggle with Password Cracking (arxiv.org)

🤖 AI Summary
Researchers conducted an empirical study testing whether pre-trained, open-source LLMs can guess passwords from structured user attributes. They prompted models such as TinyLLaMA, Falcon‑RW‑1B and Flan‑T5 to generate plausible passwords from synthetic user profiles (name, birthdate, hobbies, etc.) and evaluated outputs against plaintext and SHA‑256 hashed targets using Hit@1/5/10. The results were stark: all models achieved under 1.5% accuracy at Hit@10, far behind traditional rule‑based and combinator cracking tools. Detailed analysis and visualizations attribute the failures to weaknesses in LLMs’ generative reasoning for this task and an inability to memorize or reproduce the highly domain‑specific patterns present in real leaked password datasets. The study is significant because it tempers concerns that off‑the‑shelf LLMs are immediate, effective threats for automated password guessing, while highlighting important caveats: without supervised fine‑tuning on password corpora or specialized domain adaptation, LLMs remain ill‑suited to this adversarial use case. For the AI/ML community the paper pinpoints concrete limitations (generalization, memorization, and task framing) and calls attention to future directions—secure, privacy‑preserving password modeling, risk assessment for domain fine‑tuning, and hybrid approaches that combine linguistic models with rule/combinator heuristics.
Loading comments...
loading comments...