I built a vulnerable app and spent $1,500 seeing if LLMs could hack it (kasra.blog)

0 points 2 hours ago ago | visit original

🤖 AI Summary

A security researcher developed a deliberately vulnerable book review app using React Native and FastAPI to test whether various large language models (LLMs) could exploit common vulnerabilities, particularly those related to Firebase. Spending $1,500 on the endeavor, the researcher attempted to see if LLMs could find a way to directly sign up as users and access private data in the Firestore database, effectively showcasing a typical case of Broken Access Control or Missing Object-Level Authorization. The results highlighted significant variances in the capabilities of different models. GPT-5.5 proved most effective, achieving a 70% success rate, while others like Deepseek V4 Pro and Claude struggled to focus on the Firebase exploit. The experiment indicates a crucial lesson for the AI/ML community: while LLMs can demonstrate proficiency in certain areas, their approach to security exploits can be inconsistent, revealing gaps in their training or focus. This research emphasizes the need for better understanding and tools to secure applications against such attacks, particularly those reliant on third-party services like Firebase. The findings also encourage exploration of LLMs’ applicability in security research and suggest opportunities for improving AI-based security measures.

Loading comments...

loading comments...