Open-sourced 3k human computer-use tasks dataset for training GUI agents (huggingface.co)

đŸ¤– AI Summary
Paradigm Shift AI has open-sourced a large multimodal dataset for training and evaluating GUI agents on Hugging Face: anaisleila/computer-use-data-psai. The release contains 3,167 completed human-computer tasks with full-screen video recordings (100% coverage, 16.9 GB), DOM snapshots for 1,766 tasks (55.8%, 24.4 GB), 14,740 embedded screenshots in a 7.87 GB Parquet file, and detailed timestamped interaction event logs. Tasks span browser (2,220) and desktop (947) workflows across ~294 websites and 173 applications, with metadata on task category, difficulty (Easy 79.4% / Medium 16.7% / Hard 3.9%), platform, and subcategories such as Search & Research, Shopping, Document Editing, and more. The whole dataset is ~49.2 GB and is downloadable via the Hugging Face datasets API or hf_hub_download. This is significant for AI/ML teams building GUI agents, web automation systems, and multimodal imitation- or reinforcement-learning models because it pairs pixel-level video with DOM structure and fine-grained action traces—enabling supervised action cloning, visual + DOM state alignment, reward-shaping from human demonstrations, and robustness testing across many real apps and sites. The dataset’s mix of videos, DOM trees, screenshots, and event timelines makes it especially useful for research into action grounding, intent inference, and generalizable UI policies, and it includes code-friendly access patterns for fast prototyping and targeted downloads.
Loading comments...
loading comments...