The First Fully General Computer Action Model (si.inc)

🤖 AI Summary
Researchers have unveiled FDM-1, the first fully general computer action model, which marks a significant advancement in AI/ML applications for automation across various fields such as CAD, finance, and engineering. Unlike previous models that relied on limited annotated datasets and short-context training, FDM-1 leverages an 11-million-hour video dataset to directly learn from extensive computer use footage. This innovation allows FDM-1 to process long contexts and operate on high-frame-rate video, expanding the range of tasks it can tackle, including complex 3D modeling and web navigation, with much greater efficiency compared to past efforts. A key element of FDM-1 is its novel video encoder, which compresses nearly two hours of video into just one million tokens, achieving a 50x improvement in token efficiency over prior models. By employing an inverse dynamics model (IDM) for labeling, FDM-1 generates accurate predictions of user actions from video content, drastically increasing the volume of usable training data without the prohibitive costs associated with human annotation. This model's ability to understand and capture long-horizon workflows positions it to revolutionize how AI can assist with intricate, multi-step computer tasks, ultimately paving the way for more competent AI agents in both professional and everyday contexts.
Loading comments...
loading comments...