Benchmarking Humans and AI in Contract Drafting (www.legalbenchmarks.ai)

🤖 AI Summary
A benchmarking report (July–Aug 2025) that evaluated 13 AI systems against a human in-house lawyer baseline (450 task outputs, 72 survey responses, 12 interviews) finds that modern AI can match or exceed lawyers at producing reliable first drafts of contracts. Using a three‑dimension framework—Output Reliability (instruction compliance, factual accuracy, legal adequacy; pass/fail), Output Usefulness (clarity, helpfulness, length; 1–3 each, max 9) and Platform Workflow Support (generation + quality assurance features; max 10)—the study shows top models (Gemini 2.5 Pro at 73.3% reliability, plus GPT-5, GC AI, Brackets, August, SimpleDocs) beating the human baseline (56.7% reliability; rising to 61.5% with AI assistance). Notably, legal AI flagged material risks far more often than general-purpose models (83% vs 55%), while human reviewers raised none. Technically significant findings: general-purpose LLMs slightly edged legal-specific tools on raw output reliability, while legal platforms scored higher on usefulness and, crucially, workflow integration (66.7% of tools integrate into Microsoft Word). Platform Workflow Support—context handling, template/playbook grounding, and verification features—emerged as the main differentiator for adoption, not pure model accuracy. Implications: legal teams should prioritize tools that combine strong reliability with seamless Word integration and QA features; AI can materially reduce drafting time and surface overlooked risks, but outputs still require lawyer oversight and continuous verification as capabilities evolve.
Loading comments...
loading comments...