VCBench: Benchmarking LLMs in Venture Capital (arxiv.org)

0 points 2 hours ago ago | visit original

🤖 AI Summary

VCBench is introduced as the first public benchmark for predicting founder success in venture capital, a domain characterized by sparse signals and highly uncertain outcomes. The dataset contains 9,000 anonymized founder profiles that are standardized to retain predictive features while resisting identity leakage—adversarial tests reportedly reduce re-identification risk by more than 90%. Baseline performance is very low (market index precision 1.9%); by comparison, Y Combinator outperforms that index by 1.7× and tier‑1 firms by 2.9×. The paper evaluates nine state‑of‑the‑art LLMs: DeepSeek‑V3 achieves over six times the baseline precision, GPT‑4o attains the highest F0.5 (a metric that weights precision more heavily), and most models surpass human benchmarks on this task. VCBench’s significance lies in providing a reproducible, privacy‑preserving, community‑driven standard for testing AGI-style forecasting on real‑world, low‑signal problems. Technical contributions include the anonymization and standardization pipeline, adversarial privacy evaluations, and a public, evolving dataset and evaluation suite. For the AI/ML community, VCBench offers a new, high‑stakes testbed to compare models on economically relevant prediction, probe calibration and precision-focused metrics, and study how LLMs extract weak signals—while also raising practical and ethical questions about automated decision support in venture investing.

Loading comments...

loading comments...