🤖 AI Summary
A short thread sparked a reframing of drug discovery as a decision-theory problem: many approved drugs (aspirin, metformin, cyclosporin, etc.) would have been discarded by conventional “druglike” heuristics, and natural products — which account for over half of FDA-approved drugs — are often chemically out-of-distribution. The author argues that predicting successful candidates is a high-cost, high-reward task concentrated in regions defined by unobserved covariates, so ML/heuristic tests must be judged not just by accuracy but by how their errors map to real costs and rewards.
To make this concrete the post builds a simple expected-utility model using sensitivity (se), specificity (sp), prevalence of true positives (p), go-cost (c), reward (r), follow-up cost (fup), and a utility transform f (linear, log, isoelastic, or high-risk/high-reward shapes). It enumerates the four outcome scenarios (TP found/missed, TN correctly/incorrectly ignored), weights their utilities by probability, and sums them to get expected utility — visualized as a heatmap over se and sp (clipped to [-1,1]). Practical implications for ML: thresholding and model choice should be driven by prevalence and asymmetric costs, not just AUC; sensitivity matters more when true positives are rare, specificity when common; and richer approaches (including causal decision theory) may be needed when interventions change downstream outcomes.
Loading comments...
login to comment
loading comments...
no comments yet