Bringing NumPy's type-completeness score to nearly 90% – Pyrefly (pyrefly.org)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Quansight Labs, with support from Meta’s Pyrefly team, pushed NumPy’s Pyright “type-completeness” score from an initial 33% to about 88% by systematically fixing typos, adding annotations, and fully typing the MaskedArray API. The team first corrected measurement noise (excluding external untyped stdlib imports and test modules) to get an accurate baseline, then applied a one-line fix—replacing a mistyped CanIndex with SupportsIndex in ndarray.setfield—that alone moved coverage above 80%. The biggest win came from typing numpy.ma.MaskArray, which rose from ~20% to 100% and delivered a large chunk of the remaining improvement. Technically, the effort highlights why typing scientific libraries is painstaking: many NumPy APIs have return types that depend on argument combinations, so accurate stubs require multiple overloads (some methods needed up to nine). The work used Pyright for coverage reporting and emphasized that type-completeness measures presence of annotations (not their correctness), so running a type checker (mypy/Pyright/others) is a separate step. Remaining gaps include top-level numpy.ma functions, more precise shape-preserving overloads, missing defaults in stubs, and—critically—no type-checker enforced in NumPy’s CI. Adding a CI type-check and further overload refinement are prime opportunities for contributors to improve developer ergonomics across the Python data ecosystem.

Loading comments...

loading comments...