Smarter Adaptive Graph Sampling for More Accurate Graph Learning (kumo.ai)

🤖 AI Summary
Graph neural networks struggle in production because real-world relational graphs have billions of nodes, so practitioners use neighbor sampling (e.g., GraphSAGE) to build small computation graphs. GraphSAGE uses a fixed number of neighbors per hop, which is memory-efficient but brittle on heterogeneous relational schemas: users with very short or very long histories either under‑utilize or waste sampling budget and can end up over-sampling less informative relations (like location). That static heuristic fails to adapt to per-example graph structure and can harm link-prediction accuracy. Kumo’s adaptive metapath-aware sampler fixes this by making the sampling budget metapath-aware and redistributable: when a planned neighbor type is undersampled, the algorithm oversamples nodes that preserve the same metapath (or children along that path) so the final sampled subgraph keeps the intended semantic balance. This lets you safely allocate large budgets to the first hop (e.g., 1000 transactions, 200 locations) and rely on redistribution downstream. On RelBench link-prediction tasks it yields large gains (e.g., rel-amazon purchase map@10 from 0.014→0.021, +50%; other rel-amazon tasks +35–46%; smaller gains on avito/h&m), though some tasks (one StackOverflow relation) saw a small negative. The approach improves accuracy and resource utilization for heterogeneous relational GNNs in production, while reminding that metapath alignment matters and benefits are dataset-dependent.
Loading comments...
loading comments...