AI-powered search engines rely on “less popular” sources, researchers find (arstechnica.com)

0 points 3 days ago ago | visit original

🤖 AI Summary

Researchers from Ruhr University Bochum and the Max Planck Institute for Software Systems published a pre-print, “Characterizing Web Search in The Age of Generative AI,” showing that AI-powered search results cite much less popular websites than traditional organic Google links. They compared Google’s organic search, Google’s AI Overviews, Gemini-2.5-Flash, GPT-4o’s web search mode, and GPT-4o with a Search Tool, using queries drawn from the WildChat dataset, AllSides political topics, and the 100 most-searched Amazon products. Using Tranco domain-rankings to measure popularity, the team found AI engines frequently cite domains ranked well below Google’s Top 10 or Top 100: 53% of sources in Google’s AI Overviews didn’t appear in the Top 10 organic results, and 40% weren’t in the Top 100. Gemini was especially skewed, with the median cited domain outside Tranco’s Top 1,000 and many citations falling below even the Top 1,000,000. This matters for the AI/ML community because it quantifies how retrieval and citation behavior in generative search diverges from traditional ranking signals, with implications for credibility, misinformation risk, web traffic distribution, and SEO. The findings suggest LLM-based search prioritizes different signals—direct answerability, snippet match, or cached sources—over domain popularity, which can surface niche or low-quality sites. For researchers and engineers, the study highlights the need to audit retrieval pipelines, refine citation policies, and develop evaluation metrics that balance authoritative sourcing with coverage and freshness.

Loading comments...

loading comments...