Powerset's natural language search system (2012) (brenocon.com)

0 points 12 hours ago ago | visit original

🤖 AI Summary

Powerset was an early attempt (2005–2008) to bring deep natural-language understanding into web search. Built on Xerox PARC’s NLP engine, it indexed semantic relations and entities (often mapped to WordNet/Freebase nodes), performed constituent/unification parses, coreference and lexical lookups at index time, then matched semantic fragments of a user query against that rich index. The result was less a classic QA system and more query-focused summarization: snippets highlighted matching answers, with lots of keyword/ngram fallbacks when deep matches failed. Visible products included relation browsers (“Factz”/“Powermouse”) and a Freebase-oriented query UI. Its significance lies less in commercial success than in the engineering lessons and partial legacy. Doing full deep parses at web scale dramatically raised indexing cost (authors estimate ~100x vs. keyword indexing), introduced brittleness (odd tech choices like an early unweighted FST NER, segfault-prone parsers on early Hadoop), and drew skepticism from NLP/IR researchers. Still, Powerset helped seed ideas around structured answers in search and contributed infrastructure work (an early BigTable clone that evolved into HBase). Microsoft acquired the company and later filed patents on the tech. Powerset stands as a cautionary but influential experiment: deep semantic analysis can improve answer quality, but cost, robustness, and alternative shallow/ML approaches ultimately shaped mainstream search evolution (e.g., Google’s later QA/knowledge work and IBM Watson’s factoid systems).

Loading comments...

loading comments...