Tuesday, March 3, 2015

Applying WebTables in Practice via Google

Applying WebTables in Practice Sreeram Balakrishnan, Alon Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu Google Research {sreevb,halevy,harb,hrlee,jayant,rostami,whshen,wilder,wufei,congyu}@google.com




We started investigating the collection of HTML tables on the Web and developed the WebTables system a few years ago [4]. Since then, our work has been motivated by applying WebTables in a broad set of applications at Google, resulting in several product launches. In this paper, we describe the challenges faced, lessons learned, and new insights that we gained from our efforts. The main challenges we faced in our efforts were (1) identifying tables that are likely to contain high-quality data (as opposed to tables used for navigation, layout, or formatting), and (2) recovering the semantics of these tables or signals that hint at their semantics. The result is a semantically enriched table corpus that we used to develop several services. First, we created a search engine for structured data whose index includes over a hundred million HTML tables. Second, we enabled users of Google Docs (through its Research Panel) to find relevant data tables and to insert such data into their documents as needed. Most recently, we brought WebTables to a much broader audience by using the table corpus to provide richer tabular snippets for fact-seeking web search queries on Google.com.



There are many pages on the Web that are filled with data in the form of tables. It's possible that if you weren't paying attention you may have missed Google Table Search entirely—it hasn't gotten a lot of press as far as I can tell. If you include tabular data on the pages of your site, though, you may be able to find tables from your site included in the results from a query in Google Table Search.


No comments:

Post a Comment