Wed 10 Jan 2018 10:55 - 11:20 at Bunker Hill - Strings Chair(s): Zachary Tatlock

Data integration between web sources and relational data is a key challenge faced by data scientists and spreadsheet users. There are two main challenges in programmatically joining web data with relational data. First, most websites do not expose a direct interface to obtain tabular data, so the user needs to formulate a logic to get to different webpages for each input row in the relational table. Second, after reaching the desired webpage, the user needs to write complex scripts to extract the relevant data, which is often conditioned on the input data. Since many data scientists and end-users come from diverse backgrounds, writing such complex regular-expression based logical scripts to perform data integration tasks is unfortunately often beyond their programming expertise.

We present WebRelate, a system that allows users to join semi-structured web data with relational data in spreadsheets using input-output examples. WebRelate decomposes the web data integration task into two sub-tasks. The first sub-task generates the URLs for the webpages containing the desired data for all rows in the relational table. WebRelate achieves this by learning a string transformation program using a few example URLs. The second sub-task uses examples of desired data to be extracted from the corresponding webpages and learns a program to extract the data for the other rows.We design expressive domain-specific languages for URL generation and web data extraction, and present efficient synthesis algorithms for learning programs in these DSLs from few input-output examples. We evaluate WebRelate on 88 real-world data integration tasks taken from online help forums and other anonymous sources, and show that WebRelate can learn the programs to perform desired web data integration tasks within few seconds using only 1 example for the majority of the tasks.

Conference Day
Wed 10 Jan

Displayed time zone: Tijuana, Baja California change

10:30 - 12:10
StringsResearch Papers at Bunker Hill
Chair(s): Zachary TatlockUniversity of Washington, Seattle
10:30
25m
Talk
Synthesizing Bijective Lenses
Research Papers
Anders MiltnerPrinceton University, Kathleen FisherTufts University, Benjamin C. PierceUniversity of Pennsylvania, David WalkerPrinceton University, Steve ZdancewicUniversity of Pennsylvania
10:55
25m
Talk
WebRelate: Integrating Web Data with Spreadsheets using Examples
Research Papers
Jeevana Priya InalaMIT, Rishabh SinghMicrosoft Research
11:20
25m
Talk
What's Decidable About String Constraints with ReplaceAll Function?
Research Papers
Taolue ChenBirkbeck, University of London, Yan ChenState Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Matthew HagueRoyal Holloway, University of London, Anthony Widjaja LinOxford University, Zhilin WuState Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences
11:45
25m
Talk
String Constraints with Concatenation and Transducers Solved Efficiently
Research Papers
Lukáš HolíkBrno University of Technology, Anthony Widjaja LinOxford University, Petr JankůBrno University of Technology, Philipp RuemmerUppsala University, Tomáš VojnarBrno University of Technology