The projects below reflect my main research interest. But I enjoy working on other topics too. Check my profile on Google Scholar for my full publication record.
Dataset Search [WIP, 2024]
1) Survey results surfacing why, what, and how is searched for data, key open challenges, and system desiderata.
2) System (tbc).
1) paper survey
1) Survey results surfacing why, what, and how is searched for data, key open challenges, and system desiderata.
2) System (tbc).
1) paper survey
GitSchemas [DBML@ICDE 2022, SIGMOD 2024]
A dataset of approximately 50K real-world database schemas extracted from SQL files from GitHub.
paper | code/dataset
A dataset of approximately 50K real-world database schemas extracted from SQL files from GitHub.
paper | code/dataset
Observatory [PVLDB, NeurIPS, 2023]
1) Framework for analyzing table embeddings based on the relational model, and desiderata for TRL models.
2) Library for extracting table embeddings on row- column-, cell-level.
1) analysis paper | 2) library paper | code
1) Framework for analyzing table embeddings based on the relational model, and desiderata for TRL models.
2) Library for extracting table embeddings on row- column-, cell-level.
1) analysis paper | 2) library paper | code
GitTables [SIGMOD, 2023]
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides | podcast
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides | podcast
AdaTyper [CIDR, 2022]
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation
Sherlock [KDD, 2019]
DL method for semantic data type detection of table columns (top-5 MIT Media Lab repos, 2 Aug 23).
paper | website | code
DL method for semantic data type detection of table columns (top-5 MIT Media Lab repos, 2 Aug 23).
paper | website | code
VizNet [CHI, 2019]
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website