Hi, I am Madelon Hulsebos, a PhD student at the UvA and PhD Student Researcher at Sigma.
I am interested in Intelligent Data Systems. So far, my research has been on learned table representations and their applications like data preparation, search, and analysis.

At the MIT Media Lab, I got the opportunity to develop Sherlock, a deep learning method for detecting table semantics at scale, enabling applications like data validation. The massive interest from industry in Sherlock inspired me to start a PhD (2020) at the INDE Lab (UvA) to focus on improving table models and their applicability in practice. A piece of this puzzle is GitTables: a dataset of 1.7M tables (and continuously growing) extracted from CSV files on GitHub, enriched with table semantics.

As part of the research community, I support JSys as Assistant Editor, co-organize the SemTab '21/'22' challenge, and reviewed for various tracks at e.g. NeurIPS '21, WWW '22, AIDB@VLDB '22, and EDBT '22. Besides academia, I am member of the supervisory board of a student consulting firm and was a data scientist for 2+ years, working on automating ML-driven analyses. You can read more in my resume.

Selected projects

The projects below are close to my main research interest. But I enjoy working on other topics too. Check my profile on Google Scholar for my full publication record.

GitTables [Hulsebos et al. (to appear), SIGMOD, 2023]
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides

GitSchemas [Döhmen et al., DBML@ICDE, 2022]
A dataset of approximately 50K real-world database schemas extracted from SQL files from GitHub.
paper | code/dataset

AdaTyper [Hulsebos et al. (abstract), CIDR, 2022]
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation

Sato [Zhang et al., PVLDB, 2020]
Method for semantic data type detection that takes column context into account, extends Sherlock.
paper | code

Sherlock [Hulsebos et al., KDD, 2019]
DL method for semantic data type detection of table columns (top-10 MIT Media Lab repos, 3/10/22).
paper | website | code

VizNet [Hu et al., CHI, 2019]
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website

