👋

Hi, I am Madelon Hulsebos, a PhD student at the UvA and PhD Student Researcher at Sigma.
I am interested in Intelligent Data Systems. So far, my research has been on learned table representations and their applications like data preparation, search, and analysis.

At the MIT Media Lab, I got the opportunity to develop Sherlock, a deep learning method for detecting table semantics at scale, enabling applications like data validation. The massive interest from industry in Sherlock inspired me to start a PhD (2020) at the INDE Lab (UvA) to focus on improving table models and their applicability in practice. A piece of this puzzle is GitTables: a dataset of 1.7M tables (and continuously growing) extracted from CSV files on GitHub, enriched with table semantics.

As part of the research community, I support JSys as Assistant Editor, co-organize the SemTab '21/'22' challenge, and reviewed for various tracks at e.g. NeurIPS '21, WWW '22, AIDB@VLDB '22, and EDBT '22. Besides academia, I am member of the supervisory board of a student consulting firm and was a data scientist for 2+ years, working on automating ML-driven analyses. You can read more in my resume.

Feel welcome to reach out (click on any channel below)!

Selected projects

The projects below are close to my main research interest. But I enjoy working on other topics too. Check my profile on Google Scholar for my full publication record.

GitSchemas [Döhmen et al., DBML@ICDE, 2022]
A dataset of approximately 50K real-world database schemas extracted from SQL files from GitHub.
paper | code/dataset

GitTables [Hulsebos et al., 2021]
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides

AdaTyper [Hulsebos et al., CIDR (abstract), 2022]
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation

Sato [Zhang et al., PVLDB, 2020]
Method for semantic data type detection that takes column context into account, extends Sherlock.
paper | code

Sherlock [Hulsebos et al., KDD, 2019]
DL method for semantic data type detection of table columns (top-10 MIT Media Lab repos, 3/10/22).
paper | website | code

VizNet [Hu et al., CHI, 2019]
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website

Recent news

  • Co-organizing Table Representation Learning workshop @ NeurIPS 2022.

    Jul 14, 2022

    I’m co-organizing a workshop on Table Representation Learning (TRL) at NeurIPS 2022. Read more here, and I hope to see you in New Orleans very soon!

  • Gave a talk at the KomPAKI seminar at TU Darmstadt.

    Jun 03, 2022

    I gave a talk about “Large Table Models for enterprise data management” at the KomPAKI seminar at TU Darmstadt which brings together AI and DB researchers. The slides can be found here.

  • Support JSys as Assistant Editor.

    Apr 28, 2022

    I am excited to support the Journal of Systems Research (JSys) as an Assistant Editor. The strong principles around open science, transparency, and rigor motivated me to apply for this position.

  • Joined Sigma Computing (again) as a PhD student researcher.

    Apr 16, 2022

    I am thrilled to join Sigma Computing again to continue my work on table representation models, making them work in practice, and deploying them in various applications.

  • Released improved code of Sherlock.

    Feb 10, 2022

    We released an improved version of the Sherlock codebase, check it out on GitHub. Many of these improvements were contributed by Chris Lowe, thanks Chris!