Hi, I am Madelon Hulsebos, a PhD student at the UvA and PhD Student Researcher at Sigma.
I am interested in Intelligent Data Systems. So far, my research has been on learned table representations and their applications like data preparation, search, and analysis.
At the MIT Media Lab, I got the opportunity to develop Sherlock, a deep learning method for detecting table semantics at scale, enabling applications like data validation. The massive interest from industry in Sherlock inspired me to start a PhD (2020) at the INDE Lab (UvA) to focus on improving table models and their applicability in practice. A piece of this puzzle is GitTables: a dataset of 1.7M tables (and continuously growing) extracted from CSV files on GitHub, enriched with table semantics.
As part of the research community, I support JSys as Assistant Editor, co-organize the SemTab '21/'22' challenge, and reviewed for various tracks at e.g. NeurIPS '21, WWW '22, AIDB@VLDB '22, and EDBT '22. Besides academia, I am member of the supervisory board of a student consulting firm and was a data scientist for 2+ years, working on automating ML-driven analyses. You can read more in my resume.
Feel welcome to reach out (click on any channel below)!
Selected projects
The projects below are close to my main research interest. But I enjoy working on other topics too. Check my profile on Google Scholar for my full publication record.
A dataset of approximately 50K real-world database schemas extracted from SQL files from GitHub.
paper | code/dataset
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation
Method for semantic data type detection that takes column context into account, extends Sherlock.
paper | code
DL method for semantic data type detection of table columns (top-10 MIT Media Lab repos, 3/10/22).
paper | website | code
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website
Recent news
Jul 14, 2022
I’m co-organizing a workshop on Table Representation Learning (TRL) at NeurIPS 2022. Read more here, and I hope to see you in New Orleans very soon!
Jun 03, 2022
I gave a talk about “Large Table Models for enterprise data management” at the KomPAKI seminar at TU Darmstadt which brings together AI and DB researchers. The slides can be found here.
Apr 28, 2022
I am excited to support the Journal of Systems Research (JSys) as an Assistant Editor. The strong principles around open science, transparency, and rigor motivated me to apply for this position.
Apr 16, 2022
I am thrilled to join Sigma Computing again to continue my work on table representation models, making them work in practice, and deploying them in various applications.
Feb 10, 2022
We released an improved version of the Sherlock codebase, check it out on GitHub. Many of these improvements were contributed by Chris Lowe, thanks Chris!