Our paper on data privacy accepted at DBML workshop

Will Sharing Metadata Leak Privacy?

By Danning Zhan and Rihan Hai

Abstract—In the dynamic field of data management and machine learning, achieving a balance between effective data use and privacy preservation is increasingly crucial. Federated learning exemplifies this challenge by training machine learning models on data distributed across isolated silos while adhering to privacy regulations like GDPR. A key aspect of this process involves sharing metadata, such as feature names, essential for model accuracy. Yet, the privacy implications of this metadata exchange have been largely unexplored.

This paper examines the potential privacy risks of communicating detailed metadata in federated learning frameworks. While metadata is critical for enhancing data utility and supporting advanced analytics, we address the paradox that it might inadvertently lead to privacy violations. We focus on functional dependencies (FDs) and relaxed functional dependencies (RFDs), which are crucial metadata types in database design and data quality. We aim to define data privacy formally and investigate how sharing these dependencies affects privacy preservation, using probabilistic methods and analytical discussions to understand their impact.

Rihan Hai
Rihan Hai
Assistant professor

My research focuses on data integration and related dataset discovery in large-scale data lakes.