Data Integration in Data Lakes

Name: Data Integration in Data Lakes
Start: 2022-06-27T14:53:33Z
End: 2022-06-28T22:00:00Z
Location: Basel, Switzerland (HYBRID EVENT)

Abstract

Although big data is being discussed for some years, it still has many research challenges, such as the variety of data. The diversity of data sources often exists in information silos, which are a collection of non-integrated data management systems with heterogeneous schemas, query languages, and data models. It poses a huge difficulty to efficiently integrate, access, and query the large volume of diverse data in these information silos with the traditional ‘schema-on-write’ approaches such as data warehouses. Data lake systems have been proposed as a solution to this problem, which are repositories storing raw data in its original formats and providing a common access interface. In this talk, I will discuss the landscape of existing data lake problems, and our solutions for integrating multiple heterogeneous data sources in data lakes. I will also introduce the recent advances in supporting AI in data lakes.

Date

Jun 27, 2022 2:53 PM — Jun 28, 2022 10:00 PM

Event

PASC22 Conference Minisymposium “Leveraging Data Lakes to Manage and Process Scientific Data”

Location

Basel, Switzerland (HYBRID EVENT)

Slides: https://drive.google.com/file/d/18yJPuC1_8wJFC8emXFfGGcOLQjGDqm9p/view?usp=sharing

Data Integration in Data Lakes

Abstract

Rihan Hai

Assistant professor