Although big data is being discussed for some years, it still has many research challenges, such as the variety of data. The diversity of data sources often exists in information silos, which are a collection of non-integrated data management systems with heterogeneous schemas, query languages, and data models. It poses a huge difficulty to efficiently integrate, access, and query the large volume of diverse data in these information silos with the traditional ‘schema-on-write’ approaches such as data warehouses. Data lake systems have been proposed as a solution to this problem, which are repositories storing raw data in its original formats and providing a common access interface. In this talk, I will discuss the landscape of existing data lake problems, and our solutions for integrating multiple heterogeneous data sources in data lakes. I will also introduce the recent advances in supporting AI in data lakes.