Session

Minisymposium: MS2A - Leveraging Data Lakes to Manage and Process Scientific Data
Event TypeMinisymposium
Domains
Computer Science and Applied Mathematics
TimeMonday, June 2716:00 - 18:00 CEST
LocationOsaka Room
DescriptionIn recent years, data lakes have become increasingly popular as central storage, particularly for unstructured data. Generally, data lakes aim to integrate heterogeneous data from diverse sources into a unified information management system, where data is retained in its original format. Storing data in raw format, opposed to inferring a schema on write as it is commonly done in a data warehouse, supports the reuse and sharing of already collected data. The idea is to basically dump the data into the lake and later fish for knowledge using sophisticated analysis tools. This approach, however, is quite challenging since it has to be ensured that all data, no matter the number or size of the different data sets, will be found and can be accessed later on. In addition, especially for domain researchers in public research institutions, a research data management solution should not only ensure the preservation of the data but also support and guide scientists in complying with good scientific practices from the very beginning. In order to discuss the current challenges, their possible solutions and share personal insights into data lakes, we bring different experts together and discuss with the scientific community the potential and technical approaches.