Resources
Overarching Data Management Ecosystem HELIPORT
26th April 2024
Oliver Knodel
Physical Sciences in NFDI Workshop: FAIR Data Principles in Physical Sciences in NFDIFacilitating Research Data Management with HELIPORT
11th April 2024
Oliver Knodel, Stefan E. Müller, David Pape
28th HiRSE SeminarAbstract
Researchers rely on a variety of systems and tools when it comes to administering their research data. Processes involving research data management include proposal submission, data management planning, simulation campaigns, documentation during the experiment, and the creation and submission of journal and data publications. HELIPORT is a data management solution that aims at making all steps of the research experiment’s life cycle discoverable, accessible, interoperable and reusable according to the FAIR principles. This is done by linking to and interfacing with established tools and solutions, and exchanging metadata between systems involved in a project. The metadata are presented to the researchers through a web interface, but they are also accessible to computational agents via API and machine-readable landing pages. In this presentation, we will introduce the metadata project HELIPORT and what provided the impulse for the project, discuss the documentation of a real experiment in HELIPORT, and outline current developments and challenges.
Documenting ML Experiments in HELIPORT
March 2024
David Pape, Oliver Knodel, Sebastian Starke
deRSE24 - Conference for Research Software Engineering in GermanyAbstract
HELIPORT is a data management guidance system that aims at making the components and steps of the entire research experiment's life cycle findable, accessible, interoperable and reusable according to the FAIR principles. It integrates documentation, computational workflows, data sets, the final publication of the research results, and many more resources. This is achieved by gathering metadata from established tools and platforms and passing along relevant information to the next step in the experiment's life cycle. HELIPORT's high-level overview of the project allows researchers to keep all aspects of their experiment in mind.
A particularly interesting use case are machine learning projects. They are often prototypical in nature and driven by iterative development, so reproducibility and tranparency are a great concern. It is essential to keep track of the relationship between input data, choices in model parameters, the code version in use, and performance measures and generated outputs at all times. This requires a data management platform that automatically records the changes made and their effects. Existing MLOps tools (such as Weights and Biases, MLFlow) live entirely in the ML domain and start their workflow with the assumption that data is available. HELIPORT, on the other hand, takes care of the data lifecycle as well. Our envisioned platform interoperates with the domain specific tools already used by the scientists, and is able to extract relevant metadata (e.g. provenance). It can also make persistent any additional information such as papers the work was based on, documentation of software components, workflows, or failure cases. Moreover, it should be possible to publish these metadata in machine-readable formats.
The challenge arising from these aspects consists in integrating ML workflows into HELIPORT in such a way that they work on the provided data and metadata. The goal is also to enable the comprehensible development of ML models alongside the experiment documented in HELIPORT. This allows different teams (e.g. experimentalists and AI specialists) to work together on the same project in a seamless manner, and help generate FAIRer outcomes. In the long term we hope to aide in establishing digital twins of facilities, and making their maintenance a part of the data management proces.
HELIPORT: An overarching Data Management System at HZDR
March 2024
Stefan E. Müller, Thomas Gruber, Oliver Knodel, Jeffrey Kelling, Mani Lokamani, David Pape, Martin Voigt, Guido Juckeland
DPG-Frühjahrstagung 2024Abstract
Researchers at the Helmholtz-Zentrum Dresden-Rossendorf rely on a variety of systems and tools when it comes to administer their research data. Processes involving research data management include the project planning phase (proposal submission to the beamtime proposal management system, the creation of data management plans and data policies), the documentation during the experiment or simulation campaign (electronic laboratory notebooks, wiki pages), backup- and archival systems and the final journal and data publications (collaborative authoring tools, meta-data catalogs, software and data repositories, publication systems). In addition, modern research projects are often required to interact with a variety of software stacks and workflow management systems to allow reproducibility on the underlying IT infrastructure. The "HELmholtz ScIentific Project WORkflow PlaTform" (HELIPORT), which is currently developed by researchers at HZDR and their collaborators, tries to facilitate the management of research data and metadata by providing an overarching guidance system which combines all the information by interfacing the underlying processes and even includes a workflow engine which can be used to automate processes like data analysis or data retrieval.
Pioneering Digital Research Landscapes: Innovations at HZDR
February 2024
Oliver Knodel
Helmholtz Open Science Forum: Towards Open Digital Research Ecosystems – Interconnecting InfrastructuresAbstract
Digital infrastructures have become indispensable in the field of modern research and science. These technological frameworks play a crucial role for the entire research cycle, supporting literature searches, aiding in data collection and analysis, facilitating the creation and publication of scholarly works, and ensuring the thorough documentation and long-term storage of research findings. Additionally, these infrastructures serve as a vital means for networking and communication among peers, creating the essential foundation of an open and transparent science and research ecosystem. In this lecture, the entire digital research landscape at the HZDR will be presented and illustrated using a representative experiment.
Open Research Project Guidance System: HELIPORT
February 2024
Tobias Huste, Oliver Knodel, Thomas Gruber, Jeffrey Kelling, Mani Lokamani, Stefan E. Müller, David Pape, Martin Voigt, Guido Juckeland, Malte C. Kaluza, Joachim Hein, Alexander Kessler, Chien-Li Lee and Bernd Schuller
Helmholtz Open Science Forum: Research SoftwareAbstract
In this presentation, Heliport is outlined with a focus on the research software product Heliport itself. The project was initially funded by the HElmholtz Metadata Collaboration (HMC).
Overarching Data Management Ecosystem at HZDR
September 2023
Oliver Knodel, Thomas Gruber, Jeffrey Kelling, Mani Lokamani, Stefan E. Müller, David Pape, Martin Voigt and Guido Juckeland
Vol. 1 (2023): 1st Conference on Research Data Infrastructure (CoRDI) - Connecting CommunitiesAbstract
When dealing with research data management, researchers at Helmholtz- Zentrum Dresden – Rossendorf (HZDR) face a variety of systems and tools. These range from the project planning phase (proposal management, data management plans and policies), over documentation during the experiment or simulation campaign, to the publication (collaborative authoring tools, metadata catalogs, publication systems, data repositories). In addition, modern research projects usually are required to interact with a variety of software stacks and workflow management systems to allow comprehensi- ble and FAIR science on the underlying IT infrastructure (HPC, data storage, network file systems, archival). This article first demonstrates the data management systems and services provided at HZDR, followed by an overview of a self-developed guidance system. It is concluded by a real-world example.
First HELIPORT Workshop: Book of Abstracts
05. - 06. October 2022
First HELIPORT Community Workshop 2023Alexander Kessler, Alexey Ponomaryov, Andrew K. Mistry, Anton Barty, Arie Irman, Astrid Schneidewind, Bernd Schuller, Boxing Gou, Brian Edward Marre, Bridget Murphy, Carina Becker, Carolin Hundt, Chien-Li Lee, Christian Gutt, Christiane Schneide, Claudia Engelhardt, David Pape, Florian Rau, Frank Maas, Frank Schreiber, Friedrich Bethke, Gerrit Guenther, Guido Juckeland, Gunnar Pruß, Hans-Peter Schlenvoigt, Jan-Christoph Deinert, Jan-Dierk Grunwaldt, Jeffrey Kelling, Joachim Hein, Johannes Sperling, Kilian Schwarz, Kristin Elizabeth Tippey, Leon Steinmeier, Lisa Amelung , Malte Christoph Kaluza, Mani Lokamani, Marc Hanisch, Martin Voigt, Michael Bussmann, Moritz Kurzweil, Nico Hoffmann, Nicole Wagner, Oliver Knodel, Oonagh Mannix, Patrick Ufer, Peter Baumgärtel, Ralph Müller-Pfefferkorn, Sebastian Baunack, Sebastian Busch, Sebastian Sachse, Sebastian Starke, Sergey Kovalev, Simone Vadilonga, Stefan Bock, Stefan Mueller, Susanne Schoebel, Thomas Gruber, Thomas Kluge, Tobias Unruh, Wiebke Lohstroh, Wolfgang Horn
Abstract
In our HELIPORT workshop, we will provide insights into our project and share our results. In addition, we would like to provide a platform for the presentation of similar projects, as well as extensions or integrations from the surrounding research areas. The overall goal of the workshop is bringing together different institutions with similar challenges and establishing a community around our HELIPORT project. We welcome submissions on related projects, metadata in our scientific field in general or workflows, in the form of talks or posters. We also welcome first or future HELIPORT use-cases from within our community!
Project HELIPORT: The Integrated Research Data Lifecycle of the HELIPORT Project
05. - 06. October 2022
Helmholtz Metadata Collaboration | Conference 2022Oliver Knodel, Martin Voigt, Robert Ufer, David Pape, Mani Lokamani, Jeffrey Kelling, Stefan E. Müller, Thomas Gruber, Guido Juckeland, Malte C. Kaluza, Joachim Hein, Alexander Kessler, Chien-Li Lee and Bernd Schuller
Abstract
The HELIPORT project aims to make the components or steps of the entire life cycle of a research project at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) and the Helmholtz-Institute Jena (HIJ) discoverable, accessible, interoperable and reusable according to the FAIR principles. In particular, this data management solution deals with the entire lifecycle of research experiments, starting with the generation of the first digital objects, the workflows carried out and the actual publication of research results. For this purpose, a concept was developed that identifies the different systems involved and their connections. By integrating computational workflows (CWL and others), HELIPORT can automate calculations that work with metadata from different internal systems (application management, Labbook, GitLab, and further). This presentation will cover the first year of the project, the current status and the path taken so far in the life cycle of the project.
Intergrated Data Workflow using HELIPORT at TELBE
05. - 06. October 2022
Helmholtz Metadata Collaboration | Conference 2022Lokamani, David Pape, Thomas Gruber, Jan-Christoph Deinert, Martin Voigt, Oliver Knodel, Jeffrey Kelling, Stefan E. Müller and Guido Juckeland
Abstract
At the High-Field High-Repetition-Rate Terahertz facility @ELBE (TELBE), ultrafast terahertz-induced dynamics can be probed in various states of matter with highest precision. The TELBE sources offer both, stable and tunable narrowband THz radiation with pulse energies of several microjoules at high repetition rates and a synchronized coherent diffraction radiator,that provides broadband single-cycle pulses. The measurements at TELBE are data intensive, which can be as high as 20GB per experiment, that can lasts up to several minutes. As a result, the current data aquisition and data analysis stages are decoupled, where in the first step the primary data is processed and stored at HZDR and in a later step, restricted data access is made available to the user for post-processing. In this poster contribution, we present an integrated workflow for post-processing of the experimental data at TELBE with in-built exchange of metadata between the experiment control software LabView and the workflow execution engine UNICORE. We also present the guidance system HELIPORT[3] which manages the metadata of the associated project proposal and job information from UNICORE, and integrates with the electronic lab notebook (MediaWiki), providing a user-friendly interface for monitoring the actively running experiments at TELBE.
HELIPORT — An Integrated Research Data Lifecycle
05. - 06. October 2022
Helmholtz Metadata Collaboration | Conference 202226. September 2022
8th Annual "Matter and Technologies" Meeting22. September 2022
3. Sächsische FDM-Tagung - Forschungsdatenmanagement im Spannungsfeld zwischen Idealen, Anforderungen und Praxis05. - 07. September 2022
German Conference for Research with Synchrotron Radiation, Neutrons and on Beams at Large FacilitiesOliver Knodel, David Pape, Martin Voigt, Thomas Gruber, Jeffrey Kelling, Mani Lokamani, Stefan E. Müller, Guido Juckeland, Alexander Kessler, Joachim Hein, Malte C. Kaluza and Bernd Schuller
Abstract
HELIPORT is a data management solution that aims at making the components and steps of the entire research experiment’s life cycle discoverable, accessible, interoperable and reusable according to the FAIR principles. Among other information, HELIPORT integrates documentation, scientific workflows, and the final publication of the research results - all via already established solutions for proposal management, electronic lab notebooks, software development and devops tools, and other additional data sources. The integration is accomplished by presenting the researchers with a high-level overview to keep all aspects of the experiment in mind, and automatically exchanging relevant metadata between the experiment’s life cycle steps. Computational agents can interact with HELIPORT via a REST API that allows access to all components, and landing pages that allow for export of digital objects in various standardized formats and schemas. An overall digital object graph combining the metadata harvested from all sources provides scientists with a visual representation of interactions and relations between their digital objects, as well as their existence in the first place. Through the integrated computational workflow systems, HELIPORT can automate calculations using the collected metadata. By visualizing all aspects of large-scale research experiments, HELIPORT enables deeper insights into a comprehensible data provenance with the chance of raising awareness for data management.
A FAIRly Integrated Scientific Project Lifecycle
15. July 2022
Oliver Knodel, Martin Voigt, Robert Ufer, David Pape, Mani Lokamani, Jeffrey Kelling, Stefan E. Müller, Thomas Gruber, Guido Juckeland, Malte C. Kaluza, Joachim Hein, Alexander Kessler and Bernd Schuller
HMC DialogueAbstract
The talk introduces the general idea behind the HELIPORT project, which aims to make the entire life cycle of a scientific experiment or project discoverable, accessible, interoperable and reusable by providing an overview from a top-level perspective. Specifically, our data management solution addresses the areas from data generation to publication of primary research data, computing workflows performed and the actual research results.
HELIPORT - An Integrated Research Data Lifecycle
5. May 2022
Oliver Knodel, Martin Voigt, Robert Ufer, David Pape, Mani Lokamani, Jeffrey Kelling, Stefan E. Müller, Thomas Gruber, Guido Juckeland, Malte C. Kaluza, Joachim Hein, Alexander Kessler and Bernd Schuller
ZIH colloquium at TU DresdenAbstract
The guidance system HELIPORT aims to make the components or steps of the entire life cycle of a research project at Helmholtz-Zentrum Dresden-Rossendorf (HZDR) discoverable, accessible, interoperable and reusable according to the FAIR principles. In particular, this data management solution deals with the entire lifecycle of research experiments, starting with the generation of the first digital objects, the workflows carried out and the actual publication of research results. For this purpose, a concept was developed that identifies the different systems involved and their connections. By integrating computational workflows (CWL and others), HELIPORT can automate calculations that work with metadata from different internal systems (application management, Labbook, GitLab, and further).In this lecture, the overall system will be presented using a practical example.
Presentation for ELBE Beamline Scientists at HZDR
4. February 2022
Oliver Knodel, Stefan E. Müller
Abstract
The presentation gives an overview on the HELIPORT project. Furthermore the presentation gives insight into our motivation developing a guidance system which is now known under the name HELIPORT.
Project Poster
24. January 2022 (updated)
Oliver Knodel, Martin Voigt, Robert Ufer, David Pape, Mani Lokamani, Jeffrey Kelling, Stefan E. Müller, Thomas Gruber, Guido Juckeland, Malte C. Kaluza, Joachim Hein, Alexander Kessler and Bernd Schuller
Abstract
The HELIPORT poster provides a short overview on the project and introduces the usage of Handles, the workflow integration, the top-level project plan and the project metadata schema.
Full Integrated Research Data Lifecycle – The Project HELIPORT
16. December 2021
Oliver Knodel
SaxFDM Digital KitchenAbstract
Wissenschaftliche Experimente nutzen eine große Bandbreite an verschiedenen Software-Werkzeugen in den verschiedenen Phasen des Projektes von der Proposal-Einreichung über die Datennahme bis zur finalen Publikation. Eine große Herausforderung für Wissenschaftseinrichtungen ist es, WissenschaftlerInnen für die Dokumentation der genutzten Werkzeuge in allen Phasen des Forschungsprojektes zusätzliche Metadaten gemäß der FAIR-Prinzipien zur Verfügung zu stellen. Das Ziel der HELmholtz ScIentific Project WORkflow PlaTform (HELIPORT) ist es daher den kompletten Lebenszyklus eines wissenschaftlichen Projekts zu registrieren und die zugehörigen Programme und Systeme miteinander zu verknüpfen. Die maschinenlesbare Dokumentation aller im jeweiligen Forschungsprojekt durchgeführten Arbeitsschritte gemeinsam mit den dazugehörigen Metadaten macht jeden Arbeitsschritt transparent, verständlich und zitierbar und trägt somit zur Einhaltung guter wissenschaftlicher Praxis bei. In der Präsentation von Dr. Oliver Knodel vom HZDR wird das von der HMC geförderte Projekt HELIPORT (2021-2023) vorgestellt und in die Datenmanagementstruktur des HZDR eingeordnet.
HELIPORT use case POLARIS: Integration of a High Intensity Laser in a complete data life cycle workflow
28. October 2021
Oliver Knodel, Joachim Hein, Alexander Kessler
Laserlab-Europe, ELI and CASUS Workshop "Better Data for Better Science - Research Data Management Workshop"Abstract
The presentation outlines the POLARIS experiment at Helmholtz Institute Jena, including experimental chain, setup and first ideas regarding the description of the POLARIS experiment with HELIPORT.
HELIPORT (HELmholtz ScIentific Project WORkflow PlaTform)
28. October 2021
Oliver Knodel, Martin Voigt, Robert Ufer, David Pape, Mani Lokamani, Jeffrey Kelling, Stefan E. Müller, Thomas Gruber, Guido Juckeland, Malte C. Kaluza, Joachim Hein, Alexander Kessler and Bernd Schuller
Laserlab-Europe, ELI and CASUS Workshop "Better Data for Better Science - Research Data Management Workshop"Abstract
The presentation outlines the HELIPORT project. The HELIPORT project aims at developing a platform which accommodates the complete life cycle of a scientific project and links all corresponding programs, systems and workflows to create a more FAIR and comprehensible project description.
HELIPORT: A Portable Platform for {FAIR Workflow | Metadata | Scientific Project Lifecycle} Management and Everything
June 2021
Oliver Knodel, Martin Voigt, Robert Ufer, David Pape, Mani Lokamani, Jeffrey Kelling, Stefan E. Müller, Thomas Gruber and Guido Juckeland
P-RECS '21: Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer SystemsAbstract
Modern scientific collaborations and projects (MSCPs) employ various processing stages, starting with the proposal submission, continuing with data acquisition and concluding with final publications. The realization of such MSCPs poses a huge challenge due to (1) the complexity and diversity of the tools, (2) the heterogeneity of various involved computing and experimental platforms, (3) flexibility of analysis targets towards data acquisition and (4) data throughput. Another challenge for MSCPs is to provide additional metadata according to the FAIR principles for all processing stages for internal and external use. Consequently, the demand for a system, that assists the scientist in all project stages and archives all processes on the basis of metadata standards like DataCite to make really everything transparent, understandable and citable, has risen considerably. The aim of this project is the development of the HELmholtz ScIentific Project WORkflow PlaTform (HELIPORT), which ensures data provenance by accommodating the complete life cycle of a scientific project and linking all employed programs and systems. The modular structure of HELIPORT enables the deployment of the core applications to different Helmholtz centers (HZs) and can be adapted to center-specific needs simply by adding or replacing individual components. HELIPORT is based on modern web technologies and can be used on different platforms.