by mark doerr (mark.doerr@uni-greifswald.de)

Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

The RDM System LARA:
semantics through automation from bottom up

mark doerr, stefan maak & uwe t. bornscheuer
institute for biochemistry, university greifswald
Karlsruhe, 2023-09-14

uni-logo
Any element with the class="notes" will not be displayed. This can be used for speaker notes. In fact, the impressConsole plugin will show it in the speaker console! Press ctrl-C to activate the console
greifswald-map
* lara intro

the big vision

requirements_steps

let's build ....

KIWI_project_detail

we made a plan ....

KIWI_project_detail

planning & control

*

the project and experiment planning module

KIWI_project_detail
*

the (zotero) literature module

KIWI_project_detail
*

process execution:
pythonLab, labOrchestrator pythonLabScheduler

*

process description language : pythonLab

universal, python based, automation language
gitlab.com/opensourcelab/pythonLab

pythonLab
*

pythonLabOrchestrator & Scheduler

gitlab.com/opensourcelab/pythonlabscheduler

LARA-workflow
*

pythonLabScheduler in the wild (stefan maak)

gitlab.com/opensourcelab/pythonlabscheduler

LARA-workflow
*

(meta-) data:
data-transfer, storage, ontologies

*

what is sila_logo ?

sila_logo

sila-standard.org

* feature

which data shall be stored ?

KIWI_project

the data module

KIWI_project_detail
*

LabDataReader

https://gitlab.com/opensourcelab/LabDataReader

*
leaves

ontology - development - for semantic search / ML

KIWI_project
*

EMMO - European Multiperspective Material Ontology

*

EMMOntoPy (github.com/emmo-repo/EMMOntoPy)

*

ontology development pipeline

KIWI_project
*

ontologies @ OpenSourceLab

KIWI_project
*

exmple: OSO measurement

KIWI_project
*

SciDat

gitlab.com/opensourcelab/ScientificData/SciDat

- data and metadata storage for scientific / machine learning needs (semantic annotation, based on ontologies, derivatives of owlready2) - proper nullable data / missing data handling (pyarrow / parquet) - data modalities, like range / limits, type / continuos / categorial/ variable treatment in case of range violation (parquet metadta) - cardinality (parquet metadata) - efficient storage (parquet) - metadata and data stored at one place (parquet) - metadata conservation when saving / loading / processing (parquet -> arrow) - fast data exchange (arrow flight, MinIO active replication) - fast loading (fastparquet, pyarrow) - fast data processing without in-memory re-writing after loading ( pandas with pyarrow backend, arrow flight, polars) - "modalities" for the machine learning models - semantic annotations / metadata in RDF compliant format - for creating instances of ontology classes and SPARQL reasoning (JSON-LD, rdflib, owlready2) - fast data processing (direct loading into pyarrow driven dataframe ) - programming language agnostic / independent (parquet) - easy to use (SciDat / labDataReader framework, currently in implementation by me) - commonly used in ETL pipelines (Apache Spark, prefect, ... ) - suitable for S3 file storage systems (MinIO)

NFDI4Cat

National Research Data Infrastructure - for Catalysis

  • TA1 ontology workgroup
  • vocabulary / thesaurus for (bio-) catalysis
  • vocabulary building pipeline

see talk of Alexander Behr et al., (Wed, 11:00h, Enabling RDM I)

working with (meta-) data:
SPARQL, jupyter

*

build-in SPARQL interface

lara
*

working with jupyter

lara
*

architecture:
LARA

*
lara
* lara intro

sharing data:
collaborations, repositories, publications

*
lara
* lara intro

implementation:
all open source, python, gitlab

*

building homgeneous infrastructure from ground up

lara
*
leaves
KIWI_project

... what software components are required ?

*

SiLA servers/devices of LARA

lara
*

what are we trying to build ?

*

the challange

* protein screening engineering * findind the right enzyme in 1E5 to 1E9 variants * lara movie

benfits for machine learning applications

*

challanges to overcome

*

what is semantics enabled machine learning (ML) ?

*
leaves

outlook

*

summary

*

acknowledgements

  • Stefan Maak
  • project partners

    • Stefan Born (TU Berlin)
    • Peter Neubauer's group (TU Berlin)
    • Johannes Kabisch's group and associates (Uni Trondheim)
    • Egon Heuson (Uni Lille)
    • uwe Uwe Bornscheuer and our group (Univ. Greifswald)

    KIWI-UG / NFDI4Cat

    SiLA team

    AnIML team

    This work was supported by the German Federal Ministry of Education and Research through the Program ā€œInternational Future Labs for Artificial Intelligenceā€ (Grant number 01DD20002A)

    We are grateful to the Deutsche Forschungsgemeinschaft (DFG, INST 292/ 118-1 FUGG) and the federal state Mecklenburg-Vorpommern for financing the robotic platform.