LARA - semantically annotated experimentation from ground up &
robotic enzyme screening for Machine Learning applications.

mark doerr & uwe bornscheuer & KIWI / SiLA / AnIML / NFDI4Cat teams
institute for biochemistry, university greifswald
greifswald/göteborg, 2024-03-27

Any element with the class="notes" will not be displayed. This can be used for speaker notes. In fact, the impressConsole plugin will show it in the speaker console! Press ctrl-C to activate the console

* lara intro

ML biocatalysis - e.g. transamination reaction

- highly selective enzymes replace conventional chemistry

Structure- and Data-Driven Protein Engineering of Transaminases for Improving Activity and Stereoselectivity
Yu-Fei Ao et. al, Angewandte Chemie 2023. https://doi.org/10.1002/anie.202301660

* lara intro

3FCR transaminase substrate screening

Structure- and Data-Driven Protein Engineering of Transaminases for Improving Activity and Stereoselectivity
Yu-Fei Ao et. al, Angewandte Chemie 2023. https://doi.org/10.1002/anie.202301660

* lara intro

"classical" ML approaches

Structure- and Data-Driven Protein Engineering of Transaminases for Improving Activity and Stereoselectivity
Yu-Fei Ao et. al, Angewandte Chemie 2023. https://doi.org/10.1002/anie.202301660

* lara intro

the greifswald protein screening platform LARA

* lara intro

what is semantics enabled machine learning (ML) ?

better "understanding" of the data by the ML algorithms
semantics guided ML model building
by automated feature extraction
improved consitancy checking, because the ML algorithm can extract physical limits
improved error handling, because the ML algorithm "knows" more about the predicted system

*

requirements for semantics enabled machine learning

what are we trying to build ?

generic software platform closed loop designs of the simple kind
autonomous experiment design
autonomous experiment exection
autonomous experiment evaluation
AI and machine learning
redesign of the experiment
continue this process until an desired outcome is reached or resources are used up

*

the challange

full SiLA communication of all devices in all platforms
platform independant process description language
flexible programming language for complex processes
dynamic and error tolerant scheduling
full experimental metadata for machine learning applications
storage and exchange of data between labs (KIWI biolab)
fully semantic annotation of the metadata (ontologies)

* protein screening engineering * findind the right enzyme in 1E5 to 1E9 variants * lara movie

benfits for machine learning applications

rich metadata from ground up
ontology based (s. ontology development workflow)
allowing autonomous feature extraction

*

challanges to overcome

heterogenous data
non-standardised data structure
non-standardised metadata - no semantics
non-standardised device communication
black-box software - closed source
no advanced, comprehensive data storage

*

building homgeneous infrastructure from ground up

*

* lara intro

... what software components are required ?

device control
data readout
data transfer
data storage
semantic understanding of the data by the machines
machine learning algorithms to "work on the data"
feedback software to the instruments
common description language
process execution / scheduler

*

SiLA servers/devices of LARA

*

LARA, SiLA, AnIML/JSON-LD
pythonLab, pythonLabScheduler,
LabDataReader

*

pythonLab

https://gitlab.com/opensourcelab/pythonLab
universal, python based, automation language

*

pythonLabOrchestrator & Scheduler

https://gitlab.com/opensourcelab/pythonlabscheduler

*

pythonLabScheduler in the wild (stefan maak)

https://gitlab.com/opensourcelab/pythonlabscheduler

*

LabDataReader

https://gitlab.com/opensourcelab/LabDataReader

generic data reader framework for propriatory (text based) data formats
new data formats can be added as plugins
rich meta data support
automatic semantic annotation of the data
fully written in python
output formats: pandas data frame, JSON-LD, csv, (AnIML/JSON-LD - under development)

*

holistic approach of the LARA suite

planning of experiments
storing all required data for the planning, like literature, substances, material, devices, experimentalists ...
generating the processes
execution of the processes, communication with the lab devices
collection of the data (very structured, well prepared to learn from it)
evaluation and visualisation of the data (also DoE and machine learning)
reporting / publishing / exchange between labs

* In the very early days of personal computing, I was wondering, why the computer was not used

overview of final architecture

fully open sourced and python based

*

ontology - development - for semantic search / ML

*

EMMO - European Multiperspective Material Ontology

top- & mid level ontology
sould theoretical foundation
rooted in historical philosopy (mereology), topology, physics and quantum physics
all ITEMS are, e.g., SpaceTime Objects
multiperspective and multidisciplinary
modelling and experiments
small -> fast reasoning
python representation (EMMOntoPy)

*

EMMOntoPy (github.com/emmo-repo/EMMOntoPy)

all OWL classes and Properties are modeled as Python Objects
generation of an ontology and reaoning can be done completely in python
modular / object oriented modelling possible
easy interaction / integration in own python applications
SPARQL endpoint
fast SQLITE triple store
code managed in git repository
tools for validation, documentation and visualisation

*

ontology development pipeline

*

ontologies @ OpenSourceLab

*

exmple: OSO measurement

*

NFDI4Cat

National Research Data Infrastructure - for Catalysis

*

summary

building a large, community driven open source infrastructure
no black box, everything is adjustable and extendible
advanced queries and automated feature extraction
feedback loop driven protein engineering is possible with the tools in place
machine learing applications will dramatically improve, when the right metadata is in place
we need a modern mind set (biochemistry and machine learning)
we are not alone, please join and contribute !

*

acknowledgements

Stefan Maak

project partners

Stefan Born (TU Berlin)
Peter Neubauer's group (TU Berlin)
Johannes Kabisch's group and associates (Uni Trondheim)
Egon Heuson (Uni Lille)
Uwe Bornscheuer and our group (Univ. Greifswald)

KIWI-UG / NFDI4Cat

SiLA team

AnIML team

This work was supported by the German Federal Ministry of Education and Research through the Program “International Future Labs for Artificial Intelligence” (Grant number 01DD20002A)

We are grateful to the Deutsche Forschungsgemeinschaft (DFG, INST 292/ 118-1 FUGG) and the federal state Mecklenburg-Vorpommern for financing the robotic platform.

LARA - semantically annotated experimentation from ground up & robotic enzyme screening for Machine Learning applications.

mark doerr & uwe bornscheuer & KIWI / SiLA / AnIML / NFDI4Cat teams institute for biochemistry, university greifswald greifswald/göteborg, 2024-03-27

ML biocatalysis - e.g. transamination reaction

- highly selective enzymes replace conventional chemistry

3FCR transaminase substrate screening

"classical" ML approaches

the greifswald protein screening platform LARA

what is semantics enabled machine learning (ML) ?

requirements for semantics enabled machine learning

what are we trying to build ?

the challange

benfits for machine learning applications

challanges to overcome

building homgeneous infrastructure from ground up

... what software components are required ?

SiLA servers/devices of LARA

LARA, SiLA, AnIML/JSON-LD pythonLab, pythonLabScheduler, LabDataReader

pythonLab

https://gitlab.com/opensourcelab/pythonLab universal, python based, automation language

pythonLabOrchestrator & Scheduler

https://gitlab.com/opensourcelab/pythonlabscheduler

pythonLabScheduler in the wild (stefan maak)

https://gitlab.com/opensourcelab/pythonlabscheduler

LabDataReader

https://gitlab.com/opensourcelab/LabDataReader

holistic approach of the LARA suite

overview of final architecture

fully open sourced and python based

ontology - development - for semantic search / ML

EMMO - European Multiperspective Material Ontology

EMMOntoPy (github.com/emmo-repo/EMMOntoPy)

ontology development pipeline

ontologies @ OpenSourceLab

exmple: OSO measurement

NFDI4Cat

National Research Data Infrastructure - for Catalysis

summary

acknowledgements

project partners

KIWI-UG / NFDI4Cat

SiLA team

AnIML team

LARA - semantically annotated experimentation from ground up &
robotic enzyme screening for Machine Learning applications.

mark doerr & uwe bornscheuer & KIWI / SiLA / AnIML / NFDI4Cat teams
institute for biochemistry, university greifswald
greifswald/göteborg, 2024-03-27

LARA, SiLA, AnIML/JSON-LD
pythonLab, pythonLabScheduler,
LabDataReader

https://gitlab.com/opensourcelab/pythonLab
universal, python based, automation language