Novel-Enzymes2023 Satellite Workshop

Automation in enzymatic research: Software for the high-throughput lab and computational tools (machine learning)

(for the ones, who are already early in Greifswald)

The workshop is free and open to all, but due to space limitations (10-15 people) we kindly ask to register for the workshop: simply write an e-mail to mark.doerr@uni-greifswald.de

 

Date/Time: 2023-03-28 : 9:00-15:00h

Locaction: Institute for Biochemistry, Felix-Hausdorff 4, 2nd floor, Seminarroom D213

Organisers:

Stefan Born (stefan.born@tu-berlin)

Stefan Maak (stefan.maak@uni-greifswald.de)

Mark Doerr (mark.doerr@uni-greifswald.de)

The workshop will be divided into two parts:

Part I. Sub-Workshop: Statistical models/machine learning as a complement to directed evolution in the search for new enzymes

Stefan Born, (Mark Doerr)

Start: Tue, 28.3.2023, 9:00h, Inst. f. Biochemistry Seminarroom D213, 2nd floor.

Preliminary Program:

  • 20 min. general Introduction
  • 1h interactive presentation of selected computational tools
  • 20 min. tea/coffee break for contacting and experience exchange
  • 1h hands-on experiments using the tools

Use cases

How useful are statistical models or machine learning tools as a a complement  to directed evolution?  Each variant in a random library that is screened for a property of interest (enzymatic activity, stereoselectivity, stability) comes cheap compared to the synthesis of specific variants. However, if the design of new variants uses sufficiently good predictive models, it becomes more likely to hit an interesting variant, thus possibly compensating for the higher cost of the individual synthesis.  Unlike rational design from a well-understood chemical or quantum-mechanical description, design by statistical models will only succeed if the data together with the inductive bias of the models give enough information.

Let us distinguish a few use cases, which differ in the available data and in the task of the model:

  •  Given a small library of mutants (at a few sites) of a WT with annotations, propose new mutants to synthesize and investigate. (N.B. Whenever the properties of interest have been determined for a variant we say that the variant is annotated.)
  • Rank mutants proposed by other considerations, using a small library of mutants with annotations. Please note that the mutations may involve many sites, not just a few.
  • Propose new mutants to synthesize using a large library of mutants with annotation (obtained by microfluidics).

Please note that in the first case the model operates on a comparatively small combinatorial library, whereas in the third case the large amount of data may convey enough information to propose new variants from the whole sequence space. The second case delegates the suggestion of possible new variants to other considerations and only ranks these variants  in a type of ‘virtual screening’.

Generic components by example

An infrastructure that wants to  make machine learning tools available for these use cases needs to provide a range of components. In the workshop we will look at an example workflow and focus on the components of a generic  architecture.

  1. Input format for enzymes, substrates and measured properties
  2. Representations of amino acid sequences
  3. Representations of substrates  (possibly)
  4. Model classes for the prediction of properties from representations
  5. Composition of models and representations
  6. Training and validation of models
  7. Proposals generation and ranking

Click here to read more ….

Part II. Sub-Workshop: Planning and running experiments  in robotic high-throughput enzyme screening labs, LARA-suite

Stefan Maak, Mark Doerr

Preliminary Program:

Start: Tue, 28.3.2023, 13:00h, Inst. f. Biochemistry Seminarroom D213, 2nd floor.

  • 20 min. general Introduction
  • 1h interactive presentation of selected computational tools
  • 20 min. tea/coffee break for contacting and experience exchange
  • 1h hands-on experiments using the tools

Use cases

Open Source (robotic Lab-automation with SiLA (sila-standard.org) and LARA (gitlab.com/larasuite/lara) for enzyme screening. Standardised communication between lab devices, process scheduling and data management.

Introduction into the LARA Open Source infrastructure

  • process planning with LARA
  • labautomation process description language pythonLab
  • scheduling and orchestration of processes
  • open source data management

Running a Demo Process at the LARA robotic enzyme screening plattform (lara.uni-greifswald.de)

  • demonstration of the Open Platform Software at the LARA robot
  • enzyme screening workflow demo
  • experiences with protein screening