Deployment-Centric Generative and Multimodal AI for Scientific Discovery | 9am PT Nov 4, 2025

25 views

Skip to first unread message

Grigory Bronevetsky

unread,

Nov 3, 2025, 12:36:19 AM11/3/25

to ta...@modelingtalks.org

Modeling Talks

Deployment-Centric Generative and Multimodal AI for Scientific Discovery

Haiping Lu, University of Sheffield

Tues, Nov 4, 2025 | 9am PT

Meet | Youtube Stream

Hi all,

The presentation will be via Meet and all questions will be addressed there. If you cannot attend live, the event will be recorded and can be found afterward at

https://sites.google.com/modelingtalks.org/entry/deployment-centric-generative-and-multimodal-ai-for-scientific-discovery

More information on previous and future talks: https://sites.google.com/modelingtalks.org/entry/home

Abstract:
Artificial intelligence is reshaping how we discover and design in science. This talk introduces a deployment-centric perspective on building trustworthy AI systems that connect generative creativity with real-world impact. I will present recent work from our group on MapDiff, a diffusion-based generative framework that models protein sequences from 3D structures, and DrugBAN, an interpretable bilinear attention network for drug–target prediction. These examples illustrate how generative and multimodal learning can accelerate discovery when guided by deployment-aware principles. By coupling generative capability with interpretability, multimodal integration, and domain knowledge, we move towards AI systems that advance both molecular research and the broader scientific enterprise.

Bio:
Haiping Lu is Professor of Machine Learning at the University of Sheffield, UK, where he leads AI Research Engineering at the Centre for Machine Intelligence. He is also Director of the UK Open Multimodal AI Network (UKOMAIN), funded by the UK Engineering and Physical Sciences Research Council (EPSRC). His research focuses on deployment-centric multimodal AI, integrating diverse types of data to tackle challenges in healthcare and scientific discovery, with methodological interests spanning foundation models, generative AI, domain adaptation, and transfer learning. His recent work spans brain and cardiac imaging, cancer diagnosis, protein design, and drug and materials discovery. He leads the open-source project PyKale for knowledge-aware machine learning and serves as an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems and IEEE Transactions on Cognitive and Developmental Systems. He has received awards from the Alan Turing Institute, Amazon, the Wellcome Trust, and the UK's National Institute for Health and Care Research.

The deployment-centric concept in the title builds on our recent Nature Machine Intelligence Perspective, available here: https://www.nature.com/articles/s42256-025-01116-5

Grigory Bronevetsky

unread,

Apr 14, 2026, 11:49:06 AMApr 14

to Talks, Grigory Bronevetsky

Video Recording: https://youtube.com/live/PSHKhQyyztc
Slides: https://drive.google.com/file/d/141rmk_SE-UEqEy5_V2JQIQYdj60YdBmk/view?usp=sharing

Summary:

Aim: AI for Scientific Discovery with a focus on deployability
Observation
- Multi-modal ML models are becoming increasingly more common
- Language and vision are much more common than other modalities (time series, sensor, audio, tabular)
- Motivates work to incorporate the other modalities
  - E.g. discriminative and generative models
Generative modeling is very capable because it can reconstruct one modality from others
- Bridge between modalities
- Align correspondence between them
- Cross-modality representations, which are complementary
- Novel multi-modal privacy risk because one can’t anonymize modalities in isolation
Models are increasingly maturing; data is the major differentiator that enables discovery
Deployment-centric AI
- Consider deployment constraints and user needs
- E.g. ethical considerations
- Flow:
  - Planning: Problem definition, deployment constraints, task formulatoin
  - Multimodal AI system development
  - Real-world deployment
DrugBAN: Drug-target interaction prediction
- Graph convolutional network to encode drug’s chemical structure
- 1D convolutional network encodes sequences of proteins the drug binds with
- Multi-head Bilinear interaction model + pooling integrates the two modalities into a joint representation
- Focus on deployability: drug companies specifically care about discovery of new drugs
  - Cluster data according to drug type
  - Train on some categories of drugs, predict on different ones
- Same approach applied to cell line-drug response, mutation-drug association
MapDiff: Inverse protein folding
- Design protein with a particular function/structure
- Mask prior-guided denoising Diffusion
- Mark-prior pretraining:
  - Protein 3D structure modality
  - Mark part of the sequence, predict it again
  - Model learns a good embedding for 3D structure for amino acid sequences
- Mask-guided denoising network
  - Equivariant graph neural network: generates 3D sequence based on structure
  - Entropy-based mask (more uncertain sequence components are masked and predicted)
- Diffusion model to predict sequence
- Using Alphafold to validate the consistency of the sequence’s foldability
Recommendations for deployment-centric development
- Safety, reliability, interpretability
- Scalability and resource efficiency
- Ethical compliance and user preparedness
Recommendations for data- and model-centric development
- Data scarcity and access
- Balanced data representation
- Multimodal fusion and modality selection
- Foundation models
Recommendations for stakeholder engagement and collaboration
- Stakeholder inclusion and alignment
- Cross-disciplinary standards and communication
- Intellectual property and workflow adaptation
PyKale: pykale.github.io
- Knowledge aware machine learning from multiple sources
UKOMAIN: UK Open Multimodal AI Network: https://multimodalai.github.io/
- Builds diverse interdisciplinary network
- Create knowledge exchange platform
- Idenrtify & fund key focus areas of high impact
- Engage industry & policymakers
- Promote sustainability & responsibility
- Enhance research capability & training
- OMAIB: Open Multimodal AI Benchmark (funding call)
CD3: Cancer Data Driven Detection:
- Advance ability to prevent, diagnose and detect cancer
- Enhance equity
- Research community that federates resources
  - Health records, cancer multi-omics
  - Advanced analytics
  - Partnerships
Conclusion: multimodal data + GenAI = Impact

Reply all

Reply to author

Forward

0 new messages