How to filter giga-sized virtual libraries in ICM?

21 views
Skip to first unread message

Andrew Orry

unread,
Feb 2, 2025, 2:28:00 PM2/2/25
to MolSoft ICM Knowledge Base
Q. How to filter giga-sized virtual libraries in ICM?
A. 

ICM has a number of different scripts that are useful for this type of exercise which can be found in the directory $ICMHOME/molpipe in your ICM distribution. Read more about pipes in ICM here.

So for example if you wanted to filter the Enamine REAL database by Molecular Weight (MW) you could use the following scripts

  • molrdsmi.icm - read (from SDF,SMILES,MOLT,MOLCART)
  • molfilter.icm  - filter by chemical  properties
  • mol2smi - write/convert (SDF,SMILES,MOLT,ICB) 

Please note that these are very large databases, so this type of analysis can take a while to run.  On a multi-core machine you can expect it take approximately 36 seconds per million compounds, meaning 1 billion compounds would take around 10 hours.

Example
On unix you will need to set a PATH to /scratch/icm/3.9-4a/molpipe so you can run these ICM scripts:

#for chemicals <200 MW

cat /gigadata/chem/REAL_2024/Enamine_REAL_HAC_27_872M_CXSMILES.cxsmiles | molrdsmi.icm id=id smi=smiles -header | molfilter.icm "MolWeight(mol)<200" | mol2smi.icm > small_mols1.tsv

# for chemicals <500 MW

cat /gigadata/chem/REAL_2024/Enamine_REAL_HAC_26_766M_CXSMILES.cxsmiles | molrdsmi.icm id=id smi=smiles -header | molfilter.icm "MolWeight(mol)<500" | mol2smi.icm > small_mols2.tsv



Reply all
Reply to author
Forward
0 new messages