Groups

How to filter giga-sized virtual libraries in ICM?

21 views

Skip to first unread message

Andrew Orry

unread,

Feb 2, 2025, 2:28:00 PM2/2/25

to MolSoft ICM Knowledge Base

Q. How to filter giga-sized virtual libraries in ICM?

A.

ICM has a number of different scripts that are useful for this type of exercise which can be found in the directory $ICMHOME/molpipe in your ICM distribution. Read more about pipes in ICM here.

So for example if you wanted to filter the Enamine REAL database by Molecular Weight (MW) you could use the following scripts

molrdsmi.icm - read (from SDF,SMILES,MOLT,MOLCART)
molfilter.icm - filter by chemical properties
mol2smi - write/convert (SDF,SMILES,MOLT,ICB)

Please note that these are very large databases, so this type of analysis can take a while to run. On a multi-core machine you can expect it take approximately 36 seconds per million compounds, meaning 1 billion compounds would take around 10 hours.

Example

On unix you will need to set a PATH to /scratch/icm/3.9-4a/molpipe so you can run these ICM scripts:

#for chemicals <200 MW

cat /gigadata/chem/REAL_2024/Enamine_REAL_HAC_27_872M_CXSMILES.cxsmiles | molrdsmi.icm id=id smi=smiles -header | molfilter.icm "MolWeight(mol)<200" | mol2smi.icm > small_mols1.tsv

# for chemicals <500 MW

cat /gigadata/chem/REAL_2024/Enamine_REAL_HAC_26_766M_CXSMILES.cxsmiles | molrdsmi.icm id=id smi=smiles -header | molfilter.icm "MolWeight(mol)<500" | mol2smi.icm > small_mols2.tsv

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu