Chemistry Toolkit Rosetta Wiki
Register
Advertisement

This task comes from working with researchers who have a desktop tool which displays a large number (a few hundred) structures on a scrolling page. The structures are the database hits from searching with a given SMARTS query, and the common SMARTS substructures are aligned in each depictions. This makes it easier for them to identify the differences between the structures.

They want to replace it with something newer, and they would like more control over how the structure is displayed. (For example, some people want the depiction tool to first follow corporate guidelines on depiction layout. Doing that is outside the scope of this task.)

==Implementation==[1]

Use the first 16 structures of the benzodiazepine SD file to make a 4x4 grid of depictions as a single image. The first structure is in the upper-left corner, the second is to its right, and so on. Each depiction should include the title field of the corresponding record, which in this case is the PubChem identifier.

Use "[#7]~1~[#6]~[#6]~[#7]~[#6]~[#6]~2~[#6]~[#6]~[#6]~[#6]~[#6]12" as the common SMARTS substructure. This is the fused ring of the benzodiazepine system but without bond type or atom aromaticity information. Use the first molecule as the reference depiction. All other depictions must have the depiction of their common substructure aligned to the reference.

The final image should be 400x400 pixels, which means each structure depiction should be 100x100 pixels. If the toolkit doesn't support multiple depictions in the same image then depict each to a different image. A desktop tool would need a way to write the image data to a GUI widget, but for now just write the result to a PNG or GIF file.

The point here is to show how to do the 2D structure alignment. The other details are less important. Use your own preferences for coloring scheme (eg, color on white background or on black background). You may assume the input structure has the correct MDL chemistry. (This is only relevant for toolkits which support multiple chemistry models.)


Cactvs/Tcl[]

Aligned depictions with the Cactvs toolkit.

Output from Cactvs/Tcl

prop setparam D_GIF ncols 4 nrows 4 boxwidth 100 boxheight 100 filename aligned4x4.png format png padding 0
prop setparam E_GIF headerproperty E_NAME bgcolor white atomcolor element bonds 10
set fh [molfile open benzodiazepine.sdf.gz r hydrogens add]
set dh [molfile read $fh [dataset create] 16]
set ss [ens create {[#7]~1~[#6]~[#6]~[#7]~[#6]~[#6]~2~[#6]~[#6]~[#6]~[#6]~[#6]12} smarts]
foreach eh [dataset ens $dh] {
    match ss -align besteffort $ss $eh
}
dataset get $dh D_GIF

Cactvs/Python[]

The Python version is even more compact, with exactly the same result:

Prop.Setparam('D_GIF','ncols',4,'nrows',4,'boxwidth',100,'boxheight',100,'filename','aligned4x4.png','format','png','padding',0)
Prop.Setparam('E_GIF','headerproperty','E_NAME','bgcolor','white','atomcolor','element','bonds',10)
d=Molfile('benzodiazepine.sdf.gz','r',{'hydrogens':'add'}).read(Dataset(),16)
ss=Ens('[#7]~1~[#6]~[#6]~[#7]~[#6]~[#6]~2~[#6]~[#6]~[#6]~[#6]~[#6]12','smarts')
for e in d.ens(): match('ss',ss,e,align='besteffort')
d.get('D_GIF')

Indigo/C[]

Indigo grid white

Output from Indigo

#include <stdlib.h>
#include <string.h>

#include "indigo.h"
#include "indigo-renderer.h"

int main (void)
{
   int query = indigoLoadSmartsFromString("[#7]1~[#6]~[#6]~[#7]~[#6]~[#6]2~[#6]~[#6]~[#6]~[#6]~[#6]~1~2");
   int structure, sdf = indigoIterateSDFile("benzodiazepine.sdf.gz");
   int *atoms = (int *)malloc(indigoCountAtoms(query) * sizeof(int));
   float *xyz  = (float *)malloc(indigoCountAtoms(query) * 3 * sizeof(float));
   int collection = indigoCreateArray();
   int ref_atoms[16];

   while ((structure = indigoNext(sdf)))
   {
      int index, matcher, match, atom, aiter;

      index = indigoIndex(structure);
      indigoFoldHydrogens(structure);
      matcher = indigoSubstructureMatcher(structure, 0);
      match = indigoMatch(matcher, query);
      aiter = indigoIterateAtoms(query);

      if (index == 0)
      {
         float *ixyz = xyz;
         while ((atom = indigoNext(aiter)))
         {
            memcpy(ixyz, indigoXYZ(indigoMapAtom(match, atom)), 3 * sizeof(float));
            ixyz += 3;
            indigoFree(atom);
         }
      }
      else
      {
         int *iatoms = atoms;

         while ((atom = indigoNext(aiter)))
         {
            *iatoms++ = indigoIndex(indigoMapAtom(match, atom));
            indigoFree(atom);
         }

         indigoAlignAtoms(structure, indigoCountAtoms(query), atoms, xyz);
      }

      indigoArrayAdd(collection, structure);
      ref_atoms[index] = indigoIndex(indigoMapAtom(match, indigoGetAtom(query, 0)));

      indigoFree(aiter);
      indigoFree(match);
      indigoFree(matcher);
      indigoFree(structure);
      if (index == 15)
         break;
   }
   indigoSetOption("render-output-format", "png");
   indigoSetOption("render-image-size", "400, 400");
   indigoSetOption("render-grid-title-property", "PUBCHEM_COMPOUND_CID");
   indigoSetOption("render-grid-title-font-size", "10");
   indigoSetOption("render-grid-title-offset", "2");
   indigoSetOption("render-relative-thickness", "1.5");
   indigoSetOption("render-coloring", "true");
   indigoSetOptionColor("render-coloring", 1.0, 1.0, 1.0);
   indigoRenderGridToFile(collection, ref_atoms, 4, "indigo_grid_white.png");

   indigoFree(collection);
   indigoFree(sdf);
   indigoFree(query);
   free(atoms);
   free(xyz);
}

Indigo/C#[]

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Drawing;
using com.gga.indigo;

namespace test
{
   class Program
   {
      [STAThread]
      static void Main (string[] args)
      {
         Indigo indigo = new Indigo();
         IndigoRenderer renderer = new IndigoRenderer(indigo);
         IndigoObject query = indigo.loadSmarts("[#7]1~[#6]~[#6]~[#7]~[#6]~[#6]2~[#6]~[#6]~[#6]~[#6]~[#6]~1~2");
         int[] atoms = new int[query.countAtoms()];
         float[] xyz = new float[query.countAtoms() * 3];
         int[] refatoms = new int[16];
         IndigoObject collection = indigo.createArray();

         foreach (IndigoObject structure in indigo.iterateSDFile("benzodiazepine.sdf.gz"))
         {
            structure.foldHydrogens();
            IndigoObject match = indigo.substructureMatcher(structure).match(query);
            int i = 0;

            if (structure.index() == 0)
               foreach (IndigoObject atom in query.iterateAtoms())
                  Array.Copy(match.mapAtom(atom).xyz(), 0, xyz, i++ * 3, 3);
            else
            {
               foreach (IndigoObject atom in query.iterateAtoms())
                  atoms[i++] = match.mapAtom(atom).index();

               structure.alignAtoms(atoms, xyz);
            }

            refatoms[structure.index()] = match.mapAtom(query.getAtom(0)).index();
            collection.arrayAdd(structure);

            if (structure.index() == 15)
               break;
         }

         indigo.setOption("render-output-format", "png");
         indigo.setOption("render-image-size", "400, 400");
         indigo.setOption("render-grid-title-property", "PUBCHEM_COMPOUND_CID");
         indigo.setOption("render-grid-title-font-size", "10");
         indigo.setOption("render-grid-title-offset", "2");
         indigo.setOption("render-relative-thickness", "1.5");
         indigo.setOption("render-coloring", "true");
         indigo.setOption("render-background-color", Color.White);

         renderer.renderGridToFile(collection, refatoms, 4, "indigo_grid_white.png");
      }
   }
}

Indigo/Java[]

package test;
import com.gga.indigo.*;
import java.io.*;
import java.util.*;
        
public class Main
{
   public static void main (String[] args) throws java.io.IOException
   {
      Indigo indigo = new Indigo();
      IndigoRenderer renderer = new IndigoRenderer(indigo);
     
      IndigoObject query = indigo.loadSmarts("[#7]1~[#6]~[#6]~[#7]~[#6]~[#6]2~[#6]~[#6]~[#6]~[#6]~[#6]~1~2");
      int[] atoms = new int[query.countAtoms()];
      float[] xyz = new float[query.countAtoms() * 3];
      IndigoObject collection = indigo.createArray();
      IndigoRenderer renderer = new IndigoRenderer(indigo);
      int[] refatoms = new int[16];

      for (IndigoObject structure : indigo.iterateSDFile("benzodiazepine.sdf.gz"))
      {
         structure.foldHydrogens();
         IndigoObject match = indigo.substructureMatcher(structure).match(query);
         int i = 0;

         if (structure.index() == 0)
            for (IndigoObject atom : query.iterateAtoms())
               System.arraycopy(match.mapAtom(atom).xyz(), 0, xyz, i++ * 3, 3);
         else
         {
            for (IndigoObject atom : query.iterateAtoms())
               atoms[i++] = match.mapAtom(atom).index();

            structure.alignAtoms(atoms, xyz);
         }

         collection.arrayAdd(structure);
         refatoms[structure.index()] = match.mapAtom(query.getAtom(0)).index();

         if (structure.index() == 15)
            break;
      }

      indigo.setOption("render-output-format", "png");
      indigo.setOption("render-image-size", "400, 400");
      indigo.setOption("render-grid-title-property", "PUBCHEM_COMPOUND_CID");
      indigo.setOption("render-grid-title-font-size", "10");
      indigo.setOption("render-grid-title-offset", "2");
      indigo.setOption("render-relative-thickness", 1.5);
      indigo.setOption("render-coloring", "true");
      indigo.setOption("render-background-color", 1.f, 1.f, 1.f);
      renderer.renderGridToFile(collection, refatoms, 4, "indigo_grid_white.png");
   }
}

Indigo/Python[]

import indigo
import indigo_renderer
indigo = indigo.Indigo()
renderer = indigo_renderer.IndigoRenderer(indigo)

query = indigo.loadSmarts("[#7]1~[#6]~[#6]~[#7]~[#6]~[#6]2~[#6]~[#6]~[#6]~[#6]~[#6]~1~2");
xyz = []
collection = indigo.createArray()
refatoms = []

for structure in indigo.iterateSDFile("benzodiazepine.sdf.gz"):
  structure.foldHydrogens()
  match = indigo.substructureMatcher(structure).match(query)
  if structure.index() == 0:
    for atom in query.iterateAtoms():
      xyz.extend(match.mapAtom(atom).xyz())
  else:
    atoms = [match.mapAtom(atom).index() for atom in query.iterateAtoms()]
    structure.alignAtoms(atoms, xyz);

  refatoms.append(match.mapAtom(query.getAtom(0)).index())
  collection.arrayAdd(structure)

  if structure.index() == 15:
    break;

indigo.setOption("render-output-format", "png");
indigo.setOption("render-image-size", "400, 400");
indigo.setOption("render-grid-title-property", "PUBCHEM_COMPOUND_CID");
indigo.setOption("render-grid-title-font-size", "10");
indigo.setOption("render-grid-title-offset", "2");
indigo.setOption("render-coloring", "true");
indigo.setOption("render-coloring", "true");
indigo.setOption("render-background-color", 1.0, 1.0, 1.0);
renderer.renderGridToFile(collection, refatoms, 4, "indigo_grid_white.png")

OpenEye/Python[]

Aligned depictions with the OpenEye toolkits.

Output from OpenEye/Python


from openeye.oechem import *
from openeye.oedepict import *

CELL_WIDTH = CELL_HEIGHT = 100

def make_view():
    view = OEDepictView(CELL_WIDTH, CELL_HEIGHT)
    view.SetSuperAtoms(False)
    view.SetAromaticCircles(False)
    view.SetAromaticDashes(True)
    view.SetColorOnBlack(True)
    view.SetBackColor(0, 0, 0)
    view.SetForeColor(255, 255, 255)
    view.SetShowHydrogens(True)
    view.SetDativeBonds(False)
    return view

# This assumes the input was in proper SD format already
def prepare_mol_for_depiction(mol):
    if mol.GetDimension() == 3:
        OEPerceiveChiral(mol)
        OE3DToBondStereo(mol)
        OE3DToAtomStereo(mol)
    OEAddDepictionHydrogens(mol)

core_subsearch = OESubSearch("[#7]~1~[#6]~[#6]~[#7]~[#6]~[#6]~2~[#6]~[#6]~[#6]~[#6]~[#6]12")
ifs = oemolistream("benzodiazepine.sdf.gz")
img = OE8BitImage(CELL_WIDTH*4, CELL_HEIGHT*4)
view = make_view()

at_first_structure = True
for record_num, mol in enumerate(ifs.GetOEGraphMols()):
    prepare_mol_for_depiction(mol)

    for match in core_subsearch.Match(mol):
        if at_first_structure:
            OEDepictCoordinates(mol)
            # save the first structure's depiction coordinates
            coords = [mol.GetCoords(atom) for atom in match.GetTargetAtoms()]
        else:
            # Align to the first structure's coordinates
            fixed = [0] * mol.GetMaxAtomIdx()
            for atom, coord in zip(match.GetTargetAtoms(), coords):
                mol.SetCoords(atom, coord)
                fixed[atom.GetIdx()] = 1
            OEDepictFixedCoordinates(mol, fixed)

        view.SetMolecule(mol)
        view.RenderImage(img, at_first_structure,
                         record_num//4 * CELL_WIDTH, record_num %4 * CELL_HEIGHT)
        at_first_structure = False
        
        # Only align the first match
        break

    # Stop after 16 structures
    if record_num == 15:
        break

# Save the final image
ofs = oeofstream("align_oe.gif")
OEWriteGIF(ofs, img)

RDKit/Python[]

Aligned depictions with the RDKit toolkit.

Output from RDKit/Python

from rdkit import Chem,Geometry
from rdkit.Chem import AllChem

# read in the first 16 molecules:
suppl = Chem.SDMolSupplier('benzodiazepine.sdf')
first16 = [suppl[x] for x in range(16)]

# do the substructure matching:
patt = Chem.MolFromSmarts("[#7]~1~[#6]~[#6]~[#7]~[#6]~[#6]~2~[#6]~[#6]~[#6]~[#6]~[#6]12")
matchVs= [x.GetSubstructMatch(patt) for x in first16]

# compute the reference coordinates:
AllChem.Compute2DCoords(first16[0])
coords = [first16[0].GetConformer().GetAtomPosition(x) for x in matchVs[0]]
coords2D = [Geometry.Point2D(pt.x,pt.y) for pt in coords]

# now generate coords for the other molecules using that reference:
for molIdx in range(1,16):
    mol = first16[molIdx]
    coordDict={}
    for i,coord in enumerate(coords2D):
        coordDict[matchVs[molIdx][i]] = coord
    AllChem.Compute2DCoords(mol,coordMap=coordDict)

# generate 100x100 images for each molecule:
from rdkit.Chem import Draw
imgs = [Draw.MolToImage(x,size=(100,100)) for x in first16]

# and use PIL to combine them:
import Image
img = Image.new('RGBA',(400,400))
for i in range(16): img.paste(imgs[i],(100*(i%4),100*(i//4),100*(i%4+1),100*(i//4+1)))
img.save('combined.rdkit.png')
Advertisement