GGA Sofware Services

Page Contents

Concepts

File Formats

Indigo supports the following molecule and reaction formats:

Daylight Formats with ChemAxon and CurlySMILES Extensions

Almost all features of the original Daylight SMILES format are supported, including:

The only features that are not supported are:

Almost all features of the original Daylight SMARTS format are supported, including:

The only features that are not supported are:

The following ChemAxon SMILES extensions are supported:

The following extensions from the CurlySMILES format are supported:

MDL Formats

MDL (Symyx) Molfiles and Rxnfiles are supported by Indigo. Both format versions (2000 and 3000) are supported. Almost all format features are supported, including:

The only features that are not supported are:

Molecules and Query Molecules

Indigo treats "real" molecules different from query molecules in certain ways.

Similar rules apply to "real" reactions and query reactions:

There is never an ambiguity whether a particular object is a molecule/reaction or query molecule/reaction. While most input formats (Molfile/Rxnfile, SMILES) can store both types, different calls are provided for loading them as "real" or query objects. SMARTS strings always are loaded as query molecules, for obvious reasons.

Substructure and SMARTS Matching

Indigo provides full capabilities for substructure matching of query molecules, including ones loaded from SMARTS expressions. In the Bingo User Manual, you can find examples of substructure matches.

Differences between SMARTS and query SMILES

While a lot of SMARTS notation is allowed when loading SMILES as a query, there are differences between SMARTS and query SMILES:

Exact Matching

Indigo can perform exact matching of pairs of molecules, or pairs of reactions.

Molecule Similarity

Indigo provides various molecule similarity measures, all based on originally developed fingerprints. In the Bingo User Manual, you can find detailed description and examples.

Canonical SMILES

Canonical SMILES generated by Indigo are, according to Daylight and ChemAxon terminology, unique SMILES with isomeric information, or absolute SMILES. All significant molecular features, such as isotopes, charges, radicals, stereocenters, stereogroups, cis-trans bonds, and aromaticity, are encoded into SMILES in a canonical form. A canonical SMILES string defines the molecule independently of any particular representation (atom renumbering, stereogroup renumbering, explicit/implicit hydrogens). So, the equality of the canonical SMILES of two molecules guarantees that these molecules are the same, and vice versa.

'Useless' stereocenters

Stereocenter is not considered useful when it does not provide any information for distinguishing stereoisomers. Such useless stereocenters are ignored in canonical SMILES generated by Indigo.

From the pictures below, you can see that all the three molecules specify the same mixture. This is represented in the fact that Indigo gives identical canonical SMILES for them.

Canonical SMILES:
C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O
Canonical SMILES:
C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O
Canonical SMILES:
C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O

Note: Query features are not supported for canonicalization.

Scaffold Detection

Indigo incorporates two algorithms (exact and approximate) of maximum common substructure (MCS) computation. Each of the algorithms can operate on an arbitrary amount of input structures. Thus, it is possible to pass the found scaffold to the R-Group decovolution procedure.

Moreover, if the scaffold detection procedure has found more than one MCS, it is possible to obtain all of them.

R-Group Deconvolution

With a collection of structures and a scaffold that is common for these structures, it is possible to perform the R-Group deconvolution (R-Group decomposition). The result of this procedure will be a scaffold with marked R-sites (R1, R2, ...), and the actual substituents for these R-sites for each of the input structures.

Examples are available on a separate page.

Layout

Indigo is capable of performing layout (cleanup) of molecules and reactions. After the layout procedure, the average length of the bonds in a molecule will always be around 1.0. The procedure is not sensitive to the present molecular coordinates.

Rendering

Indigo provides high-quality 2D rendering capabilities for molecule and reactions. All the chemical features (including query features) are rendered properly following the IUPAC recommendations (1, 2) for graphical representation. The features that are not covered by IUPAC (mostly, query features) are drawn in such a way that they do not overlay the primary structure.

With Indigo, it is possible to display highlighted bonds and atoms with specified color and/or with thick lines and bold characters.

The full list of options is available on the options page.

The following output formats are supported for rendering:

On Windows platforms, Indigo is also able to:

Produced PNGs and Bitmaps are transparent unless the background is set explicitly. Produced SVGs, PDFs, and Metafiles contain no raster fragments.

In the Bingo User Manual, you can find examples of rendered molecules and reactions. All the pictures in this manual were rendered to SVG by Indigo.

Combinatorial Chemistry

Indigo provides a reaction products enumerator, which has the following features:

The full list of options is available on the options page.

The examples are available on a separate page.