Bingo Chemistry Cartridge
Solution
Applications in the cheminformatics area often need to be able to search for molecules on large chemical databases. GGA developed a molecular search cartridge – Bingo – that operates with molecules that are presented in MDL Molfile format and that are stored in the Oracle CLOB (Character Large Object). After installing Bingo into an Oracle or MS SQL Server database, any user can use several operators that accept the CLOB and index any table with CLOB in order for the same operators to work faster when applied to the elements in the table. Accessing the indexes is transparent to the user. Bingo supports multi-user parallel work with any amount of indexed tables. It also enables the user to index any number of columns in the table. The cartridge introduces a set of comprehensive graph theory algorithms and provides high search performance.
The developed solution carries all the necessary features required by modern cheminformatics applications, including features not present in other cartridges; for example, advanced tautomer search, resonance substructure search, and fast updating of the index when adding new structures. It is an open-source product that enables building a fully relational registration system: insert, delete, and update records in a relational structure or reaction database. It is also available on a commercial basis.
Technologies
- Environments: Windows 32/64 bit, Linux 32/64 bit, Mac OS X, Solaris.
- Databases: Oracle, MS SQL Server.
- Languages: C++, PL/SQL.
Features
- Interface for the Oracle cost-based optimizer where possible, particularly in SMILES queries.
- Reliability and no legacy code to maintain.
- Best performance in the industry for a search cartridge for both screening and matching phases of various types of searches, especially substructure search.
- Effective memory management with no unnecessary reallocations. Communication with underlying database, especially LOB handling, is optimized as well. During substructure searches, molecules and reactions are stored in shared memory to speed up the access.
- Searching in a variety of ways:
- For molecule structure searching, support of 2D and 3D exact and substructure searches, as well as similarity, tautomer, Markush, formula, molecular weight, and flexmatch searches. Canonical SMILES, with isomeric information included, is available as well.
- For reaction searches, support of reaction substructure search (RSS) with optional automatic generation of atom-to-atom mapping. All of these techniques are available through extensions to the SQL syntax.
- Support of fragment highlighting for substructure, tautomer, and reaction substructure searches.
- Availability of different indexing options to optimize storage space requirements versus the speed of registration.
- Flexibility and scalability of a true data cartridge:
- Fully compliant with Oracle and MS SQL Server, allows structure and reaction tables to be placed anywhere in the database, and allows the data to be accessed by any SQL compliant application.
- Native binaries are built for Windows 32/64 bit, Linux 32/64 bit, Mac OS X, and Solaris.
- Contains a combined structure and reaction cartridge. 2D and 3D search features are supported by the same index.
- Supports a large database operation (up to 28 million structures).
- Simple installation procedure.