arrowHome arrow Cheminformatics arrow SMILES
Newsflash

Genehelix brings a large database of Bioinformatics companies and institutes with complete profiles and current job openings . Search here for your company profile...

Main Menu
Home
FAQ
Contact Us
Search
Wrapper
Drug design
News
Tools
Proteomics
Genomics
Software
Cheminformatics
Databases
Books
Group
Articles
Online Tutorials
Companies-Software-1
DNA-PROTEIN-ANALYSIS
Bioinformatics Course
About Us
Login Form
Username

Password

Remember me
Forgotten your password?
No account yet? Create one
chi_soft_smiles PDF Print E-mail

SMILES (Simplified Molecular Input Line Entry System) is widely used throughout the Daylight Toolkit. SMILES is a line notation (a typographical method using printable characters) for entering and representing molecules. Some examples are:

SMILES Name SMILES Name
CC ethane [OH3+] hydronium ion
O=C=O carbon dioxide [2H]O[2H] deuterium oxide
C#N hydrogen cyanide [235U] uranium-235
CCN(CC)CC triethylamine F/C=C/F E-difluoroethene
CC(=O)O acetic acid F/C=C\F Z-difluoroethene
C1CCCCC1 cyclohexane N[C@@H](C)C(=O)O L-alanine
c1ccccc1 benzene N[C@H](C)C(=O)O D-alanine

SMILES contains the same information as might be found in an extended connection table. The primary reason SMILES is more useful than a connection table is that it is a linguistic construct, rather than a computer data structure. SMILES is a true language, albeit with a simple vocabulary (atom and bond symbols) and only a few grammar rules. SMILES representations of structure can in turn be used as "words" in the vocabulary of other languages designed for storage of chemical information (information about chemicals) and chemical intelligence (information about chemistry).

Part of the power of SMILES comes from the fact that an algorithm exists (available in the Daylight Toolkit) to produce a unique SMILES. With standard SMILES, the name of a molecule is synonymous with its structure; with unique SMILES, the name is universal. Anyone in the world who uses unique SMILES to name a molecule will choose the exact same name.

One other important property of SMILES is that it is quite compact compared to most other methods of representing structure. A typical SMILES will take 50% to 70% less space than an equivalent connection table, even binary connection tables. For example, one database of 23,137 structures, with an average of 20 atoms per structure, uses only 1.6 bytes per atom when represented with SMILES. In addition, ordinary compression of SMILES, such as Ziv-Lempel (used by UNIX's compress(1) utility), is extremely effective: The same database cited above was reduced to 27% of its original size by Ziv-Lempel compression (i.e. 0.42 bytes per atom).

These properties, of being unique, compact, human understandable, machine readable, and universal, open many doors to the chemical information programmer. Examples of uses for SMILES are: keys for database access; a mechanism for researchers to exchange chemical information; an entry system for chemical data; and part of languages for Artificial Intelligence or Expert Systems in chemistry.

The rest of this chapter is a concise exposition of the SMILES encoding rules. For further information, the reader is referred to"SMILES 1. Introduction and Encoding Rules", Weininger, D., J.Chem. Inf. Comput. Sci. 1988, 28,31.

Further Information and full course material please go to the DAYLIGHT.COM

courtesy:http://www.daylight.com