|
SMILES (Simplified Molecular Input Line Entry
System) is widely used throughout the Daylight Toolkit. SMILES is a line
notation (a typographical method using printable characters) for entering
and representing molecules. Some examples are:
| SMILES |
Name |
SMILES |
Name |
| CC |
ethane |
[OH3+] |
hydronium ion |
| O=C=O |
carbon dioxide |
[2H]O[2H] |
deuterium oxide |
| C#N |
hydrogen cyanide |
[235U] |
uranium-235 |
| CCN(CC)CC |
triethylamine |
F/C=C/F |
E-difluoroethene |
| CC(=O)O |
acetic acid |
F/C=C\F |
Z-difluoroethene |
| C1CCCCC1 |
cyclohexane |
N[C@@H](C)C(=O)O |
L-alanine |
| c1ccccc1 |
benzene |
N[C@H](C)C(=O)O |
D-alanine |
SMILES contains the same information as might be
found in an extended connection table. The primary reason SMILES is more useful
than a connection table is that it is a linguistic construct, rather than a
computer data structure. SMILES is a true language, albeit with a simple
vocabulary (atom and bond symbols) and only a few grammar rules. SMILES
representations of structure can in turn be used as "words" in the vocabulary of
other languages designed for storage of chemical information (information about
chemicals) and chemical intelligence (information about chemistry).
Part of the power of SMILES comes from the fact
that an algorithm exists (available in the Daylight Toolkit) to produce a
unique SMILES. With standard SMILES, the name of a molecule is synonymous
with its structure; with unique SMILES, the name is universal. Anyone in the
world who uses unique SMILES to name a molecule will choose the exact same
name.
One other important property of SMILES is that it
is quite compact compared to most other methods of representing structure. A
typical SMILES will take 50% to 70% less space than an equivalent connection
table, even binary connection tables. For example, one database of 23,137
structures, with an average of 20 atoms per structure, uses only 1.6 bytes per
atom when represented with SMILES. In addition, ordinary compression of SMILES,
such as Ziv-Lempel (used by UNIX's compress(1) utility), is extremely effective:
The same database cited above was reduced to 27% of its original size by Ziv-Lempel
compression (i.e. 0.42 bytes per atom).
These properties, of being unique, compact, human
understandable, machine readable, and universal, open many doors to the chemical
information programmer. Examples of uses for SMILES are: keys for database
access; a mechanism for researchers to exchange chemical information; an entry
system for chemical data; and part of languages for Artificial Intelligence or
Expert Systems in chemistry.
The rest of this chapter is a concise exposition
of the SMILES encoding rules. For further information, the reader is referred to"SMILES
1. Introduction and Encoding Rules", Weininger, D., J.Chem. Inf. Comput.
Sci. 1988, 28,31.
Further Information and full course material
please go to the
DAYLIGHT.COM
courtesy:http://www.daylight.com
|