|
SMARTS...
...means SMiles ARbitrary Target Specification
...is a language used for describing molecular patterns and properties
...rules are straightforward extensions of SMILES
- All SMILES symbols and properties are legal in SMARTS.
- SMARTS includes logical operators and additional molecular descriptors
...can describe structural patterns with varying degrees of specificity and
generality:
- SMILES for methane: C or [CH4]
- High specificity SMARTS describing a pattern consistent with methane: [CH4]
Only matches aliphatic carbon atoms that have 4 hydrogens.
Won't match ethane, ethene, or cyclopentane.
- Low specificity SMARTS describing a pattern consistent with methane: C
Matches aliphatic carbon atoms that have any number of hydrogens.
Will match ethane, ethene, and cyclopentane.
Substructure searching, the process of
finding a particular pattern (subgraph) in a molecule (graph), is one of the
most important tasks for computers in chemistry. It is used in virtually every
application that employs a digital representation of a molecule, including
depiction (to highlight a particular functional group), drug design (searching a
database for similar structures and activity), analytical chemistry (looking for
previously-characterized structures and comparing their data to that of an
unknown), and a host of other problems.
SMARTS is a language that allows you to specify
substructures using rules that are straightforward extensions of SMILES. For
example, to search a database for phenol-containing structures, one would use
the SMARTS string "[OH]c1ccccc1", which should be familiar to those aquainted
with SMILES. In fact, almost all SMILES specifications are valid SMARTS targets
(see "SMARTS Exceptions," below). Using SMARTS, flexible and efficient
substructure-search specifications can be made in terms that are meaningful to
chemists.
In the SMILES language, there are two fundamental
types of symbols: atoms and bonds. Using these SMILES symbols,
once can specify a molecule's graph (its "nodes" and "edges") and assign
"labels" to the components of the graph (that is, say what type of atom each
node represents, and what type of bond each edge represents).
The same is true in SMARTS: One uses atomic and
bond symbols to specify a graph. However, in SMARTS the labels for the graph's
nodes and edges (its "atoms" and "bonds") are extended to include "logical
operators" and special atomic and bond symbols; these allow SMARTS atoms and
bonds to be more general. For example, the SMARTS atomic symbol [C,N] is an atom
that can be aliphatic C or aliphatic N; the SMARTS bond symbol "~" (tilde)
matches any bond.
Further Information and full course material
please go to the
DAYLIGHT.COM
courtesy:http://www.daylight.com |