InChi Overview
The following text shortly describes the most important features of the InChI format.
Derived directly from structure
Unlike other chemical identifiers, such as CAS RN, the InChI string is completely derived from the structure of a compound. This means that no registration authority is required to assign a compound an InChI. As counterweight of this advantage, the need to accommodate all the possible compounds forces the length of the InChI string to be proportional to the structure size and therefore considerably longer that e.g. the above mentioned CAS RN.
Unique
The InChI format is cleverly designed to assign the same InChI to a compound regardless of the way it is drawn. This ensures that InChI generated for one compound by different users with different drawing habits will be the same (see the part on layers below for more details).
Layered
For different purposes different levels of detail of chemical structure description are required. InChI approaches this problem by splitting the information contained within the identifier into several layers describing different features of the structure. There is for instance the connectivity layer, the charge layer, isotopic layer etc.
This approach has several advantages. It means that structures may be compared on different levels and that information irrelevant to the problem at hand may be safely ignored by the processing software. Thus it is for instance possible to search for InChI of lactic acid without specifying the stereochemistry, or, when needed, for the specific stereoisomer. The InChI of the stereoisomers will differ only in the stereochemical layer.
Tautomer friendly
There are many compounds that may be drawn in different tautomeric forms, dependent on the outside conditions or the habit of the person drawing them. Because InChI tries to describe the same compound by the same InChI it does also have to take care of these cases. Even though there are some limitations to this, it does a pretty good job in detection of possible tautomeric forms and makes use of the information to describe them all with one InChI.
Forgetful
As was already mentioned, the InChI software tries hard to assign only one InChI to each possible drawing of the same compound. To do it, it has to forget some details of the original structure. InChI does for instance not contain information about atom coordinates or even bond orders. The bond orders are indirectly encoded in the hydrogen count for each atom. Also the positioning of charge is not specified in the InChI.