Using cclib

From cclib
Jump to: navigation, search

Contents

How to parse log files

Basics

Here is an interaction Python session that shows how to use cclib to extract the number of atoms (natom) from the output of a GAMESS single-point energy calculation. A large number of attributes are available for each log file (see Parsed Data). You can use the "help" command as in the following example to find out the names and meaning of all the attributes.

>>> from cclib.parser import ccopen
>>> myfile = ccopen("dvp_sp.out")
>>> data = myfile.parse() # The following lines are log messages
[GAMESS dvb_sp.out INFO] Creating attribute atomcoords[]
...
[GAMESS dvb_sp.out INFO] Creating attribute natom: 20
[GAMESS dvb_sp.out INFO] Creating attribute aooverlaps[]
...
>>> help(data)
 |  Description of cclib attributes:
 |      aooverlaps -- atomic orbital overlap matrix (array[2])
 |      atomcoords -- atom coordinates (array[3], angstroms)
...
 |      natom -- number of atoms (integer)
...
>>> print "The number of atoms is %d." % data.natom
The number of atoms is 20.

The convenience function, ccopen(), attempts to guess the type (i.e. ADF, Gaussian, etc.) of a particular log file and create an instance (it returns None otherwise). If you find that ccopen() does not work for a particular file, please let us know. You can also open the file directly as in the following example of opening a Gaussian file:

>>> from cclib.parser import ADF, GAMESS, GAMESSUK, Gaussian, Molpro, Jaguar
>>> myfile = Gaussian("myfile.out")
>>> data = myfile.parse()

Multiple log files

In some situations, it is desirable to parser several output files; for example, Molpro typically saves the output from a geometry optimization in an additional log file. This can be achieved by passing several logfiles as a list to ccopen(), causing it to sequentially parse them. Note that the order in this list is important.

Compressed files

Compressed log files are automatically processed by cclib, base on their extensions. Currently .zip, .gz and .bz2 are supported. Pass the compressed file name to ccopen as above, and the output will look the same.

Remote files

cclib is able to handle any filetype object including, for example, a file accessed across HTTP using urllib2.urlopen(). Simply pass the file-like object to ccopen, and cclib will handle it as if it were a regular file. Here is an example of parsing a cclib regression test located on the SourceForge server:

>>> import cclib
>>> import urllib2
>>> remote = urllib2.urlopen("http://cclib.sf.net/data/Gaussian/Gaussian03/QVGXLLKOCUKJST-UHFFFAOYAJmult3Fixed.out"
>>> parser = cclib.parser.ccopen(remote)
>>> data = parser.parse()
[Gaussian stream <type 'instance'> INFO] Creating attribute charge: 0
[Gaussian stream <type 'instance'> INFO] Creating attribute mult: 3
[Gaussian stream <type 'instance'> INFO] Creating attribute atomnos[]
[Gaussian stream <type 'instance'> INFO] Creating attribute natom: 1
[Gaussian stream <type 'instance'> INFO] Creating attribute atomcoords[]
[Gaussian stream <type 'instance'> INFO] Creating attribute scftargets[]
[Gaussian stream <type 'instance'> INFO] Creating attribute scfvalues[]
[Gaussian stream <type 'instance'> INFO] Creating attribute scfenergies[]
[Gaussian stream <type 'instance'> INFO] Creating attribute mosyms[]
[Gaussian stream <type 'instance'> INFO] Creating attribute homos[]
[Gaussian stream <type 'instance'> INFO] Creating attribute moenergies[]
[Gaussian stream <type 'instance'> INFO] Creating attribute grads[]
[Gaussian stream <type 'instance'> INFO] Creating attribute geovalues[]
[Gaussian stream <type 'instance'> INFO] Creating attribute geotargets[]
[Gaussian stream <type 'instance'> INFO] Creating attribute coreelectrons[]
>>>

Additional information

The previous examples used the Python logger to display information relating to the parsing. If you would like to reduce the amount of information displayed, before parsing you can set the logger to only display error messages:

import logging
myfile.logger.setLevel(logging.ERROR)
myfile.parse()

If instead you would like the progress of parsing the file to be displayed, use the following:

from cclib.parser import ccopen
from cclib.progress import TextProgress
import logging

progress = TextProgress()
myfile = ccopen("mycalc.out",progress,logging.ERROR)
data = myfile.parse()
print "The number of atoms is %d" % data.natom

Custom progress classes can also be created by implementing class that contain initialize and update functions. This allows graphical progress dialogs (and anything else imaginable) to be designed that show the progress of parsing the files.

ccget - A command-line interface to cclib

ccget is a simple command-line interface to cclib that allows you to quickly display the information extracted by cclib from a particular log file or log files. It is included in the cclib distribution in the scripts directory. For information on use, specify --help:
C:\Tools>python C:\Python26\Scripts\ccget --help
Usage:  ccget <attribute> [<attribute>] <compchemlogfile> [<compchemlogfile>]
     where <attribute> is one of the attributes to be parsed by cclib
     from each of the compchemlogfiles.
For a list of attributes available in a file, type:
     ccget --list <compchemlogfile>   [or -l]
To see what attributes are present in a file use --list:
C:\Tools\>python C:\Python26\Scripts\ccget --list AM1_SP.out.gz
Attempting to parse AM1_SP.out.gz
cclib can parse the following attributes from AM1_SP.out.gz:
  atomcoords
  atomnos
  charge
  ...
To see the values of some attributes, just specify the attribute names:
C:\Tools>python C:\Python26\Scripts\ccget atomnos charge AM1_SP.out.gz
Attempting to parse AM1_SP.out.gz
atomnos:
[6 8 6 6 6 6 7 1 6 6 6 6 8 6 6 6 6 7 1 6 6 6 1 1 1 1 1 1 1 1 1 1]
charge:
0
Finally, note that multiple filenames can be specified explicitly or using wildcards:
C:\Tools>python C:\Python26\Scripts\ccget atomnos charge *.gz
Attempting to parse AM1_SP.out.gz
atomnos:
[6 8 6 6 6 6 7 1 6 6 6 6 8 6 6 6 6 7 1 6 6 6 1 1 1 1 1 1 1 1 1 1]
charge:
0
Attempting to parse anthracene.log.gz
atomnos:
[6 6 6 6 6 6 6 6 6 6 6 1 6 6 6 1 1 1 1 1 1 1 1 1]
charge:
0
Attempting to parse chn1.log.gz
atomnos:
[6 6 6 6 7 1 1 1 1 1]
charge:
0
Attempting to parse maheshkumar.log.gz
atomnos:
[ 6  6 16  6  6  6  6 16  6  6  8  8  8  8  6  6  6  6  1  1  1  1  1  1  1  1 -2]
charge:
0
Attempting to parse tms_nmr.log.gz
atomnos:
[ 6 14  6  6  6  1  1  1  1  1  1  1  1  1  1  1  1]
charge:
0

How to perform population analysis

Population analyses such as Mulliken, C squared, and overlap can be calculated using classes provided by cclib. The general strategy is to pass a parser class to the constructor of a method class, and then call calculate():

from cclib.parser import Gaussian
from cclib.method import MPA

parser=Gaussian("mycalc.out")
analysis=MPA(parser)
analysis.calculate()

The calculate() function either returns True or False, depending on its success. If True is returned, the analysis instance should contain the attributes aoresults, fragresults, and fragcharges. aoresults contains a rank 3 array containing the basis function (ie. atomic orbital) contributions to each molecular orbital for each spin. fragresults contains a rank 3 array containing the atomic contributions to each molecular orbital for each spin. fragcharges contains a rank 1 array of the number of electrons on each atom.

There is a very specific reason why the names fragresults and fragcharges were chosen instead of the more obvious atomresults and atomcharges. The calculate() function optionally accepts a list of lists so that the electron density can be partitioned differently than just between atoms. For example:

from cclib.parser import ccopen
from cclib.method import MPA

parser = ccopen("mycalc.out")
analysis = MPA(parser)
analysis.calculate([[0,1,2,3,4],[12,13],[5,6,7,8],[10,11],[9]])

In this case, fragresults and fragcharges will correspond to the 5 fragments specified by the lists of lists. This allows electron density to be partitioned anywhere between a subset of atomic orbitals to multiple atoms.

The method classes also support progress classes like the parsers do. The order and type of arguments should be the same, except for the first argument which is either a filename (parsers) or a parser instance (method).

How to interface to other Python libraries

cclib provides 'bridges' to other open source libraries such as PyQuante, and Biopython. You need to install these separately to cclib if you want to use the bridges.

Here we show how to use PyQuante to carry out an RHF calculation on the final result of a geometry optimization.

# Extract the coordinates from a geometry optimization
from cclib.parser import ccopen
p = ccopen("geometryopt.out")
data = p.parse()

# Use the bridge to create a PyQuante molecule
from cclib.bridge import makepyquante
pyqmol = makepyquante(data.atomcoords[-1], data.atomnos)

# Run an RHF calculation using PyQuante
from PyQuante.hartree_fock import rhf
en, orbe, orbs = rhf(pyqmol)

Here we show how to use Biopython to calculate the RMS between the coordinates of the initial and final geometries of a geometry optimization.

# Extract the coordinates from a geometry optimization
from cclib.parser import ccopen
p = ccopen("geometryopt.out")
data = p.parse()

# Use the bridge to create two lists of Biopython atoms
from cclib.bridge import makebiopython
initial = makebiopython(data.atomcoords[0], data.atomnos)
final = makebiopython(data.atomcooords[-1], data.atomnos)

# Use Biopython to superimpose the two geometries and calculate the RMS
from Bio.PDB.Superimposer import Superimposer
superimposer = Superimposer()
superimposer.set_atoms(initial, final)
print superimposer.rms

How to store the information extracted from a log file

If you use cclib to parse a log file, the result is a ccData object (in the examples below, it's called mydata). If you want to store this object on disk, and read it back in in another Python script, you can just use the pickle module that comes with Python:

# In program 1
import pickle
outputfile = open("output.pickle", "w")
pickle.dump(mydata, outputfile)
outputfile.close()

# In program 2
import pickle
inputfile = open("output.pickle", "r")
mydata = pickle.load(outputfile)
inputfile.close()

Sometimes you might want to store the data in a standard text file format that can be read by many programming languages. If the simplejson module is installed, cclib can read/write ccData objects using JSON:

# In program 1
mydata.writejson("output.txt")

# In program 2
from cclib.data import readjson
mydata = readjson(filename="output.txt")
# The readjson function can also accept text...
# mydata = readjson(text='{"atomnos": [6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 6]}')
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox