Using cclib

Basics
Here is an interaction Python session that shows how to use cclib to extract the number of atoms (natom) from the output of a GAMESS single-point energy calculation. A large number of attributes are available for each log file (see Parsed Data). You can use the "help" command as in the following example to find out the names and meaning of all the attributes.

>>> from cclib.parser import ccopen >>> myfile = ccopen("dvp_sp.out") >>> data = myfile.parse # The following lines are log messages [GAMESS dvb_sp.out INFO] Creating attribute atomcoords[] ... [GAMESS dvb_sp.out INFO] Creating attribute natom: 20 [GAMESS dvb_sp.out INFO] Creating attribute aooverlaps[] ... >>> help(data) | Description of cclib attributes: |     aooverlaps -- atomic orbital overlap matrix (array[2]) |     atomcoords -- atom coordinates (array[3], angstroms) ... |     natom -- number of atoms (integer) ... >>> print "The number of atoms is %d." % data.natom The number of atoms is 20.

The convenience function, ccopen, attempts to guess the type (i.e. ADF, Gaussian, etc.) of a particular log file and create an instance (it returns None otherwise). If you find that ccopen does not work for a particular file, please let us know. You can also open the file directly as in the following example of opening a Gaussian file: >>> from cclib.parser import ADF, GAMESS, GAMESSUK, Gaussian, Molpro, Jaguar >>> myfile = Gaussian("myfile.out") >>> data = myfile.parse

Multiple log files
In some situations, it is desirable to parser several output files; for example, Molpro typically saves the output from a geometry optimization in an additional log file. This can be achieved by passing several logfiles as a list to ccopen, causing it to sequentially parse them. Note that the order in this list is important.

Compressed files
Compressed log files are automatically processed by cclib, base on their extensions. Currently .zip, .gz and .bz2 are supported. Pass the compressed file name to ccopen as above, and the output will look the same.

Remote files
cclib is able to handle any filetype object including, for example, a file accessed across HTTP using urllib2.urlopen. Simply pass the file-like object to ccopen, and cclib will handle it as if it were a regular file. Here is an example of parsing a cclib regression test located on the SourceForge server:

>>> import cclib >>> import urllib2 >>> remote = urllib2.urlopen("http://cclib.sf.net/data/Gaussian/Gaussian03/QVGXLLKOCUKJST-UHFFFAOYAJmult3Fixed.out" >>> parser = cclib.parser.ccopen(remote) >>> data = parser.parse [Gaussian stream  INFO] Creating attribute charge: 0 [Gaussian stream  INFO] Creating attribute mult: 3 [Gaussian stream  INFO] Creating attribute atomnos[] [Gaussian stream  INFO] Creating attribute natom: 1 [Gaussian stream  INFO] Creating attribute atomcoords[] [Gaussian stream  INFO] Creating attribute scftargets[] [Gaussian stream  INFO] Creating attribute scfvalues[] [Gaussian stream  INFO] Creating attribute scfenergies[] [Gaussian stream  INFO] Creating attribute mosyms[] [Gaussian stream  INFO] Creating attribute homos[] [Gaussian stream <type 'instance'> INFO] Creating attribute moenergies[] [Gaussian stream <type 'instance'> INFO] Creating attribute grads[] [Gaussian stream <type 'instance'> INFO] Creating attribute geovalues[] [Gaussian stream <type 'instance'> INFO] Creating attribute geotargets[] [Gaussian stream <type 'instance'> INFO] Creating attribute coreelectrons[] >>>

Additional information
The previous examples used the Python logger to display information relating to the parsing. If you would like to reduce the amount of information displayed, before parsing you can set the logger to only display error messages:

import logging myfile.logger.setLevel(logging.ERROR) myfile.parse

If instead you would like the progress of parsing the file to be displayed, use the following:

from cclib.parser import ccopen from cclib.progress import TextProgress import logging

progress = TextProgress myfile = ccopen("mycalc.out",progress,logging.ERROR) data = myfile.parse print "The number of atoms is %d" % data.natom

Custom progress classes can also be created by implementing class that contain initialize and update functions. This allows graphical progress dialogs (and anything else imaginable) to be designed that show the progress of parsing the files.

ccget - A command-line interface to cclib
ccget is a simple command-line interface to cclib that allows you to quickly display the information extracted by cclib from a particular log file or log files. It is included in the cclib distribution in the scripts directory. For information on use, specify --help: C:\Tools>python C:\Python26\Scripts\ccget --help Usage: ccget [ ] [ ] where is one of the attributes to be parsed by cclib from each of the compchemlogfiles. For a list of attributes available in a file, type: ccget --list  [or -l] To see what attributes are present in a file use --list: C:\Tools\>python C:\Python26\Scripts\ccget --list AM1_SP.out.gz Attempting to parse AM1_SP.out.gz cclib can parse the following attributes from AM1_SP.out.gz: atomcoords atomnos charge ... To see the values of some attributes, just specify the attribute names: C:\Tools>python C:\Python26\Scripts\ccget atomnos charge AM1_SP.out.gz Attempting to parse AM1_SP.out.gz atomnos: [6 8 6 6 6 6 7 1 6 6 6 6 8 6 6 6 6 7 1 6 6 6 1 1 1 1 1 1 1 1 1 1] charge: 0 Finally, note that multiple filenames can be specified explicitly or using wildcards: C:\Tools>python C:\Python26\Scripts\ccget atomnos charge *.gz Attempting to parse AM1_SP.out.gz atomnos: [6 8 6 6 6 6 7 1 6 6 6 6 8 6 6 6 6 7 1 6 6 6 1 1 1 1 1 1 1 1 1 1] charge: 0 Attempting to parse anthracene.log.gz atomnos: [6 6 6 6 6 6 6 6 6 6 6 1 6 6 6 1 1 1 1 1 1 1 1 1] charge: 0 Attempting to parse chn1.log.gz atomnos: [6 6 6 6 7 1 1 1 1 1] charge: 0 Attempting to parse maheshkumar.log.gz atomnos: [ 6 6 16  6  6  6  6 16  6  6  8  8  8  8  6  6  6  6  1  1  1  1  1  1  1  1 -2] charge: 0 Attempting to parse tms_nmr.log.gz atomnos: [ 6 14 6  6  6  1  1  1  1  1  1  1  1  1  1  1  1] charge: 0

How to perform population analysis
Population analyses such as Mulliken, C squared, and overlap can be calculated using classes provided by cclib. The general strategy is to pass a parser class to the constructor of a method class, and then call calculate:

from cclib.parser import Gaussian from cclib.method import MPA

parser=Gaussian("mycalc.out") analysis=MPA(parser) analysis.calculate

The calculate function either returns True or False, depending on its success. If True is returned, the analysis instance should contain the attributes aoresults, fragresults, and fragcharges. aoresults contains a rank 3 array containing the basis function (ie. atomic orbital) contributions to each molecular orbital for each spin. fragresults contains a rank 3 array containing the atomic contributions to each molecular orbital for each spin. fragcharges contains a rank 1 array of the number of electrons on each atom.

There is a very specific reason why the names fragresults and fragcharges were chosen instead of the more obvious atomresults and atomcharges. The calculate function optionally accepts a list of lists so that the electron density can be partitioned differently than just between atoms. For example:

from cclib.parser import ccopen from cclib.method import MPA

parser = ccopen("mycalc.out") analysis = MPA(parser) analysis.calculate(0,1,2,3,4],[12,13],[5,6,7,8],[10,11],[9)

In this case, fragresults and fragcharges will correspond to the 5 fragments specified by the lists of lists. This allows electron density to be partitioned anywhere between a subset of atomic orbitals to multiple atoms.

The method classes also support progress classes like the parsers do. The order and type of arguments should be the same, except for the first argument which is either a filename (parsers) or a parser instance (method).

How to interface to other Python libraries
cclib provides 'bridges' to other open source libraries such as PyQuante, and Biopython. You need to install these separately to cclib if you want to use the bridges.

Here we show how to use PyQuante to carry out an RHF calculation on the final result of a geometry optimization. from cclib.parser import ccopen p = ccopen("geometryopt.out") data = p.parse
 * 1) Extract the coordinates from a geometry optimization

from cclib.bridge import makepyquante pyqmol = makepyquante(data.atomcoords[-1], data.atomnos)
 * 1) Use the bridge to create a PyQuante molecule

from PyQuante.hartree_fock import rhf en, orbe, orbs = rhf(pyqmol)
 * 1) Run an RHF calculation using PyQuante

Here we show how to use Biopython to calculate the RMS between the coordinates of the initial and final geometries of a geometry optimization. from cclib.parser import ccopen p = ccopen("geometryopt.out") data = p.parse
 * 1) Extract the coordinates from a geometry optimization

from cclib.bridge import makebiopython initial = makebiopython(data.atomcoords[0], data.atomnos) final = makebiopython(data.atomcooords[-1], data.atomnos)
 * 1) Use the bridge to create two lists of Biopython atoms

from Bio.PDB.Superimposer import Superimposer superimposer = Superimposer superimposer.set_atoms(initial, final) print superimposer.rms
 * 1) Use Biopython to superimpose the two geometries and calculate the RMS

How to store the information extracted from a log file
If you use cclib to parse a log file, the result is a ccData object (in the examples below, it's called mydata). If you want to store this object on disk, and read it back in in another Python script, you can just use the pickle module that comes with Python: import pickle outputfile = open("output.pickle", "w") pickle.dump(mydata, outputfile) outputfile.close
 * 1) In program 1

import pickle inputfile = open("output.pickle", "r") mydata = pickle.load(outputfile) inputfile.close
 * 1) In program 2

Sometimes you might want to store the data in a standard text file format that can be read by many programming languages. If the simplejson module is installed, cclib can read/write ccData objects using JSON: mydata.writejson("output.txt")
 * 1) In program 1

from cclib.data import readjson mydata = readjson(filename="output.txt")
 * 1) In program 2
 * 1) The readjson function can also accept text...
 * 2) mydata = readjson(text='{"atomnos": [6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 6]}')