During the course of this project, the specification of DDLm has evolved. The 2008 specification is available as DDLm: Next Generation Dictionary Definition Language by Syd Hall, Nick Spadaccini and John Westbrook (Version: 13 August 2008). As of this writing (9 December 2009), a new, and hopefully final, DDLm specification is under discussion by the Committee on the Maintenance of the CIF. It appears that that final specification will be significantly different from the DDLm specification contemplated at the start of this project.
Therefore, it has been agreed with the IUCr to deliver software that will conform to the earlier specifications now as an intermediate step on the way to fully adapting to the final DDLm specification in a later project after those specifications have become final. We have attempted to set up the current releases so that they will be useful now, but in a way that will facilitate that future transition.
Three software packages have been produced. All three are open source packages.
This is a preliminary release of vcif with support for DDLm using dREL, working against the August 2008 specification of DDLm. As of this writing, DDLm is being revised to a new "CIF 2" specification, so this kit should not be used for production work yet. Check the sourceforge vcif site for later releases
If you have any difficulty with this kit, contact Elena Zlateva, elitty at gmail dot com, or Herbert J. Bernstein, yaya at bernstein-plus-sons dot com.
Work on vcif 3 was funded in part by the International Union of Crystallography and is based on work on CBFlib funded in part by grants from the NIGMS of the U.S. National Institutes of Health and the Office of Science of the U. S. Department of Energy.
Our thanks to James Hester for providing the dREL parser used by vcif_3.
Assuming cif2cbf is in your PATH and csh is your shell:
Place the file to be tested, say c2ctest5.cif, and the appropriate dictionary, say, cif_expanded.dic, in the current working directory
and execute:
cif2cbf -B read -v cif_expanded.dic -w -i c2ctest5.cif -o /dev/null |& grep CBFlib
In a bash shell, the above command should be:
cif2cbf -B read -v cif_expanded.dic -w -i c2ctest5.cif -o /dev/null 2>&1 grep CBFlib2>&1
In this case the output you should get would be (lines folded for clarity):
CBFlib: warning input line 14 (1) -- _cell.volume value provided conflicts with generated value. CBFlib generated value 1759.01678255: CBFlib: warning input line 15 (1) -- _cell.volume value provided conflicts with generated value. CBFlib generated value 1759.01678255: CBFlib: warning input line 15 (1) -- _cell.volume value out of dictionary range CBFlib: warning input line 16 (1) -- _cell.volume value provided conflicts with generated value. CBFlib generated value 1759.01678255: CBFlib: warning input line 17 (1) -- _cell.volume value missing - CBFlib generated value: 1759.01678255
You will need a unix-like system with C, C++, Fortran and Python installed. The combination of gcc 4, g++4, g95 or gfortran for gcc 4, and python 2.5 support this package. Under windows, use of MINGW and MSYS are recommended.
You will need source of this kit
vcif_3_9Dec09.tar.gz http://downloads.sf.net/vcif/vcif_3_9Dec09.tar.gz
For the Python portions of the package, you will need
package snapshot URLnumpy-1.4.0rc1.tar.gz downloads.sf.net/cbflib/numpy-1.4.0rc1.tar.gz ply-3.2_7Dec09.tar.gz downloads.sf.net/cbflib/ply-3.2_7Dec09.tar.gz dREL-ply-0.5_9Dec09.tar.gz downloads.sf.net/cbflib/dREL-ply-0.5_9Dec09.tar.gz PyCifRW-3.3_6Dec09.tar.gz downloads.sf.net/cbflib/PyCifRW-3.3_6Dec09.tar.gz
The vcif3 kit contains copies of the last three kits in the cbflib_bleeding_edge directory. Assuming you have python and gcc installed, install numpy, ply and PyCifRW. Follow the instructions in the READMEs for those three kits. The general idea is to go into the kit directory and execute
python setup.py install
Both will require the same level of system access as was needed for your base python installation, so in many unix systems, you will need to do
sudo python setup.py install
instead.
In the cbflib_bleeding_edge directory, after choosing the appropriate Makefile or editing it for your system:
make install
to put cif2cbf, drel_prep.py, drel_yacc.py, drel_lex.py and drelc.py into your PATH
12/07/09 vcif_3_9Dec09.tar.gz First release candidate for vcif_3 using patch from J. Hester. Found logic error in the CBFlib call in trying to get a value for a tag already in the data file.
12/09/09 vcif_3_9Dec09.tar.gz Change dREL call logic to change any use of the tag being evaluated to have a different name by appending _local, so dREL will always try to generate the value with a method. Updated the vcif package to include a preprocessing script for the data passed on to the dREL compiler. This script (drel_prep.py) changes the occurrences of the item name we are validating to a local name, so as to ensure the dREL compiler produces calculations whether or not there is a given value. Currently, the validation runs against the exact value returned from the calculations. A certain degree of allowed error is to be implemented in the next release.
Note: The support of looped values runs a bit slow in the current implementation but will be improved in the next release.
CIFGET is a DDLm Dictionary Import Expansion Utility, which downloads remote DDLm dictionary trees and generates expanded dictionaries that can be used when reading, writing, or validating cif files.
make clean make all
You may then test the script with
make tests
A successful run should produce an empty test.diff file
If the tests are successful
make install
will install cifget_bin as cifget in the chosen bin directory. When that script is first run it will attempt to create a copy of the install kit in the directory
./.cifget
If the directory is still available for subsequent runs, it will be used for scratch files.
PERL NOTE: If you get a complaint about a missing version.pm when running the cifget, you may need your sysadmin to run
perl -MCPAN -e 'install version'
to complete your perl installation. See
www.directadmin.com/forum/showthread.php?t=27406
for use in the kit
./cifget [URL] [expanded dictionary name]
for use from a bin directory in your path
cifget [URL] [expanded dictionary name]
cifget downloads the requested dictionary and all other dictionaries that are referenced within it by following the _import tags attributes. A local-file mirror of the dictionary tree is created. All dictionaries are fed to the original Xchek in order to create an expanded dictionary file with all imported definitions.
Required perl modules :
File::Feth File::Basename URI::Split
For October 2008 DDLm:
./cifget http://arcib.dowling.edu/~bernsteh/.cifiucr/cif/ddlm/DDLm_30jan09/TEST_DIC/cif.dic cif_expanded.dic
For October 2009 CIF2:
./cifget http://vcif.sf.net/cif2_dicts/cif.dic cif_expanded.dic
Release 1 28 April 2009 E. Zlateva
This was the initial release of cifget
Release 1.1 25 October 2009 E. Zlateva, H. J. Bernstein
Revisions to adapt to changes for CIF 2
Refactoring for system installs
Release 1.1.1 8 December 2009 H. J. Bernstein
Patch to remove stray uncommented import list from
expanded dictionary
CIFtbx is a tool box of Fortran routines for manipulating CIF data. CYCLOPS2, cif2cif, cif2xml and cif2pdb are open source Fortran programs based on CIFtbx. CYCLOPS2 checks STAR data names against data name dictionaries. cif2cif copies a CIF to a CIF while checking data names against dictionaries and reformating numbers with esd's to conform to the rule of 19. A request list may be specified. cif2xml is a variant of cif2cif which produces XML output. cif2pdb converts mmCIF files to a PDB or WPDB.
For more information on installation and use of these packages, see www.bernstein-plus-sons.com/software/ciftbx
Work done in the second quarter (November 2007 -- January 2008): The activities for the project in this quarter consisted of a cleanup of the base level of CBFlib (to be formally released shortly as CBFlib_0.7.9), and a start on the changes to the CBFlib C-based parser and to the ciftbx Fortan based parser. The major work was on the design of the logic to handle parsing the "list", "array", "tuple" and "table" entities, and identification of most of the mappings from DDLm to DDL2 needed to create hooks to the existing validation code. In addition the main server for the software development activities in the lab (arcib) was upgraded, providing a more stable development environment with more disk space.
Work done in the third quarter (February 2008 -- April 2008): The activities for the project in this quarter consisted of new code in CBFlib, starting with the release in CBFlib 0.7.9 of the code done in the prior quarter. The work being done then was to be released during the next quarter in CBFlib 0.8 to have it ready for review in Osaka. The C code to parse the three DDLm bracketed constructs ( (), [], and {}) was completed and tested. The approach that was taken is to parse the nested constructs, removing comments and extra whitespace to produce a clean string that is easy to reparse for the important single level cases needed for dictionary validation, but which preserves the full tree structure when needed. In addition, Mr. Todorov has done the regular-expression based infrastructure to validate tag values against the single container DDLm types. The same structure would then be applied iteratively to the multivalues types. The code for both the parse and the validation infrastructure was available in the CBFlib_bleeding_edge module in the cbflib project on blondie.dowling.edu.
Work done in the fourth quarter (May 2008 -- July 2008): The activities for the project in this quarter consisted of new code in CBFlib and CIFtbx preparing for release 0.8 of CBFlib and release 4 of CIFtbx, both with code to read and validate data items against expanded DDLm dictionaries. The necessary code infrastructure to parse bracketed items and to test data items against the DDLm regular expressions is now in CBFlib. The regular expressions themselves are being validated and adjusted as necessary and CIFTEST2 is being expanded case by case to provide test cases for likely incorrect uses of the new DDLm types. The current pre-release of CBFlib 0.8 in the repository on blondie.dowling.edu is also being used as the CIF processing component in the current pre-release of RasMol to help ensure adequate testing against existing CIF data. We will post as complete a version of CBFlib 0.8 to the web as is ready just before the Osaka IUCr meeting later this month. The CIFtbx infrastructure to parse bracketed items and to test data items is also ready, but the testing will have to be at a simpler level than in the C-version, not using regular expressions, but simply testing on a coarser level similar to the one adopted for testing against the DDL2 data types. That code will also be posted to the web before the IUCr meeting.
The work on CIFtbx4 has raised an interesting issue of how not to break existing Fortran applications in the transition to DDLm. The problem is that older Fortran applications cannot accept strings of arbitrary length. To deal with this, we are following the approach already used in CIFtbx to deal with text fields -- delivering the text in chunks of limited length with a flag set true if there are more chunks to be examined. This will allow existing applications to view bracketed constructs as if they were text fields presenting one item per line. For applications that can be converted to be DDLm aware, new variables giving the depth into a bracketed construct and the index across on the current level should provide appropriate control.
Work done in the fifth quarter (August 2008 -- October 2008): After due consideration of the discussions in the COMCIFS meeting and the imgCIF workshop in Osaka in August, and further comparison of available DDLm software in September, we have settled on the version of DDLm as implemented in James Hester's PyCIFRW as the reference implementation against which we will maintain compatibility and are making the necessary changes in CBFlib and CIFtbx 4.
James Hester's parser was obtained from http://anbf2.kek.jp/downloads/drel-ply-ez_260908.tar.gz with the working copy for this project at http://blondie.dowling.edu/projects/drel-parser/. His parser uses PLY (Python Lex-Yacc), so PLY needs to be installed. It can be downloaded from http://www.dabeaz.com/ply/. The current version of PyCIFRW, which this parser is a part of, can be obtained at http://anbf2.kek.jp/CIF/.
Work done in the sixth quarter (November 2008 -- January 2009):
The major work for this quarter was the creation of cifget, a wget-like utility, by E. Zlateva to gather DDLm dictionaries to be imported from the web. Releases of additional software discussed below should be ready for release in the upcoming quarter.
As noted above, we now have a utility similar to wget to gather DDLm imported dictionaries from the web so that applications (especially Fortran applications) do not need to have web access. A tarball of this utility is available at http://arcib.dowling.edu/cifiucr/cifget_31Jan09.tar.gz. This utility is an important subsystem for both the Fortran code and the C-code. In addition we have been working on the handling of comments and whitespace in both the Fortran code and the C code, and the handling of methods for validation in the C code.
Work done in the seventh quarter (February 2009 -- April 2009):
The major work of this quarter was continuation of work on coupling of CBFlib to James Hester's PyCIFRW-based dREL parser, the creation of a Fortran interface for CBFlib, revisions to the whitespace and comment logic, and preparations to present the DDLm-based validation at the upcoming ACA meeting in Toronto. Some of the work reported was also funded by DOE and NIH.
As noted above, we have continued work on the coupling of J. Hester's dREL parser into CBFlib for method-based validation, have added a Fortran wrapper to CBFlib to allow full support for Fortran code, and have revised the handling of comments and whitespace in the C code.
Work done in the eighth quarter (May 2009 -- July 2009):
The major work of this quarter was presentation of the validation logic by E. Zlateva at the ACA meeting in Toronto at the end of July 2009. A copy of the poster is appended. Work continued on the coupling of CBFlib to James Hester's PyCIFRW-based dREL parser and revisions to the whitespace and comment logic. Some of the work reported was also funded by DOE and NIH.
As noted above, E. Zlateva presented a poster on the approach to validation at the ACA 2009 meeting. A copy of the poster is at http://arcib.dowling.edu/cifiucr/EZPoster_final_2UP.pdf, and we will extend it to a paper for the Journal of Applied Crystallography in the Fall. The coupling with J. Hester's dREL parser is working and the necessary code has been posted in both the SourceForge.net and blondie.dowling.edu cbflib project repositories. Work on the Fortran interface and handling of comments and whitespace has continued.
Work done in the ninth quarter (August 2009 -- October 2009):
We have three packages that are useful, but we have not yet completed some portions of the project. We had intended this to be the final report for the project, and much of the work has been completed, but for the reasons explained below, including problems found in regression testing and recent drastic changes in the DDLm specification, we will be providing one more report in December.
A lot of time in October we had intended to put into finishing code went into lively discussions both on the list and face-to-face with Nick Spadaccini during his visit to Dowling. While this effort delayed meeting the immediately goals of the project both because of the time involved and because of major changes in specifications we believe that they have contributed to a better result in the end.
Three packages were been prepared: cifget 1.1, which combines a perl script with a modified version of XCHEK to download and combine dictionaries following DDLm import directives, CIFtbx4 and Cyclops 2.1.5 to check data names in a file against a DDLm dictionary, and an updated vcif using DDLm dictionaries to check a CIF for errors. While each of these was useful, during October 2009, DDLm working group discussions have changed the specification of both dictionaries and data files. In addition, there were problems with the interaction between the DDLm parser from James Hester and dictionaries. Those problems still needed to be resolved. The then current releases had been partially adapted to the DDLm changes and were ready for the updated parser when it is released. We hoped to have more useful releases in a few weeks. No funding by the IUCr beyond the contract amount was be needed for this work to continue. We apologizes for the packages not being more complete at the end of October.
Work done in the final period (November 2009 -- 9 December 2009):
Three packages were prepared: cifget 1.1, which combines a perl script with a modified version of XCHEK to download and combine dictionaries following DDLm import directives, CIFtbx4 and Cyclops 2.1.5 to check data names in a file against a DDLm dictionary, and an updated vcif using DDLm dictionaries to check a CIF for errors. Thanks to cooperation and assistance from James Hester and programming efforts by E. Zlateva, problems in communications between the dREL parser and CBFlib have been resolved. The project was complete.
Project Status 9 december 2009.
This work depends on access to appropriate dictionaries. The necessary code for downloading DDLm style dictionaries and merging them into expended dictionaries has been implemented in the cifget package available on SourceForge. The latest kit can be downloaded from
http://downloads.sf.net/cifget/cifget-1.1.1.zip
or
http://downloads.sf.net/cifget/cifget-1.1.1.tar.gz
This kit includes an updated XCHEK and appears to be complete and stable working with either August 2008 DDLm dictionaries or with draft October 2009 CIF2 dictionaries. The minor bug of a stray import directive line reported last month was corrected.
The Fortran specification was implemented in CIFtbx 4. CIFtbx 4 is working and is available from the CVS repository at
http://blondie.dowling.edu/plugins/scmcvs/cvsweb.php/CIFtbx_4/?cvsroot=ciftbx
This version does a reasonable CYCLOPS name check of a document against either a 2008 DDLm style core dictionary or a draft CIF2 style core dictionary. The cif2cif utility is now able to copy 2007 and 2008 DDLm dictionaries and data files.
Work by E. Zlateva and H. J. Bernstein
Project Status 1 November 2009.
This work depends on access to appropriate dictionaries. The necessary code for downloading DDLm style dictionaries and merging them into expended dictionaries has been implemented in the cifget package available on SourceForge. The latest kit can be downloaded from
http://downloads.sf.net/cifget/cifget-1.1.zip or http://downloads.sf.net/cifget/cifget-1.1.tar.gz
This kit includes an updated XCHEK and appears to be complete and stable working with either August 2008 DDLm dictionaries or with draft October 2009 CIF2 dictionaries. At present we are aware of only one minor bug: On assembling an expanded dictionary, one data value from an import directive is included in raw form instead of being commented out or omitted as it should be. At present we comment out the offending line by hand after the cifget run. We are working on correcting the problem.
The Fortran specification was implemented in CIFtbx 4, which is available in its current state (except for the small portion still being debugged) from the CVS repository
http://blondie.dowling.edu/plugins/scmcvs/cvsweb.php/CIFtbx_4/?cvsroot=ciftbx
This version does a reasonable CYCLOPS name check of a document against either a 2008 DDLm style core dictionary or a draft CIF2 style core dictionary. Corrections to the detailed parse to enable full cif2cif support for DDLm and CIF2 data files is in progress.
Work by E. Zlateva and H. J. Bernstein
Project Status 1 August 2009.
Work on the Fortran interface and handling of comments and whitespace continues. The simplified logic for whitespace works well for read. We are redoing the write logic to use an internal tree structure for all bracketed constructs in place of the pure ascii string version we had been using to achieve better performance.
Work by H. J. Bernstein
Project Status 2 May 2009.
Full implementation of DDLm support on the Fortran side requires an interface between Fortran code and C-code. Kay Diederichs pointed out the recent availability of a new standardized interface between Fortran 2003 and C, which has been back adapted to most Fortran 95 compilers. We have created a full set of C-routines suited to this interface for the entire CBFlib API. The same code can be used directly with older Fortrans, such as g77, that do not have the new interface specification, but which use gcc-style C calling sequences. The code works and has been posted to the sourceforge CBFlib svn repository. We are still working on the documentation and will put out a full release before the ACA meeting. While this will leave some gaps in Fortran support for environments in which neither a modern Fortran 95 compiler nor a C-calling sequence-compatible Fortran 77 compiler is available, we doubt this will be a major problem, but if there are critical older systems with those characteristics for which adaptation is needed, we will be happy to contribute to such an effort even after the current project is over.
Work by H. J. Bernstein
Project Status 2 February 2009.
The most important work for this period was the creation of CIFGET. CIFGET is a DDLm dictionary import expansion utility, which downloads remote DDLm dictionary trees and generates expanded dictionaries that can be used when reading, writing, or validating cif files. It is a shell script, called from the command line in the following format:
./cifget [url] [expanded dictionary destination]
where [url] is the web location of the source dictionary that needs to be expanded.
CIFGET runs a perl script, called cget, which downloads the requested dictionary file. cget is largely modeled on the logic of GNU wget. It mimics the wget recursive download logic, which allows it to follow links and download the dictionary tree in the same way that wget would download an html files tree. To avoid infinite recursion between files that reference each other, after a url's contents have been downloaded, the url is saved into an array. The array is looked at before each download to check if the current url is already in the array. cget parses a dictionary file and finds import tag attributes, using regular expressions. The urls are extracted from the import tags. In addition to downloading the dictionary tree, cget also copies all dictionaries to a working dictionary directory, where Xchek will run to create an expanded dictionary.
CIFGET uses Xchek to create an expanded dictionary from the local-file dictionary mirrors. It takes as a source dictionary the dictionary that was originally requested by the user from the web. Xchek is run from the working dictionary directory, and the expanded dictionary file is saved in the user-specified location. During testing, builds of some expanded dictionaries were prevented by a typographical error in one of the dictionaries on the IUCr web site. Therefore, testing of this package should be done against
http://arcib.dowling.edu/~bernsteh/.cifiucr/cif/ddlm/DDLm_30jan09/TEST_DIC/
until the typo is corrected. The change needed on the iucr web site is to core_struc.dic on the IUCr site:
lines 1682-1683:
_import_list.id (('Att','Cromer_Mann_coeff','com_att.dic'],
['Val','Cromer_Mann_b4','com_val.dic'))
should read
_import_list.id (('Att','Cromer_Mann_coeff','com_att.dic'),
('Val','Cromer_Mann_b4','com_val.dic'))
In this reporting period, work continued on the Fortran code focusing on the design for handling white space and comments, especially in dealing with bracketed constructs. The existing scheme in the Fortran code in CBFtbx for dealing with comments and white space has been event driven – delivering comments and white space on the fly when scanning forward. White space is recorded as status information associated with each token in the form of a column number. This scheme is being retained and supplemented by the following scheme, derived from a scheme discussed on the imgCIF list in September 2008, but using DDLm bracketed constructs to carry the information more compactly and more intuitively:
((((coltp,prologt,colt), (colvp,prologv,colv), (((cole1p,prologe1,cole1),(cole1e,epiloge1)),…), (colve,epilogv))
where coltp is the column number at which any comment before the tag begins, prologt is any
comment before the tag, colt is the column at
which the tag begins, colvp is the column at which any comment before the tag value begins,
prologv is any comment before the tag value, colv is the column at which the column value
begins, colve is the column at which any comment after the tag value begins and epilogv is any
comment after the tag value. The elements in the middle provide the same information for a
bracketed construct. If the tag is part of a loop, then ws_
Work by E. Zlateva and H. J. Bernstein
Project Status 1 November 2008.
In this period we have been working on the write logic for the bracket-delimited constructs in CIFtbx 4 and, as an unexpected consequence of the spirited exchange with James Hester on the best way to handle magic numbers for imgCIF, have worked out what seems to be a reasonable solution to the event-based parsing of bracket constructs with embedded comments. The approach in the fortran-based software is as follows.
The software will offer the option of parsing the bracketed constructs in either of two alternate ways:
White space other than comments will not be delivered as an event. Instead it will be marked by column numbers of the elements that are delivered. There are cases in which the detailed information about elements and comments is critical, but for such activities as extracting and copying portions of DDLm CIFs transfer line-by-line is more efficient.
It is not practical to embed full web access into the Fortran code. Therefore, the logic for dictionary expansion in the Fortran code is being based on local files, with the gathering of dictionaries from remote locations being handled by a separate C program based on wget (see below).
Work by H. J. Bernstein
Project Status 2 August 2008.
In the prior status report (below), we wrote,
"The lessons learned in the coding for C have caused us to rethink the code currently in Fortran. In the past we have preserved all comments in a CIF while doing the Fortran parse, so that the original CIF with all comments can be recreated even when reformatted. As we have discovered in the C code, it is important to strip the embedded comments in the bracketed constructs, and it also may be necessary to have a full tree-expansion of nested bracketed constructs. Maintaining a full tree structure and three-fold replication of all the bracketed constructs is workable in C and even in Fortran-95, but is a non-trivial change in Fortran-77 if reasonable performance is to be achieved. We are exploring alternatives and will resolve the matter in the next quarter."
We have explored the alternatives, which for Fortran-77 would require extending the current use of direct access files to store the tree. The performance hit was too great and we plan not to bring the full tree-structure into the Fortran version, but to stay with the less-demanding event-based logic discussed above. We should discuss this in Osaka.
Work by H. J. Bernstein
Project Status 1 May 2008.
The lessons learned in the coding for C have caused us to rethink the code currently in Fortran. In the past we have preserved all comments in a CIF while doing the Fortran parse, so that the original CIF with all comments can be recreated even when reformatted. As we have discovered in the C code, it is important to strip the embedded comments in the bracketed constructs, and it also may be necessary to have a full tree-expansion of nested bracketed constructs. Maintaining a full tree structure and three-fold replication of all the bracketed constructs is workable in C and even in Fortran-95, but is a non-trivial change in Fortran-77 if reasonable performance is to be achieved. We are exploring alternatives and will resolve the matter in the next quarter.
Work by H. J. Bernstein
Project Status 1 February 2008.
In order to validate CIFs against DDLm dictionaries, we need to parse those dictionaries. Xchek has a minimal partial parse of DDLm, but a full check is best done using a full parse. To upgrade from the existing DDL1 and DDL2 parsers, code has to be added to handle the new syntax for values, which, in addition to the existing unquoted words, single and double quoted strings and "\n;" quoted texts fields in DDL1 and DDL2, includes the new "{ item, item, ...}", "[ item, item, ...]", and "( item, item, ...)" constructs, allowing for the possibility that these mechanisms may be nested, requiring some uses of stacks to save state. We are adding new code and variables to CIFtbx to support these new constructs. This raises an interesting issue in the handling of existing DDL1 and DDL2 CIFS - should the new constructs be allowed in parsing such CIFS. Allowing them helps to encourage the community to move up to DDLm, but weakens strict validation of CIFs for compliance with DDL1 and DDL2. The compromise we propose is to make acceptance of the new constructs the default and to add a control flag to allow applications to revert to strict checking against DDL1 and DDL2.
We hope to have the first pass of these parser upgrades to ciftbx for these constructs ready for distribution by mid-February and the upgrade to CBFlib ready for distribution by the end of February. The code will be posted on the blondie.dowling.edu, our GFORGE server.
Work by H. J. Bernstein and G. Todorov.
Project Status 31 October 2007.
Prior to the actual start of the project, Syd Hall had provided a release of Xcheck. That kit was unpacked and reviewed and permission was requested from Hall, Spadaccini and Westbrook to move the code under the GPL. G. Todorov set up a website at blondie.dowling.edu/projects/ddlm that contains all available information on the specifications, prototype dictionaries etc. Blondie is a collaborative development environment using GFORGE. Much of the validation functionality of Xcheck is closely related to the vcif2 validation code in CBFlib (see below).
We believe, after a review of the currently available software, that xcheck.f and indic.f are the only available tools for parsing ddlm and are a good starting point. They do not implement dREL yet.
Work by H. J. Bernstein and G. Todorov.
Project Status 9 December 2009.
We have prepared the current release of vcif 3
http://downloads.sf.net/vcif/vcif_3_09Dec09.tar.gz
vcif 3 is a special mode of execution of the program cif2cbf from the CBFlib package that makes use of the dREL parser by James Hester.
Ms. Zlateva changed dREL call logic to convert any use of the tag being evaluated to have a different name by appending _local. That way dREL will always try to generate the value with a method.
The cbflib_bleeding_edge kit included in the vcif 3 kit has been aligned to the CBFlib source code repository.
Work by E. Zlateva and H. Bernstein
Project Status 1 November 2009.
Ms. Zlateva has prepared the current best release of the vcif code at
http://arcib.dowling.edu/~zlatevae/vcif_Oct27.tar.gz
The contents are as follows: Folders:
c2c_tests - the latest CIFTEST2 adds against which I have been testing the new code
ddlm_dicts_aug09 - dictionaries prior to changes
ddlm_dicts_oct09 - dictionaries with {}'s
doc - dREL specs
drel-non-dictionary-parse - the first drel parser kit that we received,
which parses and executes simple drel like this:
total = 0
othertotal = 0
do jkl = 0,20,2 { total = total + jkl
do emm = 1,5 { othertotal = othertotal + 1 }
}
end_of_loop = -25.6
./python gu.py total
executes this method and outputs the result in method_output in the same folder. The result should be 110.
dREL-ply-0.5-original - the latest dREL parser kit that James Hester has sent, which is still not part of the PyCifRW distribution. It has a setup.py, which has not been added to the Makefiles since it is anticipated that it will not be part of the package when PyCifRW-4.0 is released.
dREL-ply-0.5 - a test version of the same kit with a modified script to compile method
expressions in a modified dictionary. We modified the dictionary so as to avoid conspicuous
[] constructs that fail because the
grammar="DDLm" tag
which is supposed to fix these is not being recognized yet.
In cbflib_bleeding_edge, the drel_gu.py script is included with the source, and it should be able to run once ./setup.py is run in dREL-ply-0.5-original Currently, we have validation rules set up to see what values have associated methods in the dictionary, pass them on to the drel parser and wait for a result. If the result matches the value, then the value is valid. If the result and the value don't match, cbf_warning is issued to complain that the value is different from the method-generated value. For a missing value, if there is a method to generate it, it is passed on to the drel parser, and the result is taken as the value.
drel_gu.py - The most important parts
//declarations for the drel lexer and parser self.lexer = drel_lex.lexer self.parser = drel_yacc.parser //declaration for the dictionary to use self.testdic = CifFile.CifDic(dictionary, grammar='DDLm') //declaration for the testblock.In the next statement, the cbf_handle is the cif handle being read and validated and the datablock_name is the name of the datablock currently being processed
self.testblock = CifFile.CifFile(cbf_handle) [datablock_name]
Then you need to declare the drel target_id (or what it is that the method is supposed to look for and return). For example, in a method for _cell.volume, _cell.volume is the target_id
self.parser.target_id = valuename //The expression itself also needs to be passed on to the drel parser res = self.parser.parse(expression + "\n", lexer=self.lexer) //And finally, we can make an executable function realfunc = drel_yacc.make_func(res, "myfunc", valuename) //Execute it exec realfunc //And retrieve the result realres = myfunc(self.testdic,self.testblock)
We have 5 parameters that we need to pass to the python code - the dictionary, the cif file, the datablock, the tag for the value, and the expression. For now, the dictionary (cbf_dictionary), the file (cbf_data), and the expression (method_expression) are written to files and read by the parser. The datablock and tag can be passed on to argv[]. The conversion from file communications to IPC has been designed but not yet implemented.
Work by E. Zlateva and H. Bernstein
Project Status 1 August 2009.
As shown in the ACA 2009 poster the coupling of C-based CBFlib logic to J. Hester's dREL parser is working, and we are also extending it in some NIH-funded work to allow a more general function-definition-function-call model. J. Hester has advised us to wait for the next PyCIFRW release in which the dREL parser will be a full part of PyCIFRW. If that full release proves suitable, we will adopt it.
We are still using the file system for communications between C and Python to simplify debugging, but the logic for use of pipes will be substituted before closeout of the project in October.
Work by E. Zlateva and H. Bernstein
Project Status 2 May 2009.
Now that we had a working parser for the new DDLm constructs and CIFGET to gather web-based dictionaries, the remaining major piece of the project was to evaluate methods in the course of validation. For convenience in debugging, the initial coupling had been via files and a python script to evaluate an expression. In May 2009 the script took a method expression and the name of the relevant data item on which it ran. The script parsed the expression using James Hester's code and calls drel_yacc.make_func. The execution of this function produced the result.
The results of the evaluation were written to a file. In CBFlib, we created a function
int cbf_drel(const char *itemname, const char *expression)
that output an expression to a file and ran the python script via a system call. The function was called when the _method tag was found during the reading of a cif. The value of the expression was taken from the "_method_expression" attribute. After each call of the function, the value, calculated in the method with which the function was called, would be sitting in the method output file for easy access. Thus, this value could be used in the place of a missing value, or it could be compared with a present value for validation.
Ms. Zlateva was working on eliminating the need for files by using pipes to process a method expression. At that point, she had the feed of the method expression directly to the python script working. The output was still to a file, but it was intended that it should have been upgraded to a pipe shortly.
The system we started in the prior quarter had been evaluated, simplified and extended. We removed one layer of bracketing. The new, and hopefully close to final, specification was as follows:
1. The prefix ws is reserved for special whitespace categories and tag.
2. For any given tag, <tag>, in a category, <category>, whitespace and comments for the tag and its value(s) will be given by <category>.ws_<tag> (for DDL2 style tag naming) or ws_tag (for DDLm and DDL1 style undotted naming). In all cases this tag will be part of the same category as <tag>
3. For any given category, <category>, whitespace and comments for the category as a whole will be given by _ws__<category>.ws_ (note the double underscore). This category ws__<category> is distinct from <category>.
4. For any given data block or save frame, whitespace and comments for the data block or save frame as whole will be given by _ws_.ws_
5. Whitespace and comments may be given as a prologue (intended to be presented before the element), zero or more emlogues (intended to be presented between the initial sub-element, e.g. “loop_” or the tag name and the rest of the element) or an epilogue (intended to be presented after the element as a whole). We use the term “-logues” for prologues, emlogues and epilogue The –logues for an element may be given as a single string, in which case only an epilogue is intended or as a bracketed construct (using parentheses) with multiple –logues. If only one –logue is given, it is the epilogue. If two –logues are given, the first is the prologue and the second is the epilogue. If more emlogues are given than there are breaks in the element, the extra emolgues are prepended to the epilogue. The emlogues for a bracketed construct may also be bracketed constructs to provide whitespace and comments within bracketed constructs.
6. A prologue, emlogue or epilogue is a string of one or more lines starting with a optional colon-terminated column position for that line, followed by optional whitespace, followed by an optional comment. If no column position is given the whitespace begins at the next syntactically valid location. If a column position is given, then, on writing, a new line will be started if necessary to align to that column. A column position with no whitespace and no comment simply provides alignment for the next sub-element. If the end of a –logue line is a comment, whatever follows will be forced to a new line.
The following example may help in understanding this approach. Consider the following CIF fragment
#234567890123456789012345678901234567890123456789
#second line of leading comments
data_block
########
# CELL #
########
_cell.length_a 100
_cell.length_b 100
_cell.length_c 100
_cell.angle_alpha 90.
_cell.angle_beta 90.
_cell.angle_gamma 90.
Internally in CBFlib, if parsable whitespace is selected then internally in CBFlib, it
will be treated as
data_block
_ws_.ws_
(
;\
1:#234567890123456789012345678901234567890123456789
#second line of leading comments;
:5\
; ,)
_ws__cell.ws_
(
;\
1:########
# CELL #
########
\
; ,)
_cell.length_a 100
_cell.length_b 100
_cell.length_c 100
_cell.angle_alpha 90.
_cell.angle_beta 90.
_cell.angle_gamma 90.
_cell.ws_length_a (“1:”,”23:”,)
_cell.ws_length_b (“1:”,”23:”,)
_cell.ws_length_c (“1:”,”23:”,)
_cell.ws_angle_alpha (“1:”,”24:”,)
_cell.ws_angle_beta (“1:”,”24:”,)
_cell.ws_angle_gamma (“1:”,”
This will add the same control over layout of a CIF that we already have in the Fortran CIFtbx.
Work by E. Zlateva and H. Bernstein
Project Status 2 February 2009.
We have prepared the hooks in CBFlib to use James Hester's PyCIFRW-based DDLm parser from within CBFlib and will be working further on that code in the upcoming quarter using the expanded dictionaries provided by CIFGET. Work has continued in CBFlib on the handling of comments and whitespace in the manner described above.
Work by E. Zlateva and H. Bernstein
Project Status 1 November 2008.
As noted above, as a result of the COMCIFS discussions in Osaka and subsequent inquires by email, we have settled on James Hester's PyCIFRW-based DDLm implementation as the reference implementation for our work. Work has begun on prototype integration with CBFlib via system calls as an interface to python when methods need to be executed. Faster IPC calls will be substituted later. A similar approach to the one being followed in CIFtbx is being followed for the management of bracketed constructs with embedded comments in CBFlib. Comments at higher levels are being handled with the "ws_" constructs discussed in the interchange with James Hester subsequent to the COMCIFs meeting.
In order to deal with import directives both for CBFlib and CIFtbx, E. Zlateva is working on a separate utility based on wget to gather local copies of the necessary dictionaries recusively with loop breaking in the same way that wget handles html file links. Once the dictionaries are downloaded, CBFlib will parse _import tags and copy required definitions into an expanded dictionary.
Version 0.8 of CBFlib was released prior to the IUCr meeting in Osaka. The current work is being integrated into the CBFlib_bleeding_edge module that will become the 0.9 release before the end of this calendar year,
Work by E. Zlateva and H. Bernstein
Project Status 2 August 2008.
In this quarter the paper on vcif2 was accepted by the Journal of Applied Crystallography [G. Todorov and H. J. Bernstein , "VCIF2: extended CIF validation software," J. Appl. Cryst. (2008). 41, 808-810]. Checking of data values against DDLm dictionaries in CBFlib began. New test cases have been added to CIFTEST2. The external specifications of the methods checking are to be discussed at the COMCIFS meetings in Osaka, and we expect to implement what is decided during the coming quarter.
Work by H. J. Bernstein, G. Todorov, E. Zlateva, N. Darakev.
A note on the DDLm import logic and dictionary layering: There will be a discussion of DDLm import logic and dictionary layering at the Osaka meeting. The view has been presented that dictionaries should be handled in small segments with multi-pass real-time web-loading of multiple dictionaries to compose on virtual dictionary. An alternative approach is for applications to use complete, "expanded" dictionaries. There are problems with both approaches. In the first case, there are serious platform and performance issues. In the second case, there are serious issues of dictionary synchronization. In this project we are addressing these issues in a modular way. We have started modifications to the open source web-page mirroring program wget to allow it to handled web-based dictionary caching into local mirrors. We are adopting the logic in the original Xcheck to then work from local-file dictionary mirrors to local expanded dictionaries. This should allow appropriate local choices in balancing the performance and synchronization issues and should make it much easier to support a wide range of platforms. We will discuss this further in Osaka.
Project Status 1 May 2008.
In this quarter a paper on vcif2 was submitted to the Journal of Applied Crystallography. The draft is being revised in response to reviewer comments. The necessary infrastructure was added to CBFlib for checking of data values against DDLm regular expressions. As a practical matter, code to automatically handle methods given in DDLm dictionaries will have to be based either on Python or Java. Python has long been an opne source language. Inasmuch as java has recently become open source, we are exploring both alternatives.
Work by H. J. Bernstein and G. Todorov.
Project Status 1 February 2008.
Xcheck and Cyclops prepare lists of names from the dictionaries and compare them to the names in documents. In the course of doing that the syntax of the dictionaries are checked, but not the syntax of the document. In vcif3 (vcif2 adapted to DDLm and dREL) we will be checking the syntax of the document as well. Most of the checking is conceptually identical tp the vcif2 checking, but the information is conveyed by different tags or in a slightly different way. The major significant difference is that DDLm and dREL allow methods to algorithmically state the relationships among values of different tags. This last aspect is most easily handled in a C-like context, such as CBFlib (the base for the current vcif2) This requires a rework for CBFlib not only of the parser (a task similar to what is being done to the ciftbx parser) but of the data structures to support efficient access to the elements of lists, arrays, tuples and tables. We have begun the parser changes. The data structure and method interpretation changes will almost certainly extend into the second year of the project. In this quarter the current vcif2 has been more fully documented and the design of the new data structures began.
Work by H. J. Bernstein and G. Todorov.
Project Status 31 October 2007.
As noted above, a website was set up at blondie.dowling.edu/projects/ddlm that contains all available information on the specifications, prototype dictionaries etc.
The original material provided and the October update were both added to the site on blondie. G. Todorov reviewed this material and it appears that s full ddlm dictionary is not available yet, which makes the implementation of the specifications more difficult. However dREL ( the method definition language ) is well defined and is a good starting point for implementation. The main differences of ddlm from ddl1 and ddl2 and test cases can be generated easily following the provided specifications.
G. Todorov applied the vcif2 validation to the new dictionaries. CBFlib reads the dictionaries without breaking, which means that the major part of the validation provided by xcheck already exists. That will be used as a base.
Work by G. Todorov.
Project Status 1 August 2009.
The failing UPS systems from last quarter were upgraded with fresh batteries. Late in this quarter, SourceForge changed their file-release and wiki support. We are making the necessary adaptations. There were no other significant issues or activity on infrastructure in this quarter.
Project Status 2 May 2009.
Several UPS systems failed on battery age. Computers were switched to other UPS systems and battery replacement in the older UPS systems is in progress. There were no other significant issues or activity on infrastructure in this quarter.
Project Status 2 February 2009.
No significant issues or activity on infrastructure in this quarter.
Project Status 1 November 2008.
The infrastructure has been functional this quarter despite some serious problems with the sourceforge server and some minor problems with fan and UPS failures.
Work by H. J. Bernstein and N. Darakev
Project Status 2 August 2008.
In this quarter the remediation of hardware problems that were addressed in the previous quarter were completed. The backup server was replaced with a system with 3 TB of storage. In the previous quarter the CBFlib CVS was replicated from the GFORGE server to sourceforge in the cbflib project, and that sourceforge CBFlib is now heavily used. As noted in the prior report. By the time of the Osaka meeting we expect to complete the move of the code and web pages for this project to sourceforge for the convenience of the community. The primary development activities will continue on the GFORGE server.
Work by H. J. Bernstein, G. Todorov and N. Darakev
Project Status 1 May 2008.
In this quarter additional hardware problems arose. The GFORGE server was replaced and the necessary new disks to replace the file backup server are being tested now. The replacement of the backup file server should be completed in the next few days. A vcif.org domain was purchased, a sourceforge vcif project started, and the vcif2 validation web page was placed at www.vcif.org. As vcif3 matures, its features will be added at that site. The CBFlib CVS has been replicated from the GFORGE server to sourceforge in the cbflib project. By the time of the Osaka meeting we expect to have the code and web pages for this project all available on sourceforge for the convenience of the community. The primary development activities will continue on the GFORGE server.
Work by H. J. Bernstein and G. Todorov
Project Status 1 February 2008.
In this quarter the main development machine for our lab, arcib.dowling.edu was replaced with a faster, more reliable machine with more disk space.
Work by G. Todorov and D. O'Brien.
For August 2004 through August 2006, with funding from the International Union of Crystallography, we worked on improvements to CIFTEST, vcif, and CIFtbx as well as a new line folding package to help provide new and upgraded CIF software to facilitate publication in IUCr journals. The work drew and and intersected with other work support by grants from the U.S. National Science Foundation and the U. S. Department of Energy. The result was a set of packages of open source software:
Website: http://www.bernstein-plus-sons.com/software/ciftest
Download: CIFTEST_2.1.tar.gz
Website: http://arcib.dowling.edu/vcif
Download: vcifHTML.tar.gz CBFlib_0.7.6.1.tar.gz CBFlib_0.7.6_Data_Files.tar.gz
Website: http://www.bernstein-plus-sons.com/software/ciffold
Download: CIFFOLD_0.5.4.tar.gz
Website: http://www.bernstein-plus-sons.com/software/ciftbx
Download: ciftbx_3.0.4.tar.gz
In order to support the evolving needs of the community for new and upgraded CIF software to facilitate publication in IUCr journals, we have established a CIF software support effort at Dowling College under the direction of Herbert J. Bernstein, a major CIF software developer, leveraging the infrastructure already in place in Dr. Bernstein's lab for bioinformatics software development.
G. Todorov presenting poster on vcif2 and G. Todorov and G.
Darakev talking to R. Grosse-Kunstleve about CBFlib at ACA 2006 in
Hawaii.
Foils of presentation on the project for XX Congress IUCr, Florence, IT, 23-31 August 2005.
I. Awuah Asiamah, K. Mitev and G. Todorov preparing the
presentation for IUCR 2005 and G. Todorov and K. Mitev presenting at IUCR2005
MS 86.
Among the sub-projects are:
Current IUCr release: journals.iucr.org/iucr-top/cif/developers/trip/ by Brian McMahon 10 May 2000
Project Status 3 September 2006: The work done on the other packages, including CIFtbx 3.0.4 and CBFlib 0.7.6.1, has been integrated into a new release CIFTEST 2.1. The test package illustrates three important approaches to comparing output CIFS. In order to resolve the variations in the handling of one-row loops, for the cif2cif section CIFTEST uses cif2cbf as a filter to resolve the ambiguity as discussed in the prior report. In order to resolve the variations in handling of leading zeros, for the cif2pdb section CIFTEST uses sed scripts to deal with that issue. The release for CIFTEST is on the project web site and at:
http://www.bernstein-plus-sons.com/software/ciftest
Work by H. J. Bernstein based on the kit produced by G. Todorov.
Project Status 1 June 2006: The management of one-row loops has been further investigated and is proving both interesting and complex. In most cases, one-row loops appear to be best handled as tag value pairs, except when presented with a significant number of matrix or vector elements. Such stylistic variations will require a replacement of straight comparison of tests cases with a pass through a program like cif2cbf to get to a uniform presentation. Work by H. J. Bernstein.
Project Status 1 April 2006: In the course of the new work on vcif (see below), some new test cases have been generated, including tests for the existing binary format. These will be integrated with ciftest when the vcif release is ready. One of the interesting issues is the handling of a one-row loop either as a loop or as a series of tag-value assignments. Work by H. J. Bernstein.
Project Status 1 February 2006: The framework for CIFTEST2 created by G. Todorov, making B. McMahon's trip package more general, has been extended to include the vcif test cases. Environment variables have been used to make it easier to customize the choices of programs to be used for the tests. The first release candidate of CIFTEST2.0 is at:
http://arcib.dowling.edu/~bernsteh/software/CIFTEST2.0
Work by G. Todorov and H. J. Bernstein.
Project Status 1 December 2005: The framework for CIFTEST2 has been created, which allows a modular testing of multiple cases that involve a variety of CIF packages. The CIFFOLD test cases have been integrated into CIFTEST2 shortly. Testing has begun and a copy has been made available to B. McMahon. Formal release is expected in the next reporting period. Work by G. Todorov.
Project Status 1 October 2005: New test cases have been created as part of the work on CIFFOLD, and released in the CIFFOLD kit. These cases will be back integrated into CIFTEST shortly.
Project Status 1 August 2005: Continuing from the prior reporting period the intensive work on CIFFOLD has resulted in interesting new test cases for long lines and for cases that might break parsers. I. Awuah Asiamah is cleaning up and organizing these test cases. Some of the test cases have become part of the "make tests" section of CIFFOLD already. Work by I. Awuah Asiamah, K. Mitev and H. J. Bernstein.
Project Status 28 May 2005: The intensive work on CIFFOLD has resulted in interesting new test cases for long lines and for cases that might break parsers. These cases need to be cleaned up and revised to avoid intellectual property issues, but will become a significant contribution to the test suite. Work by I. Awuah Asiamah and K. Mitev.
Project Status 30 Jan 2005: The script, runtest, was revised to use new vcif and to handle command line arguments for all test cases. (see ciftest_1_2) New test cases (see below) not yet incorporated to ensure that current test behavior will be reproduced using the evolving vcif (see below).
Project Status 5 Dec 2004: S. Louris has prepared initial test cases for CIF 1.1 line folding (see ctc001).
Current IUCr release: www.uk.iucr.org/iucr-top/cif/software/vcif/index.html. The IUCR has re-released vcif under the GPL. That version has been posted at arcib.dowling.edu/software/vcif/ as vcif 1.1.
Project Status 3 September 2006: The work on vcif and CIFTEST was presented as a poster by G. Todorov at the summer 2006 ACA meeting in Hawaii. The poster was well-attended with lively discussions.
As discussed in the prior report, we settled on CBFlib as the base both for the vcif2 syntax checking and for parent-child and category checking. Type checking is hard- coded, rather than dictionary regular expression driven. The dictionary regular expressions were just not solid enough. Mr. Todorov has packaged the checking in the form of a web page and php script at
Work by H. Bernstein, G. Todorov and G. Darakev with consultation by K. Mitev.
Project Status 1 June 2006: The abstract for the summer 2006 ACA meeting in Hawaii on the work on vcif and CIFTEST has been accepted for a poster presentation.
We have settled on CBFlib as the base both for the vcif2 syntax checking and for parent- child and category checking. Type checking is hard-coded, rather the dictionary regex driven. The dictionary regex expressions are just not solid enough. The alias code was taken from release 0.7.5 to 0.7.6 and works. As noted above, we are reworking the line folding in CBFlib. The hash-table performance seems to be good enough to make the use of a formal database package unnecessary to achieve the goals of this project for most realistic cifs, and the use of CBFlib under the GPL or LGPL should address the concerns Brian raised.
Work by H. Bernstein in consultation with G. Todorov, K. Mitev. Some CBFlib testing by A. Hammersley and J. Wright at ESRF.
Project Status 1 April 2006: An abstract for the summer 2006 ACA meeting in Hawaii has been prepared on the work on vcif and CIFTEST.
As noted in the prior bi-monthly report, Mr. Mitev proposed an interesting approach to validation involving the use of a postgres database for the layered dictionaries to then produce a ghost schema against which to validate CIFs. Brian McMahon raised some concerns about making the system dependent on auxiliary software that not all users might have available. We remain convinced that the key to full CIF validation, especially for DDL2-based CIFS, is to make use of the rules for relational databases, which is most efficiently done with an SQL server. However, Brian McMahon's concerns are valid. Therefore we are structuring the new code to allow for use of either an internal or an external database. Handing the second option, however, requires a significant upgrade to the API we are using (derived from the CBFlib API), which we have begun and which is also bearing fruit for imgCIF. We have extended the CBFlib parser to support save frames so that DDL2 dictionaries may be read, and have added hash table-based searches similar to the ones we use in CIFtbx to achieve acceptable performance. For your information a current snapshot of the parser bison grammar is available at
http://arcib.dowling.edu/~bernsteh/.cifiucr/cbf_stx_1Apr06.y
We have also made arrangements with SSRL to bring the CBFlib API under the GPL or the LGPL as alternative licenses. The parse of both DDL1 and DDL2 dictionaries is working, and we are now working on the dictionary layering code and denormalization of the item and category attributes from scattered tables to a smaller number of larger hash-indexed tables. The first tests of the dictionary parse code will be of tag aliasing. This will be done in collaboration with colleagues at ESRF who are interested because of the utility of this feature in handling older deprecated imgCIF categories in newer CBF files. We hope to incorporate the new parser and dictionary handling into the CBFlib 0.7.5 release to be posted to the web shortly, so that this critical code will get more extensive testing. Work by H. Bernstein in consultation with G. Todorov, K. Mitev. Some CBFlib testing by A. Hammersley and J. Wright at ESRF.
Project Status 1 February 2006: Mr. Mitev has proposed an interesting approach to validation involving the use of a postgres database for the layered dictionaries to then produce a ghost schema against which to validate CIFs. This appears to be the best option for implementing the additional validation needed for vcif2. This will be investigated further in the next reporting period.
Project Status 1 December 2005: There is nothing new to report on vcif at this time.
Project Status 1 October 2005: There is nothing new to report on vcif at this time.
Project Status 1 August 2005: Since I. Awuah Asiamah has become the primary tester for the packages in this project, he is taking over responsibility for vcif. If the visa issues can be resolved in a timely manner, I. Awuah Asiamah will attend the IUCr Congress in Florence (as will K Mitev,G. Todorov and H. J. Bernstein) which will be helpful in discussions relative both to CIFTEST and vcif.
Project Status 28 May 2005: Work done recently on CIFFOLD includes new code for syntactic validation, that is being considered for incorporation into vcif. This code is able to provide useful analyses even when the data block is not specified. Work by K. Mitev.
Project Status 30 Jan 2005: Changes made to bring code closer to current ANSI-C conventions and to avoid conflicts in building for some platforms, such as MS Windows. Command line option added to specifiy CIF level and command line processing revised to allow both long and short argument names. Updated code tested to ensure ability to process original CIFTEST cases with unchanged output. (see work in progress vcif002.patch -- incorporates changes below -- do not apply both patches). HJB
Project Status 5 Dec 2004: Mods prepared to extend line length (see work in progress vcif001.patch). Preliminary tests by S. Louris, continued by H. Bernstein.
Current IUCr release: none
Project Status 3 September 2006: On reflection and after some experimentation, we concluded that there was no need to add the deprecated "\;" semicolon escape convention to ciffold, since conversion for files using the incorrect convention can be done either with by sed or by the old CBFlib, and it would be best not to encourage the writing of additional incompatible files. If there is objection to this approach at the IUCr, we can put out the code to support the old convention, but absent such objection, we believe the project goals for ciffold have been met.
The release for ciffold is on the project web site and at:
http://www.bernstein-plus-sons.com/software/ciffold
Work by H. J. Bernstein.
Project Status 1 June 2006: The logic in ciffold in being reviewed with respect to the handling of folding which places a semicolon in column 1 of the next line. Ciffold handles this by doing the fold one character earlier. Ciffold allows a semicolon to be moved to column 1 if it is not followed by a blank or tab. CBFlib follows a different convention, escaping such a semicolon with a leading backslash, which conflicts with the IUCr convention of treating a "\;" as an ogonek. CBFlib has been changed to conform with the Ciffold convention, but, since there are datasets in the field that have used the CBFlib convention, we will add it as a deprecated option in both packages. Work by H. J. Bernstein. The review of semicolon handling started with a discussion with Mr. Mitev.
Project Status 1 April 2006: The release candidate discussed in the prior bimonthly report has been made the default ciffold release at:
http://www.bernstein-plus-sons.com/software/ciffold
Work by K. Mitev and H. J. Bernstein
Project Status 1 February 2006: The problem identified by Mr. Mitev at the end of the prior reporting period was investigated. The problem was one of insufficient blank stripping at the ends of lines within text fields. Mr. Mitev proposed a solution which addressed that problem, but, in the course of integrating the fix, a subtle problem was found in the mapping of single line quoted strings presented as folded text fields to apostrophe- and double-quote- quoted strings. In particular there are cases that should be left as folded text fields and that also bring to light important test cases to be added to vcif and CIFTEST. A fix for the problem is being tested in the next release candidate for CIFFOLD which should be posted for external release after additional testing. The release candidate is at:
http://arcib.dowling.edu/~bernsteh/software/CIFFOLD_0.5.4
Work by K. Mitev and H. J. Bernstein
Project Status 1 December 2005: At the end of this reporting period Mr. Mitev reported a potential problem with the handling of unfolding of folded text fields containing lines that end with the sequence backslash-blank. This is being investigated further by H. Bernstein.
Project Status 1 October 2005: During this reporting period CIFFOLD was presented in Florence and B. McMahon requested a new mode of operation in which only lines that were longer than 80 characters would be folded and other lines would remain unchanged to simplify reporting of changes to authors during the IUCr publication processes. The change was implemented in September by adding a new command line option ("-n") for minimal folding as well as a new ncurses menu page. The updated code was just released as version 0.5.3 and made the default at:
http://www.bernstein-plus-sons.com/software/ciffold
Work by H. J. Bernstein, with consultation by K. Mitev.
Project Status 1 August 2005: During this reporting period, the program CIFFOLD was tested and documented and progressed from release 0.4.3 to release 0.5.1 as problems were discovered and corrected and the quality of the ouput was improved (e.g. to fold between blank separated words in the style of CIFtbx3, rather than precisely on column 80 as in earlier releases of CIFFOLD). The code has been working well, and a link has been placed on the public web page for CIFFOLD to allow retrieval of folded versions of the long-line mmCIF datasets currently being released by the RCSB PDB. As of this writing the 0.5.1 release is the default release. The 0.5.2 release, currently available for testing at http://www.bernstein-plus-sons.com/software/CIFFOLD_0.5.2 changes the handling of folded long single or double quoted strings to include a terminal backslash in the resulting text field, so that an extra newline will not be added on reconstruction, and also deals with additional cases of embedded "; " sequences in text fields that might end up in column 1. If this release does well in testing, it should be the default release during the Congress. Work by K. Mitev, G. Todorov and H. J. Bernstein, with testing by I. Awuah Asiamah.
Project Status 28 May 2005: After testing and comments by the students in the lab, a full release (CIFFOLD_0.1) was prepared and posted at www.bernstein-plus-sons.com/software/ciffold in mid April 2005. Further testing by the students and by B. McMahon resulted in rapid evolution of the code to a reasonable stable release (CIFFOLD_0.4.3) on 14 May 2005. The work on cif2cif using the CIFtbx3 release was fed back into the work on CIFFOLD. Work by K. Mitev, G. Todorov and H. J. Bernstein, with testing by R. Chachra, C. Chigbo, S. Louris and, especially, I. Awuah Asiamah. Helpful comments provided by B. McMahon.
Project Status 2 April 2005: K. Mitev has prepared a full GUI release for testing. (see the current state of the release (ciffold003). The release is being actively tested by others in the lab to see if it can be gotten ready for release and for inclusion in the ITVG CDROM this month. A tar for others who wish to test this prerelease is available: CIFFOLD.tar.gz. Work by K. Mitev and G. Todorov, with testing so far by H. Bernstein, G. Todorov, R. Chachra, I. Awuah Asiamah and S. Louris.
Project Status 30 Jan 2005: K. Mitev and G. Todorov are working on a GUI front end and integrity checking. (see the work in progress ciffold002).
Project Status 5 Dec 2004: K. Mitev is working a this code (see the work in progress ciffold001).
Current IUCr release: www.iucr.org/iucr-top/cif/software/ciftbx3/README.html (from this project) and http://www.iucr.org/iucr-top/cif/software/ciftbx/README.html (the prior version).
Project Status 3 September 2006: The vcif2 validation code has been adapted from C to Fortran and incorporated into CIFtbx version 3.0.4 to provide "extended integrity checking comparable to that in vcif2". The full patch code to convert from CIFtbx 2.6.4 to CIFtbx 3.0.4 is available at
http://arcib.dowling.edu/cifiucr/ciftbx003.patch
and a release kit is available at
http://arcib.dowling.edu/cifiucr/ciftbx_3.0.4.cshar.Z
The full release of the enhanced CIFtbx3 is available on the project web site (see above) and at
http://www.bernstein-plus-sons.com/software/ciftbx
We are pleased to report that Syd Hall recently agreed to allowing the LGPL as an alternate license to the GPL for the API. We have incorporated the necessary license revisions into this patch and the kit. Since completion of the project, we have released the current versions of cif2cif, Cyclops, cif2pdb and cif2xml based on CIFtbx 3.0.4 rather than on CIFtbx 3.0.3 and reflecting the improvement in the license situation.
The specific changes made to CIFtbx in the transition from the 3.0.3 release to the 3.0.4 release were:
The 3.0.4 release completed the extension of validation to include checking for missing parents and validation of data values against dictionary-specified ranges and enumerations. Failure to provide dictionary-specified mandatory items is also reported. The new dict_ check codes 'parck' and 'parno' turn on and off checking of parent-child relationships. The default is 'parck'. The new character variable dicpname_ returns the dictionary-specified parent of the name in dicname_. The meaning of the existing dict_ check code 'dtype' has been extended to include data input checks for compliance with dictionary-specified ranges and enumerations, and type checking is more rigorous than in the past. The new logical variable valid_ is set .true. if an input data item has been validated against the dictionary. If no dictionary type check is specified or the item does not conform, valid_ is .false. Warning are also reported for validation failures. Additional internal name changes to avoid conflicts were made:
tbxxpcat procat
tbxxsstb <new>
tbxxfstb <new>
tbxxnid newdent
tbxxoid <new>
We believe the project goals for CIFtbx have been met. Work by H. J. Bernstein
Project Status 1 June 2006: Nothing to report for this period. We will return to this after the vcif validation upgrade to incorporate similar changes. Note, however, the discussion of semicolons in ciffold and CBFlib, below. Similar changes are being prepared for CIFtbx. Work by H. J. Bernstein
Project Status 1 April 2006: Nothing to report for this period. We will return to this after the vcif validation upgrade to incorporate similar changes. We note that CIFtbx_3.0.3 downloads have risen from 20 per week to 30 per week in this period.
Project Status 1 February 2006: The prior release of CIFtbx3 (3.0.2) and the new release (3.0.3) were tested on various platforms and, after careful review of the results CIFtbx 3.0.3 was made the default release on 18 January 2006 at
http://www.bernstein-plus-sons.com/software/ciftbx
It should be noted that the CIFtbx test cases are now also incorporated into CIFTEST.
Work on CIFtbx by G. Todorov, J. Jemilawon and H. J. Bernstein.
Project Status 1 December 2005: The next major phase of work with CIFtbx is augmentation of the integrity checking. In order to do this, further performance improvements are needed, In this time period the performance of CIFtbx was improved by reworking portions of the code to use of counted strings to avoid unnecessary replication of trailing blanks in the larger buffers created for handling the long lines of CIF 1.1, and increasing the number of pages kept resident, so that it will be feasible to make additional accesses to the data from dictionaries. The cif2pdb Makefile in the CIFtbx package was cleaned up. The code was released for testing as release 3.0.3 of CIFtbx at
http://www.bernstein-plus-sons.com/software/ciftbx_3.0.3/
Work on CIFtbx by H. J. Bernstein and G. Todorov.
Project Status 1 October 2005: During this reporting period the package was given further testing as the base for a new version of the program cif2pdb being used in another project. The package was upgraded early in August to release 3.0.2 to correct the handling of an index in dtype and to add new types from the PDB extensions dictionary. The updated version is now in use in support of the web page at
http://biomol.dowling.edu/WPDB
which is part of a project funded by the U. S. Department of Energy on creation of a new, wide PDB format. Work on CIFtbx by H. J. Bernstein. Work on the DOE project is a collaboration between F. C. Bernstein and H. J. Bernstein.
Project Status 1 August 2005: As reported in the prior reporting period, fully operational folding and unfolding with acceptable performance on long lines was integrated with CIFtbx and released as CIFtbx3. During this reporting period, CIFtbx3 was made the default release of CIFtbx, and, with S. R. Hall's approval, released under the GPL. The package was given extensive testing as the base for a new version of the program cif2pdb being used in another project, and worked corrected with excellent performance.
Project Status 28 May 2005: Fully operational folding and unfolding with acceptable performance on long lines was integrated with CIFtbx. The program cif2cif was upgraded to include options for folding and unfolding, making it an alternative to CIFFOLD and, more importantly, a template for Fortran programmers on how to adapt a Fortan application to use of CIFtbx3 for long lines. The first release of CIFtbx3 (ciftbx_3.0.0) was released at www.bernstein-plus-sons.com/software/ciftbx_3.0.0. There was an upgrade to www.bernstein-plus-sons.com/software/ciftbx_3.0.1 on 7 April 2005. This version seems to be reasonably stable. Links have been created from www.bernstein-plus-sons.com/software/CIFtbx3 to the ciftbx_3.0.1 release and from www.bernstein-plus-sons.com/software/CIFtbx2 to the ciftbx_2.6.4 release. The primary software distribution link for ciftbx will be upgraded from CIFtbx2 to CIFtbx3 during the next reporting period. We are pleased to note that the IUCr web site has the CIFtbx 3.0.1 release. Work by H. Bernstein with testing by K. Mitev and others.
Project Status 2 April 2005: The performance issue uncovered in the last cycle has been reasonably well addressed. The code for folding is written and the code for comment unfolding is written. With the addition of the code for text unfolding, this version may be ready for release and inclusion in the ITVG CDROM this month. (see work in progress ciftbx002.patch). Work by H. Bernstein.
Project Status 30 Jan 2005: Work on this package has brought to light serious performance issues is working with large numbers of large character strings in Fortran. The code of CIFtbx is being reworked to use representations of strings more appropriate to working in Fortran, combining trailing-blank-trimming and run-length-encoding.
Project Status 5 Dec 2004: Mods in progress to extend line length and to do folding (see work in progress with code for folding ciftbx001.patch). Work by H. Bernstein.
This is a major set of inter-related projects, expected to take more than two years to complete. A phased release to Chester of partial preliminary versions of all of these packages will be made on this web site and feedback from Chester will be used to guide completion of the packages. Comments and suggestions by other interested parties would be appreciated.
As versions of these packages mature they will be released to the community as open source software without charge to encourage wide use. The software will be released using the GNU GPL license.