In 1990 we began experimenting with electronic documents that merge our scientific software with our word-processing software. A year later we manufactured a CD-ROM containing a new textbook, Joe Dellinger's doctoral dissertation, and two progress reports of the Stanford Exploration Project. We distributed these CD-ROMs to sponsors and many friends at the 1991 SEG meeting. (SEP-CD-1 is available from Stanford University Press, $15 plus shipping, tel 415-723-1593) In 1990, we set this sequence of goals:
We met all these goals and set new ones:
Our basic goal is reproducible research. The electronic document is our means to this end. In principle, reproducibility in research can be achieved without electronic documents and that is how we started. Our first nonelectronic reproducible document was a textbook in which the paper document contained the name of a program script in every figure caption. The program scripts were organized by book chapter and section so they could be correlated to an accompanying magnetic tape dump of the file system. The magnetic tape also contained all the necessary data to feed the program script.
Now that we have begun using CD-ROM publication, we can go much further. Every figure caption contains a pushbutton that jumps to the appropriate science directory (folder) and initiates a figure rebuild command and then displays the figure, possibly as a movie or interactive program. We normally display seismic images of the earth's interior, but to reach wider audiences, Figure CI060210 shows a satellite weather picture which the pushbutton will animate as seen on commerical television. We include all our plot software as well as freely available software from many sources, including compilers and the LaTeX word processing system. Naturally we cannot include licensed software, but with the exception of Fortran and C compilers and the UNIX system itself, our publication includes source code for everything needed. The CD-ROM, at 680 megabytes, is so large we have had room for many executable programs on popular brands of workstations. The presence of these executables gives our readers a fast start.
% Here is LaTeX that we cannot yet convert to html :-( % %activeplot{CI060210}{width=4in}{.}{ % This figures is interactive. % Pressing the button at the end of the caption % will bring up a movie of satellite pictures % showing the weather situation over California % during one day in June 1992.}
Nearly everyone would rather read a paper book than the bitmapped page images on a screen that you see with an electronic document. But the illustrations in the electronic book are mostly in color, many are movies, and some are interactive. So the electronic book gives the reader a better understanding of the results. We typically use an interactive movie program to compare seismic sections where successive frames include processing with various parameters. The movie medium is much more informative than comparing seismic sections side by side. 3-D volumes are much better exhibited by movies than static paper illustrations. We are delivering a volume of software that is accessed like a book.
Our basic goal is reproducible research and the ability to publish it in reproducible form. The electronic document is our means to this end.
Authors producing an electronic document are required to file everything in a standard way and use certain names in their command scripts. Otherwise it is no harder to make an electronic document than a paper document. To propagate the methodology to the other 14 members of our group, we used only a one hour training session. Of course, our group was already accustomed to our common software collection for making paper documents.
The only word processor meeting our requirements is TeX which although it produces unsurpassed displays of mathematics, is not as easy to learn or use as popular commercial word processing software.
At the present time, a minimum hardware configuration costs 5-10 thousand dollars.
To insert an illustration in a document,
the author must provide a pathname to a postscript plot file
with a name either of the form
NAME.ps
or of the form
NAME.ps.save
.
The caption of the illustration is marked
[R] denoting reproducible when the plot file name is of the form
NAME.ps
and it is marked nonreproducible [NR] when it is of the form
NAME.ps.save
.
The burn-illustration commands destroy file names of the type
NAME.ps
.
For a nonreproducible figure to be marked reproducible requires
two unlikely events: first a mislabeling, and second,
a failure to run the burn and build cycle on the completed document.
An interactive illustration has a pushbutton in the caption. The pushbutton is the interface between the word-processing program and the command scripts. The label on the pushbutton holds the pathname to the command script and the name of the figure. The word processor executes a "change directory" to where the command script is, and then launches it. What happens next is determined by the conventions and style of the author. Our most recent research progress report (SEP-73) has 18 authors and 246 illustrations. Somewhat more than half the illustrations are reproducible. There are 165 captions with pushbuttons and 81 without. Nonreproducible illustrations are mostly line drawings from a draw program. Other nonreproducible illustrations used our parallel computer or had been done at other sites. Most of the pushbuttons simply rebuild and replot as described below, but many increasingly show movies.
makefile
to maintain software.
The remainder of this paragraph explains the makefile concept
for those unfamiliar with it.
The final result of a research calculation is a plot file
to be included in a document (and perhaps a movie to be launched
when the figure caption button is pressed).
To get these final results, many source files are needed
such as data, main programs, subroutines, and parameter files.
Between the source files and the plot files are intermediate files
such as object files, executable files, and processed data files.
After a plot file has been built, it should be younger than
the intermediate files which should be younger than the source files.
The makefile contains the commands needed to build everything
and knows the expected age relations among the files.
When the age relations are as expected,
then a call to rebuild a plot file
will report that the plot file has already been built.
Changing the date of any file,
say by editing a program or a parameter file,
will cause out-of-date files to be rebuilt
the next time the plot file is called for.
Thus, when a reader presses a button in a figure caption,
the figure is rebuilt if it is absent or out-of-date.
We find that makefile methods are highly nonstandard from one
UNIX system to another, and that a publically available variant
of make
called cake
works on all systems and
provides more powerful ways of maintaining documents.
One thing we particularly like about cake
is that unlike make,
it does not require intermediate files to be present.
Thus, after we finish building a plotfile, we can clean up
cake clean
by removing all the intermediate files
such as objects, executables, and processed data.
(Authors could use make
instead of cake
,
but the price they would pay on the next call for a plot
is that make
would insist on rebuilding
all the missing intermediate files.)
Makefiles and cakefiles contain rules for building files
from other files.
For each rule, the created files are called "targets"
and they are derived from files called "dependencies".
We have built about six major documents with over a thousand
illustrations. This experience has been distilled
to 2-3 pages of common cake rules called SEP.idoc.rules
that we typically include
in every figure-building cakefile.
Reasons to produce our interactive publications on CD-ROM are: When we hand a CD-ROM to someone and they put it in their drive, they have instantaneously mounted 600 Mb (megabytes) of our file system in their computer. They did not first need to clear 600 Mb of space in their computer and they did not need to wait two hours for the tape to read. CD-ROM is more suited than magnetic tape to please the recipient. Thus CD-ROM is a better publication medium. CD-ROM readers are likely to become more numerous than tape readers because manufacturers have begun distributing software on CD-ROM and the drives themselves are much cheaper than tape readers. CD-ROM is more durable than tape and more compact.
Manufacturing costs are variable, but in June 1992 we received an offer which amounted to manufacturing the first four disks for $150 each, thereafter the price would drop by a factor of a hundred to $1.50 per disk. Production times are about a week.
cd
and ls
commands).
The UNIX tar
command will take a UNIX directory tree
and create a single file from it,
but this file will not be in the proper form for a CD-ROM drive.
For the material on a CD-ROM to be a file system
that can be mounted,
the file must be prepared by a tar-like program.
One such program that we purchased
(with a huge academic discount)
for CD-ROM mastering
is called makedisk
from the
Young Minds Company (YM).
Before that we mimicked a CD-ROM with the UNIX mount
command
by mounting a file system in read-only mode.
Nearby is a tree with about 3500 leaves on it and in a nearby computer file system there are about 3500 files. In winter time all the leaves fall off the tree and then light passes directly through it because the leaves have much more surface area than the branches. Likewise the computer files take much more space than the directory structure. So the trick to using a CD-ROM as if it were a read-write file system is to first make a copy of the CD-ROM's directory structure on your hard disk. Like the tree branches, that isn't too voluminous. In this directory structure you do not put files, but pointers (symbolic links) to the files on the CD-ROM. Now whenever you need to change a file, you replace the link to the CD-ROM by a copy of the file on your hard disk.
The tree without its leaves is often called a "shadow tree".
To make one we could use a well-known simple
shell program called lndir
(distributed with X)
but instead we use a program called
cd_link
,
free from Young Minds.
It works much like lndir
except
that it performs the required name translation from ISO 9660 names
(a CD-ROM standard) to our directory names.
The only drawback
is that it takes
about 5 to 30 minutes to make the shadow tree.
Luckily,
you only pay this price the first time you load the disk.
From the Young Minds documentation we learn that
the CD-ROM standard, ISO 9660,
mandates that file names be restricted in several ways.
Basically,
1. have no lower case and 2. fewer than 8 or 11 chars.
makedisk
maps our file names into ISO 9660 compliant names.
Then after the user mounts his disk, the user needs to run
a program cd_link
(supplied by YM as C source) that appears to build
(on the user's hard disk) a shadow tree with our names
and symbolic links to the ISO 9660 named files on the CD-ROM.
We understand that manufacturers are planning CD-ROM drivers
that should make CD-ROM usable on our target range of work stations
without requiring the use of cd_link
or a shadow tree.
A disadvantage of publication on UNIX machines is that superuser status is now needed to use the mount command. Thus, an amateur cannot pop a disk in and out as simply as on a Macintosh. It is rumored that experts are working on this problem, but we can expect to wait a while before a multivendor solution comes into view.
We recently received a free CD-ROM called "Software Store" which contains demonstrations of a wide range of software for sale. A little booklet came with the disk. Unfortunately the first four pages of this booklet are devoted to "super user" commands required to be run before we can see anything. Our CDs are a little easier, but not much.
I (JFC) have found some publishers that are willing to publish a traditional book even though there is a publically licensed electronic book. Inexpensive copy machines did not put publishers out of business and they don't expect clumsy workstations will either.