Handout
following the ECPGR Workshop Central Crop Databases on-line, Bonn, June
8-10, 1997
Downloadable databases on the
Internet
Theo van Hintum
Centre for Genetic Resources
the Netherlands (CGN), Centre for Plant Breeding and Reproduction Research,
P.O. Box 16, 6700 AA Wageningen, The Netherlands
The easiest way of providing access to a database via the Internet is
by creating a downloadable version of the database, or parts of it, and
putting it on the net. It allows any user connected to the Internet to
download it and use it the way (s)he likes.
Pros and cons
Providing downloadable databases should be considered a complementary
service to providing on-line searchable databases. Both approaches have
their pros and cons.
Some advantages of downloadable databases are:
-
Downloadable databases are very easy to prepare. Often an off-line version
of the database is already available for distribution via floppy.
-
Down-loaded databases can be analyzed and processed by the user. Many users
need the possibility to load the database on their local computer. For
instance, if genebanks would provide downloadable databases per crop with
passport data in a format following the Multicrop Passport Descriptor
List it would become possible to create a central crop database of any
crop within a few days.
-
Downloadable databases allow sending additional documents along with the
data sets. These documents can provide information for a proper interpretation
of the information in the data sets, but also information about how to
order material, etc.
Some disadvantages:
-
The access is indirect; the user must take an extra step before actually
accessing the data. This costs time and requires some knowledge.
-
Downloadable databases are, generally, snapshots. Updates of the database
made from the moment the files were generated will obviously not be included.
-
Data sets can start leading their own lives. Especially in the case of
coding systems the dangers of downloading, changing and further distribution
will be evident.
Requirements of downloadable databases
Downloadable databases should be as accessible and as up-to-date as
possible.
The accessibility can be improved by using
-
a well-known format for the file containing the data (DBF or TXT) and the
accompanying text files (DOC or TXT),
-
a well-known or easy interpretable structure of the data file, e.g. the
IPGRI Multicrop Passport Descriptor list,
-
standard of interpretable coding, i.e.
-
standard coding systems such as the codes defined in the IPGRI Multicrop
Passport Descriptor list, the ISOŠ country codes or the FAO institution
codes, or
-
local codes accompanied by decoding information in a separate file,
The data sets should not be too large and have a relevant coverage.
Providing the user with a tool to browse or query the data might be
helpful, though most users can be expected to have access to some database
or other software that is able to provide this functionality.
It is advisable to bundle and compress all relevant file in one ZIP
file. Relevant files include:
-
the data file(s),
-
the file(s) containing the information needed to decode the codes used
in the data file(s),
-
the file describing the structure of the data and decode file(s), and
-
a file called something like READ_ME.TXT with some nice words and a short
description of the content of all other files.
To keep the files as up-to-date as possible, automatic procedures can be
developed that refresh the files when appropriate. One might even consider
automatic distribution of updated versions to important users via Email.
Final steps
Once the ZIP files are available they have to be put on the net, and
referenced in HTML documents on the net. If no local server is available
and/or knowledge of HTML is lacking, a member of the Internet Advisory
Group (e.g. CGN,IPGRI, NGB, ZADI) should be approached. They will provide
you with a simple solution within minutes.
Conclusion
Downloadable databases, accompanied by information for their interpretation
and compressed in ZIP files, provide an attractive way of providing access
to databases. They should be considered as a option in addition to providing
access via on-line searchable databases. Since the technology needed is
available in any genebank, the databases can be on the net within weeks.