       Proposal for the International Institute of Social History
                            Tjebbe van Tijen
                              7 April 1994
          The creation of a Wide Area Archive & Library (WAAL)

The archive & library consists of digital documents representing
all kinds of information from text (in the first stages of the
project) to images and sound (in the future). The reason for
constituting a WAAL is that, although the production and
proliferation of electronic documents has been astronomic, there
has very little been done for long term preservation of this kind
of information. It is clear that there is a strong impact on
society by the new information technologies, especially through
the diffusion of information by telecommunication. This
phenomenon has been compared on several occasions with the
'revolution of the printed word' as it developed from the 15th.
century onwards. The 'digital revolution' will be a popular
subject for historical study soon. To make such studies possible
we have to act now to rescue what will otherwise be lost forever.

There are distinct differences between printed and digital
information. The first is tangible and readable without any
devices (except spectacles in some cases), the second is
disembodied and can only be perceived with the help of special
appliances. Papyrus, parchment and paper have carried information
from generation to generation for more then 4000 years. It is not
likely that this 'paper memory system' will be fully replaced by
digital documents as some over enthusiastic computer lovers
propagate. Nevertheless we should start to take care also of the
'digital memory system', if we do not want to leave our
predecessors with a historical void. The new form of information
circulation over electronic networks has a very ephemeral
quality. Text is often written and read directly on and from the
computer terminal screen. Not much thought is given to long term
preservation of such texts and if so the necessary facilities,
finances and expertise are not available.

There are a few characteristics of this new media that will force
us to rethink the concepts we use to determine the selective
criteria for building historical collections of information
items. The notions of 'small' and 'big' publishers, limited and
wide circulation, are less applicable. The ease with which
documents can be duplicated, adapted and re-circulated, placed
from one electronic bulletin board to a whole network, from one
network to other networks, does away with the earlier distinction
between mass media and its implicit counter part 'non mass
media'. The ease with which one can now circulate information
from local to global scale has also consequences for another
concept of the 'paper world' collection building, that is the
importance given to the 'place of origin' of an information item.
Collections are often build up geographically. Consequently, work
tasks are also divided over different geographical areas. With
the implosion of physical space in the world wide electronic
network collection structuring and task division should be
revised. Collecting digital information can be done from any
point in the interconnected global network. The traditional
division in document types like correspondence, manuscript, book,
periodical, press release, pamphlet, hand out, leaflet, is
getting less distinctive. A whole chain of activities of the
publisher, printer, distributor, bookshop, has suddenly been
united in one process: computer networking. 

Of course the International Institute of Social History should
make a selection of the hundreds of thousands digital documents
that are (still) available now. One major network with an
information content closest to the collection profile of the
Institute is the Association for Progressive Communications
(APS). APS started in 1984 in the San Francisco Bay Area as an
initiative of the Ark Communications Institute, the Center for
Innovative Diplomacy, Community Data Processing and the
Foundation for the Arts of Peace (at that time called PeaceNet).
In 1987 PeaceNet was managed by the newly formed Institute for
Global Communications (IGC), set up by the Tides Foundation.
Other networks were created, such as EcoNet and ConflictNet.
Among the financial supporters of these initiatives was Apple
Computer. Later the network made connections with similar
initiatives in other countries such as GreenNet in England. In
1987 Peter Gabriels directed financial support to the project
from a fund raising rock concert in Tokyo (the year before). The
transatlantic link with GreenNet proved so successful that other
funds for furthering the net could be found from foundations like
MacArthur, Ford, General Service and the United Nations
Development Program. In 1990 the Association for Progressive
Communications was formed to coordinate the by now global
networking activities. There were more then 15.000 subscribers
in 90 countries in 1993, mostly Non-Governmental Organisations
(NGO) (see map).

The outline of the proposal that I discussed last week with
Michael Polman and Alfred Heitink from the Antenna Foundation in
Nijmegen reads as follow:

First step: gather all archive material of the APS network, in
as far as it has been preserved somewhere in the world. A rough
estimate is that it will be between 2 and 3 Gigabyte since 1984.
The daily feed of material is now around 1 Mb per day. This
estimate is mainly material in English, but also includes text
in Spanish, German and Portuguese. The proposal is to make a
contract with the representative of the APC network in the
Netherlands, the Antenna Foundation. The Antenna Foundation will
make a separate agreement with another partner, GreenNet in
London, to assure long term continuity. In principle all APC
materials are free on the network. Participating host
organisations in different countries have an agreement that they
only will charge for the transport costs of the information, not
for the information itself. There are some exceptions, as with
the materials from International Press Service (IPS). In such
cases separate deals need to be made with these information

At the moment the most cost effective and safe method of
preservation is writing the digital archive material to CD-ROM.
Each CD-ROM has a capacity of a bit more then 600 Mb. The whole
APS archive could be written on 5 to 6 of such CD-ROMs. With the
lowering of the prices of hardware and software it is feasible
now to buy a CD-Recordable writing device with a dedicated
computer and apliances for a price around fl. 15.000,-. Blank
CD-Recordable discs cost now between fl. 50,- and fl. 75,- a
piece. The writing of the CD's can thus be done 'in house'.  The
great advantage is that once the material has been prepared for
storing on a CD-ROM, other copies can be made easily and cheaply,
either by 'burning' another CD-Recordable, or duplicate them in
a small copy range through a duplicating company. Also the same
digital material can be formatted on CD-ROM for usage on
different platforms (PC, MAC, UNIX). Also duplicates of archives
can be exchanged with other institutions or made into a
publication. Of course there need to be permissions by copyright
holders before such a publication can be made.
The main steps for the APC project will be:

-    archiving/preservation;
-    classifying/normalisation;
-    making the material public available.

Each of these tasks can be divided in separate steps:

-    through direct Internet connections;
-    archive materials on DAT cartridges;
-    the original structure of bulletin boards and networks with
     news groups, subject lists, conferences, electronic
     journals and file sections will be preserved as much as
-    deselection by automatic filtering, for instance all
     messages of less than 5 Kb, or messages that consist mainly
     of quotations of other messages;
-    detection of double items on the basis of unique 'message
     ID' (only within a news group);
-    registration of verification by using ... of original text.

-    automatic description on the basis of formal elements in
     the headers of messages (from - date - subject line);
-    registration of conference(s) or list(s) where the message
     has been posted (also multiple appearance);
-    automatic classification of specific names derived from
     full text (person, corporations, geographical names) on the
     basis of expertise dictionaries;
-    semi automatic classification with descriptors/keywords on
     the basis of expertise dictionaries, in such  way that sets
     of message descriptions can easily be selected or
     deselected by the classifier;
-    normalisation of text that has been non correctly
-    reformatting for CD-ROM of view copy of texts that use
     national/language specific routines for non lower ASCII

Making the material public available
-    Bringing the indexes that refer to the full text on line
     (through an existing bulletin board system, direct
     dialling, on Internet, distributing the index to other
     bulletin boards);
-    bringing the whole text on line (so-called FTP site),
     either based at a computer at the Institute or for instance
     on the GreenNet computer in London;
-    establishing a service whereby on the basis of the
     descriptions (indexes) selections of text can be made 'on
     line' or by buying a floppy disc for use at home; the
     requested material can than be delivered on floppy, in an
     email box or on a CD-Recordable (with an automatic billing
     and payment registration program);
-    and of course consultation directly at the Institute.

Once the information is preserved on CD-ROM an 'on line resource
center' will be constituted at the International Institute of
Social History. Costs

A rough estimation of costs that can be divided in one time
investments and annual exploitation costs. Although the dynamic
hardware and software market will make it necessary to renew the
hardware and software on a regular basis. 

Starting options:
-    Hardware and software for archiving materials on CD-ROM
-    Multiple CD-ROM player to put in local and external network
-    Software development, training and support 10.000,-
-    Transport costs of data 5.000,-
-    Peripheral equipment (high speed modems, cabling, network
     facilities) 5.000,-

How to proceed

I propose that the project will be developed in stages whereby
at the first stage the project will be set up by an external
company on the basis of a contract with a fixed price. The
Antenna Foundation will be the most suitable candidate. For the
project there will be formed a steering committee with 2
representatives of the Institute and two of the company. The
project will include training of personnel of the Institute. The
dedicated software that has to be developed should, as much as
possible, be made up of combinations of existing widely accepted
software modules and its construction should be modular and be
open to adaptations by the Institute. The software should be able
to handle a wide variety of text and database material formats
and platforms.


Of course when the Institute decides to do the writing of CD-ROMs
'in house', the equipment can be used at the same time for other
projects as:

-    a compilation of existing text format inventories of the
     Institute and affiliates (with one general index);
-    publication of new inventories on CD-ROM (ID Archiv);
-    Archives de Bakunin;
-    publication of the general catalogue (OPC) on CD-ROM;
-    back up safety copies of images files of the iconographic

Tjebbe van Tijen using the Antenna and NLnet Internet Services
E-mail: iisg@antenna.nl or Tjebbe.van.Tijen@inter.NL.net

