approved: holiday   
Subject: A Visual Archive of Cinematographical Topics
Wolfgang Ernst / Harun Farocki

Visual archiving: Sorting and storing images

Cultural memory of images has traditionally linked images with texts, terms
and verbal indexes. Confronted with the transition of images into digital
storage gradually non-verbal methods of classification gain importance. It
is not the archival question which poses a problem to video memory; rather
the search methods used to find pictorial information are still limited to
models which habe been developed for retrieving texts:

Typically, available methods depend on file Ids, keywords, or text
associated with the images. <...> they don´t allow queries based directly on
the visual properties of the images, are dependent on the particular
vocabulary used. <Flickner et al. 1997: 7>

The question arises which new kind of knowledge will exists exclusively in
the form of images, which part of traditional knowdledge can be transformed
into images and which part might just vanish.

Techno-image archaeology  aims at rethinking the notion of images from the
vantage point of the process of archiving. The archive here is seen as a
medium of storage und form of organization of all that can be accessed as
knowledge. The function of archives of images such as museums or data banks
exceeds by far mere storage and conservation of images. Instead of just
collecting passively and subsequently archives actively define what is meant
to the archivable at all. In so far they determine as well what is allowed
to be forgotten.

In terms of technology an archive is a coupling of storage media, format of
contents and address structure. In this case the images is to be conceived
as data format. Methododically this implies leaving behind the contemplation
and description of single images in favour of an investigation of sets of

In his 1766 essay Laocoon G. E. Lessing discussed the aesthetic conflict
between the logic of language and the logic of images in terms of a
genuinely muli-media semiotics: pictura is no longer - as declared by
Horace - ut poiesis; time-based media (like dramatic speech and linear
narratives) differ from space-based media (like simultaneous pictures).  The
digitalization of images today provides a technical basis of inquiry into
this conflict, so that this investigation can be grounded in the terms of
the medium computer. It would not make sense to retell a teleological story
of images processing which finally reached its aim in digitalization; on the
contrary this history of images is to be revised from the present point of
view of digitalization. How can, for example, archives be related to
algorithms of image processing, of pattern recognition and computer

In sharp contrast to hermeneutics the media-archaeological investigation of
image archives do not take images as carriers of experiences and meanings.
The relation between vision and image cannot be taken as the guideline of
investigation, since image processing by computers can no more be re-enacted
with the anthropological semantics of the human eye. The methodological
starting point is rather a theory of technical media based on Michel
Foucault´s discourse analysis and Claude Shannon´s mathematical theory of
communication, as well as practices and notions of data-structure oriented

 The artes memoriae have been visual techniques of memorization from the
rhetorics of the antique to the renaissance. Museums, collections, images of
picture galleries, catalogues since have always dealt with the programming
of material image banks. The strive for visual knowledge in the -
literally - age of enlightenment in the eighteenth century led to visual
encyclopedias and their visualizations (like the planches, i. e. the visual
supplement of the big French Encyclopédie edited by Diderot and d´Alambert).
Photography then has been the switching medium from perception to
technology, creating the first technical image archives. Movies then have
been archives themselves (Hollywood and the rules of image sequences).

When it comes to (re-)programming image-oriented structures in digital data
bases of given images archives, priority has the development of a visually
adressable image archive. By combining Multiresolutional Image
Representation with simple Octree structures a variable archive module might
be applied. This allows to test the application of different algorithms
creating different visual sequences and neighbourhoods. Most operators of
image processing and pattern recognition such as filters and invariant
transformations can be integrated in the structure of the data base in order
to make cluster of images accessable. The next stept will be the development
of an interactive and visual agent capable of "intelligent" retrieval of
images and visual sketches in large data banks.

Navigating images on the borderline of digital addressability

While occidental culture has for the longest time practically subjected the
memory of images to verbal or numerical access (alphanumerical indexing by
authors and subjects, and even Sergej Eisenstejn subjected films to the idea
of deciphering its virtual story-book by transcribing moving images into a
score - a kind of reverse engineering of the written script), the iconic
turn, predicted by W. T. Mitchell, is still to come in the field of
image-based multimedia information retrieval.

In media culture there is still the problem that audio-visual analogue
sources can or should not be addressed like texts and books in a library;
these resources form a rather unconquered multi-media datascape.

Addressing and sorting non-scriptural media remains an urgent challenge (not
only for commercial TV) which after the arrival of fast-processing computers
can be matched by digitizing analogous sources. This does not necessarily
result in better image quality but rather in the unforseen option of
addressing not only images (by frames) but even every single picture element
(pixel). Images and sounds thus become calculable and can be subjected to
algorithms of pattern recognition - procedures which will "excavate"
unexpected optical statements and perspectives out of the audio-visual
archive which, for the first time, can organize itself not just according to
meta-data but according to its proper criteria - visual memory in its own
medium (endogenic).

By translating analogous, photographic images (of which film, of course,
still consists) into digital codes, not only images become adressable in
mathematical operations, but their ordering as well can be literally
calculated (a re-appearance of principles of picture-hanging envisaged by
Diderot in the eighteenth century).

The subjection of images to words is not just a question of adressing, but
of still applying the structuralist-linguistic paradigm on audiovisual data
as well.

Within the medium film, the practice of montage (cutting) has always already
performed an kind of image-based image sorting (by similarity, f. e.).
Cutting has two options: to link images by similarity of by contrast
(Eisenstein´s option). Only video - as a kind of intermediary medium between
classical cinema and the digital image - has replaces mechanical addressing
of cinematographic images by different means (timecode), offering new
options of navigating in stored image space. Automated digital linking of
images by similarity, though, creates rather unexpected, improbable links:
which are, in the theory of information, the most informative, the least
redundant ones. It also allows for searching for the least probable cuts.

Jurij Lotman explained in his film semiotics: "Joining chains of varied
shots into a meaningful sequence forms a story."  This is being contrasted
by Roger Odin´s analysis of Chris Marker´s film La Jetée (1963); how can a
medium, consisting of single and discrete shots, in which nothing moves
internally - photographic moments of time (frozen image) -, create narrative
effects? Cinematographic sequences are time-based, but film as such - the
cinematographic apparatus - "has no first layer of narrativity", when being
looked at media-archaeologically <Gaudreault 1990: 72>.

The absence of reproduction of movement <...> tends to block narrativity
since the lack of movement means that there is no Before/After opposition
within each shot, the narrative can only be derived from the sequence of
shots, that is from montage. <Odin, as quoted in: Gaudreault 1990: 72>

What happens if that sequence is not being arranged according to
iconological or narrative codes any more, but rather in an inherently
similarity-based mode, leading to a genuinely (image- oder
media-)archaeological montage?

After a century of creating a genuinely audio-visual technical memory
emerges a new cultural practice of mnemic immediacy: the recycling and
feed-back of the media archive (a new archival economy of memory). With new
options of measuring, naming, describing and addressing digitally stored
images, this ocean asks for being navigated (cybernetics, literally) in
different ways and not any longer just being ordered by classification (the
encyclopedic enlightened paradigm).

This state of affairs has motivated the film director Harun Farocki and the
media scientists Friedrich Kittler and Wolfgang Ernst to design a project of
performing an equivalent to lexicographical research: a collection of filmic
expressions. Contrary to familiar semantic research in the history of ideas
(which Farocki calls contentism , that is: the fixation on the fable, the
narrative bits), such a filmic archive will no longer concentrate on
protagonists and plots and list images and sequences according to their
authors, time and space of recording and subject; on the contrary, digital
image data banks allow for systematizing visual sequences according to
genuinely iconic notions (topoi or- for time-based images, a different
notion of Bachtian chrono-topoi) and narrative elements, revealing literally
new insights into their semantic, symbolic and stylistic values. This is
exactly what the film maker Harun Farocki strived for when in summer 1995 at
the Potsdam Einstein Foundation he proposed the project of a kind of visual
library of film which would not only classify its images according to
directors, place and time of shooting, but beyond that: digitally
systemizing sequences of images according to motives, topoi and, f. e.,
narrative statements, thus helping to create a culture of visual thinking
with a visual grammar analogous to linguistic capacities.

Different from the verbal space there is still an active visual thesaurus
and grammar of linking images lacking; our predominantly scripturally
directed culture still lacks the competence of genuinely filmic
communication ("reading" and understanding).

Genuinely mediatic criteria for storing electronic or filmic images have
been listed by the director of the Federal Archives of Germany (Kahlenberg)
and the chief archivist of the nationwide public tv-channel ZDF (Schmitt);
next to economically driven criteria (recycling of registered emissions)
historically-semantically-iconographically "inhaltsbezogene Kriterien" they
name 1. "Dominanzereignisse" (historical event-centered), 2. "politische und
soziale Indikationen längerfristiger Entwicklungen und Tendenzen", 3.
"Soziale Realität im Alltag" follows under "gestaltungsbezogene bzw,
ästhetische Kriterien" l. "Optische Besonderheiten" (remarkable camera
perspectives, such as "Bildverkantung und extreme Auf- oder Untersicht"), 2.
"die dramaturgische Gestaltung von Bildsequenzen" (cut, opposition of single
frames), 3. "besondere Bildmotive" (landscapes, people) - close to Farocki´s
topoi. Last but not least, of course, "Medientypische Gesichtspunkte" - the
very proper media archives, documenting the history of a tv channel itself.

On the market, though, digital video browsing still seeks to reaffirm
textual notions such as the story format as segmentation of a video
sequence, such as the news story, "a series of related scenes with a common
content. The system needs to determine the beginning and ending of an
individual news story."  Beginning and end though, in technical terms, are
nothing but cuts here.

With film, time enterns the pictorial archive. Once being digitized, even
the single frame is no more a static photographic image, but a virtual
object which is constantly being re-inscribed on the computer monitor in
electronic refresh circle light beams. While the visual archive has for the
longest time in history been an institution associated with unchangeable
content, the memory of (time-based) images becomes dynamic itself. Thus,
images get a temporal index.

The equivalent for iconographic studies of images is the search for
macroscopic time objects in moving images, "for instance larger sequences
constituting a narrative unit" . The media-archaeological look on film, on
the contrary, segments serially.

What do we mean by the notion of "excavating the archive"? The answer is
media-archaeology instead of iconographical history: What is being digitally
"excavated" by the computer is a genuinely media-mediated gaze on a
well-defined number of (what we call) images.

In a different commercial news analysis system, Farocki´s notion of kinetic
topoi occurs: "Each segment has some informative label, or topic. It is this
kind of table of contents that we strive to automatically generate" (i. e.
by topic segmentation).  Of course, "motion is the major indicator of
content change", a zoom shot f. e. is best abstracted by the first, the
last, and one frame in the middle <Zhang et al. 1997: 143>.

"Current video processing technologies reduce the volume of information by
transforming the dynamic medium of video into the static medium of impages,
that is, a video stream is segmented and a representative image is ex-<...>"
; that is exactly what indexing by words (description) does. How to avoid
freezing the analysis into a data bank? "Image analysis looks at the images
in the video stream. Image analysis is primarily used for the identification
of scene breaks and to select static frame icons that are representative of
a scene" <Hauptmann / Witbrock 1997: 222>, using color historgram analysis
and optical flow analysis and speech analysis for analyzing the audio
component (which can be done by transforming the spoken content of news
stories into a phoneme string). Thus the image stream is not subjected to
verbal description but rather accompanied by an audio-visual frame analysis.

Retrieval and browsing require that the source material first be effectively
indexed. While most previous research in indexing has been text-based (Davis
1993, Rowe et al. 1994), content based indexing of video with visual
features is still a research problem. Visual features can be divided into
two levels <vgl. Erwin Panofskys drei ikonologische Bildschichten>:
low-level image features <radikale Oberfläche>, and semantic features base
don objects and events. <...> a viable solution seems to be to index
representative key-frames (O´Connor 1991) extracted from the video sources

- but what is "representative", in that archivo-archaeological context? "Key
frames utilize only spatial information and ignore the temporal nature of a
video to a large extent" <Zhang et al. 1997: 149>.

The basic unit of video to be represented or index is usually assumed to be
a single camera shot, consisting of one or more frames generated and
recorded contiguously and representing a continuous actionin time and space.
Thus, temporal segmentation is the problem of detecting boundaries between
consecutive camera shots. The general approach to the soultion has been the
definition of a suitable quantitative difference metric which represents
significant qualitative differences between frames <Zhang et al. 1997: 142>

- which is exactly the boundary between the iconological and the
archaeological gaze, between semantics and statistics, between narrative and
formal (in the sense of Wölfflin) topoi.

Of course, a topos is a rhetorical category; rhetoric, though, is more of a
technique than a question of content: The philosopher Immanuel Kant, f. e.,
considers the ordering art of topics to be a kind of storage grid for
general notions, just like in a library the books are being distributed and
stored in shelves with different inscriptions. Do we have to always group
image features into meaningful objects and attach semantic descriptions to
scenes <Flickner et al. 1997: 8>, or does it rather make sense to
concentrate on syntax, thus treating semantics as second-order-syntax?

