t byfield on Mon, 5 Nov 2001 20:08:50 +0100 (CET)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> phil agre: Design for a Web Filtering Service



----- Forwarded 

Date: Sun, 4 Nov 2001 11:21:31 -0800
From: Phil Agre <pagre@alpha.oac.ucla.edu>
To: "Red Rock Eater News Service" <rre@lists.gseis.ucla.edu>
Subject: [RRE]Design for a Web Filtering Service


  Design for a Web Filtering Service

  Phil Agre
  http://dlis.gseis.ucla.edu/pagre/

  Version of 4 November 2001.
  2600 words.


For a few years now I've been using this mailing list to filter the Web.  
I ask people to send me URL's on certain topics.  Then I look at the
pages, keep the good ones, and assemble packages of URL's with titles and
short commentaries.  The results are sent to the mailing list, kept in a
Web-based archive of the list, and included in a single big Web page of
all the URL's I've sent out -- the latter mostly to help me avoid
duplicates.  I have sent out many thousands of URL's under this system:
about 1500 on the recent US elections, about 4000 about the current war,
and perhaps 6000 on other topics. Complete lists of the URL's can be found
through the RRE home page:

  http://dlis.gseis.ucla.edu/people/pagre/rre.html

These 10,000+ URL's have been submitted by hundreds of people, and the
great strength of Web filtering is that it draws on the diversity of the
participants.  I would never have heard of most of those URL's on my own.

Community Web filtering seems like a good idea, and it's time to explore
automated tools to support it.  In this article I will suggest a design
for a Web-based filtering tool.  I cannot participate in building such a
tool, but I would be happy to try out any prototypes that others might
construct.  I have established a discussion list for people who might be
interested in working on a tool:

  http://groups.yahoo.com/group/webfilter

This list is open and unmoderated.  If something more sophisticated is
needed, I am hoping that people who join the list can self-organize. I
believe that such a tool would be an excellent public service, not just
for this list but for many others as well, and I hope that some
public-spirited subscribers will be interested in taking initiative to
build it.  I would be happy to publicize their contributions, or else help
them preserve their anonymity, whichever they would prefer.

Here, then, is my proposed design.  I am sure that people who design
Web-based services for a living can do better, but I also hope that any
designers will listen to my rationales, which are based on years of
experience running a community Web filtering service by hand.

The "webfilter", as I'll call it, is a cross between a discussion list, a
weblog, and a bookmark file.  It is not just a weblog, since it includes
numerous functionalities to deal with long lists of URL's. Nor is it just
a discussion list, since the goal is to produce a reasonably clean and
orderly presentation of the URL's.  Nor is it just a bookmark file,
because of its community nature.

The webfilter lives on the Web.  The key idea is to require people who
submit URL's to impose a minimal degree of order on their submissions.
Right now, I get free-form text messages from submitters, and I have to
fish through these messages by hand to recover a useable URL.  Even though
most submitters are well-intentioned, it takes a lot of work to process
these text messages.  The messages are so diverse that it would be
impossible to write an automated tool to parse them.  That's why we need a
more structured tool on the Web.  The tool should be friendly, simple, and
efficient, but I don't think that will be hard.

The webfilter code should be open-source -- I'm not interested in a
proprietary system.  Each site that runs the webfilter code will be called
a "webfilter" (by analogy with "weblog").  Each webfilter will provide one
or more "services".  For example, on my own list I have effectively been
providing three services: the election, the war, and miscellaneous (mostly
politics and social aspects of technology). Lots of people would like me
to separate the political service from the non-political services, and the
webfilter could support many such divisions.

Services come in two types: edited and unedited (by analogy with moderated
and unmoderated mailing lists).  Each service has an owner, and some
services have editors.  The owner and editor need not be the same person.  
Each will have a password.

Each webfilter service on a given site has five modes: Submit, Edit,
Revise, Configure, and View.  I'll explain mode each in turn, stopping to
explain the reasons for each design choice.

(1) Submit mode

The owner creates the service using the Configure mode and then advertises
a Web page where people can submit URL's to it.  No login is required.  
The Submit page is very simple.  It has three boxes, a menu, and a button:

 * URL - The box should be large enough to handle the cumbersome,
   multiline URL's generated by some online publications, such as
   the Globe and Mail.  Required.

 * Title - We should encourage people to extract the real title of
   the material on the page and insert it in this box.  The title
   of the Web page itself may not be useful.  Optional.

 * Commentary - Space for several lines of commentary.  I typically
   provide one line of commentary at most, but many people prefer
   to include several lines, and many readers prefer more clues about
   whether they should click on a link.  Optional.

 * Category - Large lists of URL's are overwhelming unless they are
   broken down into categories.  Imposing the categories takes work,
   and we should shift most of this work onto the submitting users.
   The list owner should establish the categories, and should have
   an interface for editing them.  The categories should then appear
   to submitters as a (potentially two-level) pop-up menu.  This
   is optional, simply because the default category will be "other".
   Still, submitters should be politely encouraged to provide a more
   specific category.

 * A button called "submit".

The "submit" button should rapidly bring up a confirmation page. The
confirmation page will include the submitted page (i.e., the page whose
URL has been entered) in a frame, and above the frame the URL, title, and
commentary should be neatly presented.  If the same URL has already been
published in the same service, then an appropriate notice should apepar,
together with a link to the archive entry for that issue.  (Users should
not be prevented from resubmitting URL's that have already been published
on the service, but the editor should have a configuration option to
automatically throw such submissions in the trash.)  The confirmation page
should have two buttons: "confirm" and "edit".  The "confirm" button sends
the page to the input queue, says "thank you", and gives the user a blank
Submit page.  The "edit" button returns to the Submit page with the user's
URL, title, and comment.

We need a confirmation page because people often accidentally submit URL's
that don't work.  The counterargument is that the confirmation protocol
imposes overhead on the majority of users that outweighs the hassle of
badly formed submissions.

When a page is sent to the input queue for a service, the webfilter should
check for duplications.  If several people have submitted the same URL
with different commentaries then the webfilter needs to do something
reasonable, which I'll discuss under Edit mode.

(2) Edit mode

Webfilter services, as I say, can be either edited or unedited. Once it
has been set up, an unedited service is entirely automated. An edited
service has an editor, who is a user with a login name and a password.  
(In principle it could have several editors, but I will keep it simple.)  
Each service has a separate URL for the editor; this URL is presumably not
advertised, though no harm would result if it was.  (This is better than
having a single login page for everyone, which makes the editor enter the
name of the service every time and clutters the interface for non-editors.  
Both types of users need to be able to bookmark the page for their
respective mode for a given service.)

Having logged in, the editor is offered a link to Configure mode,
explained in a moment.  But in most cases the goal is editing, and it is
crucial for the Edit interface to be efficient.  The editor's job is to
rake through the URL's that have been submitted by subscribers to the
service.  The editor should presumably be shown a list of these URL's,
perhaps just the titles with hyperlinks, together with statistics about
how many URL's are in the queue, when the last batch of URL's was
released, the total number of URL's published to date, and so on.  (The
latter item is purely for the editor's curiosity.)

The editor's goal is to assemble an "issue" of the webfilter service. Each
issue consists of a title (i.e., the Subject: line of an email), a
prefatory text, lists of URL's under successive categories with titles and
commentaries, and perhaps a concluding text.  The editor can work on an
issue incrementally, and need not publish it until it is ready.  So the
service will always have a partially assembled issue stored in its
database.  The editor can only work on a single issue of a given service
at a time.  If the editor wants to fork off several issues, that probably
means it's time to break the service into several services with distinct
identities.

The crucial part comes when the editor settles down to filtering the
submitted pages.  Having clicked a button in the Edit mode called
"filter", the editor should be presented with a series of framed pages.  
One frame (at the bottom) will have the submitted page, and the other (at
the top) will have the URL, title, and commentary (all in their own boxes)
and the categorization (with the same pop-up menu that the submitters
see).  Four buttons are also provided on the top frame: "accept" (include
this page in the issue), "reject" (throw this page in the trash),
"postpone" (hold this page back for potential inclusion later on, either
in this issue or the next), or "stop" (make no decision on this page, and
instead return to the main Edit mode page).

The general idea is that the editor can change any of the entries that the
submitting user has made.  The editor can change the URL (for example,
removing the junk after the "?"), the title (for example, substituting a
paraphrase or a descriptive rant for a title that may not be
self-explanatory), the commentary (for example, deleting the commentary
altogether or editing it down to something simpler), and the
categorization (for example, supplying a category when the submitting user
has kept the default).  A particular problem arises, as I mentioned, when
several users have submitted the same URL with different commentaries (or
different titles, for that matter).  I don't know what the right answer is
here.  Perhaps the editor should simply get all of the commentaries (or
titles) in the single box, and should delete or edit the whole lot of them
at will.

Once the editor hits an "accept", "reject", or "postpone" button, it is
crucial that a new framed page appear as quickly as possible.  It should
be possible for an editor to crank through dozens of submitted pages every
day, making rapid decisions on each one.  Once the queue has been
exhausted or the editor has hit the "stop" button, the Edit mode page
should come back.  In addition to the features that I've mentioned, that
page should also have tools for publishing an issue consisting of the
URL's that have already been accepted.  I would suggest that the editor be
able to supply (in the Configure mode) a "boilerplate" text that goes at
the top of each issue, and then before publishing should have a chance to
edit that boilerplate text, for example by adding extra comments.  
(Perhaps the interface for editing the boilerplace text should go in the
Revise mode.)

The Edit mode page should have, inter alia, a "revise" button and a
"publish" button.  I'll explain the Revise mode in a moment.

When an issue is published, several things should happen.  An email
version of the issue should be sent to everyone who has subscribed to the
service, the URL's in the issue should be included in one giant historical
file for the service (for quick reference as to what has already been
published), and the URL's should also be entered into a database, indexed
by the category that they have been published under. This database is what
the View mode looks at.  Finally, the editor should be returned to the
Edit page, which should clearly reflect the successful publishing of the
new issue, along with whatever URL's still remain in the queue.

(3) Revise

When "revise" is pressed in Edit mode, the editor is taken to a page that
resembles in spirit a Web browser's bookmark editor.  This is probably the
most difficult mode to program.  A mock-up of the issue should appear, and
the editor should have point-and-click commands to rearrange the links,
delete them, edit the titles and commentaries again, and so on.  The
mock-up should gather the URL's under each category in the order they were
submitted, and the categories themselves should appear in the same order
as they do in the pop-up menu.  The order of the URL's within each
category of the issue is quite important, and the editor's tools for
moving URL's around within a category (or, I suppose, between categories)
should afford rapid wholesale rearrangement.  It is possible to make this
interface infinitely complex, for example by providing features to break
categories down into new subcategories or whatever, but this sort of thing
is not crucial.  In running webfilter services by hand, I have sometimes
used standardized categories and sometimes allowed categories to emerge
spontaneously within each issue.  But it is not crucial to be able to
improvise new categories, so long as it is easy to edit the category menu
in the Configure mode.

Incremental changes in the Revise page should be permanently stored in the
mocked-up issue.  The Revise page should have a "return" button that
returns to the Edit mode page without publishing the issue, as well as a
"publish" button.

(4) Configure mode

Along the way, I have listed some of the configuration options that the
editor should have.  It is easy to multiply such options, and you can
probably imagine them as well as I can.  It would be nice for each service
to have its own visual identity, selected from some options. The Configure
mode should presumably be used when a webfilter service is first
established, but it should hopefully not be needed very often afterward.  
I would guess that the most commonly used Configure function will be
adding new categories to the category menu.

(5) View mode

The owner also publicizes a URL for the service's View mode.  (For the
sake of cleanliness this should be a different page from the service's
Submit mode, though obviously the two pages should be linked to each
other.)  If you wanted to get fancy then you could offer users the option
of logging in and configuring the way they view a service. But for most
purposes a simple interface should suffice: links to the recent issues in
reverse chronological order.  It would also be nice to offer links to
reverse chronological lists of the links under particular categories: the
user would select these lists using the same pop-up menu that they use to
submit links.  Each list within a given category would, of course, consist
of sequential entries: title, commentary, and URL.  The URL should be
hyperlinked to the page it names.  It would be nice if the server could
periodically check all of the pages to make sure they still exist,
labeling links that have gone bad so as to save users the trouble of
trying to follow them.

The View mode page should also have a simple interface for subscribing to
the service by e-mail.  Just type in your address and hit the "subscribe"
button.  The server should obtain a confirmation by return e-mail before
adding the address to the list.

That's it.

----- Backwarded





#  distributed via <nettime>: no commercial use without permission
#  <nettime> is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body
#  archive: http://www.nettime.org contact: nettime@bbs.thing.net