[ba-unrev-talk] Licensing of the unrevii email archives (was re: Progress on...)
cdent@burningchrome.com wrote:
> Kathryn La Barre and I have been working with the archive of the
> unrev-II list to experiment with computational methods for
> determining "aboutness" of the subjects, messages and threads. This
> "aboutness" could then be used in an iterative process to
> generate facets for a faceted access structure to the archive.
> [Snip]
> We are not very far along, but have laid the groundwork for some
> interesting research. What we have done thus far is gathered at
> the following URL:
>
> http://ella.slis.indiana.edu./~klabarre/unrev_firstpage.html (01)
Chris- (02)
This is fantastic and brave work! I hope you continue to elaborate this
and share the results under some specific free or open source license.
http://www.gnu.org/philosophy/license-list.html
http://www.opensource.org/licenses/
(I didn't see a license in parser.pl by the way -- just your copyright.) (03)
=== an issue of email archive licensing === (04)
On my own machine, I've been using my own archive of the Bootstrap
mailing list (from Netscape as someone on the list) as one of several
large email archives for playing around privately with various archiving
approaches using some software (the Pointrel Data Repository System) I
have been independently working on.
http://www.kurtz-fernhout.com/pointrel/
I also use archives related to Squeak and Python for example, plus my
own personal mail. I find mailing lists served by mailman (or the
earlier pipermail behind it) as the easiest to work with as the provide
downloadable archives. An example is here:
http://lists.squeakfoundation.org/pipermail/squeakfoundation/ (05)
A major issue that has deterred me from publishing any work involving
the Bootstrap mailing list data has been a lack of clear license related
to using the email archive. While I myself applaud what you are doing,
it would nice to know with as much assurance as possible that releasing
such work publicly was on solid legal grounds (if you have not obtained
such assurances already from either Stanford or the Bootstrap
Institute). (06)
Rather than contact all the mailing list participants (a legally viable
but likely impractical approach), I believe the Bootstrap Institute
could provide this assurance based on the authority granted them from
"permission to use" (one good thing about it -- although the liability
indemnification provision still hinders my own participation). Someone
there in a legal position of authority there (like Doug) could just
submit to the email list a clear statement of their permission for
republication of the archive in any fashion or under a specific license
like the GNU Free Document license
http://www.gnu.org/copyleft/fdl.html
or others: (07)
http://www.gnu.org/philosophy/license-list.html#FreeDocumentationLicenses
It would nice to see that statement in writing as I think supporting
efforts such as yours is very much in tune with what the Bootstrap
Institute is trying to achieve, and the Bootstrap mailing lists have so
much interesting content on them they are an excellent testing ground
for any such email handling techniques in a self-reflective sort of way.
I myself would greatly value a clear license on the use of the Bootstrap
mail archives and then could consider putting up my own experiments
(assuming permission to use liability indemnification issues were
resolved). (08)
As an aside (and true to form if you review my postings to the list) I
myself think that licensing issues are one of the biggest things
preventing a true OHS (Open Hyperdocument System) from being developed
as a comprehensive digital library to help resolve world problems such
as energy difficulties or world hunger issues. We had the technology two
decades ago to put all the world's textual materials on line but it did
not happen. The best we got was Project Gutenberg
http://promo.net/pg/
or the Humanity Libraries Project
http://www.humaninfo.org/
which while both fabulous efforts as labors of love (and deserving of
much more support) do not in any way consist of most of the world's
textual materials (just those out of copyright or to which a few
copyright holders were willing to grant an additional license). (09)
The US Library of Congress has some related efforts
http://www.wired.com/news/culture/0,1284,41166,00.html
but by perhaps unintentionally using (to coin a phrase)
a "license denial through chaffing"(*) approach (consisting of mixing
public domain works and licensed works and putting the liability on the
user to determine the licensing of works) they prevent in many cases any
significant use of their accomplishments. For reference, see "the LOC's
American Memory Project"
http://memory.loc.gov/ammem/amhome.html
and the "license denial through chaffing" licensing:
http://memory.loc.gov/ammem/copyrit2.html
> The Library is offering broad public access to American Memory
> collections as a contribution to education and scholarship.
> Some materials in these collections may be protected by the U.S.
> Copyright Law (Title 17, U.S.C.) and/or by the copyright or
> neighboring-rights laws of other nations. More information about
> U.S. Copyright is provided by the Copyright Office. Additionally,
> the reproduction of some materials may be restricted by terms of
> Library of Congress gift or purchase agreements, donor restrictions,
> privacy and publicity rights, licensing and trademarks.
> Transmission or reproduction of protected items beyond that allowed
> by fair use requires the written permission of the copyright owners.
> The nature of historical archival collections means that copyright or
> other information about restrictions may be difficult or even
> impossible to determine. Whenever possible, the Library provides
> information about copyright owners and other restrictions in the
> catalog records, finding aids, special-program illustration captions,
> and other texts that accompany collections. The Library provides
> such information as a service to aid patrons in determining the
> appropriate use of an item, but that determination ultimately
> rests with the patron. (010)
(*)The "Chaffing and winnowing" technique was invented by Ronald L.
Rivest
http://theory.lcs.mit.edu/~rivest/chaffing.txt
http://www.sciam.com/1998/0698issue/0698techbus4.html
as a process of deniable encryption whereby valuable data is obscured in
chaff data so any eavesdropper without the correct filtering algorithm
doesn't know which data is the real message. I use the term "license
denial through chaffing" here to mean combined works like the LOC
"American Memory Project" where some is public domain and some is
licensed under various arbitrary licenses and you can't easily tell
which is which, rendering the whole work as well as any part useless as
far as prudently making derived works or even just redistributing the
whole without changes. Another example of chaffing at work to provide
security is the Windows System directory where one doesn't know what
DLLs belong to what application -- preventing easy copying applications
to another computer (as opposed to the Macintosh OS in most cases). (011)
Here is a a post I made on a related topic last year on gnu.misc.discuss
called: "License management tools: good, bad, or ugly?" on the issue of
using DRM systems to ensure freedoms to use, since effectively content
not accompanied by clear licenses in prudent practice cannot be used to
make derived works, rendering all content unusable by default down the
road unless one very carefully manages licenses by hand (since
content+permission do not travel together as an atomic unit). (012)
http://groups.google.com/groups?hl=en&selm=3AF45969.A5ED86A8%40kurtz-fernhout.com (013)
The complete thread is here: (014)
http://groups.google.com/groups?hl=en&th=bc41381803b7e0c0&rnum=1 (015)
Ideally content+"permission-to-use" would always travel together (given
the requirements of today's legal system) as opposed to being handled
manually in various files as is typical today. Even when they do sort of
travel together (as when a license is in a zip file) you often can't
preview the license before accepting the entire package -- this always
annoys me when there is no way on a web site to determine the license of
a package without downloading it first -- and if I don't want to accept
the license, I'd usually rather not see or receive the material at all
(especially if it is source code) as it creates a potential liability.
Ideally, I'd like to set up automated filters on my workstation so my
computer can automatically reject content however it is sent (email,
HTML, FTP) that does not conform to my notion of acceptable licenses for
my current purposes. (And of course, others would be free to set their
own filters according to their own preferences.) This intent here is to
use digital rights management tools to affirm freedom (by rejecting
unfree content) as opposed to dening freedom (by allowing my computer to
host RIAA robot guards to police my behavior and reduce my privacy after
I accept unfree content). (016)
To be clear, I am not saying you could not claim a "fair use" defense
for what you are doing with the Bootstrap archive. I am not in any way
suggesting right now you stop or even slow your excellent work. Given
the public nature of the list, one could hope that if you were somehow
prosecuted for civil or criminal penalties (under the DMCA some copying
== felony), you might convince a judge or jury that what you did seemed
fair and reasonable (I am not a lawyer though). I am more just saying
that you (or anyone else) would be on safer grounds when making such
modified archives available by relying on explicit permission obtained
through a clear license. For me, my own republishing of the entire list
archives would be currently well outside my own legal liability comfort
zone when considering risk vs. reward (but I am very conservative on
this). (017)
I've been trying to get some code together for others to show what I
mean as a prototype but so far haven't had time to put something
together. I guess Lawrence Lessig http://lessig.org/ is moving in this
direction as well with his Creative Commons.
http://www.wired.com/wired/archive/9.12/lessig_pr.html (018)
For whatever ill words one can say about the GPL license (e.g. coming
right now from Bill Gates),
http://slashdot.org/article.pl?sid=02/04/19/2256208&mode=thread&tid=109&threshold=2
the body of published material now licensed under the GPL plus future
derived works presents much less of this "license denial through
chaffing" problem as long as derived works continue to (as required)
reference the GPL. (019)
To conclude, whatever fancy OHS tools are created, they will have
difficulty being used in practice unless they address "license denial
through chaffing". (020)
-Paul Fernhout
Kurtz-Fernhout Software
=========================================================
Developers of custom software and educational simulations
Creators of the Garden with Insight(TM) garden simulator
http://www.kurtz-fernhout.com (021)