Re: [ba-ohs-talk] backlink database data
Hey Eugene, (01)
What about a filter to exclude what is below the sig line? That data should
usually be reasonably redundant and unrelated to the body text. (02)
Thoughts? (03)
---Sheldon (04)
Eugene Eric Kim wrote: (05)
> I've added a new feature to the list archiving code that the Bootstrap
> Alliance mailing lists (ba-unrev-talk, ba-ohs-talk) use: backlink
> extraction. Whenever an e-mail gets converted into HTML, the archiver
> also extracts URLs from the e-mail body and appends it to a text file.
>
> You can see live results of the links from this list and from
> ba-unrev-talk at:
>
> http://www.bootstrap.org/lists/backlinks.txt
>
> This is an extremely crude, early stage experiment. The hypothesis is
> that this backlink data, combined with a useful front-end, can serve as a
> useful and automatic way of integrating data in a repository.
>
> Restated, an e-mail with a link is an annotation. For example, this very
> e-mail could be considered an annotation to the document located in the
> URL above. In fact, you should find the above URL in the file.
>
> Unfortunately, this annotation is usually not visible when viewing the
> URL, because the Web has no notion of back-links. However, you can create
> this notion by extracting the links from local documents on a Web server
> and recording those links in a database. (This is essentially what Google
> does.) Extracting back-links from e-mail archives has the added bonus
> that e-mail is a static document, meaning that at least one end of the
> link (the e-mail end) will rarely break. I say rarely, because you could
> change the location of the archives on the web site, or delete them
> altogether.
>
> What does this mean for all of you? Well, I'm not a front-end kind of
> guy, but I know there are people on this list who are. So, this is an
> open challenge to create useful front-ends to this data.
>
> One early observation: There is a lot of "useless" data in the file. For
> instance, in my .sig below, I have a URL to my home page. So every e-mail
> I send to this list creates a back-link to my home page, even though my
> home page isn't really relevant to the content of these e-mails. The same
> goes for quoted text -- you have a lot of redundant text in e-mail
> threads, and hence, a lot of redundant links in the back-link database.
>
> Here is my roadmap for further developing this feature:
>
> 1. People on ba-ohs-talk build useful front-ends to this data. In the
> process of doing this, people make useful suggestions as to what other
> metadata need be stored in the file.
>
> 2. Replace the text file with a real backlink database. Hopefully, this
> backlink database can be used by other projects as well, including a2h
> (my Augment-to-(X)(HT)ML translator) and the Hyperscope.
>
> 3. Eventually bind the backlink database to a peer-to-peer infrastructure,
> so that multiple databases can be installed all over the Net, with all of
> them sharing data.
>
> -Eugene
>
> --
> +=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/ ===+
> | "Writer's block is a fancy term made up by whiners so they |
> +===== can have an excuse to drink alcohol." --Steve Martin ===========+ (06)