[ba-ohs-talk] Fwd: [helium] Re: LSA, clustering and spring-graphs
Reference Website: http://helium.knownspace.org (01)
In direct and in indirect ways, several readers of ohs-talk are involved in
the KnownSpace project, a project conceived by Gregory Rawlins
http://www.cs.indiana.edu/~rawlins/ who is, I think, a brilliant, and
productive fellow. Below, I have included his recent (as recent as today)
justification for the project. First, let me give a bit of background from
my perspective. (02)
Why is KnownSpace important? I can think of lots of reasons, but largest
ones center on these key points:
KnownSpace is about secure transactions
KnownSpace is about lots of simple agents doing things
KnownSpace is about knowledge representation, manipulation, and
presentation (03)
In short, KnownSpace is clearly an engine that would support any proper OHS
project.
Readers of this list who are in direct or indirect ways coupled with
KnownSpace include:
Chris Dent, now very active in Gregory's class where Helium is
being developed
Alex Shapiro, who created TouchGraph, which, if I got it right, is
now part of the KnownSpace project
Jack Park (moi), who built a MySql backside for the earlier
Hydrogen KnownSpace and provoked a move to sourceforge (which, btw, is not
being used in any current implementation, but which probably will be used
in my own implementations) (04)
There may be others. (05)
KnownSpace is a Java, Apache license, society of agents, a communications
system for those agents, and a knowledge representation scheme based on
Entities and Attributes, as discussed in the paper "A New Data Model:
Persistent Attribute-Centric Objects" which is linked on Gregory's site. (06)
I can't begin to list all the things that Helium can do because that list
grows far too rapidly for me to keep track. I stopped downloading from the
CVS for a while; just too many new features, fixes, and so forth going on;
this is one of the most productive software class efforts I have
seen. But, a couple of things that I think are wildly important are email
processing, including text document clustering. We all need that, and soon! (07)
Following is Gregory's response to a comment on the email list in which he
articulates a point of view that I think is of value to all the software
projects that are now orbiting around the OHS attractor basin. (08)
Enjoy.
Jack (09)
>now we're getting into the question of what knownspace is for, or the issue
>of ultimate causes. originally i started on knownspace because i used to
>run machine learning classes (where we'd discuss and then implement
>learning algorithms to do various kinds of clustering/data mining/whatever
>using neural networks or genetic algorithms or classifier systems or
>discriminant analysis or whatever). each term i'd take the class from
>grunting savagery and utter ignorance to something approaching cognizance
>of what's out there and with enough tools to make a start on some problem.
>the best students would take that and run with it and produce
>implementations that, some of them, were worthwhile---at least in terms of
>idea application where most of the ideas were generated by other people.
>those implementations were all over the map---c, pascal, c++, visual basic,
>whatever, and on whatever machine they were most comfortable with or in
>whichever environement the thing they were mainly using/extending first
>used (not only solaris, but also linux, and of course windows and macs).
>this would happen each term. and each succeeding term i'd have to start
>over NOT ONLY with a new set of ignorant savages but also with the
>same scattered set of languages and platforms and environments, whatever
>they were most familiar with, whatever was most prevalent in the cs dept at
>the time, whatever was cheapest for home use, whatever. even when someone
>exceptional managed to grub a few steps up the mountainside of mud, next
>term it was as if it never had been (except that i remembered that it once
>existed) and the class as a whole had slid back to the start. term afte
>term, year after year. it was mindlessly stupid.
>
>take one particular example right here in bloomington. doug hofsteder's
>students over the years have worked in mostly one environment built, in
>some sense on melanie mitchell's copycat thesis (melanie was one of doug's
>first students). but anyone wishing to capitalize on all that work must
>first get that whole world of computation. all those thoughts each smart
>student thinks are mostly lost to anyone else. david leake's students
>mostly work on case-based reasoning, but each problem was distinct so
>cross-pollination was small. and new students could only start with a set
>of prior ideas, rarely, if ever, with a set of prior implementations.
>
>the only place we build on prior implementations is in the commercial
>world when one company manages to win massive market share for some
>implementation or physical machine, or in the academic marketplace of ideas
>where one free implementation is flexibile enough to win wide acceptance
>(unix mainly, now linux probably) so that current workers could build on
>past workers work. but those impleemntations are so far away in complexity
>form interesting rpoblems that building on them to get to doing something
>really interesting is just too damned hard for any one person or small
>group.
>
>this is a pattern replicated all over the world. ai, user interfaces,
>networking, user modelling, visualiztion, human-computer interfaces, over
>and over and on and on. lots of drek being done, but some of it good, but
>none of it replicable (aside from the cleverest ideas, but not the
>cleverest implementations).
>
>i thought about what science would be like if there were no replicability.
>what if each physicist had to start over from the beginning and become
>galileo and newton? what if there were no accepted standrad places to
>publish so that every other interested scientist could build the same
>equipment easily to repliacte work and maybe get a new idea that could then
>add to what was known, what could be done, what could be thought? that is
>what computer 'science' is now (minus the mathematical foundations, where
>we can exchange and build on ideas easily since they are pure ideas, and
>not implemention-dependent machines).
>
>i thought this was all very very stupid.
>
>what was needed was the first step. unfortunately it's a BIG step. a
>free and open computational environment robust enough and flexble enough,
>and understandable enough that nearly everyone could add to it, modify it,
>share their work over it. instead of working on toy problems because that's
>all you can reach starting from scratch each time, eventually the world as
>a whole could begin to work on something really hard---ideally, a problem
>that's in front of everyone all the time anyway, so i chose information
>management, a problem i've struggled with since the 80s. and still struggle
>with today.
> best,
> gregory. (010)
---------------------------------------------------------------------------
XML Topic Maps: Creating and Using Topic Maps for the Web.
Addison-Wesley, ISBN 0-201-74960-2. (011)
http://www.nexist.org/wiki/User0Blog (012)