[ba-ohs-talk] Fwd: [topicmaps-comment] on a new small stratified ontology for cyber war
This post is too rich in conceptual issues to ignore.
Jack (01)
>From: psp <beadmaster@ontologystream.com>
><snippage/> (02)
><header> Again, I apologize for creating this public discussion. However it
>seems important at least to me. And I do remind everyone that the time
>required for you to delete the message is not so great.</header>
>
>***
>
>Richard Ballard's contribution here is important and relevant to the issue
>of producing ontologies for arbitrary domains. He and I have talked about
>what it would take to code a software system and demonstrate a methodology
>that would produce a stratified ontology supporting sense making about the
>events that occur in hacking activities and cyber war. Perhaps this is a
>less then 5M project, with operational deployment (independent of all other
>systems) within nine months, and refinements over a three year period of
>time.
>
>Len Bullard is one of the developers of HyTime, a notational system for
>modeling the production of music, and which has been adopted into many small
>ontologies used in agile transformation of information. His comments are
>welcome, and are thoughtful of the issues posed.
>
>Frank Sowa's recent work on these related subjects is at: (03)
<jp> I think he meant John Sowa</jp> (04)
>http://www.jfsowa.com/pubs/signproc.htm
>
>
>The working notes that I developed, due to discussions with Dennis Wisnosky,
>on this stratified sense making system is in a four panel PowerPoint at:
>
>http://www.ontologystream.com/EI/slipstream_files/frame.htm
>
>This architecture is designed to interface with Richard Ballard's Mark 3
>knowledge system.
>
>***
>
>The primary compatibility can be seen in Dick's references to the REF-REF
>matches { machine derivable by co-occurrence and other (n-gram, tensor,
>Latent Semantic Indexing, and a few other esoteric evolutional programming
>processes } and now the functional load mapping of the "single node"
>formative ontology of the event map (please look at the representation of
>Port 80 - the e-mail port) in the top right corner of the paper:
>
>http://www.ontologystream.com/bSLIP/finalReview.htm
>
>This visualization is original to me.. but related to the work of Soviet era
>cognitive visualization of the theorems of elementary number theory:
>
>http://www.ontologystream.com/IRRTest/Evaluation/ARLReport.htm
>
>The connection to number theory is in elementary number base conversions...
>(changing the base from base 10 to base 6 alters the "solvability" of the
>problem of representing 1/3 in a rational expansion. This is related in my
>unpublished work on the Whorf hypothesis (in non-translatability) and to
>Godel/Cantor theory (in the foundation of finite mathematics.)
>
>This work on transforming unsolvable problems (Peter Kugler would call this
>a by-pass) leads to very fast scatter gather (clustering) algorithms so that
>what takes 4 hours using FoxPro Rushmore indexing is reduced to 25 seconds.
>So the clustering in the SLIP (stochastic) and eventChemistry work is fast
>enough to interact with human's attention span during an real time
>investigation of data invariance. Self organizing feature maps often take a
>day to cook a representation of a text corpus. The new algorithms change
>this investment of time to perhaps several minutes. The theory is very
>simple and the demonstration was already in the December 7th, 2001 SLIP
>Browsers: downloadable with short tutorial at:
>
>http://www.ontologystream.com/SLIP/files/ArbitaryEventLog.htm
>
>
>The SLIP atoms are Peircean nodes (not graphs - but a single node)! Sigh..
>the insight that I seem to have *that no one else has shown to me* is that
>the mental event is a single node "noun subject" with the reference link
>*potential* enumerated. This is not a Bayes representation, because the
>causes of these potential links are not probabilities. The link forms
>(emerges) within a stratified architecture with the decomposition of past
>memory being the substructure and the anticipation due to ecological
>affordance being the ultrastructure (notational engineer Jeff Long's term
>used here slightly differently). A simple algorithmic process (I invented
>in 1997) makes routing and retrieval situated.
>
>http://www.bcngroup.org/area3/pprueitt/kmbook/Appendix.htm
>
>So atoms, required by a formative compound ontology, first are created by a
>stochastic process (see the papers) and then meaning is acquired by human
>introspection (introspection is the "I" word in science, yes?). Peircean
>thirdness is in moving from a level of atom ontologies (firstness) to the
>level of compound, via scope (secondness). The current Topic Maps, even with
>HyTime, are not doing this yet... (at least not that I am aware of.)
>
>The key elements of this architecture are:
>
>Each of four levels of the taxonomy has human terminology evolution
>processes, in conjunction with human communities of practice.
>
>The bottom layer of this layered taxonomy is an open semiotic system
>depending on invariance types (categories) produced from the data
>aggregation of the Internet traffic at selected points within the Internet
>systems.
>
>The second layer is an intrusion event level that is responsive to the
>already deployed infrastructure of Intrusion Detection Systems (IDSs) and to
>the visualization of intrusion event patterns at the third level.
>
>The third level is a knowledge management system having knowledge
>propagation and a knowledge base system developed based on Peircean logics
>(cognitive graphs) that have a formative and thus situational aspect.
>
>The fourth level is a machine representation of the compliance models
>produced by policy makers.
>
>National Defense System against Cyber War would deploy a structured and
>stratified taxonomy across the governmental Computer Emergancy Response Team
>(CERT) centers. This system would be an independant system, from the
>current systems, and would have a knowledge management component for virtual
>collaboration.
>
>***
>
>Why have I found it necessary to make this conversation public?
>
>1) The theoretical and practical issues relevant to a working dynamic and
>stratified taxonomy of this nature must see the light of day. These issues
>can be partially solved, not solved at all, and solved in ways that burden
>the National response to cyber war.
>
>2) The proper solutions to this problem are useful in eBusiness, decision
>support systems (commercial and military), and in systems from virtual
>education. They CANNOT become part of the confusion which is the classified
>technologies (most of which simply are well-known not to work.)
>
>3) Private and personal reasons that since November 2001 no one has been
>considerate enough to pay me and my programmer for the work that we continue
>to do, all the while treating the issue as if a field test of **my**
>software, software that was not yet completed, is in progress. It has been
>treated as if the new work, more completed axioms and theorems foundation to
>a new area of pure mathematics, and guidance from those whom I would ask to
>develop proper outcome metrics; well these things are somehow not needed or
>wanted. The business value proposition is in competition with something
>that makes the business unit more money.
>
>***
>
>This MUST be considered a policy issue because it is one more example, where
>there are many many more examples, of how the innovations needed in defining
>knowledge science are restricted by the practices of the business mind in
>the exercise of control over the science mind. This might be ok if the
>Nation and the economy was not traveling a light speed towards a brick wall.
>
>I realize that the general systems dynamic has nothing to do with me, or the
>business unit. It is systemic and ubiquitous. It is a fact of life.
>
>However, the larger moral issue is in regards to why we as a culture have
>given absolute power to those who are practiced in this control. They know
>nothing of the issues that might be solved. They brought is the .com bubble
>because they felt too important to understand that in most cases there was
>no product even considered by these invested companies. Yes? Perhaps there
>is some other reason why the .com bubble occured? I do not think so.
>
>I again call for a Manhattan Project to establish Knowledge Science, and
>extend the true capabilities of Information Technology.
>
>http://www.bcngroup.org/area3/manhattan/sindex.htm
>
>This project could change the nature of the public discussion about what IT
>is good for, by bringing an understanding of the existing science to bare on
>the (mostly unprovable and often deceptive) theories in artificial
>intelligence and information technology. In response to Microsoft's
>advertizement "Where would you like to go?", I say I want to go somewhere
>where there is a stable operating system that will not change just a soon as
>I get my programs to work.
>
>Paul S. Prueitt
>Chantilly VA
>
>
>
>***
>
>
>
>-----Original Message-----
>From: Richard Ballard [mailto:rlballard@earthlink.net]
>Sent: Saturday, February 02, 2002 1:31 AM
>To: eventChemistry@yahoogroups.com; Topicmaps-Comment; Thomas B. Passin
>Cc: Mark Turner; Douglas Weidner; Tim Barber; Dorothy Denning; Doug
>Dearie; Dr. Robert Brammer; Rita Colwell; James L. Olds;
>Humanmarkup-Comment; Katarina Auer; Paul Zavidniak; William Sander;
>Dennis Wisnosky; Albright; Ivan Prueitt; Pharris(Contr-Ito); George
>Lakoff; Wojciech M. Jaworski
>Subject: [eventChemistry] Reaction to -- multilingual thesaurus -
>language, scope, and topic naming constraint
>
>
>Paul & Others:
>
>This conversation is a wonderfully entangled cameo of semantics, taken as
>the nexus or solution or insolvability of all things conceptual. Everyone of
>us becomes tempted at some time of life to untangle this problem or, via
>some simplifying assumption, finesse it as a barrier and move past. Some
>settle in and decide to spend their lives either solving or contributing to
>it from some particular perspective. The pernicious raise the issue just to
>assert that no problem can be solved unless their favorite problem is solved
>first. Delightfully that perniciousness, while present, is not blatant here.
>But still it goes round and round.
>
>At some point, the question has to be called for and some division of the
>house. I usually ask two questions: (1) What do you want language to do for
>you that makes semantics the issue? (2) From what you have learned so far is
>this problem going to be solved in years, decades, centuries, millennia, or
>ever? I would certainly like to hear an optimistic answer, particularly from
>George Lakoff or others who are so heavily invested.
>
>For me some 20 years was devoted to natural language dialog systems,
>sub-language analysis, and related linguistic issues in user interface
>design and computer based instruction and tutoring. When I turned to full
>time knowledge engineering (some 18 years ago), my faith and sympathy for
>language as a system for knowledge representation became a losing struggle.
>I abandoned it completely 10 years ago. I consider that a breakthrough and
>will say more on it at the Knowledge Technology Conference in Seattle March
>11-13.
>
>In knowledge coding we have the problem identifying "ideas" with some code,
>symbol, or phrase and then integrating the knowledge gathered and acquired
>by modeling from many sources. Each source had its own ontological
>commitment and the problem and goal is to marry these views at points where
>they share a common idea. In formal languages, like computer programming, we
>speak of DEFs and REFs. DEFs are places where the author has carefully
>defined as precisely as possible what a given phrase, symbol, or idea means
>as compared to REFs where some phrase, symbol, abbreviation, or figurative
>pronoun is used in reference to ideas that were never defined. In computers,
>the job of detecting conflicting DEF-DEF assertions and perfecting DEF-REF
>matches and self-consistency is accomplished by compilers, matches across
>sources by linkers. None of these tools tries to make REF-REF matches,
>unless some necessary characteristic matches exactly.
>
>In natural language sources, the ratio of DEFs to REFs is very small. (1 in
>10 might be a useful, integrative, "learnable" knowledge source.) Try to
>find definitions in the foregoing conversations. What passes for
>conversation is invariably REF-REF matches. It is hard to believe that
>language evolved under the imperatives of exactly matching ideas and
>meaning, more likely its natural selection criteria was "adequate
>similarity" within the "bonding cultural illusion" of shared feelings,
>interest, and understanding. Language and (unfortunately?) language
>misunderstanding and ambiguity are exactly what cultures and civilizations
>need to sustain unity under the stress of cultural diversity and broad
>differences in education, motives, and real interest.
>
>The "plasticity" of language to change and become what ever it needs to
>become makes the idea of "correct sense matching through language" more
>likely to mean politically correct, culturally correct, religiously correct,
>legally correct, than it is to be logically correct. Whose penalties are
>most severe? Well, who am I talking to and who else is listening.
>
>In large scale knowledge base construction we employ four primary talents
>(acquisition editors, modelers, production editors, and consulting subject
>specialists). Acquisition editors are trained to seek out and recognize the
>highest quality knowledge sources relevant to the target audience's primary
>needs and demands. Modeler's sort through these sources, focusing primarily
>on the quality and completeness of their "dominant mediating conceptual
>structures" (taxonomies, compositions, task/subtask hierarchies, flows,
>choice and constraint structures, etc etc.). Within these contexts concept
>meanings are strongly typed independent of language used, models make the
>first order ontological assignments and direct the word processing "pick and
>shovel" workers who add great productivity and volume to their efforts. this
>is the human equivalent of compilation.
>
>Production editors, assisted by consulting subject specialists, focus on
>source differences in abstraction level and granularity -- the processing of
>proximate matches. This work goes on within narrow subject areas suited to
>sublanguage analysis in limited domains where contextual settings and
>"subject expertise" resolve and validate the matches made. This is the human
>equivalence of linking. All of this work is value added and well worth the
>effort if the sources are suitable and highly structured, which from the
>knowledge management perspective means thick, repetitive, tabular books and
>data bases that for some reason cost a lot to produce (because of their
>completeness), and make dull reading. The first thing a company is likely to
>throw out.
>
>Most knowledge acquisition by modeling efforts become economic today where
>direct labor costs fall within $5000-$10K per source document (excluding
>royalties, licensing, etc.) Within the next 2-5 years this legacy mining
>might be expected to grow very fast, given market awareness and delivery
>tool environments.
>
>The dictionary was invented to stabilize word use, spelling, and meaning
>assignments against constant generational drift. Even when overloading words
>with 10-20 alternate meanings there are not enough to match one word to one
>concept. In the main, we use noun phrases for concept titling and acronyms,
>abbreviations, and pronouns when we get tired of writing these. Our literary
>forms favor constant reference variation to keep from sounding repetitive or
>one dimensional. These forces stressing human attention span and need for
>stimulation tell us that language has more to do than help us compare ideas.
>
>If we look hard at technical book stores today we will see the ontological
>equivalent of the dictionary taking up whole bookshelves. Its the field of
>medical coding. If your doctor orders a 26910 treatment and your not
>suffering from either a 170.5, 198.5, 730.13, or 991.1. Then that could cost
>you serious money, because your insurance company will not pay for it. If
>you want to sell clothes to Nordstrom, then you are going to have to enter
>into their standardized retail buying network and match their coding system
>in all your paper work. If we need an exact concept matching language, we
>will get it and it will not come from the dictionary.
>
>Dick
>
>PS. As is your way, feel free to share this.
>
>-----Original Message-----
>From: psp [mailto:beadmaster@ontologyStream.com]
>Sent: Friday, February 01, 2002 8:32 AM
>To: Topicmaps-Comment; Thomas B. Passin
>Cc: Douglas Weidner; Tim Barber; Dorothy Denning; Doug Dearie; Dr.
>Robert Brammer; Rita Colwell; James L. Olds; eventChemistry;
>Humanmarkup-Comment; Katarina Auer; Paul Zavidniak; William Sander;
>Dennis Wisnosky; Albright; Ivan Prueitt; Pharris(Contr-Ito); George
>Lakoff
>Subject: [eventChemistry] RE: [topicmaps-comment] multilingual thesaurus
>- language, scope, and topic naming constraint
>
>
><header> This is a complex message - perhaps of some theoretical interest
>to the cc list. However, if Points of Contact at DARPA, OSTP and NSF are
>not interested in this discussion; then we request a different point of
>contact. -Paul Prueitt OSI </header>
>
>****
>****
>
>Tom Passin said about the excellent post by Bernard Valant,
>
>
>"I didn't think of representing that those words themselves stood
>for different concepts. Interesting!"
>
>to the topicmaps-comment forum (at Oasis).
>
>***
>
><Paul Prueitt>
>
>A brief note here regarding the scope of a word due to language setting. I
>think that what I will say here will not be a surprise to linguists.
>
>It is NOT simply an "technical understanding of the language" that provides
>the real scope of a word in a language. Meaning occurs and can only be
>fully understood in the cultural setting and realities of the social system.
>To hold the opposition position (that an Interlingua exists in an absolute
>sense) is speculative, at best. This position is reductionism at core (this
>is my claim), since it claims that all natural language can be reduced to a
>single deep structure. Perhaps Professor Lakoff will make a comment on
>this?
>
>"Contextual is also pragmatic, as the word *lives* in a cultural setting.
>(Fiona Citkin, Head translator of the ARL sponsored conference (1995 - 1999)
>on Soviet Semiotics) private communication.)"
>
>In most cases the (Whorf?) problem is not so bad. However, in many cases
>profound misunderstanding can come because of an assumption that it is a
>technical understanding of a second language that stands in for the cultural
>experience. Yes? Machine translation systems have this problem often.
>Yes?
>
>On the practice of constructing static topic map? Well **perhaps** the TM
>community sees the real problem that comes from an early binding of scope
>during the production of TMs by one person and the use of the TM by someone
>who has a different point of view.
>
>These TM are becoming engines that will do things? And thus the issue of
>false Sense Making is vital - since evidence indicate miscommunication
>**between humans** sometimes distorts the meaning in diplomatic channels.
>Tonfoni makes the (private) argument that diplomatic miscommunication was
>responsible for much of the diplomatic errors made before the Gulf War.
>{Certainly, the American Nation is close, in many instances, to false sense
>making with respect to many issues where we are using great force to achieve
>outcomes that is proper, but that... we are not properly understanding the
>**scope**. } This is not a small matter!
>
>*False* sense making (Karl Weick, Sensemaking in Organizations), using off
>the shelf ontology (static TM), is a big problem that is not completely
>solved using HyTime...
>
>http://www.bcngroup.org/area3/pprueitt/private/KM_files/frame.htm
>
>The issue is reflected in the problem with machine based declassification
>and a operational theory of similarity, as I have stated in:
>
>http://www.bcngroup.org/area3/pprueitt/SDIUT/sdlong.htm
>
>This is a long and unpublished paper.
>
>I hope that the TM community will realize that I am NOT criticizing the
>important work that has been done over the past several years using Topic
>Maps. But there continues to be a problem, and Bernard's message states
>this problem *perfectly*. yes?
>
>***
>
>
>I have an approach to mapping the functional load between one word and all
>other words in natural use in a language. This is completely novel and new
>(I think).
>
>It is the eventChemistry as applied to word co-occurrence. I have studied
>the Aesop fable collection in English... but I need some help with issues
>like noun and verb differentiation.. and case grammars. There are a lot of
>similarities to Latent Semantic Indexing.. but eventChemistry has
>visualization and a few other surprises.
>
>Is there anyone (a linguist) who would like to do this work on the fable
>collection (likely requiring 30 - 40 hours of effort, using the
>eventChemistry software. What we might go after is a description of the
>functional load of some of the terms as used by Aesop in his fables.
>
>http://www.ontologystream.com/bSLIP/finalReview.htm
>
>So, some of you already see where this is going; the notion is that mapping
>single word usage in natural settings will provide a single atom (node with
>affordance links) --- as in Peirce's Unifying Logic Vision... concepts are
>like chemical compounds that are composed of atoms".
>
>This single atom is like the event atoms I have developed to study cyber war
>and innovation adoption (both of these are **intrusions** from one level of
>natural activity into another level of natural activity.) Please just look
>at the short paper on this at the above URL.
>
>It would seems that this would make a good publication, and perhaps even
>identify a value proposition?
>
>The mark-up of the context setting is addressed nicely in the work of
>Tonfoni
>
>http://www.bcngroup.org/area3/gtonfoni/EIVD/index.html
>
>
>
>Paul Prueitt
>OntologyStream Inc.
>Chantilly VA
>
>
>I have copied Bernard's message below for two other forums.. as the issue of
>scope is so beautifully expressed:
>
>
>
>****
>
>-----Original Message-----
>From: Bernard Vatant [mailto:bernard.vatant@mondeca.com]
>Sent: Friday, February 01, 2002 4:46 AM
>To: topicmaps-comments
>Cc: stefan.jensen@eea.eu.int
>Subject: Re: [topicmaps-comment] multilingual thesaurus - language,
>scope, and topic naming constraint
>
>
>
>
>Thanks to all who tried to answer, both on this list and through private
>communications.
>
>Now let me expose what I found out yesterday night - just after switching
>off the
>computer - with that delicious feeling you have when a long searched
>solution suddenly
>appears obvious and crystal clear, just because you have, at last, looked at
>it the right
>and simple way, and all the previous attempts look awkward and far-fetched.
>
>But, be patient. A bit of history. Last year, I was investigating that
>question with
>Seruba research team, unfortunately swept from the scene by economical
>constraints. The
>solution I had suggested at the time was to consider terms in different
>languages as n
>distinct topics, independent from the abstract descriptor, itself considered
>topic n+1.
>And then link those guys together through associations, asserting something
>like:
>"This topic is an abstract descriptor, representing an abstract concept,
>independent from
>any language. Those topics represent the term used in those languages to
>represent this
>descriptor concept".
>In putting the concept and the terms on different levels of topics, we had a
>technical way
>to manage synonymy and polysemy. But, like solutions proposed by Kal or Tom,
>that was only
>a stealth, and I remember one of Seruba's linguists, very skeptical about
>it, keeping
>saying to me "It works, but it does not make sense!"
>
>And he was right! The only sustainable viewpoint is that there is no such
>thing as a
>*concept independent of its representation by a term in a certain language*.
>Every
>attachment of a term to a concept is always asserted in the scope of a
>certain language,
>and every other language conveys a slightly or radically different view of
>the world and
>organisation of concepts, and that's why lingual diversity is so precious,
>and translation
>so difficult ...
>
>So we have to go back to basics: one subject = one topic.
>(DAN : økonomi), (DUT : economie), (ENG : economy), (FRE : économie), (GER :
>Wirtschaft),
>(SPA : economía) convey a priori six different concepts and views of the
>world, that
>someone familiar with all those languages could certainly feel, even if the
>differences
>are subtle. Hence they are six different subjects, and therefore have to be
>represented by
>six different topics. They are not six names of the same topic in different
>scopes, and
>definitely not variants.
>And they are not even representations of a same descriptor in different
>languages. The 7th
>topic, standing in the middle of nowhere outside of any language scope, does
>not make
>sense, because it has no meaningful subject. Note that if you give a
>definition of the
>descriptor, you always give it in some default language ...
>
>So what is a descriptor, putting together those six concepts for the purpose
>of
>cross-language communication and translation?
>What do you do when you gather topics? Obvious - you build an association.
>And what is the
>scope of that association? The scope of the language viewpoint from which
>you assert this
>association, that means the default language of the thesaurus ...
>This association asserts that those topics can be considered as
>"equivalent", allowing a
>translation which makes sense, maybe in a certain scope. Note that the scope
>is not on the
>names, but on the association. And that the associations are not necessarily
>the same if I
>stand from another language viewpoint. So if I edit the thesaurus with a
>different default
>language, I will certainly have to change the set of associations.
>
>That approach is deeply respecting the diversity of *concepts* conveyed by
>the different
>languages. All previous approaches are in fact killing the linguistic
>diversity, if you
>look at them closely, because the default language of the descriptor imposes
>the set of
>concepts, and the other languages are to find willy-nilly a name for it.
>
>And this is really enabled by the topic map representation.
>
>Think about it. I've got to put all that in XTM now.
>
>Regards
>
>Bernard (05)