Re: [unrev-II] Making HTML into XHTML

From: Peter Jones (ppj@concept67.fsnet.co.uk)
Date: Sat Jul 14 2001 - 05:42:39 PDT

  • Next message: Henry van Eyken: "Re: [unrev-II] Making HTML into XHTML"

    Not so long ago I ran into the problem of turning the HTML output of MS Word
    into XHTML, and then doing some funky things with the resulting XHTML.
    HTMLTidy was far too overzealous for our needs.
    Purely as a prototype I wrote a rough Perl (+ other stuff) hack that can be
    found here
    http://www.concept67.fsnet.co.uk/xml/

    It works for most cases of Word HTML, but it needs improvements like proper
    Latin-1 entity replacements (simple matter of changing the parsing DTD
    slightly to reference the entity set), a much more efficient algorithm (bit
    of thought if you can be bothered), and as it stands the code is for Windows
    (no shebangs and uses InstantSaxon not Java Saxon for the XSLT).
    The well-formedness fixer also adds an extra </div> at the end to cope with
    the particulars of Word output.

    It converts a dense 100-page A4 Word HTML doc in about 15-17 seconds on my
    laptop.

    However, there is a script in there that adds a nesting structure so that
    <h1> elements and their text nest <h2> elements and their text and so on.
    This is pure perl and can be used independently of the rest.
    So once you have a well-formed XHTML document, you can add level nesting.
    This comes in handy for all sorts of transforms and queries.

    You'll also need to adapt the XSLT to match any bootstrap.org special styles
    (otherwise it will strip them). Should be fairly obvious how that works, but
    if you need help, mail me.

    What's also neat is that because I've used only regular expressions, fixed
    operations from James Clark's SP toolkit, and XSLT, the whole thing is very
    configurable (caveat: if you know what you are doing ;-), and with minor
    adaptation (maybe the addition of a command line switch to cancel the extra
    end </div> tag) the whole thing could be used for non-Word HTML too.

    Crude but effective.

    cheers,
    Peter

    ----- Original Message -----
    From: "Henry van Eyken" <vaneyken@sympatico.ca>
    To: <unrev-II@yahoogroups.com>
    Sent: Friday, July 13, 2001 9:43 PM
    Subject: Re: [unrev-II] "As We May Think", etc.

    > Peter.
    >
    > Wow, you overwhelm me with your background and insights. Never expected
    > that a hastely scripted comment of mine would lead to all this! But the
    > forum is the better for it.
    >
    > Let me get to your concluding sentences first, "And if I was augmented,
    > what would become of the present relationship between intellect and
    > morality? I'm really fired up by these questions."
    >
    > It is my understanding that Doug Engelbart's augmenting of the
    > Collective IQ is what it says it is: augmenting the collective intellect
    > (never mind the original meaning of "I.Q.") But one might expect that by
    > the ensuing quickening pace of actions, there is a corresponding
    > shortening between instances that individuals face moral dilemmas with
    > less time and supporting resources to resolve them (such as during such
    > crises as war and surviving ecological catastrophies). Hence it would be
    > well to pay attention to values too.
    >
    > Recently, I have expressed my feeling to Doug and Bootstrap friends that
    > our site should, if we can so manage, change its nature from some sort
    > of a house organ, which it presently is, to a public medium. Then we can
    > also touch on questions raised during this discussion. In fact. we might
    > distill a good amount of facts and insights from the discussion groups
    > to serve as foundations for reflections more coherent then a discussion
    > forum permits.
    >
    > In the meantime, I am pondering how to morph the present site into one I
    > dimly envisage. This includes paying attention to the way we handle
    > documents, hence my interest in markup languages. It would be a simple
    > matter just to run our pages through a program like "Tidy" to ensure
    > they are, in the jargon of the afficionados, "well-formed." But I hope
    > to gain a better understanding of the subject in order (a) to better
    > appreciate many posts contributed to our discussion group" and (b) to
    > work as much as possible within the "Engelbart spirit."
    >
    > Hope I have not been too cryptic.
    >
    > Henry
    >
    >
    >
    >
    > Peter Jones wrote:
    >
    > > Henry, Dennis, Thank you both. Very interesting point from Henry: Did
    > > Kant consider rationality to be free from culturally imbued mores?If I
    > > remember correctly (very hazy, too long ago, book not to-hand), Kant's
    > > Groundwork for a Metaphysic of Morals, the text that contains
    > > definition of the C.I., had to posit a being of ideal rationality that
    > > one should aspire to be like in order to motivate proper use of the
    > > C.I. itself. Since we're not ideal we should just try very hard (else
    > > God won't like us). Or something like that. Dennis seems to be more on
    > > the ball on that front so perhaps he can fill in the gaps more
    > > accurately. Just to complicate matters slightly, today I ran across a
    > > book:Kant on Freedom, Law and Happiness by Paul Guyer (distinguished
    > > Kant scholar, responsible for a series of the best translations
    > > going). The back cover blurb states that Guyer is arguing for a new
    > > interpretation of Kant's ethics implying that there is much more to
    > > Kant's ethics than just the C.I.. Wish I had the time/energy to read
    > > that one. Another one I spotted that I don't have time to read
    > > was:Metaphor in Context by Joseph Stern (distinguished Univ. of
    > > Chicago prof.)Blurb states that Prof. Stern has come up with an
    > > interpretation of metaphor that places it within existing semantic
    > > frameworks (contra the now apparently traditional view that metaphor
    > > didn't really fit in so it couldn't be handled). Here I would refer
    > > folks to another page on that site that Bernard Vatant pointed us
    > > to:http://www.uia.org/homemeta.htmParticularly interesting for me were
    > > the points about the use of metaphor in international debate as a
    > > means of facilitating universal understanding. There was also another
    > > book, containing scholarly discussion of professional ethics that
    > > looked interesting (but it was wrapped so I couldn't scan it there and
    > > then) called "Matters of Breath". All in the philosophy section, so
    > > not much handy code in those, I suspect. So, how does one build
    > > ethical augmentation into augmenting systems then? Or would that be a
    > > bad idea (unethical in itself, even?) ?And also, could there be
    > > ethical augmentation that uses metaphor to raise points of ethical
    > > controversy to common understanding, moving beyond cultural
    > > differences?And if I was augmented, what would become of the present
    > > relationship between intellect and morality?I'm really fired up by
    > > these questions. cheers,Peter
    >
    >
    >
    > Community email addresses:
    > Post message: unrev-II@onelist.com
    > Subscribe: unrev-II-subscribe@onelist.com
    > Unsubscribe: unrev-II-unsubscribe@onelist.com
    > List owner: unrev-II-owner@onelist.com
    >
    > Shortcut URL to this page:
    > http://www.onelist.com/community/unrev-II
    >
    > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
    >
    >
    >

    ------------------------ Yahoo! Groups Sponsor ---------------------~-->
    Secure your servers with 128-bit SSL encryption! Grab your copy of
    VeriSign's FREE Guide "Securing Your Web Site for Business." Get it now!
    http://www.verisign.com/cgi-bin/go.cgi?a=n094442340008000
    http://us.click.yahoo.com/6lIgYB/IWxCAA/yigFAA/IHFolB/TM
    ---------------------------------------------------------------------~->

    Community email addresses:
      Post message: unrev-II@onelist.com
      Subscribe: unrev-II-subscribe@onelist.com
      Unsubscribe: unrev-II-unsubscribe@onelist.com
      List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page:
      http://www.onelist.com/community/unrev-II

    Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



    This archive was generated by hypermail 2b29 : Sat Jul 14 2001 - 05:57:38 PDT