Re: [unrev-II] Making HTML into XHTML

From: Peter Jones (ppj@concept67.fsnet.co.uk)
Date: Sat Jul 14 2001 - 05:42:39 PDT

Next message: Henry van Eyken: "Re: [unrev-II] Making HTML into XHTML"

Previous message: Peter Jones: "Re: [unrev-II] "As We May Think", etc."
In reply to: Henry van Eyken: "Re: [unrev-II] "As We May Think", etc."
Next in thread: Henry van Eyken: "Re: [unrev-II] Making HTML into XHTML"
Reply: Henry van Eyken: "Re: [unrev-II] Making HTML into XHTML"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Not so long ago I ran into the problem of turning the HTML output of MS Word
into XHTML, and then doing some funky things with the resulting XHTML.
HTMLTidy was far too overzealous for our needs.
Purely as a prototype I wrote a rough Perl (+ other stuff) hack that can be
found here
http://www.concept67.fsnet.co.uk/xml/

It works for most cases of Word HTML, but it needs improvements like proper
Latin-1 entity replacements (simple matter of changing the parsing DTD
slightly to reference the entity set), a much more efficient algorithm (bit
of thought if you can be bothered), and as it stands the code is for Windows
(no shebangs and uses InstantSaxon not Java Saxon for the XSLT).
The well-formedness fixer also adds an extra </div> at the end to cope with
the particulars of Word output.

It converts a dense 100-page A4 Word HTML doc in about 15-17 seconds on my
laptop.

However, there is a script in there that adds a nesting structure so that
<h1> elements and their text nest <h2> elements and their text and so on.
This is pure perl and can be used independently of the rest.
So once you have a well-formed XHTML document, you can add level nesting.
This comes in handy for all sorts of transforms and queries.

You'll also need to adapt the XSLT to match any bootstrap.org special styles
(otherwise it will strip them). Should be fairly obvious how that works, but
if you need help, mail me.

What's also neat is that because I've used only regular expressions, fixed
operations from James Clark's SP toolkit, and XSLT, the whole thing is very
configurable (caveat: if you know what you are doing ;-), and with minor
adaptation (maybe the addition of a command line switch to cancel the extra
end </div> tag) the whole thing could be used for non-Word HTML too.

Crude but effective.

cheers,
Peter

----- Original Message -----
From: "Henry van Eyken" <vaneyken@sympatico.ca>
To: <unrev-II@yahoogroups.com>
Sent: Friday, July 13, 2001 9:43 PM
Subject: Re: [unrev-II] "As We May Think", etc.

> Peter.
>
> Wow, you overwhelm me with your background and insights. Never expected
> that a hastely scripted comment of mine would lead to all this! But the
> forum is the better for it.
>
> Let me get to your concluding sentences first, "And if I was augmented,
> what would become of the present relationship between intellect and
> morality? I'm really fired up by these questions."
>
> It is my understanding that Doug Engelbart's augmenting of the
> Collective IQ is what it says it is: augmenting the collective intellect
> (never mind the original meaning of "I.Q.") But one might expect that by
> the ensuing quickening pace of actions, there is a corresponding
> shortening between instances that individuals face moral dilemmas with
> less time and supporting resources to resolve them (such as during such
> crises as war and surviving ecological catastrophies). Hence it would be
> well to pay attention to values too.
>
> Recently, I have expressed my feeling to Doug and Bootstrap friends that
> our site should, if we can so manage, change its nature from some sort
> of a house organ, which it presently is, to a public medium. Then we can
> also touch on questions raised during this discussion. In fact. we might
> distill a good amount of facts and insights from the discussion groups
> to serve as foundations for reflections more coherent then a discussion
> forum permits.
>
> In the meantime, I am pondering how to morph the present site into one I
> dimly envisage. This includes paying attention to the way we handle
> documents, hence my interest in markup languages. It would be a simple
> matter just to run our pages through a program like "Tidy" to ensure
> they are, in the jargon of the afficionados, "well-formed." But I hope
> to gain a better understanding of the subject in order (a) to better
> appreciate many posts contributed to our discussion group" and (b) to
> work as much as possible within the "Engelbart spirit."
>
> Hope I have not been too cryptic.
>
> Henry
>
>
>
>
> Peter Jones wrote:
>
> > Henry, Dennis, Thank you both. Very interesting point from Henry: Did
> > Kant consider rationality to be free from culturally imbued mores?If I
> > remember correctly (very hazy, too long ago, book not to-hand), Kant's
> > Groundwork for a Metaphysic of Morals, the text that contains
> > definition of the C.I., had to posit a being of ideal rationality that
> > one should aspire to be like in order to motivate proper use of the
> > C.I. itself. Since we're not ideal we should just try very hard (else
> > God won't like us). Or something like that. Dennis seems to be more on
> > the ball on that front so perhaps he can fill in the gaps more
> > accurately. Just to complicate matters slightly, today I ran across a
> > book:Kant on Freedom, Law and Happiness by Paul Guyer (distinguished
> > Kant scholar, responsible for a series of the best translations
> > going). The back cover blurb states that Guyer is arguing for a new
> > interpretation of Kant's ethics implying that there is much more to
> > Kant's ethics than just the C.I.. Wish I had the time/energy to read
> > that one. Another one I spotted that I don't have time to read
> > was:Metaphor in Context by Joseph Stern (distinguished Univ. of
> > Chicago prof.)Blurb states that Prof. Stern has come up with an
> > interpretation of metaphor that places it within existing semantic
> > frameworks (contra the now apparently traditional view that metaphor
> > didn't really fit in so it couldn't be handled). Here I would refer
> > folks to another page on that site that Bernard Vatant pointed us
> > to:http://www.uia.org/homemeta.htmParticularly interesting for me were
> > the points about the use of metaphor in international debate as a
> > means of facilitating universal understanding. There was also another
> > book, containing scholarly discussion of professional ethics that
> > looked interesting (but it was wrapped so I couldn't scan it there and
> > then) called "Matters of Breath". All in the philosophy section, so
> > not much handy code in those, I suspect. So, how does one build
> > ethical augmentation into augmenting systems then? Or would that be a
> > bad idea (unethical in itself, even?) ?And also, could there be
> > ethical augmentation that uses metaphor to raise points of ethical
> > controversy to common understanding, moving beyond cultural
> > differences?And if I was augmented, what would become of the present
> > relationship between intellect and morality?I'm really fired up by
> > these questions. cheers,Peter
>
>
>
> Community email addresses:
> Post message: unrev-II@onelist.com
> Subscribe: unrev-II-subscribe@onelist.com
> Unsubscribe: unrev-II-unsubscribe@onelist.com
> List owner: unrev-II-owner@onelist.com
>
> Shortcut URL to this page:
> http://www.onelist.com/community/unrev-II
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Secure your servers with 128-bit SSL encryption! Grab your copy of
VeriSign's FREE Guide "Securing Your Web Site for Business." Get it now!
http://www.verisign.com/cgi-bin/go.cgi?a=n094442340008000
http://us.click.yahoo.com/6lIgYB/IWxCAA/yigFAA/IHFolB/TM
---------------------------------------------------------------------~->

Community email addresses:
  Post message: unrev-II@onelist.com
  Subscribe: unrev-II-subscribe@onelist.com
  Unsubscribe: unrev-II-unsubscribe@onelist.com
  List owner: unrev-II-owner@onelist.com

Shortcut URL to this page:
http://www.onelist.com/community/unrev-II

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Next message: Henry van Eyken: "Re: [unrev-II] Making HTML into XHTML"
Previous message: Peter Jones: "Re: [unrev-II] "As We May Think", etc."
In reply to: Henry van Eyken: "Re: [unrev-II] "As We May Think", etc."
Next in thread: Henry van Eyken: "Re: [unrev-II] Making HTML into XHTML"
Reply: Henry van Eyken: "Re: [unrev-II] Making HTML into XHTML"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2b29 : Sat Jul 14 2001 - 05:57:38 PDT