Net Buzz with Richard Wiggins | 3
|
||
Volume 1, Number 19 | March 18, 1998 | |
|
XML: What Every Webmaster Should Know |
|
XML and Metadata; HTML Conversion IssuesIt seems that everyone talks about metadata but the world can't settle on a standard for metadata. Will XML solve that? One of the problems with metadata is that doing it in a formalized way on computers is relatively new, so we're still discovering things we didn't know before (and XML is new, so there are discoveries there too). Metadata itself isn't new: the Romans had it, and medieval legal manuscripts have more metadata than data. But there was a certain amount of conflict between the library people and the computer people to start with, I think, as each community learned from the other. That's past, but there's still a lot of work to do in finding out what the most usable format is. XML itself is just a language to let you specify a structure for different classes of document, expressed as a set of tags, so it will handle metadata if that's what you set it up to do. Compare it with asking if C++ will index a phone book -- sure, if you write the program. Where XML scores is that it makes it easier to write an application, so writing one which caters for metadata is easier too. The publishing industry has long understood the value of SGML. What industries outside of publishing do you see adopting XML first? You can see some of this emerging in the list of new XML applications: finance and software are two at the top, with commercial transaction apps for general business use pretty close. Scientific apps are up there (chemistry was actually the very first with CML), and then there's a huge cluster of vertical markets: real estate agent markup language, for example, and furniture/woodwork industry markup language. One of my favorites is DESSERT, a markup language for recipes announced at SGML/XML'97. Don't laugh -- the cookery book/magazine market is huge. Something specific I'm curious about: why did the XML standards committee decide that XML tags, unlike HTML tags, would be case sensitive? This is essential for use in languages other than English. It lets a German have element types like <Hduser>, but even more importantly, it lets non-Latin-alphabet languages use 16-bit or 32-bit coded character sets, so Japanese users can have element types that can't be represented in a character set assuming case. There are significant problems if you try case-folding with non-8-bit alphabets, because some scripts simply don't have the concept of "case,", and some have it but only for some symbols. So the simplest and most useful thing to do was to remove case-folding completely. This also solves the perennial battle between the SGML rule of fold-to-uppercase (IBM legacy) and the UNIX practice of fold-to-lowercase. You point out in the XML FAQ that I need not convert existing HTML documents to XML any time soon; however, if I want the power that XML offers, I'll have incentive to convert. How hard is it going to be to convert from HTML to XML? If users have got start-tags and end-tags which currently match case (ie <UL>...</UL>, <ul>...</ul>, <Ul>...</Ul>, or <uL>...</uL>), then there's no problem. It's when they've hand-edited it and gotten <Ul>...</uL> or <uL>...</Ul> that it causes trouble. The vast majority of HTML pages are reasonably consistent, though.
There are already several XML editors around, so my guess is that there won't be many people masochistic enough to use Notepad for XML! In any case, Emacs-psgml has an XML mode now, and that works on PC/Windows as well as UNIX, and it's free, so there's really no need to use "dumb" editors any more. There will be "tidy-up-your-HTML-and-make-it-XML" programs too. Any first-year CS student should be able to do a program to consistently upcase or downcase the element type names in the tags of an HTML file ("making" it into XML is not much harder). |
|
Comments are welcome
Produced by Rich Wiggins and
All Rights Reserved. Legal Notices.
Created: March 18, 1998
Revised: March 18, 1998
URL: https://webreference.com/outlook/column19/page2.html