Discussion:
Normalisation in XML, HTML etc
Peter Kirk
2003-10-15 10:57:19 UTC
Permalink
I have heard it mentioned in general terms that W3C has specified that
text should be normalised according to NFC. What actually is the scope
of this specification? Does it apply to all XML, HTML etc? Is it
mandatory or just a recommendation?

I would also like to know if this is actually applied or enforced by
products such as OpenOffice and Microsoft Office 2003 which use XML as
one of their native document formats. Will text saved in these formats
be normalised to NFC? Should it be?
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-15 11:34:20 UTC
Permalink
Post by Peter Kirk
I have heard it mentioned in general terms that W3C has specified that
text should be normalised according to NFC. What actually is the scope
of this specification? Does it apply to all XML, HTML etc? Is it
mandatory or just a recommendation?
It is not mandatory. It is a SHOULD, which is between MUST (mandatory)
and MAY (permissive); it means that "there may exist valid reasons
in particular circumstances to ignore a particular item, but the full
implications must be understood and carefully weighed before choosing
a different course."

XML 1.0 is silent on the subject. XML 1.1 (not yet finalized) says
that XML parsers SHOULD (in the sense above) verify that their input is
normalized, and explains exactly what "normalized" means in connection
with various XML constructs; for example, the character just after a
start-tag SHOULD not be a combining character.
Post by Peter Kirk
I would also like to know if this is actually applied or enforced by
products such as OpenOffice and Microsoft Office 2003 which use XML as
one of their native document formats. Will text saved in these formats
be normalised to NFC? Should it be?
Output SHOULD be normalized; input SHOULD be verified as normalized,
but not forcibly normalized (doing so is a security hole). Whether
any particular product does this is up to the people who make the
product, and I have no information on either of those.
--
One art / There is John Cowan <***@reutershealth.com>
No less / No more http://www.reutershealth.com
All things / To do http://www.ccil.org/~cowan
With sparks / Galore -- Douglas Hofstadter


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Loading...