Line Separator and Paragraph Separator

Discussion:

Jill Ramonsky

2003-10-20 14:37:50 UTC

Are the LS and PS characters actually used in real plain-text documents?

I ask because plain text documents are created by text editors. The text
editor I happen to use is TextPad (there are hundreds of others, and
everyone has their favorite). It can save in UTF-8, and so on. But it
always saves documents with CRLF separating the lines. (It's a Windows
system).

Going a little deeper, applications like this are often written in C or
C++. These languages have the convention that "\n" in a string literal
means "new line". Strictly speaking, BY DEFINITION (from the C and C++
specs), "\n" is supposed to mean LF, and nothing else, but programs
compiled on Windows will reinterpret "\n" in a string literal to mean
either LF only (when in memory) or CRLF (when encoded to or from a file
or stream opened in text mode). Yes, it's a kludge, but it obviously
works quite well. I suspect (but I don't know for sure) that the Mac
will interpret "\n" as CR only.

It would seem impossible (or at least, a violation of the C/C++ specs)
to reinterpret "\n" as LS in C/C++ ... but then again, that
specification has already been violated, so maybe the precedent is there
and that no longer matters.

Nonetheless, it would seem, at least /slightly/ sensible to me that text
files encoded as UTF-8 should be using LS instead of CRLF. But this
appears to be difficult to achieve. There is no C/C++ escape sequence
which is defined to mean LS (unless you're prepared to write
"\xE2\x80\xA2" instead of "\n" all over the place), and what "\n"
generates is platform-dependent.

We can't change C or C++, of course, but would it make sense for other
computer languages, in particular future computer languages, either to
redefine "\n" to mean LS (if the encoding is capable of representing
it)* or to introduce a new escape sequence, ("\l"?) to mean LS? (Of
course, if we introduced "\l" for LS, we could also introduce "\p" for PS).

Thoughts anyone?

Jill

*FOOTNOTE - this is actually quite difficult to achieve if you're
storing stuff internally as bytes. Windows knows whether or not to
convert LF->CRLF and vice versa by means of a parameter passed to
fopen(), but this parameter can only distinguish between "text" and
"binary", not between "latin-1 text" and "utf-8 text". Things get easier
if you stor chars internally as Unicode chars of course.

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Frank da Cruz

2003-10-20 15:52:50 UTC

Permalink

Post by Jill Ramonsky
Are the LS and PS characters actually used in real plain-text documents?

At some point in the early 1990s, the thinking was that ASCII control
characters were included in Unicode only for round-trip compatibility
with existing character sets, but their semantics were undefined, and anyway
they were not needed since they were from the bygone days of terminals and
similar antique contraptions, whereas in modern times all text is "flowed"
by "smart rendering engines".

Ten years hence, the terminal-to-host model is still widely used, as is text
with hard line breaks, but to convince the skeptics and ultra-modernists
that line breaks were still a useful concept, I mentioned line-oriented
programming languages (such as Fortran), and poetry. Hence the line
separator.

Later everybody realized you couldn't stamp out ASCII control characters,
so we're still using them; LS and PS never caught on as far as I know.
Although obviously, LS would have been an improvement over the existing
situation, in which different line separators (CR, LF, CRLF) are used
on different platforms, which would otherwise have compatible text
record formats, which to this day causes no end of confusion.

At some point after Unicode 2.0, the C1 controls were adopted from ISO 6429,
in which we have a Next Line control (NEL, U+0085), which might also have
served the purpose, but it never caught on either.

- Frank

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

John Cowan

2003-10-20 17:32:04 UTC

Permalink

Post by Jill Ramonsky
Are the LS and PS characters actually used in real plain-text documents?

You can find such documents, but they're not common. LS was an attempt
to unify the diverse standards for line-end characters by providing a
new one, but IMHO it flopped. (XML 1.1, however, will interpret LS
as a line-end character.)

Post by Jill Ramonsky
These languages have the convention that "\n" in a string literal
means "new line". Strictly speaking, BY DEFINITION (from the C and C++
specs), "\n" is supposed to mean LF, and nothing else,

It means any one character that serves a new-linish function, which can
be LF or CR or NEL, for example. On EBCDIC-based systems, the native
C compiler interprets \n as 0x25, which is NEL.

Post by Jill Ramonsky
compiled on Windows will reinterpret "\n" in a string literal to mean
either LF only (when in memory) or CRLF (when encoded to or from a file
or stream opened in text mode).

It's any LF character that gets that treatment, of course, not just one
from a string literal. The fact that DOSish systems map LF to CRLF on
output and back on input has nothing to do with the C \n character.

Post by Jill Ramonsky
I suspect (but I don't know for sure) that the Mac
will interpret "\n" as CR only.

Yes.

Post by Jill Ramonsky
It would seem impossible (or at least, a violation of the C/C++ specs)
to reinterpret "\n" as LS in C/C++ ... but then again, that
specification has already been violated, so maybe the precedent is there
and that no longer matters.

It is not a violation.
--
Real FORTRAN programmers can program FORTRAN John Cowan
in any language. --Allen Brown ***@reutershealth.com

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

John Delacour

2003-10-20 20:15:03 UTC

Permalink

Post by Jill Ramonsky
I suspect (but I don't know for sure) that the Mac
will interpret "\n" as CR only.

Yes.

Or, since the Mac that knows Unicode is Mac OS 10.*, no.

JD

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

John Cowan

2003-10-20 21:06:36 UTC

Permalink

Post by John Delacour
Or, since the Mac that knows Unicode is Mac OS 10.*, no.

I wasn't speaking of Unicode. Classic Mac C compilers definitely
generate 0x0D rather than 0x0A when "\n" appears in the source.
I don't know what Mac OS X C compilers do.
--
Principles. You can't say A is John Cowan <***@reutershealth.com>
made of B or vice versa. All mass http://www.reutershealth.com
is interaction. --Richard Feynman http://www.ccil.org/~cowan

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Doug Ewell

2003-10-21 06:00:08 UTC

Permalink

Post by Jill Ramonsky
Are the LS and PS characters actually used in real plain-text
documents?
I ask because plain text documents are created by text editors. The
text editor I happen to use is TextPad (there are hundreds of others,
and everyone has their favorite). It can save in UTF-8, and so on. But
it always saves documents with CRLF separating the lines. (It's a
Windows system).

SC UniPad <http://www.unipad.org> can save any type of Unicode file
(UTF-7/8/16/32, SCSU, or ASCII with \uXXXX escapes) with any type of
line separator (CR and/or LF, or LS). The tricky part is that ASCII
transparency is really the whole reason for UTF-8 to exist, and if you
use LS instead of CR/LF, you don't really have ASCII transparency any
more.

Another point is that the ONE character LS actually uses more bytes in
UTF-8 than the TWO characters CR and LF. I don't know if this fact
actually contributes to the low use of LS in the real world, but it
probably doesn't help.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Jill Ramonsky

2003-10-21 12:05:09 UTC

Permalink

Interesting.

I do strongly suspect, however, that at least part of the reason that LS
and PS didn't take off was that they are more than seven bits wide, and
hence cannot be transported in plain ASCII text.

I wonder why it was not felt a good idea at the time (the early 1990s)
to have defined LS and PS, but with codepoints somewhere in the range
U+00 to U+1F. I think it would have been fairly easy to find some mostly
unused ones, for example U+10 and U+11. The reason? SMTP traffic is (by
definition) transmitted across 7-bit-wide channels. HTTP traffic is
transmitted across 8-bit wide channels. In the internet world, "newline"
is CRLF, and everything else has to be converted to it for transmission
across the internet.

Personally, I would have added a THIRD kind of separator, a "soft line
break". The reason? Some email relays insist on a "maximum line length"
of emails. In these days of mime types and attachments, we inject CRLF
into the files to keep such relays happy, but the renderer ignores them
as "just whitespace". If we'd have had a "soft line break" character (in
the range U+00 to U+1F), we could have retrofitted it into existing
email protocols. Had we done this, SLB could have been considered "just
whitespace", while LS and PS would have been not-ignorable in HTML (and
in fact, equivalent to <br> and <p> respectively).

I'm not surprised that NEL never caught on though.

Jill

-----Original Message-----
Sent: Monday, October 20, 2003 4:53 PM
To: Jill Ramonsky
Subject: Re: Line Separator and Paragraph Separator
At some point in the early 1990s, the thinking was that ASCII control
characters were included in Unicode only for round-trip compatibility
with existing character sets, but their semantics were
undefined, and anyway
they were not needed since they were from the bygone days of
terminals and
similar antique contraptions, whereas in modern times all
text is "flowed"
by "smart rendering engines".
Ten years hence, the terminal-to-host model is still widely
used, as is text
with hard line breaks, but to convince the skeptics and
ultra-modernists
that line breaks were still a useful concept, I mentioned
line-oriented
programming languages (such as Fortran), and poetry. Hence the line
separator.
Later everybody realized you couldn't stamp out ASCII control
characters,
so we're still using them; LS and PS never caught on as far as I know.
Although obviously, LS would have been an improvement over
the existing
situation, in which different line separators (CR, LF, CRLF) are used
on different platforms, which would otherwise have compatible text
record formats, which to this day causes no end of confusion.
At some point after Unicode 2.0, the C1 controls were adopted
from ISO 6429,
in which we have a Next Line control (NEL, U+0085), which
might also have
served the purpose, but it never caught on either.
- Frank

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

John Cowan

2003-10-21 12:57:37 UTC

Permalink

Post by Jill Ramonsky
I wonder why it was not felt a good idea at the time (the early 1990s)
to have defined LS and PS, but with codepoints somewhere in the range
U+00 to U+1F.

Pretty much because other ISO standards specify the meaning of that set,
and Unicode/ISO 10646 very much didn't want to go there. I say "meaning",
but there are actually multiple possible meanings, though most of them are
fairly consistent.

Post by Jill Ramonsky
I'm not surprised that NEL never caught on though.

Note that the presence of NEL in the C1 area (U+0080 to U+009F) reflects
an earlier attempt to do the same thing that generated LS. Some ISO
committee recognized that LF was being overloaded to mean "move to the
next line" and "go back to the beginning, then move to the next line"
and introduced the characters U+0084 (IND) and U+0085 (NEL) to
disambiguate, presumably in hopes that LF would eventually be abandoned
in favor of IND and NEL as appropriate.

No such luck, Doc. <chomp/><chomp/>
--
John Cowan ***@reutershealth.com www.reutershealth.com www.ccil.org/~cowan
I am he that buries his friends alive and drowns them and draws them
alive again from the water. I came from the end of a bag, but no bag
went over me. I am the friend of bears and the guest of eagles. I am
Ringwinner and Luckwearer; and I am Barrel-rider. --Bilbo to Smaug

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Jill Ramonsky

2003-10-21 15:12:50 UTC

Permalink

Hmm.

Well, I can't say I've ever found a use for putting either a C0 or a C1
control into a text file, beyond the usual CR, LF and TAB. My code also
often considers FF to be whitespace, although I've never actually
(knowingly) encountered it in a real text file.

I would have thought that low codepoints would be highly valuable
commodities. Though some may have exotic uses, my experience is that
most of them don't seem to be used. In the past (that is, in the
pre-Unicode days, or when specifically working with ASCII or Latin-1
strings), I have tended to treat the control characters rather like the
Private Use Area - a space in which I can do what I want so long as
don't expect the "outside world" to agree. I've even invented (and used)
some 8-bit encodings which leave the whole of Latin-1 unchanged (apart
from the C1s) and use C1 characters a bit like "surrogate pairs" to
reach the rest. (I didn't expect this to catch on, it was for internal
use only).

I'm really surprised that Unicode "didn't want to go there".
Still, that's life.
Jill

-----Original Message-----
Sent: Tuesday, October 21, 2003 1:58 PM
To: Jill Ramonsky
Subject: Re: Line Separator and Paragraph Separator

Post by Jill Ramonsky
I wonder why it was not felt a good idea at the time (the

early 1990s)

Post by Jill Ramonsky
to have defined LS and PS, but with codepoints somewhere in

the range

Post by Jill Ramonsky
U+00 to U+1F.

Pretty much because other ISO standards specify the meaning
of that set,
and Unicode/ISO 10646 very much didn't want to go there. I
say "meaning",
but there are actually multiple possible meanings, though
most of them are
fairly consistent.

Elliotte Rusty Harold

2003-10-21 15:53:51 UTC

Permalink

Post by Jill Ramonsky
Hmm.
Well, I can't say I've ever found a use for putting either a C0 or a
C1 control into a text file, beyond the usual CR, LF and TAB. My
code also often considers FF to be whitespace, although I've never
actually (knowingly) encountered it in a real text file.

I have. It shows up in a lot of old text files as a page separator
character. It's also occasionally used as a document separator when
someone wants to stuff multiple XML documents in the same file.
--
Elliotte Rusty Harold
***@metalab.unc.edu
Processing XML with Java (Addison-Wesley, 2002)
http://www.cafeconleche.org/books/xmljava
http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

John Burger

2003-10-21 17:18:47 UTC

Permalink

Post by Elliotte Rusty Harold

My code also often considers FF to be whitespace, although I've never
actually (knowingly) ecountered it in a real text file.

I have. It shows up in a lot of old text files as a page separator
character.

It's still useful as way to force lpr (Unix plain-text print utility)
to start a new page.

- John Burger
MITRE

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

John Cowan

2003-10-21 16:38:21 UTC

Permalink

Post by Jill Ramonsky
Well, I can't say I've ever found a use for putting either a C0 or a C1
control into a text file, beyond the usual CR, LF and TAB. My code also
often considers FF to be whitespace, although I've never actually
(knowingly) encountered it in a real text file.

All RFCs contain FFs at the end of each page.

Post by Jill Ramonsky
I would have thought that low codepoints would be highly valuable
commodities. Though some may have exotic uses, my experience is that
most of them don't seem to be used.

That's why Microsoft felt free to reassign 80-9F to graphic characters in
its various codepages, which means they cannot reliably be sent across
serial transmission lines, which is what most control characters were
intended for.

Post by Jill Ramonsky
I have tended to treat the control characters rather like the
Private Use Area - a space in which I can do what I want so long as
don't expect the "outside world" to agree.

Indeed, that's safe enough. But Unicode is all about interchange, so if
it reassigned any ISO controls, it would step on other uses of them.

Post by Jill Ramonsky
I've even invented (and used)
some 8-bit encodings which leave the whole of Latin-1 unchanged (apart
from the C1s) and use C1 characters a bit like "surrogate pairs" to
reach the rest.

There is actually a standards-compliant way to achieve "code extension"
of that type, for up to 4 spaces of 94+96 characters each (you are not
allowed to redefine space or DEL):

o use 0E (shift out) to switch to the 2nd space
o use 0F (shift in) to return to the main space
o use 8E (single shift 2) to mark the next byte as being in the 3rd space
o use 8F (single shift 3) to mark the next byte as being in the 4th space

If you need more than 4 spaces, you then enter the Great Pain, also known
as ISO 2022.
--
Evolutionary psychology is the theory John Cowan
that men are nothing but horn-dogs, http://www.ccil.org/~cowan
and that women only want them for their money. http://www.reutershealth.com
--Susan McCarthy (adapted) ***@reutershealth.com

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Peter Kirk

2003-10-21 19:03:10 UTC

Permalink

Post by Jill Ramonsky
Hmm.
Well, I can't say I've ever found a use for putting either a C0 or a
C1 control into a text file, beyond the usual CR, LF and TAB. My code
also often considers FF to be whitespace, although I've never actually
(knowingly) encountered it in a real text file.

I just encountered a C0 control in one of John Cowan's always
entertaining signatures, in a plain text e-mail. Well, the source was
actually "Fran=1B)B=E7ois Yergeaus" but the "=1B" is a quoted-printable
encoded form of U+001B. I'm not sure what the display intention was as
Mozilla made no attempt to render it properly.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Marco Cimarosti

2003-10-21 16:23:38 UTC

Permalink

[...] I've even invented (and used) some 8-bit encodings which
leave the whole of Latin-1 unchanged (apart from the C1s) and use C1
characters a bit like "surrogate pairs" to reach the rest.

Doug, are you listening? It seems there's a new clone of UTF:-)Z waiting for
implementation!

_ Marco

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/