Discussion:
Klingons and their allies - Beyond 17 planes
Jill Ramonsky
2003-10-17 10:00:45 UTC
Permalink
-----Original Message-----
Sent: Thursday, October 16, 2003 4:04 PM
To: Philippe Verdy
Subject: Beyond 17 planes, was: Java char and Unicode 3.0+
Plenty of
room there to encode not just all the scripts of the Galactic
Federation
but even to squeeze in those of the Klingons and their allies!
The reason that the Klingon alphabet is not currently part of Unicode is
that the Klingon Language Institute submitted a proposal for the Klingon
script to the Unicode Consortium, and the Unicode consortium rejected
it. I have been unable to fathom their reasons.

It seems a simple enough case to argue - EITHER the 0x110000 character
space is amply big enough for everyone, as John Cowan asserts. (I quote,
"Similarly, the number of characters used by the peoples of the Earth
for writing their various languages is not going to be expanded by the
discovery of 10,000 characters used for writing the lost script
of Atlantis. The earth is finite and small, and there's no place for
large writing systems to hide from the eagle eyes of the Roadmappers."),
OR it isn't, in which case there is an argument for adding more planes.
[I should stress at this point the Klingon script /is/ used by the
peoples of the Earth, right here in the 21st century]. Here's what the
Klingon Language Institute has to say:

/The Klingon *pIqaD *script was on the Roadmap for inclusion in
Unicode for several years before it was rejected. There were many
debates on its appropriateness, with one camp maintaining that
fictional scripts in general, and Klingon in particular, didn't
belong in Unicode. That view was eventually defeated, with the
relevant criteria ending up being whether a script is used by a
large enough body of users who need to exchange data, and whether it
is historically important enough with respect to existing recorded
data. Klingon was rejected, but it failed because its potential
users don't use it. The fact is that Klingon language publications,
by and large, use the Romanized transcription presented in The
Klingon Dictionary. This is arguably a chicken-and-egg situation,
but nobody argued that point successfully to the relevant Unicode
committees. /

/However, being rejected doesn't mean that Klingon is not compatible
with Unicode today. Some years ago, Klingon was one of the supported
languages in a popular distribution of the Linux operating system,
with a *pIqaD *-style metafont character set mapped to a specific
region of the Unicode Private Use Area. That mapping has been made
somewhat more "public" in the CSUR, a published list of constructed
scripts: /

It seems to me that if 0x110000 codepoints isn't a big enough space to
fit in the Klingon alphabet (and other alphabets which were similarly
rejected) then we need more codepoints. Simple as that. The "chicken and
egg" situation described above is quite real. Esperanto speakers were
writing c^, ch and even cx /long/ before the character c^ became
available for everyone's use. More codepoints may allow more scripts not
to be rejected in the first place.

Jill
John Cowan
2003-10-17 11:43:49 UTC
Permalink
Post by Jill Ramonsky
It seems a simple enough case to argue - EITHER the 0x110000 character
space is amply big enough for everyone, as John Cowan asserts.
Big enough for everyone, but not for everything. Encoding Klingon has
a cost beyond the allocation of codepoints: proposals must be written
(taking time away from other proposals that need to be written), committees
must deliberate, facts must be checked. Most of that work had already
been done for Klingon, as it's a dirt-simple script, much more so than
Latin, to say nothing of Hebrew. But it's a precedent.
Post by Jill Ramonsky
[I should stress at this point the Klingon script /is/ used by the
peoples of the Earth, right here in the 21st century].
Well, in fact the people who use it most are the _Star Trek_ set designers,
and they use it not to write Klingon, but purely as a design element.
There are many glyphs that appear on the show that aren't used in the
standard mapping.
Post by Jill Ramonsky
The fact is that Klingon language publications,
by and large, use the Romanized transcription presented in The
Klingon Dictionary. This is arguably a chicken-and-egg situation,
but nobody argued that point successfully to the relevant Unicode
committees. /
I don't think for a moment it's a chicken-and-egg situation. Klingon
is written in the Latin script in essentially all running-text (as opposed
to decorative) instances of its use. If it were c-and-e, the script could
still be written by hand -- though it must have the worst ductus of any
script ever devised, and would probably be writable only with the
assistance of a set of rubber stamps.
Post by Jill Ramonsky
It seems to me that if 0x110000 codepoints isn't a big enough space to
fit in the Klingon alphabet (and other alphabets which were similarly
rejected) then we need more codepoints.
That would be true if Klingon had been rejected for lack of space. It
wasn't. It was rejected for inappropriateness in other respects.

(BTW, Michael, I can't agree that Klingon script is a cipher for Latin.
The mapping is to Klingon phonemes, not Latin letters as such.)
--
Long-short-short, long-short-short / Dactyls in dimeter,
Verse form with choriambs / (Masculine rhyme): ***@reutershealth.com
One sentence (two stanzas) / Hexasyllabically http://www.reutershealth.com
Challenges poets who / Don't have the time. --robison who's at texas dot net


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Mark E. Shoulson
2003-10-17 15:28:53 UTC
Permalink
Post by John Cowan
Post by Jill Ramonsky
It seems a simple enough case to argue - EITHER the 0x110000 character
space is amply big enough for everyone, as John Cowan asserts.
Big enough for everyone, but not for everything. Encoding Klingon has
a cost beyond the allocation of codepoints: proposals must be written
(taking time away from other proposals that need to be written), committees
must deliberate, facts must be checked. Most of that work had already
been done for Klingon, as it's a dirt-simple script, much more so than
Latin, to say nothing of Hebrew. But it's a precedent.
Doesn't *everyting* take time from other proposals? Maybe we should
close this list down and stop taking more proposals, because of this
"cost." I thought that doing this very work was what the Unicode
Consortium was created for.
Post by John Cowan
Post by Jill Ramonsky
[I should stress at this point the Klingon script /is/ used by the
peoples of the Earth, right here in the 21st century].
Well, in fact the people who use it most are the _Star Trek_ set designers,
and they use it not to write Klingon, but purely as a design element.
There are many glyphs that appear on the show that aren't used in the
standard mapping.
But there are people who DO use the characters in the mapping, and use
them to write Klingon.
Post by John Cowan
Post by Jill Ramonsky
The fact is that Klingon language publications,
by and large, use the Romanized transcription presented in The
Klingon Dictionary. This is arguably a chicken-and-egg situation,
but nobody argued that point successfully to the relevant Unicode
committees. /
I don't think for a moment it's a chicken-and-egg situation. Klingon
is written in the Latin script in essentially all running-text (as opposed
to decorative) instances of its use. If it were c-and-e, the script could
still be written by hand -- though it must have the worst ductus of any
script ever devised, and would probably be writable only with the
assistance of a set of rubber stamps.
Not so. OK, yes, it IS so that the script is hideous and if it were a
natural script I'd say it could only have evolved from stamping. But
there *are* people who use it in handwriting, who keep journals in it,
who write notes to one another... (I'm not one of them, in general, but
there are Klingonists who do). I'll see if I can get you some names and
data.

It *is* a c-and-e problem, as I've said just now. We *can't* send email
or make web pages in Klingon: I've tried, and even with Mozilla (a
generally standards-compliant browser) the PUA doesn't work as it ought
to, and if it did it wouldn't matter since the PUA by definition isn't
meant for information interchange.
Post by John Cowan
Post by Jill Ramonsky
It seems to me that if 0x110000 codepoints isn't a big enough space to
fit in the Klingon alphabet (and other alphabets which were similarly
rejected) then we need more codepoints.
That would be true if Klingon had been rejected for lack of space. It
wasn't. It was rejected for inappropriateness in other respects.
None of which have ever made sense to me.
Post by John Cowan
(BTW, Michael, I can't agree that Klingon script is a cipher for Latin.
The mapping is to Klingon phonemes, not Latin letters as such.)
Hebrew's also a cipher to slightly augmented Latin, didn't you know
that? Hebrew scholars for decades/centuries have used a standard
transcription of Hebrew into Latin script plus diacriticals. So the
Hebrew text is also expressible as Latin. Yes, the mapping is to
phonemes, not letters.

~mark



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-17 17:07:05 UTC
Permalink
Post by Mark E. Shoulson
But
there *are* people who use it in handwriting, who keep journals in it,
who write notes to one another... (I'm not one of them, in general, but
there are Klingonists who do). I'll see if I can get you some names and
data.
This doesn't really meet what Unicode needs. Keeping journals is a personal
use, and writing notes is a point-to-point use. What we need is something
resembling publishing: a book, a journal, a newsletter. Something where
in order to computerize it, people have to agree on the encoding who can't
make person-to-person agreements.
Post by Mark E. Shoulson
It *is* a c-and-e problem, as I've said just now. We *can't* send email
or make web pages in Klingon: I've tried, and even with Mozilla (a
generally standards-compliant browser) the PUA doesn't work as it ought
to, and if it did it wouldn't matter since the PUA by definition isn't
meant for information interchange.
Try looking at http://publish.reutershealth.com/cgi-bin/qapla with Mozilla
Firebird or IE6. Make sure you have the Code2000 font installed.
Post by Mark E. Shoulson
Hebrew's also a cipher to slightly augmented Latin, didn't you know
that?
:-)
--
"Take two turkeys, one goose, four John Cowan
cabbages, but no duck, and mix them http://www.ccil.org/~cowan
together. After one taste, you'll duck ***@reutershealth.com
soup the rest of your life." http://www.reutershealth.com
--Groucho


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-17 18:27:15 UTC
Permalink
Oh. Right. So we just need to have a standardized encoding for Klingon
to get a standardized encoding for Klingon. That seems simple enough.
(and the fact that we already have and use one (or two: the PUA and the
"xifan" coding) doesn't count because... why again?)
No. You have two encodings, now USE them to actually publish stuff.
Start sending out KLI hard copy works in pIqaD. Reprint some books
using it. That's how you build a case for promoting your locally
standardized encoding to a universally standardized one.

Also sprach Kenneth Whistler:

# Demonstration of use in published or otherwise printed material
# is an important criterion. So is demonstration of use in implemented
# software, which may not be the same thing at all. (Some characters
# are invisible units of processing, after all, and may have no direct
# manifestation in print.) So is indication of support by a relevant community
# of users -- sometimes in advance of published use. So is use in textual
# interchange, or conversion, or use in interworking with legacy
# character encodings -- again, sometimes in the absence of printed
# publications. Architectural necessity to solve some problem in
# the encoding is another possible criterion. (In case anybody is
# interested in a bit of trivia, that is how U+0229 LATIN SMALL LETTER
# E WITH CEDILLA got in, by the way.) Preexisting encoding in other
# character encodings, even if badly conceived in the first place,
# is another criterion used for lots of characters.
I'm attaching a screenshot of http://www.kli.org/QQ/QQ0202.html?mode=UTF
which SHOULD be a Unicode encoding. This is with Mozilla 1.4 and
Code2000. Even people who can read pIqaD can't read this. The "qapla'"
page works okay, but note that only some letters are affected (that's
okay; English doesn't *really* need its g, m, q, r, and z, right?)
I see the same partial mojibake on Mozilla Firebird/Linux, but the
Windows version seems to work correctly. So does fully patched IE6.
--
"We are lost, lost. No name, no business, no Precious, nothing. Only empty.
Only hungry: yes, we are hungry. A few little fishes, nassty bony little
fishes, for a poor creature, and they say death. So wise they are; so just,
so very just." --Gollum ***@reutershealth.com www.ccil.org/~cowan


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Mark E. Shoulson
2003-10-17 19:48:12 UTC
Permalink
Post by John Cowan
I'm attaching a screenshot of http://www.kli.org/QQ/QQ0202.html?mode=UTF
which SHOULD be a Unicode encoding. This is with Mozilla 1.4 and
Code2000. Even people who can read pIqaD can't read this. The "qapla'"
page works okay, but note that only some letters are affected (that's
okay; English doesn't *really* need its g, m, q, r, and z, right?)
I see the same partial mojibake on Mozilla Firebird/Linux, but the
Windows version seems to work correctly. So does fully patched IE6.
I'm pretty sure this is happening because the stylesheet specifies
something non-Code2000, and since Times or whatever has things at those
codepoints, the (r) and such win, while where it's silent, the browser
falls back on Code2000. It worked on your qapla page because you have
an explicit fontface=code2000 there.

(Ah, so why didn't I fix it? Mainly because I only just figured it out.)

~mark



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Mark E. Shoulson
2003-10-17 18:54:24 UTC
Permalink
Post by John Cowan
Post by Mark E. Shoulson
But
there *are* people who use it in handwriting, who keep journals in it,
who write notes to one another... (I'm not one of them, in general, but
there are Klingonists who do). I'll see if I can get you some names and
data.
This doesn't really meet what Unicode needs. Keeping journals is a personal
use, and writing notes is a point-to-point use. What we need is something
resembling publishing: a book, a journal, a newsletter. Something where
in order to computerize it, people have to agree on the encoding who can't
make person-to-person agreements.
Oh. Right. So we just need to have a standardized encoding for Klingon
to get a standardized encoding for Klingon. That seems simple enough.
(and the fact that we already have and use one (or two: the PUA and the
"xifan" coding) doesn't count because... why again?)
Post by John Cowan
Post by Mark E. Shoulson
It *is* a c-and-e problem, as I've said just now. We *can't* send email
or make web pages in Klingon: I've tried, and even with Mozilla (a
generally standards-compliant browser) the PUA doesn't work as it ought
to, and if it did it wouldn't matter since the PUA by definition isn't
meant for information interchange.
Try looking at http://publish.reutershealth.com/cgi-bin/qapla with Mozilla
Firebird or IE6. Make sure you have the Code2000 font installed.
I'm attaching a screenshot of http://www.kli.org/QQ/QQ0202.html?mode=UTF
which SHOULD be a Unicode encoding. This is with Mozilla 1.4 and
Code2000. Even people who can read pIqaD can't read this. The "qapla'"
page works okay, but note that only some letters are affected (that's
okay; English doesn't *really* need its g, m, q, r, and z, right?)

~mark


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jill Ramonsky
2003-10-17 12:36:39 UTC
Permalink
-----Original Message-----
Sent: Friday, October 17, 2003 12:44 PM
To: Jill Ramonsky
Subject: Re: Klingons and their allies - Beyond 17 planes
Post by Jill Ramonsky
It seems a simple enough case to argue - EITHER the
0x110000 character
Post by Jill Ramonsky
space is amply big enough for everyone, as John Cowan asserts.
Big enough for everyone, but not for everything.
Aha. Then at least we agree on something. An 0x110000 character space is
not big enough for everyTHING.

In that case, I would argue that, in order to provide a big enough
character space for everything, IF twenty-one bits is not enough THEN we
should use more bits. Users of any script, regardless of whether it's
Klingon or anything else, should always be able to get codepoints for
their script. Nobody should ever need to "justify" its use to a
committee. It should suffice to claim "At least two people use it, so we
want codepoints for it". Klingon, at least, /does/ now use space in the
PUA, but of course that's a problem for anyone who doesn't agree on the
particular choice of mapping.

You could argue that that's what the private use area is for. I would
argue that codepoints above 0x10FFFF could be considered as just another
private use area ... only somewhat larger. So large, in fact, that you
need never see a clash, ever.

Jill
John Cowan
2003-10-17 13:35:53 UTC
Permalink
Post by Jill Ramonsky
Aha. Then at least we agree on something. An 0x110000 character space is
not big enough for everyTHING.
You persist in misunderstanding. Suppose I came along and told you
I wanted to create a Unicode codepoint for each word in every language
on Earth. Would you blithely allocate me a 24-billion-codepoint
private space? And then my friend comes along and wants to do the
same, but he can't use my encoding because he relies on binary
ordering and he needs to get the languages grouped in alphabetical
order, whereas I sort them by language family. Boom, another 24 billion
codepoints gone. Now comes someone else who figures that 64 x 64 resolution
is good enough for representing glyphs, and wants a codepoint for each
possible glyph. That's 2^64^2, or
10443888814131525066917527107166243825799642490473837803842334832839
53907971557456848826811934997558340890106714439262837987573438185793
60726323608785136527794595697654370999834036159013438371831442807001
18559462263763188393977127456723346843445866174968079087058037040712
84048740118609114467977783598029006686938976881787785946905630190260
94059957945343282346930302669644305902501597239986771421554169383555
98852914863182379144344967340878118726394964751001890413490084170616
75093668333850551032972088269550769983616369411933015213796825837188
09183365675122131849284636812555022599830041234478486259567449219461
70238065059132456108257318353800876086221028342701976982023131690176
78006675195485079921636419370285375124784014907159135459982790513399
61155179427110683113409058427288427979155484978295432353451706522326
90613949059876930021229633956877828789484406160074129456749198230505
71642377154816321380631045902916136926708342856440730447899971901781
46576347322385026725305989979599609079946920177462481771844986745565
92501783290704731194331655508075682218465717463732968849128195203174
57002440926616910874148385078411929804522981857338977648103126085903
00130241346718972667321649151113160292078173803343609024380470834040
3154190336 more codepoints gone. We aren't going to run out of
integers, of course, but we will quickly run out of money, brains,
and time.

Or we can say that the purpose of the Unicode Standard is to encode
characters used for computer (and a fortiori computer-moderated
human) interchange of text.
Post by Jill Ramonsky
In that case, I would argue that, in order to provide a big enough
character space for everything, IF twenty-one bits is not enough THEN we
should use more bits.
21 bits is plenty. Not everything that *can* be fit into that space
should be.
Post by Jill Ramonsky
You could argue that that's what the private use area is for.
Exactly.
Post by Jill Ramonsky
I would
argue that codepoints above 0x10FFFF could be considered as just another
private use area ... only somewhat larger. So large, in fact, that you
need never see a clash, ever.
Only if you are willing to deal with infinite precision integers, as
I do above.
--
Eric Raymond is the Margaret Mead John Cowan
of the Open Source movement. ***@reutershealth.com
--Lloyd A. Conway, http://www.ccil.org/~cowan
amazon.com review http://www.reutershealth.com


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Michael Everson
2003-10-17 13:31:37 UTC
Permalink
Post by Jill Ramonsky
The reason that the Klingon alphabet is not currently part of
Unicode is that the Klingon Language Institute submitted a proposal
for the Klingon script to the Unicode Consortium,
That isn't true. It was *I* who submitted the proposal.
Post by Jill Ramonsky
and the Unicode consortium rejected it. I have been unable to fathom
their reasons.
It was rejected because the people who read and write Klingon don't
Post by Jill Ramonsky
The fact is that Klingon language publications, by and large, use
the Romanized transcription presented in The Klingon Dictionary.
This is arguably a chicken-and-egg situation, but nobody argued that
point successfully to the relevant Unicode committees.
It is not a chicken-and-egg situation. Were the Klingon Dictionary
reissued without Latin orthography, and were articles in HolQeD
regularly written in Klingon script, one might well take notice. CSUR
gives an encoding which can be used; we have yet to see it being used!
Post by Jill Ramonsky
More codepoints may allow more scripts not to be rejected in the first place.
Space is not why Klingon was not accepted.
--
Michael Everson * * Everson Typography * * http://www.evertype.com


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jill Ramonsky
2003-10-17 13:43:26 UTC
Permalink
Don't patronise me. I don't misunderstand you, I just disagree with you.
I think you're wrong you. Deal with it.

I'm happy to discuss this or any other issue, but ONLY if you drop the
insults.
Jill
-----Original Message-----
Sent: Friday, October 17, 2003 2:36 PM
To: Jill Ramonsky
Subject: Re: Klingons and their allies - Beyond 17 planes
You persist in misunderstanding.
------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Rick McGowan
2003-10-17 14:56:32 UTC
Permalink
Jill Ramonsky wrote...
It seems to me that if 0x110000 codepoints isn't a big enough space to fit in
the Klingon alphabet (and other alphabets which were similarly rejected)
then we need more codepoints. Simple as that.
Rejection of Klingon has *absolutely* nothing to do with space. Jill
quoted, but apparently did not *read* a statement (ostensibly from KLI but
apparently only existing in a FAQ from a mail list not on the kli.org
Klingon was rejected, but it failed because its potential users don't use it.
Jill went on to write:

Answer the question "please come up with more than 1 million things that
Every script that ever got rejected by the Unicode Consortium.
Count the rejected characters, please. The number is nowhere near a
million. *SPACE* is simply not the issue in rejection. It has to do with
usage.
In such a system no application need ever be rejected, for any reason.
Inclusion would be automatic for every submission.
Who will write software to keep track of all that? Sorry, but that notion
is economically preposterous. If anyone anytime can make a new character
and have it automatically added to the standard, you don't have a very
stable standard, you have a bunch of competing private uses, and nobody
from one moment to the next has any idea what is actually in the standard
or how it relates to anything else. (The notion is anti-communicative and
entirely against the trend of history which, in all civilizations I know of
in all time periods has tended toward greater standardization, not less.)
The chaos of such a free-for-all would probably end up working itself out
into a series of private agreements among user groups and industry
cartels... Soon someone would get the bright idea of defining a
circumscribed subset so more people could have some hope of communicating.
And then we would be right back in Unicode land.

And as usual this message reflects solely my personal opinion and not that
of anyone else.

Rick


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Mark E. Shoulson
2003-10-17 17:07:31 UTC
Permalink
Post by Rick McGowan
Jill Ramonsky wrote...
It seems to me that if 0x110000 codepoints isn't a big enough space to fit in
the Klingon alphabet (and other alphabets which were similarly rejected)
then we need more codepoints. Simple as that.
Rejection of Klingon has *absolutely* nothing to do with space. Jill
quoted, but apparently did not *read* a statement (ostensibly from KLI but
apparently only existing in a FAQ from a mail list not on the kli.org
Klingon was rejected, but it failed because its potential users don't use it.
See http://ptolemy.tlg.uci.edu/~opoudjis/Klingon/piqad.html for some
discussion on this from Nick Nicholas. He makes some of the same
points: dozens of scripts out there are used more in transliteration
than in native form, by scholars. Who *writes* in Linear B? People
write in transliterations of it.

Note, too, that Klingon's PUA is used in jbovlaste
(http://www.lojban.org/jbovlaste/natlang/listing.html?lang=i-klingon).
Post by Rick McGowan
In such a system no application need ever be rejected, for any reason.
Inclusion would be automatic for every submission.
Who will write software to keep track of all that? Sorry, but that notion
is economically preposterous. If anyone anytime can make a new character
and have it automatically added to the standard, you don't have a very
stable standard, you have a bunch of competing private uses, and nobody
from one moment to the next has any idea what is actually in the standard
or how it relates to anything else. (The notion is anti-communicative and
entirely against the trend of history which, in all civilizations I know of
in all time periods has tended toward greater standardization, not less.)
The chaos of such a free-for-all would probably end up working itself out
into a series of private agreements among user groups and industry
cartels... Soon someone would get the bright idea of defining a
circumscribed subset so more people could have some hope of communicating.
And then we would be right back in Unicode land.
I have to confess, I personally would like to see a way to access the
higher planes somehow, eventually (e.g. hyper-surrogates or something).
But your argument is correct. Basically, an (effectively) infinite
space assignable at will by anyone is what we had already, before
computers were invented. Anyone could make up glyphs, and every writing
system in fact did. But it gets impossible to keep track of all of them.

~mark



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-17 18:52:29 UTC
Permalink
Who *writes* in Linear B? People write in transliterations of it.
Now they do. But the people who actually did write in LinB, even though
they had no computers (abacuses, maybe), were in fact publishing what
they wrote.
Note, too, that Klingon's PUA is used in jbovlaste
(http://www.lojban.org/jbovlaste/natlang/listing.html?lang=i-klingon).
The President of the LLG takes official notice of this. :-)
--
John Cowan ***@reutershealth.com
http://www.reutershealth.com http://www.ccil.org/~cowan
Humpty Dump Dublin squeaks through his norse
Humpty Dump Dublin hath a horrible vorse
But for all his kinks English / And his irismanx brogues
Humpty Dump Dublin's grandada of all rogues. --Cousin James


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Constable
2003-10-17 15:37:12 UTC
Permalink
From: unicode-***@unicode.org [mailto:unicode-***@unicode.org] On
Behalf Of Jill Ramonsky
The fact is that Klingon language publications, by and large, use the
Romanized transcription presented in The Klingon Dictionary. This is
arguably a chicken-and-egg situation
Not it's not. People have been creating documents for scripts that are
not supported in any industry standard for years.


Peter



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Constable
2003-10-17 15:41:45 UTC
Permalink
-----Original Message-----
On
Behalf Of Mark E. Shoulson
Doesn't *everyting* take time from other proposals?
You mean, discussions like this?


Peter

Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Mark E. Shoulson
2003-10-17 17:10:40 UTC
Permalink
That's what I mean. We'd better shut down the list.

~mark
Post by Peter Constable
-----Original Message-----
On
Behalf Of Mark E. Shoulson
Doesn't *everyting* take time from other proposals?
You mean, discussions like this?
Peter
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Nick Nicholas
2003-10-17 16:12:16 UTC
Permalink
On Saturday, Oct 18, 2003, at 02:04 Australia/Melbourne, Nick Nicholas
Date: Fri, 17 Oct 2003 11:00:45 +0100
Subject: Klingons and their allies - Beyond 17 planes
The reason that the Klingon alphabet is not currently part of Unicode
is
that the Klingon Language Institute submitted a proposal for the
Klingon
script to the Unicode Consortium, and the Unicode consortium rejected
it. I have been unable to fathom their reasons.
The relevant debates are linked on my page at
http://www.tlg.uci.edu/~opoudjis/Klingon/piqad.html . And it's quite
obvious what the real reasons were: the feeling that Klingon would
bring Unicode into disrepute. (How many newspaper and web articles on
Unicode in the late '90s included the phrase "even Klingon"?) And my
impression was that feeling was even stronger in the ISO than the UTC.
But hey, I wasn't there...
[I should stress at this point the Klingon script /is/ used by the
peoples of the Earth, right here in the 21st century]. Here's what the
I think the KLI's summary is fair. (After all, I contributed to it. :-)
Date: Fri, 17 Oct 2003 07:43:49 -0400
Subject: Re: Klingons and their allies - Beyond 17 planes
Post by Jill Ramonsky
It seems a simple enough case to argue - EITHER the 0x110000 character
space is amply big enough for everyone, as John Cowan asserts.
Big enough for everyone, but not for everything. Encoding Klingon has
a cost beyond the allocation of codepoints: proposals must be written
(taking time away from other proposals that need to be written),
committees
must deliberate, facts must be checked. Most of that work had already
been done for Klingon, as it's a dirt-simple script, much more so than
Latin, to say nothing of Hebrew. But it's a precedent.
Not a compelling one, though, and Michael in his time has repudiated
it: http://groups.yahoo.com/group/unicode/message/11541 .
Post by Jill Ramonsky
The fact is that Klingon language publications,
by and large, use the Romanized transcription presented in The
Klingon Dictionary. This is arguably a chicken-and-egg situation,
but nobody argued that point successfully to the relevant Unicode
committees. /
I don't think for a moment it's a chicken-and-egg situation.
I'm with Mark on this one, John: if it were official, you would see
more pIqaD online. By no means most Klingon, but I do think more.
Klingon
is written in the Latin script in essentially all running-text (as
opposed
to decorative) instances of its use. If it were c-and-e, the script
could
still be written by hand -- though it must have the worst ductus of any
script ever devised, and would probably be writable only with the
assistance of a set of rubber stamps.
Or by proprietary font (which has happened, as you will find by
searching for XIFAN online, but not much), or by the PUA.
Date: Fri, 17 Oct 2003 13:36:39 +0100
Subject: RE: Klingons and their allies - Beyond 17 planes
In that case, I would argue that, in order to provide a big enough
character space for everything, IF twenty-one bits is not enough THEN
we
should use more bits. Users of any script, regardless of whether it's
Klingon or anything else, should always be able to get codepoints for
their script. Nobody should ever need to "justify" its use to a
committee. It should suffice to claim "At least two people use it, so
we
want codepoints for it". Klingon, at least, /does/ now use space in the
PUA, but of course that's a problem for anyone who doesn't agree on the
particular choice of mapping.
John is right that this is a slippery slope; any two people can be
boneheads. That said, I think the voting down of pIqaD was stuffy and
pointless; like I say on my pIqaD site, "Personally, I do not regard
pIqaD as more or less frivolous than Tengwar—or for that matter
Meroitic" (since, as Bunz has often argued, specialists on ancient
languages only ever work in transliteration, so the scholarly market
won't use them all that much). But it's done, and both me and the KLI
built ourselves a bridge and got over it. People on the KLI list did in
fact comment at the time that the "semi-private" assignment of Klingon
in the PUA (and there has been more than one font using that
assignment) was a good outcome.
Date: Fri, 17 Oct 2003 06:31:37 -0700
Subject: Re: Klingons and their allies - Beyond 17 planes
It was rejected because the people who read and write Klingon don't
Post by Jill Ramonsky
The fact is that Klingon language publications, by and large, use
the Romanized transcription presented in The Klingon Dictionary.
This is arguably a chicken-and-egg situation, but nobody argued that
point successfully to the relevant Unicode committees.
It is not a chicken-and-egg situation. Were the Klingon Dictionary
reissued without Latin orthography, and were articles in HolQeD
regularly written in Klingon script, one might well take notice. CSUR
gives an encoding which can be used; we have yet to see it being used!
http://www.lojban.org/jbovlaste/natlang/listing.html?lang=i-klingon .
Which is not even a Klingonist site. You overstate our case: the people
who read and write Klingon *rarely* actually use the script. Yes, its
most prominent venue is on the commemorative T-shirts at the KLI annual
conventions, but it's not like noone there is able to nut out what they
mean --- or design them in the first place. One might argue that
T-shirts and Lojban online dictionary forms are not enough of a
precedent for assigning a block; but I'm not sure how well pIqaD would
compare in usage past or present to, say, the Elbasan script.

Ah well. Best get back to Classical Greek before we start seeing more
well-considered and judicious posts like
http://groups.yahoo.com/group/unicode/message/5552 ...

--
κι έγειρε αργά τα στήθια τα θλιμμένα·#Nick Nicholas, French/Italian,
σαν αηδόνι που σε νυχτιά ανοιξιάτα #University of Melbourne
την ώρα που κελάηδα επνίχτη, ωιμένα! # ***@unimelb.edu.au
στις μυρωδιές και στ' ανθισμένα βάτα.# http://www.opoudjis.net
-- Ν. Καζαντζάκης, Τερτσίνες: Χριστός#



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-10-17 17:41:25 UTC
Permalink
... or for that matter Meroitic" (since, as Bunz has often argued,
specialists on ancient languages only ever work in transliteration, so
the scholarly market won't use them all that much). ...
That is not true of all ancient scripts. Some, like biblical Hebrew, are
still in regular use by specialists as well as non-specialists. And, as
Bunz also argues in http://www.unicode.org/notes/tn3/bunz-iuc17pap.pdf,
other ancient scripts, while not much used by specialists for their
actual discussions, are nevertheless used in quite widely in tutorial
materials and in materials prepared for the general public e.g. popular
historical science, enyclopedias etc.

So there is a real need for Unicode to encode and standardise these
ancient scripts - or at least some of them, based on the other criteria
which Bunz explores. Fortunately there is, we have been assured several
times, plenty of room for them, and if we want Klingon etc, in the 17
planes.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Nick Nicholas
2003-10-18 02:26:00 UTC
Permalink
On Saturday, Oct 18, 2003, at 03:41 Australia/Melbourne, Peter Kirk
Post by Peter Kirk
... or for that matter Meroitic" (since, as Bunz has often argued,
specialists on ancient languages only ever work in transliteration,
so the scholarly market won't use them all that much). ...
That is not true of all ancient scripts. Some, like biblical Hebrew,
are still in regular use by specialists as well as non-specialists.
Ah, no fair. I don't regard Hebrew as archaic, since it continues in
use; the more so since the biblical Hebrew in current use is
typographically 9th century AD --- as is for that matter the Greek in
current use. And of course, as Elaine has pointed out to us, other
archaic Semitic scripts get transliterated *into* Hebrew.
Post by Peter Kirk
And, as Bunz also argues in
http://www.unicode.org/notes/tn3/bunz-iuc17pap.pdf, other ancient
scripts, while not much used by specialists for their actual
discussions, are nevertheless used in quite widely in tutorial
materials and in materials prepared for the general public e.g.
popular historical science, enyclopedias etc.
Weeell, yes, but if we're talking raw amount of text appearing in
original script and in transliteration, transliteration almost always
wins, and by an appreciable margin. Furthermore, transliteration
doesn't force you to make determinations on what is emic and what is
etic. And this explains why specialists in Egyptian hieroglyphics and
cuneiform are reluctant to do anything with them.

I'm not disputing that there's space for archaic scripts, of course
(it's all been roadmapped already, after all); nor even that encoding
them (if feasible) is a noble and worthy effort. I have after all been
involved in preparing such proposals for Archaic and Hellenistic Greek.
I'm just pointing out that, in terms of both actual usage and
probability of a proposal gelling --- not to mention a user community
who would like to have the encoding there --- Klingon does not compare
unfavourably with Meroitic, and I think the "disrepute" consideration
was as much a consideration as the "actual use". The proof for or
against, I guess, will come if Cirth gets into Unicode.

But this issue has provoked grumbles in the past from UTC members ---
particularly when someone asked for explicit criteria on including
scripts; so like I say, whatever. Klingon's in the PUA, and that's OK.
If software can't cope with the PUA, that *is* defeating the purpose of
the PUA (two people can and should be allowed to exchange data in it by
agreement, they just shouldn't expect everyone else to subscribe to
that agreement). Unfortunately there's no block of codepoints in PUA
pushing its incorporation into software, the way Cantonese did for the
Astral Planes; but it is still a misfeature, and it is appropriate for
Mark to complain about it. Though I think the solution is to fix PUA
support, not to bring Klingon up to the ISO again...

--------------------
=================================----------------------
Dr Nick Nicholas. Unimelb, Aus. ***@unimelb.edu.au;
www.opoudjis.net
"Electronic editors have to live in hope: hope that the long-awaited
standards for encoding texts for the computer will arrive; hope that
they
will be workable; hope that software will appear to handle these texts;
hope that all the scholars of the world will have computers which can
drive the software (which does not yet exist) to handle the texts (which
have not yet been made) encoded in standard computer markup (which has
not
yet been devised). To hope for all this requires a considerable belief
in
the inevitability of progress and in the essential goodness of mankind."
(Peter M.W.
Robinson)




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Doug Ewell
2003-10-18 05:57:58 UTC
Permalink
Post by Nick Nicholas
If software can't cope with the PUA, that *is* defeating the purpose
of the PUA (two people can and should be allowed to exchange data in
it by agreement, they just shouldn't expect everyone else to
subscribe to that agreement).
The best explanation of the PUA I've heard in a long time. If Nick's
words aren't the absolute truth, a lot of assumptions I've been making
about the PUA for the past 7 years are blown out of the water.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Tom Gewecke
2003-10-18 13:41:36 UTC
Permalink
Post by Doug Ewell
Post by Nick Nicholas
If software can't cope with the PUA, that *is* defeating the purpose
of the PUA (two people can and should be allowed to exchange data in
it by agreement, they just shouldn't expect everyone else to
subscribe to that agreement).
The best explanation of the PUA I've heard in a long time. If Nick's
words aren't the absolute truth, a lot of assumptions I've been making
about the PUA for the past 7 years are blown out of the water.
It seems to me that modern software is perfectly able to "cope" with the
PUA. Using OS X I can read and write email and web pages in Cirth,
Tengwar, and Klingon perfectly well. The problems mentioned earlier in
this thread disappear if one uses correct html/css for websites and uses
html mail rather than plain text with the Mozilla mail client, which
otherwise won't let you choose the font for incoming mail.





------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Doug Ewell
2003-10-18 15:42:23 UTC
Permalink
The problems mentioned earlier in this thread disappear if one uses
correct html/css for websites and uses html mail rather than plain
text with the Mozilla mail client, which otherwise won't let you
choose the font for incoming mail.
For all the criticisms that people love to fling toward Outlook Express,
it does let me choose the font for incoming mail. The e-mail messages I
received in Ewellic were plain text.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-10-18 17:53:02 UTC
Permalink
Post by Doug Ewell
The problems mentioned earlier in this thread disappear if one uses
correct html/css for websites and uses html mail rather than plain
text with the Mozilla mail client, which otherwise won't let you
choose the font for incoming mail.
For all the criticisms that people love to fling toward Outlook Express,
it does let me choose the font for incoming mail. The e-mail messages I
received in Ewellic were plain text.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
There definitely seems to be a bug in Mozilla in this area. See for
example http://bugzilla.mozilla.org/show_bug.cgi?id=201695 (still
unconfirmed - it seems that Mozilla bug reports in this area don't even
get looked at in six months) and
http://bugzilla.mozilla.org/show_bug.cgi?id=26182 (where they claim the
problem was fixed three years ago, but it has reappeared). Maybe there
is a fix by setting some user preferences in a special file, but there
doesn't seem to be a fix in the UI.

I'll file a new bug - done, see
http://bugzilla.mozilla.org/show_bug.cgi?id=222777. It will be
interesting to see what response I get, if any.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-10-21 19:14:30 UTC
Permalink
Post by Peter Kirk
Post by Doug Ewell
The problems mentioned earlier in this thread disappear if one uses
correct html/css for websites and uses html mail rather than plain
text with the Mozilla mail client, which otherwise won't let you
choose the font for incoming mail.
For all the criticisms that people love to fling toward Outlook Express,
it does let me choose the font for incoming mail. The e-mail messages I
received in Ewellic were plain text.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
There definitely seems to be a bug in Mozilla in this area. See for
example http://bugzilla.mozilla.org/show_bug.cgi?id=201695 (still
unconfirmed - it seems that Mozilla bug reports in this area don't
even get looked at in six months) and
http://bugzilla.mozilla.org/show_bug.cgi?id=26182 (where they claim
the problem was fixed three years ago, but it has reappeared). Maybe
there is a fix by setting some user preferences in a special file, but
there doesn't seem to be a fix in the UI.
I'll file a new bug - done, see
http://bugzilla.mozilla.org/show_bug.cgi?id=222777. It will be
interesting to see what response I get, if any.
It turns out that my bug is a duplicate of
http://bugzilla.mozilla.org/show_bug.cgi?id=91190 and work is in
progress on fixing it. Meanwhile there is a workaround. UTF-8 plain text
messages, and web pages, are displayed with the default font for the
system locale. So change the fonts for your system locale, in my case
"western", and your plain text UTF-8 messages will show up in that font
- even with UTF-8 codes outside your system code page. With this
approach (see comments 34 and 36 to bug 91190) I can even read Mark
Davis' signature - that is, it appears correctly, I'd love to know what
it means!
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Mark Davis
2003-10-21 20:02:13 UTC
Permalink
Post by Peter Kirk
I can even read Mark
Davis' signature - that is, it appears correctly, I'd love to know what
it means!
शिष्यादिच्छेत्पराजयम्

shiSyAdicchetparAjayam

shiSyAt ‘from the student’

icchet ‘one should desire’

parAjayam ‘defeat’

‘A teacher should wish to be defeated by his own student in scholarship’


I got this from Peri Bhaskararao on our recent trip to India. I had said
something reminiscent of this in a talk, and he let me know of this saying --
which I liked -- then sent the details to me later.

Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message -----
From: "Peter Kirk" <***@qaya.org>
To: "Unicode Mailing List" <***@unicode.org>
Sent: Tue, 2003 Oct 21 12:14
Subject: Re: Klingons and their allies - Beyond 17 planes
Post by Peter Kirk
Post by Peter Kirk
Post by Doug Ewell
The problems mentioned earlier in this thread disappear if one uses
correct html/css for websites and uses html mail rather than plain
text with the Mozilla mail client, which otherwise won't let you
choose the font for incoming mail.
For all the criticisms that people love to fling toward Outlook Express,
it does let me choose the font for incoming mail. The e-mail messages I
received in Ewellic were plain text.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
There definitely seems to be a bug in Mozilla in this area. See for
example http://bugzilla.mozilla.org/show_bug.cgi?id=201695 (still
unconfirmed - it seems that Mozilla bug reports in this area don't
even get looked at in six months) and
http://bugzilla.mozilla.org/show_bug.cgi?id=26182 (where they claim
the problem was fixed three years ago, but it has reappeared). Maybe
there is a fix by setting some user preferences in a special file, but
there doesn't seem to be a fix in the UI.
I'll file a new bug - done, see
http://bugzilla.mozilla.org/show_bug.cgi?id=222777. It will be
interesting to see what response I get, if any.
It turns out that my bug is a duplicate of
http://bugzilla.mozilla.org/show_bug.cgi?id=91190 and work is in
progress on fixing it. Meanwhile there is a workaround. UTF-8 plain text
messages, and web pages, are displayed with the default font for the
system locale. So change the fonts for your system locale, in my case
"western", and your plain text UTF-8 messages will show up in that font
- even with UTF-8 codes outside your system code page. With this
approach (see comments 34 and 36 to bug 91190) I can even read Mark
Davis' signature - that is, it appears correctly, I'd love to know what
it means!
--
Peter Kirk
http://www.qaya.org/
------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Michael Everson
2003-10-18 15:57:22 UTC
Permalink
Post by Nick Nicholas
If software can't cope with the PUA, that *is* defeating the purpose
of the PUA (two people can and should be allowed to exchange data in
it by agreement, they just shouldn't expect everyone else to
subscribe to that agreement). Unfortunately there's no block of
codepoints in PUA pushing its incorporation into software, the way
Cantonese did for the Astral Planes; but it is still a misfeature,
and it is appropriate for Mark to complain about it.
As far as I understand the problem is one of font only, where an OS
makes assumptions about what font should be displayed when a PUA
character is used.
Post by Nick Nicholas
Though I think the solution is to fix PUA support, not to bring
Klingon up to the ISO again...
In time, if Klingon demonstrates actual use, it could be
reconsidered. And I mean in time, and substantial use. But there's
nothing that says that it, or indeed the Phaistos Disc, are banned
forever.
--
Michael Everson * * Everson Typography * * http://www.evertype.com


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Marco Cimarosti
2003-10-17 16:42:42 UTC
Permalink
Post by John Cowan
You persist in misunderstanding. Suppose I came along and told you
I wanted to create a Unicode codepoint for each word in every language
on Earth. Would you blithely allocate me a 24-billion-codepoint
private space?
Why? 200 millions should be more than enough: that's more than 30.000 words
for each living language.

Of course, you should only encode abstract words, such as <ENGLISH VERB
JOKE>, and combining morphemes such as <ENGLISH COMBINING INFLECTION PAST
TENSE>, <ENGLISH COMBINING INFLECTION PRESENT PARTICIPLE>, etc.

It will be the task of the uttering engine to utter a sequence like <ENGLISH
VERB JOKE> + <ENGLISH COMBINING INFLECTION PRESENT PARTICIPLE> with the
ligature "joking". Of course, this will only happen with OpenLex-enabled
uttering engines: naive uttering engine based on old TrueLex would render
with the fallback uttering "joke -ing".

To make it more interesting, you could also encode a few useless
compatibility presentation inflected forms such <ENGLISH VERB SPEAK PAST
TENSE FORM>, which will get decomposed to <ENGLISH VERB SPEAK> + <ENGLISH
COMBINING INFLECTION PAST TENSE>, and finally be rendered as "spoke",
"speaked" or "speak -ed", depending on the platform.

Notice that a few words will need contextual forms, such as <ENGLISH
INDETERMINATE ARTICLE>, which will display as "a" or "an" depending on the
following code point.

Languages, such as Swahili, which use prefixes instead than suffixes will be
encoded in "logical order", i.e. with the combining prefix after the root.
It will be the task of the uttering engine to reorder the prefix. E.g., the
Swahili word "watu" (plural of "mtu" = "man") will be encoded as <SWAHILI
NOUN TU> + <SWAHILI COMBINING INFLECTION PLURAL FOR PEOPLE> and, in theory,
it will be rendered as "watu". In practice, it will always be rendered as
"-tu wa-" because no one will invest in implementing Swahili rendering.

Ciao.
Marco





------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Patrick Andries
2003-10-17 17:18:35 UTC
Permalink
----- Message d'origine -----
Post by Marco Cimarosti
Languages, such as Swahili, which use prefixes instead than suffixes will be
encoded in "logical order", i.e. with the combining prefix after the root.
It will be the task of the uttering engine to reorder the prefix. E.g., the
Swahili word "watu" (plural of "mtu" = "man") will be encoded as <SWAHILI
NOUN TU> + <SWAHILI COMBINING INFLECTION PLURAL FOR PEOPLE> and, in theory,
it will be rendered as "watu". In practice, it will always be rendered as
"-tu wa-" because no one will invest in implementing Swahili rendering.
I believe there is a strong case for Bantu character unification here, we
don't want to have twenty or so class characters for each bantu language or
dialect. A new Rapporteur Group ?

Incidentally, I notice that the b of group 8's bî (in my Meillet and Cohen
plural of ki) has a horizontal stroke across the lower stem of the b. I also
noticed this in a transcription of a Mayotte language. Also used by Meillet
and Cohen to note class 2 and class 8 prefix in Herero : respectively ob-a
and ib-i.

How is that character coded in Unicode?

- o - 0 - o
Unicode et ISO 10646 en français
http://pages.infinit.net/hapax








------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-17 17:57:44 UTC
Permalink
Post by Patrick Andries
Incidentally, I notice that the b of group 8's bî (in my Meillet and Cohen
plural of ki) has a horizontal stroke across the lower stem of the b. I also
noticed this in a transcription of a Mayotte language. Also used by Meillet
and Cohen to note class 2 and class 8 prefix in Herero : respectively ob-a
and ib-i.
It's probably a glyphic variant of U+0180.
--
Only do what only you can do. John Cowan <***@reutershealth.com>
--Edsger W. Dijkstra, http://www.reutershealth.com
deceased 6 August 2002 http://www.ccil.org/~cowan


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Patrick Andries
2003-10-17 18:42:36 UTC
Permalink
----- Message d'origine -----
Post by John Cowan
Post by Patrick Andries
Incidentally, I notice that the b of group 8's bî (in my Meillet and Cohen
plural of ki) has a horizontal stroke across the lower stem of the b. I also
noticed this in a transcription of a Mayotte language. Also used by Meillet
and Cohen to note class 2 and class 8 prefix in Herero : respectively ob-a
and ib-i.
It's probably a glyphic variant of U+0180.
In the case of Malgasy writing system (not Mayotte sorry) someone suggested
to me that it may be phonetically equivalent to U+0253, the sample I have is
« Kib-osi kimaôre en orthographe malgache » which means « Malgache de
Mayotte » (Mayotte being a French island off Madagascar). I will check the
actually value and usage of this b in Malgasy.




P. A.









------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Patrick Andries
2003-10-21 16:11:42 UTC
Permalink
Post by Patrick Andries
In the case of Malgasy writing system (not Mayotte sorry) someone suggested
to me that it may be phonetically equivalent to U+0253, the sample I have is
« Kib-osi kimaÎre en orthographe malgache » which means « Malgache de
Mayotte » (Mayotte being a French island off Madagascar). I will check the
actually value and usage of this b in Malgasy.
I got some news from the author of this notation, he tells me the b- and d- (bar across the lower part of the vertical stroke) are glyphic variant to the implosives « ɓ» U+0253 and « ɗ » U+0257. The forms with bars where chosen for pragmatic reasons, they can easily be typed « b » and « d », if necessary. A few books have been printed using the notation.


(Full message in French on Unicode-Afrique)
Peter Kirk
2003-10-17 19:05:58 UTC
Permalink
Post by Patrick Andries
----- Message d'origine -----
Post by Marco Cimarosti
Languages, such as Swahili, which use prefixes instead than suffixes will
be
Post by Marco Cimarosti
encoded in "logical order", i.e. with the combining prefix after the root.
It will be the task of the uttering engine to reorder the prefix. E.g.,
the
Post by Marco Cimarosti
Swahili word "watu" (plural of "mtu" = "man") will be encoded as <SWAHILI
NOUN TU> + <SWAHILI COMBINING INFLECTION PLURAL FOR PEOPLE> and, in
theory,
Post by Marco Cimarosti
it will be rendered as "watu". In practice, it will always be rendered as
"-tu wa-" because no one will invest in implementing Swahili rendering.
I believe there is a strong case for Bantu character unification here, we
don't want to have twenty or so class characters for each bantu language or
dialect. A new Rapporteur Group ?
If so, isn't there a case also for unification of Germanic languages? It
is clear that at this level many words are common between English,
German, Dutch, Swedish etc. although pronounced differently in the
different languages. Even the detail that Swedish definite articles are
suffixes can be dealt with by OpenLex as a rendering detail. And there
is the precedent of the unification of CJK characters across Chinese
"dialects" whose pronunciation differs just as widely. The scheme could
perhaps be extended across more of the Indo-European language family.

:-)
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-17 17:33:08 UTC
Permalink
Post by Marco Cimarosti
Why? 200 millions should be more than enough: that's more than 30.000 words
for each living language.
The Oxford English Dictionary has almost 10 times that many main entries.
And if we want to record every obvious derivative, 4 million words (times
6000 languages) seems a reasonable upper bound. Granted, English has
a fat vocabulary, but let's think big here.
Post by Marco Cimarosti
In practice, it will always be rendered as
"-tu wa-" because no one will invest in implementing Swahili rendering.
How right you are.
--
John Cowan <***@reutershealth.com> http://www.reutershealth.com
I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Philippe Verdy
2003-10-17 21:19:40 UTC
Permalink
Post by John Cowan
Post by Marco Cimarosti
Why? 200 millions should be more than enough: that's more than 30.000 words
for each living language.
The Oxford English Dictionary has almost 10 times that many main entries.
And if we want to record every obvious derivative, 4 million words (times
6000 languages) seems a reasonable upper bound. Granted, English has
a fat vocabulary, but let's think big here.
Post by Marco Cimarosti
In practice, it will always be rendered as
"-tu wa-" because no one will invest in implementing Swahili rendering.
Isn't most of that work already implemented for Indic scripts that require
glyph reordering? Or do you mean the complexity of the work needed to
create the glyph reordering tables (for logical to visual order)?

Isn't the Indic system now flexible enough to encode Banthu & Swahili
languages? That's a shame because Swahili is one of the most spoken
languages of the world (with millions of speakers), even before a lot of
regional European languages that are fully encoded and supported in
Unicode, and it really urgently needs to be more easily published to
keep its associated culture.



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-17 22:18:36 UTC
Permalink
Post by Philippe Verdy
Isn't most of that work already implemented for Indic scripts that require
glyph reordering? Or do you mean the complexity of the work needed to
create the glyph reordering tables (for logical to visual order)?
Relax, it's a joke.
Post by Philippe Verdy
Isn't the Indic system now flexible enough to encode Banthu & Swahili
languages? That's a shame because Swahili is one of the most spoken
languages of the world (with millions of speakers), even before a lot of
regional European languages that are fully encoded and supported in
Unicode, and it really urgently needs to be more easily published to
keep its associated culture.
Swahili uses Latin script, and of course it is representable in Unicode.
--
John Cowan ***@reutershealth.com www.ccil.org/~cowan www.reutershealth.com
"If I have seen farther than others, it is because I was standing on
the shoulders of giants."
--Isaac Newton


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-10-17 23:05:31 UTC
Permalink
Post by Philippe Verdy
Post by John Cowan
Post by Marco Cimarosti
Why? 200 millions should be more than enough: that's more than 30.000
words
Post by John Cowan
Post by Marco Cimarosti
for each living language.
The Oxford English Dictionary has almost 10 times that many main entries.
And if we want to record every obvious derivative, 4 million words (times
6000 languages) seems a reasonable upper bound. Granted, English has
a fat vocabulary, but let's think big here.
Post by Marco Cimarosti
In practice, it will always be rendered as
"-tu wa-" because no one will invest in implementing Swahili rendering.
Isn't most of that work already implemented for Indic scripts that require
glyph reordering? Or do you mean the complexity of the work needed to
create the glyph reordering tables (for logical to visual order)?
Isn't the Indic system now flexible enough to encode Banthu & Swahili
languages? That's a shame because Swahili is one of the most spoken
languages of the world (with millions of speakers), even before a lot of
regional European languages that are fully encoded and supported in
Unicode, and it really urgently needs to be more easily published to
keep its associated culture.
Are we talking about a real non-Latin script, some kind of syllabary or
logographic script, for Swahili and other Bantu languages? If so, I have
never heard of one and I have not seen it roadmapped. The Latin script
in common use certainly doesn't require complex ordering behaviour,
although for some Bantu languages characters outside ISO-8859-1 may be
required.

Or did someone not notice that Marco's comments were about the word "joke"?
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
j***@att.net
2003-10-18 03:30:30 UTC
Permalink
.
Rick McGowan wrote,
Post by Rick McGowan
Rejection of Klingon has *absolutely* nothing to do with space.
...
*SPACE* is simply not the issue in rejection.
"Space... the final frontier."
Post by Rick McGowan
Answer the question "please come up with more than 1 million
things that need to be encoded?" ...
Easy. Just start with the Chinese grass radical variants...
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2326.pdf

Best regards,

James Kass
.


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
j***@att.net
2003-10-18 23:52:20 UTC
Permalink
.
Tom Gewecke wrote,
Post by Tom Gewecke
It seems to me that modern software is perfectly able to "cope" with the
PUA. Using OS X I can read and write email and web pages in Cirth,
Tengwar, and Klingon perfectly well. The problems mentioned earlier in
this thread disappear if one uses correct html/css for websites and uses
html mail rather than plain text with the Mozilla mail client, which
otherwise won't let you choose the font for incoming mail.
In addition to the problem of the OS substituting improper glyphs
from inappropriate fonts unexpectedly, there's often a problem with
line breaking.

Since the PUA has no properties, some applications seem to ignore the
space character and break lines arbitrarily, splitting words in the
middle.

Best regards,

James Kass
.


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Thomas Chan
2003-10-19 02:04:37 UTC
Permalink
Post by j***@att.net
In addition to the problem of the OS substituting improper glyphs
from inappropriate fonts unexpectedly, there's often a problem with
line breaking.
Since the PUA has no properties, some applications seem to ignore the
space character and break lines arbitrarily, splitting words in the
middle.
No properties, or Han properties?


Thomas Chan
***@cornell.edu



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Doug Ewell
2003-10-19 18:32:34 UTC
Permalink
Post by j***@att.net
In addition to the problem of the OS substituting improper glyphs
from inappropriate fonts unexpectedly, there's often a problem with
line breaking.
Since the PUA has no properties, some applications seem to ignore the
space character and break lines arbitrarily, splitting words in the
middle.
That's exactly what happens in my sample pages. I didn't think it was
because the PUA had "no" properties so much as "default" properties,
which (as Thomas Chan indicated) might be Han-based or Han-influenced.
You can always switch to a font that will display glyphs for your PUA
characters, but it's harder to adapt a rendering engine to observe PUA
character properties.

In any case, I am absolutely certain :-) :-) that the arbitrary mid-word
line breaking is what has discouraged would-be readers from pointing out
the typo (since fixed) in my transcription of a Dorothy Parker poem:

http://users.adelphia.net/~dewell/sopp-ew.html

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Chris Jacobs
2003-10-20 02:34:27 UTC
Permalink
----- Original Message -----
From: "Doug Ewell" <***@adelphia.net>
To: "Unicode Mailing List" <***@unicode.org>
Cc: <***@att.net>; "Tom Gewecke" <***@bluesky.org>
Sent: Sunday, October 19, 2003 8:32 PM
Subject: Re: Klingons and their allies - Beyond 17 planes
Post by Doug Ewell
Post by j***@att.net
In addition to the problem of the OS substituting improper glyphs
from inappropriate fonts unexpectedly, there's often a problem with
line breaking.
Since the PUA has no properties, some applications seem to ignore the
space character and break lines arbitrarily, splitting words in the
middle.
That's exactly what happens in my sample pages. I didn't think it was
because the PUA had "no" properties so much as "default" properties,
which (as Thomas Chan indicated) might be Han-based or Han-influenced.
You can always switch to a font that will display glyphs for your PUA
characters, but it's harder to adapt a rendering engine to observe PUA
character properties.
One problem is that there seems to be no way in plaintext unicode to specify
who is in charge of a particular interpretation of the PUA.

As I understand the position of the designers of Unicode they definitely
don't want to be in charge of this and want to let the users of the PUA
fight it out among themselves.

Nevertheless I think if Unicode don't want to decide how the PUA is to be
interpreted it should be at the very least provide a mechanism by which an
user of the PUA can specify which specification he prefers.

I plan to propose such a mechanism:

I want to propose a char with the following properties:

Scalar Value: U+E0002

This starts a PUA interpretation selector tag.
The content of the tag is a Font family name.
For all PUA chars between this tag and the corresponding Cancel tag the
copyright holder of the font is the sole authority about how the PUA should
be interpreted.

Any comments?
Post by Doug Ewell
In any case, I am absolutely certain :-) :-) that the arbitrary mid-word
line breaking is what has discouraged would-be readers from pointing out
http://users.adelphia.net/~dewell/sopp-ew.html
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Asmus Freytag
2003-10-20 03:59:16 UTC
Permalink
Why does this have to be in 'plain text'??

Plain text can be streams or strings. For streams, such a mechanism might
make sense, if you could identify a compelling case that's not better
handled by HTML, XML etc.

For strings, embedding font names in front of characters just violates some
implicit assumptions, e.g. that the average string is 'short', that the
number of bytes are a small and at least probabilistically determinable
multiple of the number of character, etc. etc. Not to forget that strings
are often assumed to be the plainest of plain text.

A lot of architectures will break if you violate these implicit assumptions
by hosting a mini-markup inside a string. And for at least half of them (my
scientific estimate) performance will prevent them from doing anything
about it, so you are stuck.

The language tagging scheme was designed for use with a string based
protocol, but one where the protocol contained the rules of interpreting
any tagging. What you are proposing is something that's supposed to just
infect any run of characters without warning.

Who's going to implement this, why, where and when?

A./
Post by Chris Jacobs
----- Original Message -----
Sent: Sunday, October 19, 2003 8:32 PM
Subject: Re: Klingons and their allies - Beyond 17 planes
Post by Doug Ewell
Post by j***@att.net
In addition to the problem of the OS substituting improper glyphs
from inappropriate fonts unexpectedly, there's often a problem with
line breaking.
Since the PUA has no properties, some applications seem to ignore the
space character and break lines arbitrarily, splitting words in the
middle.
That's exactly what happens in my sample pages. I didn't think it was
because the PUA had "no" properties so much as "default" properties,
which (as Thomas Chan indicated) might be Han-based or Han-influenced.
You can always switch to a font that will display glyphs for your PUA
characters, but it's harder to adapt a rendering engine to observe PUA
character properties.
One problem is that there seems to be no way in plaintext unicode to specify
who is in charge of a particular interpretation of the PUA.
As I understand the position of the designers of Unicode they definitely
don't want to be in charge of this and want to let the users of the PUA
fight it out among themselves.
Nevertheless I think if Unicode don't want to decide how the PUA is to be
interpreted it should be at the very least provide a mechanism by which an
user of the PUA can specify which specification he prefers.
Scalar Value: U+E0002
This starts a PUA interpretation selector tag.
The content of the tag is a Font family name.
For all PUA chars between this tag and the corresponding Cancel tag the
copyright holder of the font is the sole authority about how the PUA should
be interpreted.
Any comments?
Post by Doug Ewell
In any case, I am absolutely certain :-) :-) that the arbitrary mid-word
line breaking is what has discouraged would-be readers from pointing out
http://users.adelphia.net/~dewell/sopp-ew.html
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Curtis Clark
2003-10-20 04:06:16 UTC
Permalink
Post by Chris Jacobs
One problem is that there seems to be no way in plaintext unicode to specify
who is in charge of a particular interpretation of the PUA.
At last! Another use for Plane 14! :-)
--
Curtis Clark http://www.csupomona.edu/~jcclark/
Mockingbird Font Works http://www.mockfont.com/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Doug Ewell
2003-10-20 06:18:56 UTC
Permalink
Post by Chris Jacobs
As I understand the position of the designers of Unicode they
definitely don't want to be in charge of this and want to let the
users of the PUA fight it out among themselves.
"Come to a mutual agreement" is probably more in the spirit. I doubt
the original designers of Unicode expected much competition among PUA
mappings.
Post by Chris Jacobs
Nevertheless I think if Unicode don't want to decide how the PUA is
to be interpreted it should be at the very least provide a mechanism
by which an user of the PUA can specify which specification he
prefers.
I'm pretty sure UTC wants to stay as far away as possible from something
like this that could be misunderstood as running a PUA registry.
Post by Chris Jacobs
Scalar Value: U+E0002
This starts a PUA interpretation selector tag.
The content of the tag is a Font family name.
For all PUA chars between this tag and the corresponding Cancel tag
the copyright holder of the font is the sole authority about how the
PUA should be interpreted.
Any comments?
Plenty. You're assuming a one-to-one relationship between font and PUA
mapping, and especially between font maker and PUA registration
authority, that doesn't necessarily exist. Code2000, for instance, is
not the only font that covers some of the ConScript ranges, particularly
Tengwar and Klingon. For the PUA mappings established by Microsoft and
Apple, there are numerous fonts distributed not only by those companies,
but by others.

Ideally, PUA characters should also have complete (or nearly complete)
information on Unicode properties, such as directionality and combining
class. This isn't necessarily the kind of information you could get by
asking the font vendor or examining a font file. Font files don't even
have Unicode character names, just short identifiers like "aacute."

Despite the wording "For all PUA chars...", there is no real guarantee
that an implementation would respect this font tag for PUA characters
only, and I think there'd have to be.

Finally, there is not a great sentiment within the UTC for expanding the
role of Plane 14 tags in general. In my November 2002 paper "In defense
of Plane 14 language tags" (L2/02-396R), I wrote that deprecating those
tags (which was under discussion at the time) would implicitly deprecate
the entire concept of Plane 14 tagging, and discourage the introduction
of new, non-language-related Plane 14 tags like the one you describe.
As it turns out, there are those who feel that would be a good thing.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Marco Cimarosti
2003-10-20 08:23:19 UTC
Permalink
Post by Peter Kirk
Are we talking about a real non-Latin script, some kind of
syllabary or logographic script, for Swahili and other
Bantu languages? [...]
Or did someone not notice that Marco's comments were about
the word "joke"?
Indeed.

In the last few months, I have been relatively serious, so someone may not
know or remember that I am the unofficial Unicode List's clown.
*<|:o)

_ Marco


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Philippe Verdy
2003-10-20 09:43:20 UTC
Permalink
Post by Marco Cimarosti
Post by Peter Kirk
Are we talking about a real non-Latin script, some kind of
syllabary or logographic script, for Swahili and other
Bantu languages? [...]
Or did someone not notice that Marco's comments were about
the word "joke"?
Indeed.
In the last few months, I have been relatively serious, so someone may not
know or remember that I am the unofficial Unicode List's clown.
Accept my apologize: I had not checked the script used by Swahili when it
was discussed (joked). However after reading your message, I had thought
that this language was mostly transliterated to ASCII, and there may have
existed some historic native scripts to write this language, in a context
where culture is/was mostly transmitted orally.

As Africa has been influenced by many foreign invasions, there may in fact
exist other scripts to represent this language (notably some Semitic
script). Do you know if such historic texts exist for this language written
in Arabic, Ethiopic, or some Indic scripts imported by merchants or
missionnaries ?



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-10-20 11:12:04 UTC
Permalink
Post by Philippe Verdy
Post by Marco Cimarosti
Post by Peter Kirk
Are we talking about a real non-Latin script, some kind of
syllabary or logographic script, for Swahili and other
Bantu languages? [...]
Or did someone not notice that Marco's comments were about
the word "joke"?
Indeed.
In the last few months, I have been relatively serious, so someone may not
know or remember that I am the unofficial Unicode List's clown.
Accept my apologize: I had not checked the script used by Swahili when it
was discussed (joked). However after reading your message, I had thought
that this language was mostly transliterated to ASCII, and there may have
existed some historic native scripts to write this language, in a context
where culture is/was mostly transmitted orally.
As Africa has been influenced by many foreign invasions, there may in fact
exist other scripts to represent this language (notably some Semitic
script). Do you know if such historic texts exist for this language written
in Arabic, Ethiopic, or some Indic scripts imported by merchants or
missionnaries ?
The best candidate for a historic script for a Bantu language. I can
find is the script for the Bamun or Bamum language of Cameroon, which is
"Bantoid" but not "Narrow Bantu" (see
http://www.ethnologue.com/show_language.asp?code=BAX and
http://www.ethnologue.com/show_family.asp?subid=22). The script was
originally devised about 1897 with 466 picto-ideographic characters, was
developed further and simplified to a 72 character phonetic script in
1918, and used widely until about 1933 after which it fell into gradual
disuse. Presumably this is the same as the "Bamum" roadmapped for
1900-1AFF in plane 1. The other roadmapped indigenous African scripts,
Vai, Mende and Bassa (not to forget Tifinagh, Egyptian and Meroitic,
also Ethiopic which is not strictly indigenous), are not for Bantu
languages. And the same is true of one African script which is well
attested but not apparently roadmapped: Egyptian demotic.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jill Ramonsky
2003-10-20 09:10:17 UTC
Permalink
I challenge you to find a document or script which used the characters
represented by codepoints U+E0020 to U+E007F /before/ their inclusion in
Unicode.

In point of fact, I challenge you to find any document, script,
application, or indeed any use whatsoever for certain particular of
these tag characters. Name me one document in which the character
U+E0040 (tag commerical at) or U+E007C (tag vertical line) is used. At all.

Jill
-----Original Message-----
Sent: Friday, October 17, 2003 4:37 PM
Subject: RE: Klingons and their allies - Beyond 17 planes
Post by Jill Ramonsky
The fact is that Klingon language publications, by and
large, use the
Post by Jill Ramonsky
Romanized transcription presented in The Klingon
Dictionary. This is
Post by Jill Ramonsky
arguably a chicken-and-egg situation
Not it's not. People have been creating documents for scripts that are
not supported in any industry standard for years.
Peter
Peter Kirk
2003-10-20 10:22:02 UTC
Permalink
Post by Jill Ramonsky
I challenge you to find a document or script which used the characters
represented by codepoints U+E0020 to U+E007F /before/ their inclusion
in Unicode.
In point of fact, I challenge you to find any document, script,
application, or indeed any use whatsoever for certain particular of
these tag characters. Name me one document in which the character
U+E0040 (tag commerical at) or U+E007C (tag vertical line) is used. At all.
Jill
It depends what is considered to be a tag. Currently only language tags
are defined in Unicode. But if "Author" were considered to be a
potentially definable tag, I could cite your posting as an example of
the use of the logical character "tag commercial at", not of course as
the Unicode character but as its logical equivalent in an alternative
encoding and markup. For your posting included "@" in "From: Jill
Ramonsky <***@Aculab.com>".
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jill Ramonsky
2003-10-20 10:47:00 UTC
Permalink
So, if I have understood this correctly (which is by no means certain),
these tag characters were added to Unicode in the vague hope that some
people might one day start using them, or on the off-chance that someone
might one day need them. This seems a far cry from the demand that
actual printed publications must appear in a given script before Unicode
will accept it. The reason given for the rejection of Klingon doesn't
seem to have been applied to the tag characters.

So what's the explanation for this discrepency then? Hypocracy?
Prejudice? Please enlighten me.

Alternatively, maybe I've misunderstood and there is, in fact, no such
requirement that a script appear in published books before it may be
added to Unicode ... in which case, of course, it cannot be used as an
argument for the Consortium's rejection of Klingon.

Jill
-----Original Message-----
Sent: Monday, October 20, 2003 11:22 AM
To: Jill Ramonsky
Subject: Re: Klingons and their allies - Beyond 17 planes
Post by Jill Ramonsky
I challenge you to find a document or script which used the
characters
Post by Jill Ramonsky
represented by codepoints U+E0020 to U+E007F /before/ their
inclusion
Post by Jill Ramonsky
in Unicode.
In point of fact, I challenge you to find any document, script,
application, or indeed any use whatsoever for certain particular of
these tag characters. Name me one document in which the character
U+E0040 (tag commerical at) or U+E007C (tag vertical line)
is used. At
Post by Jill Ramonsky
all.
Jill
It depends what is considered to be a tag. Currently only
language tags
are defined in Unicode. But if "Author" were considered to be a
potentially definable tag, I could cite your posting as an example of
the use of the logical character "tag commercial at", not of
course as
the Unicode character but as its logical equivalent in an alternative
--
Peter Kirk
http://www.qaya.org/
------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-10-20 11:43:08 UTC
Permalink
Post by Jill Ramonsky
So, if I have understood this correctly (which is by no means certain),
these tag characters were added to Unicode in the vague hope that some
people might one day start using them, or on the off-chance that someone
might one day need them.
Not.

They were added in order to ward off an abuse of UTF-8 by a certain
committee that insisted it needed lightweight language tagging in
a certain computer protocol. The tags were never a "script". Everyone
on the UTC sincerely hopes, I believe, that they never get used at all.
For 99.9% of all use cases, ordinary markup is the Right Thing for
language tagging.
Post by Jill Ramonsky
Alternatively, maybe I've misunderstood and there is, in fact, no such
requirement that a script appear in published books before it may be
added to Unicode ... in which case, of course, it cannot be used as an
argument for the Consortium's rejection of Klingon.
"Books" is an equivoque. Publishing (i.e. distributing to the public)
in some medium of writing is certainly an important factor.
--
John Cowan <***@reutershealth.com>
http://www.ccil.org/~cowan http://www.reutershealth.com
Charles li reis, nostre emperesdre magnes,
Set anz totz pleinz ad ested in Espagnes.


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Philippe Verdy
2003-10-20 15:10:46 UTC
Permalink
Post by John Cowan
Post by Jill Ramonsky
So, if I have understood this correctly (which is by no means certain),
these tag characters were added to Unicode in the vague hope that some
people might one day start using them, or on the off-chance that someone
might one day need them.
Not.
They were added in order to ward off an abuse of UTF-8 by a certain
committee that insisted it needed lightweight language tagging in
a certain computer protocol. The tags were never a "script". Everyone
on the UTC sincerely hopes, I believe, that they never get used at all.
For 99.9% of all use cases, ordinary markup is the Right Thing for
language tagging.
I also approve the fact that "language" tags are not needed for Unicode,
(else it woumd mean that the text they surround must be treated specially
for a specific language, with distinct character properties, clustering,
rendering and so on, so that the text remains legible; this fact would
then break the unification model).

However I think it's a good idea to have qualifying "script" tags in areas
that Unicode will not regulate: PUAs. This allows adding a semantic to
them and effectively can close the gap for their correct interpretation,
notably when Unicode text with PUAs from various sources are merged
in a single document: these PUAs can then be interpreted correctly and
less ambiguously within their context.

This also means that, in this case, PUAs would be effectively usable
and interchangeable between systems using distinct PUA conventions
without needing extra planes. All that remains is then describing
which script tags can be used, how they should be coded, and if a
registry (like the IANA charsets database) should be preferably used
when this registry contains charmaps and assignments to these PUAs.



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Marco Cimarosti
2003-10-20 12:23:57 UTC
Permalink
Post by Philippe Verdy
As Africa has been influenced by many foreign invasions,
there may in fact exist other scripts to represent this
language [...]
Yes: until a recent past, Swahili was also commonly written in the Arabic
alphabet.

_ Marco




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Marco Cimarosti
2003-10-20 13:56:59 UTC
Permalink
Post by Chris Jacobs
[...]
Nevertheless I think if Unicode don't want to decide how the
PUA is to be interpreted
Please take notice of this "interpreted": I'll come back to this soon.
Post by Chris Jacobs
it should be at the very least provide a mechanism by which
an user of the PUA can specify which specification he
prefers.
Scalar Value: U+E0002
This starts a PUA interpretation
Again, please take notice of this "interpretation".
Post by Chris Jacobs
selector tag. The content of the tag is a Font family
name. For all PUA chars between this tag and the
corresponding Cancel tag the copyright holder of the font
is the sole authority about how the PUA should
be interpreted.
Again, "interpreted"...
Post by Chris Jacobs
Any comments?
Yes.

A font tells me how a certain run of text should be *displayed* in rich
text, not how it should be *interpreted* in plain text.

Imagine that I have been asked to write a function AreTheseLetters() which
gets a string argument (i.e., a piece of plain text) and returns a Boolean
value indicating whether all the characters in it are letters.

For non-PUA characters, I already implemented this using Unicode's "General
Category" property: I decided that all characters whose General Category is
"L*" are "letters". My default assumption about PUA characters is that they
are not letters.

So far so good. Now I want to use your PUA Plan-14 tags, if present, to
override the above assumption about PUA characters. E.g., imagine that my
string contains this:

��������������������������
(U+0E0000 U+0E0002 U+0E0046 U+0E006F U+0E004F U+0E0062 U+0E0061
U+0E0072 U+0E002E U+0E0074 U+0E0074 U+0E0066 U+0E007F U+E017 U+E009)

This is what I am going to do:

1) I parsing the tags at the beginning of the string and save the relevant
information in a temporary variable which we will call PuaInterpretation;

2) I remove the tags.

Now, my PuaInterpretation variable contains the following information:

Foobar.ttf

And my string contains the following text:


(U+E017 U+E009)

Now, what's the next step? What am I supposed to do to find out whether,
according to the PUA interpretation called "Foobar.ttf", U+E017 and U+E009
are letters or not?

_ Marco



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Kent Karlsson
2003-10-20 16:05:48 UTC
Permalink
...
Post by Marco Cimarosti
For non-PUA characters, I already implemented this using Unicode's
"General Category" property: I decided that all characters whose General
Category is "L*" are "letters".
Nit: That isn't quite true (but I'm not doubting your choice). The
HANGUL * FILLER characters aren't letters, even though they are of
GC Lo. Indeed, they are even invisible (but the Jamo ones are needed
for representing isolated letters using Jamos in the adopted architecture
for Hangul in Unicode; the non-Jamo Hangul fillers are there just
for compatibility with an older standard, nothing lettery about them).
Nor are LAO ELLIPSIS and THAI CHARACTER PAIYANNOI letters,
though Lo. They are really punctuation.
Post by Marco Cimarosti
My default assumption about PUA characters is that they are not letters.
Hmm. A common default seems to be to treat them as CJK. Non-PUA
CJK is Lo... (Except for radicals, which are So.) Granted, I'm not too
fond
of that default myself. The situation is a bit similar for Braille, where
the
"glyphs" are given, but nothing much else.

/kent k
Philippe Verdy
2003-10-21 08:30:19 UTC
Permalink
Post by Marco Cimarosti
Foobar.ttf

(U+E017 U+E009)
Now, what's the next step? What am I supposed to do to find out whether,
according to the PUA interpretation called "Foobar.ttf", U+E017 and U+E009
are letters or not?
Effectively, I don't like the idea of tagging PUA text with "font names
tags".

I'd rather prefer tagging the PUA text with "script name tags" (I mean
the extended user-defined script codes like "x-klingon", followed by
a base codepoint indicator and a codespace length like
"x-klingon;b=E000;l=80):

- this gives a real interpretation to PUAs, evaluated in their context,

- it allows remapping them locally to other ranges in case of conflict
between
multiple PUA conventions uses

- the script indicator name can be mapped locally to a character properties
database, indexed at the relative codepoint in the PUA convention codespace.

- any number of fonts can be designed to work with PUAs even if they are
sharing conflicting codespaces.

- any language can use this system.

- no more need for extra planes

- experimentation with new scripts still not standardized is possible,
including
for character properties, breaking behavior, layout, grapheme clustering,
...

- emulation of new standardized scripts becomes possible on previous
implementations that lack support for new characters or scripts...



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
j***@att.net
2003-10-21 01:41:48 UTC
Permalink
.
Marco Cimarosti wrote,
Post by Marco Cimarosti
So far so good. Now I want to use your PUA Plan-14 tags, if present, to
override the above assumption about PUA characters. E.g., imagine that my
󠀀󠀂󠁆󠁯󠁏󠁢󠁡󠁲󠀮󠁎󠁎󠁊󠁿> ?
(U+0E0000 U+0E0002 U+0E0046 U+0E006F U+0E004F U+0E0062 U+0E0061
U+0E0072 U+0E002E U+0E0074 U+0E0074 U+0E0066 U+0E007F U+E017 U+E009)
1) I parsing the tags at the beginning of the string and save the relevant
information in a temporary variable which we will call PuaInterpretation;
2) I remove the tags.
Foobar.ttf

(U+E017 U+E009)
Now, what's the next step? What am I supposed to do to find out whether,
according to the PUA interpretation called "Foobar.ttf", U+E017 and U+E009
are letters or not?
Hmmm, the UTF-8 non-BMP string apparently got munged.

Anyway, the next step is for your function to load the file
"Foobar.puapropertiesclass".

This file is a plain-text file following the same format as UNIDATA. It's
extensible -- if the font vendor doesn't include it with the font download,
then the savvy end-user can simply construct it with a plain-text editor.

Now your function has all the necessary information and can determine
whether the PUA code points are letters, or not.

Best regards,

James Kass
.


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Continue reading on narkive:
Loading...